我有一个调查文件,其中行是观察和列问题.
以下是一些 中,您有一个数据框看起来像这样:
>头(df)ID 类型 Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE1 1 A 1980 450 338 154 36 13 92 2 A 2000 288 407 212 54 16 233 3 A 2020 196 434 246 68 19 364 4 B 1980 111 326 441 90 21 115 5 B 2000 63 298 443 133 42 216 6 B 2020 36 257 462 162 55 30由于您在第 4-9 列中有数值,这些数值稍后会绘制在 y 轴上,因此可以使用 reshape 轻松转换并绘制.
对于我们当前的数据集,我们需要类似的东西,所以我们使用 freq=table(col(raw), as.matrix(raw)) 来得到这个:
>数据名字非常.坏坏好非常.好1 食物 7 6 5 22 音乐 5 5 7 33 人 6 3 7 4想象一下你有 Very.Bad、Bad、Good 等等,而不是 X1PCE、X2PCE,X3PCE.看到相似之处了吗?但是我们需要先创建这样的结构.因此 freq=table(col(raw), as.matrix(raw)).
I have a survey file in which row are observation and column question.
Here are some fake data they look like:
People,Food,Music,People P1,Very Bad,Bad,Good P2,Good,Good,Very Bad P3,Good,Bad,Good P4,Good,Very Bad,Very Good P5,Bad,Good,Very Good P6,Bad,Good,Very GoodMy aim is to create this kind of plot with ggplot2.
- I absolutely don't care of the colors, design, etc.
- The plot doesn't correspond to the fake data
Here are my fake data:
raw <- read.csv("pastebin/raw.php?i=L8cEKcxS",sep=",") raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE) raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE) raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)But if I choose Y as count then I'm facing an issue about choosing the X and the Group values... I don't know if I can succeed without using reshape2... I've also tired to use reshape with melt function. But I don't understand how to use it...
解决方案EDIT: Eight years later...
This needs a tidyverse solution, so here is one, with all non-base packages explicitly stated so that you know where each function comes from (except for read.csv which is from utils which comes with base R):
library(magrittr) # needed for %>% if dplyr is not attached "pastebin/raw.php?i=L8cEKcxS" %>% read.csv(sep = ",") %>% tidyr::pivot_longer(cols = c(Food, Music, People.1), names_to = "variable", values_to = "value") %>% dplyr::group_by(variable, value) %>% dplyr::summarise(n = dplyr::n()) %>% dplyr::mutate(value = factor( value, levels = c("Very Bad", "Bad", "Good", "Very Good")) ) %>% ggplot2::ggplot(ggplot2::aes(variable, n)) + ggplot2::geom_bar(ggplot2::aes(fill = value), position = "dodge", stat = "identity")
The original answer:
First you need to get the counts for each category, i.e. how many Bads and Goods and so on are there for each group (Food, Music, People). This would be done like so:
raw <- read.csv("pastebin/raw.php?i=L8cEKcxS",sep=",") raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE) raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE) raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE) raw=raw[,c(2,3,4)] # getting rid of the "people" variable as I see no use for it freq=table(col(raw), as.matrix(raw)) # get the counts of each factor levelThen you need to create a data frame out of it, melt it and plot it:
Names=c("Food","Music","People") # create list of names data=data.frame(cbind(freq),Names) # combine them into a data frame data=data[,c(5,3,1,2,4)] # sort columns # melt the data frame for plotting data.m <- melt(data, id.vars='Names') # plot everything ggplot(data.m, aes(Names, value)) + geom_bar(aes(fill = variable), position = "dodge", stat="identity")Is this what you're after?
To clarify a little bit, in ggplot multiple grouping bar you had a data frame that looked like this:
> head(df) ID Type Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE 1 1 A 1980 450 338 154 36 13 9 2 2 A 2000 288 407 212 54 16 23 3 3 A 2020 196 434 246 68 19 36 4 4 B 1980 111 326 441 90 21 11 5 5 B 2000 63 298 443 133 42 21 6 6 B 2020 36 257 462 162 55 30Since you have numerical values in columns 4-9, which would later be plotted on the y axis, this can be easily transformed with reshape and plotted.
For our current data set, we needed something similar, so we used freq=table(col(raw), as.matrix(raw)) to get this:
> data Names Very.Bad Bad Good Very.Good 1 Food 7 6 5 2 2 Music 5 5 7 3 3 People 6 3 7 4Just imagine you have Very.Bad, Bad, Good and so on instead of X1PCE, X2PCE, X3PCE. See the similarity? But we needed to create such structure first. Hence the freq=table(col(raw), as.matrix(raw)).
更多推荐
ggplot 中的分组条形图
发布评论