我有以下数据。 我想要做的是根据年份和ID创建所有州的列表。
为了直观地展示它,我在Excel中制作了一个小模型:
由于阿拉巴马州有2017年和ID 1,因此它被放入该单元格。 其他州阿拉斯加州和亚利桑那州同样拥有2017年和1年级,所以他们被置于同一个地方。 那整个单元格将组成一个矢量,如:
c('Alabama', 'Alaska', 'Arizona')当我说“单元格”时,我当然是指R语境中的“元素”。 我希望R以类似的方式填写所有其他元素并将其全部输出为数据框,ID为观察值,年份为变量。 我真的不知道怎么开始这样做虽然...
df <- USArrests df$ID <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2) df$Year <- c(2017, 2017, 2017, 2015, 2012, 2013, 2017, 2017, 2017, 2015, 2012, 2013, 2017, 2017, 2017, 2015, 2012, 2013, 2017, 2017, 2017, 2015, 2012, 2012, 2017, 2017, 2017, 2015, 2012, 2013, 2017, 2017, 2017, 2015, 2012, 2013, 2017, 2017, 2017, 2015, 2012, 2013, 2017, 2017, 2017, 2015, 2011, 2015, 2012, 2013)I have the following data. What I would like to do is create a list of all the states, based on their year and ID.
To show it visually, I've made a small mockup in Excel:
Since Alabama has the year 2017 and the ID 1, it gets put in that cell. The other states Alaska and Arizona similarly has the year 2017 and ID 1, so they get put in the same place. That entire cell will then make up a vector like:
c('Alabama', 'Alaska', 'Arizona')When I say "cell", I of course refer to an "element" in the R context. I would like R to fill out all the other elements in a similar way and output it all as a data frame, with ID as observations and year as variables. I really have no idea how to start about doing this though...
最满意答案
一个选项可以是使用data.table和dcast 。 keep.rownames = TRUE参数会将keep.rownames = TRUE更改为column (rn) 。
library(data.table) setDT(df, keep.rownames = TRUE) dcast(df, ID ~ Year, value.var = "rn", fun.aggregate = paste0, collapse = ",")对于只有多个州的解决方案可以是:
dcast(df, ID ~ Year, value.var = "rn",length) # ID 2011 2012 2013 2015 2017 # 1: 1 0 1 2 3 9 # 2: 2 1 5 3 3 9 # 3: 3 0 3 2 3 6根据OP的反馈更新:
dcast(df, ID ~ Year, value.var = "rn", function(x)paste0("Count: ",length(x), " " ,paste(x,collapse = ","))) # ID 2011 2012 2013 # 1: 1 Count: 0 Count: 1 California Count: 2 Colorado,Louisiana # 2: 2 Count: 1 Washington Count: 5 Minnesota,Mississippi,New Hampshire,Ohio,Wisconsin Count: 3 New Jersey,Oklahoma,Wyoming # 3: 3 Count: 0 Count: 3 Hawaii,Kentucky,South Dakota Count: 2 Idaho,TennesseeOne option can be to use data.table and dcast. keep.rownames = TRUE argument will change the rownames to column (rn).
library(data.table) setDT(df, keep.rownames = TRUE) dcast(df, ID ~ Year, value.var = "rn", fun.aggregate = paste0, collapse = ",")For just number of states solution could be as:
dcast(df, ID ~ Year, value.var = "rn",length) # ID 2011 2012 2013 2015 2017 # 1: 1 0 1 2 3 9 # 2: 2 1 5 3 3 9 # 3: 3 0 3 2 3 6Updated based on feedback from OP:
dcast(df, ID ~ Year, value.var = "rn", function(x)paste0("Count: ",length(x), " " ,paste(x,collapse = ","))) # ID 2011 2012 2013 # 1: 1 Count: 0 Count: 1 California Count: 2 Colorado,Louisiana # 2: 2 Count: 1 Washington Count: 5 Minnesota,Mississippi,New Hampshire,Ohio,Wisconsin Count: 3 New Jersey,Oklahoma,Wyoming # 3: 3 Count: 0 Count: 3 Hawaii,Kentucky,South Dakota Count: 2 Idaho,Tennessee更多推荐
发布评论