我已经用?stats :: aggregate 函数实现了一个简单的分组操作。它在向量中收集每个组的元素。我想使它更快使用data.table包。但是我无法使用data.table重现所需的行为。
I have implemented a simple group-by-operation with the ?stats::aggregate function. It collects elements per group in a vector. I would like to make it faster using the data.table package. However I'm not able to reproduce the wanted behaviour with data.table.
示例数据集:
df <- data.frame(group = c("a","a","a","b","b","b","b","c","c"), val = c("A","B","C","A","B","C","D","A","B"))输出使用data.table重现:
Output to reproduce with data.table:
by_group_aggregate <- aggregate(x = df$val, by = list(df$group), FUN = c)我试过的:
data_t <- data.table(df) # working, but not what I want by_group_datatable <- data_t[,j = paste(val,collapse=","), by = group] # no grouping done when using c or as.vector by_group_datatable <- data_t[,j = c(val), by = group] by_group_datatable <- data_t[,j = as.vector(val), by = group] # grouping leads to error when using as.list by_group_datatable <- data_t[,j = as.list(val), by = group]在data.table列中可以有不同大小的向量吗?
Is it possible to have vectors of different size in a data.table column? If yes, how do I achieve it?
推荐答案这里有一种方法:
data_t[, list(list(val)), by = group] # group V1 #1: a A,B,C #2: b A,B,C,D #3: c A,B使用第一个 list(),因为您要聚合结果。使用第二个列表是因为您要将val列聚合到每个组的单独列表中。
The first list() is used because you want to aggregate the result. The second list is used because you want to aggregate the val column into separate lists per group.
要检查结构:
str(data_t[, list(list(val)), by = group]) #Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables: # $ group: Factor w/ 3 levels "a","b","c": 1 2 3 # $ V1 :List of 3 # ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 # ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 # ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 # - attr(*, ".internal.selfref")=<externalptr>
使用dplyr,您可以执行以下操作: / p>
Using dplyr, you could do the following:
library(dplyr) df %>% group_by(group) %>% summarise(val = list(val)) #Source: local data frame [3 x 2] # # group val # (fctr) (chr) #1 a <S3:factor> #2 b <S3:factor> #3 c <S3:factor>检查结构:
df %>% group_by(group) %>% summarise(val = list(val)) %>% str #Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 2 variables: # $ group: Factor w/ 3 levels "a","b","c": 1 2 3 # $ val :List of 3 # ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 # ..$ : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 # ..$ : Factor w/ 4 levels "A","B","C","D": 1 2更多推荐
如何将不同大小的向量放在data.table列中
发布评论