本文介绍了汇总同一调用中的所有组值和条件子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我将通过一个例子来说明我的问题.
I'll illustrate my question with an example.
样本数据:
df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B = c(1, 5, 7, 23, 54, 202)) df ID A B 1 1 foo 1 2 1 bar 5 3 2 foo 7 4 2 foo 23 5 3 bar 54 6 5 bar 202我想做的是通过ID总结B的总和以及当A为"foo"时B的总和.我可以通过几个步骤来做到这一点,例如:
What I want to do is to summarize, by ID, the sum of B and the sum of B when A is "foo". I can do this in a couple steps like:
require(magrittr) require(dplyr) df1 <- df %>% group_by(ID) %>% summarize(sumB = sum(B)) df2 <- df %>% filter(A == "foo") %>% group_by(ID) %>% summarize(sumBfoo = sum(B)) left_join(df1, df2) ID sumB sumBfoo 1 1 6 1 2 2 30 30 3 3 54 NA 4 5 202 NA但是,我正在寻找一种更优雅/更快的方法,因为我正在处理10gb +的sqlite内存不足数据.
However, I'm looking for a more elegant/faster way, as I'm dealing with 10gb+ of out-of-memory data in sqlite.
require(sqldf) my_db <- src_sqlite("my_db.sqlite3", create = T) df_sqlite <- copy_to(my_db, df)我想到了使用 mutate 定义一个新的 Bfoo 列:
I thought of using mutate to define a new Bfoo column:
df_sqlite %>% mutate(Bfoo = ifelse(A=="foo", B, 0))不幸的是,这不适用于数据库.
Unfortunately, this doesn't work on the database end of things.
Error in sqliteExecStatement(conn, statement, ...) : RS-DBI driver: (error in statement: no such function: IFELSE) 推荐答案写出@hadley的评论作为答案
Writing up @hadley's comment as an answer
df_sqlite %>% group_by(ID) %>% mutate(Bfoo = if(A=="foo") B else 0) %>% summarize(sumB = sum(B), sumBfoo = sum(Bfoo)) %>% collect更多推荐
汇总同一调用中的所有组值和条件子集
发布评论