我有一个data.frame df
df = data.frame(v = c('E', 'B', 'EB', 'RM')) df$n= 100 / apply(df, 1, nchar)其中v表示值E = 4 , B = 3 , R = 2 , M = 1
我想像这样计算一列:
v n idx 1 E 100 400 2 B 100 300 3 EB 50 350 4 RM 50 150其中idx是n (v) 。 例如,对于第一行4 * 100 = 400并且对于最后一行(2 + 1) * 50 = 150
我有这样的事情:
df$e = ifelse(grepl('E', df$v), 4, 0) df$b = ifelse(grepl('B', df$v), 3, 0) df$r = ifelse(grepl('R', df$v), 2, 0) df$m = ifelse(grepl('M', df$v), 1, 0) df$idx = df$n * (df$e + df$b + df$r + df$m)但随着列数的增加,它变得不可行。
I have a data.frame df
df = data.frame(v = c('E', 'B', 'EB', 'RM')) df$n= 100 / apply(df, 1, nchar)Where v represents values E = 4, B = 3, R = 2, and M = 1
I want to calculate a column like so:
v n idx 1 E 100 400 2 B 100 300 3 EB 50 350 4 RM 50 150Where idx is n (v). For example for the first row 4 * 100 = 400 and for the last row (2 + 1) * 50 = 150
I have something like this:
df$e = ifelse(grepl('E', df$v), 4, 0) df$b = ifelse(grepl('B', df$v), 3, 0) df$r = ifelse(grepl('R', df$v), 2, 0) df$m = ifelse(grepl('M', df$v), 1, 0) df$idx = df$n * (df$e + df$b + df$r + df$m)But it becomes unfeasible as the number of columns grows.
最满意答案
1)定义查找表, lookup和函数Sum ,它采用单个字母的向量,查找每个字母并对其查找数字求和。
使用Sum将结果乘以n将v拆分为单个字母的向量列表并在该列表上进行sapply。
lookup <- c(E = 4, B = 3, R = 2, M = 1) Sum <- function(x) sum(lookup[x]) transform(df, idx = n * sapply(strsplit(as.character(v), ""), Sum))赠送:
v n idx 1 E 100 400 2 B 100 300 3 EB 50 350 4 RM 50 1502)使用上面的lookup的替代方案是以下内容:对于v每个字符,使用以公式表示法表示的匿名函数应用lookup ,创建一个列表,在该列表上我们对sum sapply ,最后乘以n 。
library(gsubfn) transform(df, idx = n * sapply(strapply(as.character(v), ".", x ~ lookup[x]), sum))3)使用上面lookup dplyr / tidyr溶液如下。 我们插入一个id来唯一标识每一行,并使用separate_rows将v每个字母放在一个单独的行中。 然后,我们通过查找每个字母和求和来汇总具有相同id的所有行。 最后我们删除id 。
library(dplyr) library(tidyr) df %>% mutate(id = 1:n()) %>% separate_rows(v, sep = "(?<=.)(?=.)") %>% group_by(id, n) %>% summarize(idx = sum(n * lookup[v])) %>% ungroup %>% select(-id)赠送:
# A tibble: 4 x 3 id n idx <int> <dbl> <dbl> 1 1 100. 400. 2 2 100. 300. 3 3 50. 350. 4 4 50. 150.通过用这两个语句替换separate_rows语句,可以避免复杂的正则表达式:
mutate(v = strsplit(as.character(v), "")) %>% unnest %>%1) Define a lookup table, lookup, and a function Sum that takes a vector of single letters, looks up each and sums their lookup number.
split v into a list of vectors of single letters and sapply over that list using Sum mulitplying the result by n.
lookup <- c(E = 4, B = 3, R = 2, M = 1) Sum <- function(x) sum(lookup[x]) transform(df, idx = n * sapply(strsplit(as.character(v), ""), Sum))giving:
v n idx 1 E 100 400 2 B 100 300 3 EB 50 350 4 RM 50 1502) An alternative using lookup from above is the following which for each character in v applies lookup using the anonymous function expressed in formula notation creating a list over which we sapply the sum and finally multiply by n.
library(gsubfn) transform(df, idx = n * sapply(strapply(as.character(v), ".", x ~ lookup[x]), sum))3) A dplyr/tidyr solution using lookup from above is the following. We insert an id to uniquely identify each row and the use separate_rows to place each letter of v in a separate row. We then summarize all rows with the same id by looking up each letter and summing. Finally we remove id.
library(dplyr) library(tidyr) df %>% mutate(id = 1:n()) %>% separate_rows(v, sep = "(?<=.)(?=.)") %>% group_by(id, n) %>% summarize(idx = sum(n * lookup[v])) %>% ungroup %>% select(-id)giving:
# A tibble: 4 x 3 id n idx <int> <dbl> <dbl> 1 1 100. 400. 2 2 100. 300. 3 3 50. 350. 4 4 50. 150.One could avoid the complex regular expression by replacing the separate_rows statement with these two statements:
mutate(v = strsplit(as.character(v), "")) %>% unnest %>%更多推荐
df$m,df$e,电脑培训,计算机培训,IT培训"/> <meta name="description" c
发布评论