从映射值计算新列(calculate new column from mapped values)

编程入门行业动态更新时间:2024-10-22 13:28:07

我有一个data.frame df

df = data.frame(v = c('E', 'B', 'EB', 'RM')) df$n= 100 / apply(df, 1, nchar)

其中v表示值E = 4 ， B = 3 ， R = 2 ， M = 1

我想像这样计算一列：

v n idx 1 E 100 400 2 B 100 300 3 EB 50 350 4 RM 50 150

其中idx是n (v) 。例如，对于第一行4 * 100 = 400并且对于最后一行(2 + 1) * 50 = 150

我有这样的事情：

df$e = ifelse(grepl('E', df$v), 4, 0) df$b = ifelse(grepl('B', df$v), 3, 0) df$r = ifelse(grepl('R', df$v), 2, 0) df$m = ifelse(grepl('M', df$v), 1, 0) df$idx = df$n * (df$e + df$b + df$r + df$m)

但随着列数的增加，它变得不可行。

I have a data.frame df

df = data.frame(v = c('E', 'B', 'EB', 'RM')) df$n= 100 / apply(df, 1, nchar)

Where v represents values E = 4, B = 3, R = 2, and M = 1

I want to calculate a column like so:

v n idx 1 E 100 400 2 B 100 300 3 EB 50 350 4 RM 50 150

Where idx is n (v). For example for the first row 4 * 100 = 400 and for the last row (2 + 1) * 50 = 150

I have something like this:

df$e = ifelse(grepl('E', df$v), 4, 0) df$b = ifelse(grepl('B', df$v), 3, 0) df$r = ifelse(grepl('R', df$v), 2, 0) df$m = ifelse(grepl('M', df$v), 1, 0) df$idx = df$n * (df$e + df$b + df$r + df$m)

But it becomes unfeasible as the number of columns grows.

最满意答案

1）定义查找表， lookup和函数Sum ，它采用单个字母的向量，查找每个字母并对其查找数字求和。

使用Sum将结果乘以n将v拆分为单个字母的向量列表并在该列表上进行sapply。

lookup <- c(E = 4, B = 3, R = 2, M = 1) Sum <- function(x) sum(lookup[x]) transform(df, idx = n * sapply(strsplit(as.character(v), ""), Sum))

赠送：

v n idx 1 E 100 400 2 B 100 300 3 EB 50 350 4 RM 50 150

2）使用上面的lookup的替代方案是以下内容：对于v每个字符，使用以公式表示法表示的匿名函数应用lookup ，创建一个列表，在该列表上我们对sum sapply ，最后乘以n 。

library(gsubfn) transform(df, idx = n * sapply(strapply(as.character(v), ".", x ~ lookup[x]), sum))

3）使用上面lookup dplyr / tidyr溶液如下。我们插入一个id来唯一标识每一行，并使用separate_rows将v每个字母放在一个单独的行中。然后，我们通过查找每个字母和求和来汇总具有相同id的所有行。最后我们删除id 。

library(dplyr) library(tidyr) df %>% mutate(id = 1:n()) %>% separate_rows(v, sep = "(?<=.)(?=.)") %>% group_by(id, n) %>% summarize(idx = sum(n * lookup[v])) %>% ungroup %>% select(-id)

赠送：

# A tibble: 4 x 3 id n idx <int> <dbl> <dbl> 1 1 100. 400. 2 2 100. 300. 3 3 50. 350. 4 4 50. 150.

通过用这两个语句替换separate_rows语句，可以避免复杂的正则表达式：

mutate(v = strsplit(as.character(v), "")) %>% unnest %>%

1) Define a lookup table, lookup, and a function Sum that takes a vector of single letters, looks up each and sums their lookup number.

split v into a list of vectors of single letters and sapply over that list using Sum mulitplying the result by n.

lookup <- c(E = 4, B = 3, R = 2, M = 1) Sum <- function(x) sum(lookup[x]) transform(df, idx = n * sapply(strsplit(as.character(v), ""), Sum))

giving:

v n idx 1 E 100 400 2 B 100 300 3 EB 50 350 4 RM 50 150

2) An alternative using lookup from above is the following which for each character in v applies lookup using the anonymous function expressed in formula notation creating a list over which we sapply the sum and finally multiply by n.

library(gsubfn) transform(df, idx = n * sapply(strapply(as.character(v), ".", x ~ lookup[x]), sum))

3) A dplyr/tidyr solution using lookup from above is the following. We insert an id to uniquely identify each row and the use separate_rows to place each letter of v in a separate row. We then summarize all rows with the same id by looking up each letter and summing. Finally we remove id.

library(dplyr) library(tidyr) df %>% mutate(id = 1:n()) %>% separate_rows(v, sep = "(?<=.)(?=.)") %>% group_by(id, n) %>% summarize(idx = sum(n * lookup[v])) %>% ungroup %>% select(-id)

giving:

# A tibble: 4 x 3 id n idx <int> <dbl> <dbl> 1 1 100. 400. 2 2 100. 300. 3 3 50. 350. 4 4 50. 150.

One could avoid the complex regular expression by replacing the separate_rows statement with these two statements:

mutate(v = strsplit(as.character(v), "")) %>% unnest %>%

更多推荐

df$m,df$e,电脑培训,计算机培训,IT培训"/> <meta name="description" c

本文发布于:2023-07-17 07:41:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1141097.html