从其他多列的值有条件地替换多列的值

编程入门行业动态更新时间:2024-10-24 06:33:54

本文介绍了从其他多列的值有条件地替换多列的值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

假设我有这个数据集:

set.seed (1234); data.frame(cbind(a=rep(c("si","no"),30),b=rnorm(60)), c=rep(c("d","e","f"),20)) %>% head()

然后我想添加很多列(在这个例子中我只添加了两列)，以识别每组之间的不同案例(在本例中，列a").

Then I want to add many columns (in this example I only added two), to identify distinct cases between each group (in this case, column "a").

set.seed(1234); data.frame(cbind(a=rep(c("si","no"),30),b=rnorm(60)),c=rep(c("d","e","f"),20)) %>% group_by(a) %>% dplyr::mutate_at(vars(c(b,c)), .funs= list(dups_hash_ing= ~n_distinct(.)))

此代码留下以下数据集:

This code leaves the following dataset:

如果我用dput设置数据集，结果是

If I set the dataset with dput, the outcome is

structure(list(a = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("no", "si"), class = "factor"), b = structure(c(22L, 1L, 51L, 34L, 50L, 57L, 53L, 10L, 47L, 3L, 11L, 23L, 15L, 38L, 58L, 39L, 41L, 17L, 28L, 21L, 37L, 45L, 29L, 46L, 32L, 48L, 56L, 52L, 26L, 19L, 35L, 8L, 55L, 20L, 9L, 36L, 2L, 12L, 6L, 42L, 49L, 43L, 59L, 54L, 31L, 13L, 60L, 44L, 14L, 30L, 7L, 5L, 16L, 27L, 33L, 18L, 24L, 4L, 25L, 40L), .Label = c("-0.0997905884418961", "-0.151736536534977", "-0.198416273822079", "-0.254874652654534", "-0.274704218225806", "-0.304721068966714", "-0.324393300483657", "-0.400235237343163", "-0.415751788401515", "-0.50873701541522", "-0.538070788884863", "-0.60615111526422", "-0.659770093821306", "-0.684320344136007", "-0.789646852263761", "-0.933503340589868", "-0.965903210133575", "-1.07754212275943", "-1.11444896479736", "-1.60708093984972", "-2.07823754188738", "-2.7322195229558", "-2.85575865501923", "-3.23315213292314", "0.0295178303214797", "0.0326639575014441", "0.116845344986082", "0.162654708118265", "0.185513915583057", "0.186492083080971", "0.287709728313787", "0.311681028661359", "0.319160238648117", "0.413868915451097", "0.418057822385083", "0.42200837321742", "0.485226820569252", "0.487814635163685", "0.500694614280786", "0.594273774110513", "0.62021020366732", "0.629536099884472", "0.660212631820405", "0.677415500438328", "0.696768778564913", "0.700733515544461", "0.704180178465512", "0.760462361967838", "0.895171980275539", "0.912322161610113", "0.976031734922396", "1.1123628412626", "1.16910851401363", "1.17349757263239", "1.49349310261748", "1.84246362620766", "1.98373220068438", "2.16803253951933", "2.27348352044748", "2.91914013071762" ), class = "factor"), c = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("d", "e", "f"), class = "factor"), a_dups_hash_ing = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), b_dups_hash_ing = c(30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), c_dups_hash_ing = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -60L), groups = structure(list( a = structure(1:2, .Label = c("no", "si"), class = "factor"), .rows = list(c(2L, 4L, 6L, 8L, 10L, 12L, 14L, 16L, 18L, 20L, 22L, 24L, 26L, 28L, 30L, 32L, 34L, 36L, 38L, 40L, 42L, 44L, 46L, 48L, 50L, 52L, 54L, 56L, 58L, 60L), c(1L, 3L, 5L, 7L, 9L, 11L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 29L, 31L, 33L, 35L, 37L, 39L, 41L, 43L, 45L, 47L, 49L, 51L, 53L, 55L, 57L, 59L))), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))

我需要做的是，如果每组不同案例的数量超过一个，则逐列替换，使用原始列的值.我必须这样做50多列.将仅针对具有 mutate 的一列提供一个示例:

What I need to do, is replace, column by column, if the number of distinct cases is more than one per group, with the value of the original column. I have to do this for more than 50 columns. An example of this will be provided for only one column with mutate:

dplyr::mutate(b_dups_hash_ing= ifelse(>1,b,0))

我需要为许多变量重复上面提供的代码.这与 mutate_at 非常相似(括号中的单词是我会做的).以下示例不起作用，但这是我在理想世界中会做的事情，只是为了您更好地理解我的问题.

I need to repeat the code provided above for many variables. This is very similar to a mutate_at (words in brackets is what I would do). The following example does not work, but is something I would do in an ideal world, just for your better understanding of my problem.

dplyr::mutate_at(vars(contains('_dups_hash_ing')), .funs = list(~ifelse(.>1,vars([original]),0)))

推荐答案

这是您要找的吗?

df %>% dplyr::mutate_at(vars(contains('_dups_hash_ing')), ~ ifelse(. > 1, ., 0)) %>% head #> # A tibble: 6 x 6 #> # Groups: a [2] #> a b c a_dups_hash_ing b_dups_hash_ing c_dups_hash_ing #> <fct> <fct> <fct> <dbl> <int> <int> #> 1 si -2.7322195229558 d 0 30 3 #> 2 no -0.09979058844189… e 0 30 3 #> 3 si 0.976031734922396 f 0 30 3 #> 4 no 0.413868915451097 d 0 30 3 #> 5 si 0.912322161610113 e 0 30 3 #> 6 no 1.98373220068438 f 0 30 3

更多推荐

从其他多列的值有条件地替换多列的值

本文发布于:2023-07-08 20:10:35，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1080217.html