在data.table中按组标记随机选择的N行

编程入门 行业动态 更新时间:2024-10-18 16:47:29
本文介绍了在data.table中按组标记随机选择的N行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

在C3列的data.table中,我要标记每个组(C1)随机选择的N行.在SO 此处,此处和此处.但是基于答案,仍然无法弄清楚如何为我的任务找到解决方案.

At the data.table in column C3 I want to flag N randomly selected rows by each group (C1). There are several similar questions have already been asked on SO here, here and here. But based on the answers still cannot figure out how to find a solution for my task.

set.seed(1) dt = data.table(C1 = c("A","A","A","B","C","C","C","D","D","D"), C2 = c(2,1,3,1,2,3,4,5,4,5)) dt C1 C2 1: A 2 2: A 1 3: A 3 4: B 1 5: C 2 6: C 3 7: C 4 8: D 5 9: D 4 10: D 5

以下是每个C1组随机选择的两行的行索引(不适用于B组)

Here are row indexes for two randomly selected rows by each group C1 (doesn't work well for group B):

dt[, sample(.I, min(.N, 2)), by = C1]$V1 [1] 1 3 3 7 5 10 9

NB:对于B,仅应选择一行,因为组B仅包含一行.

这是一种针对每个组中随机选择的行的解决方案,这通常不适用于B组:

Here is a solution for one randomly selected row in each group, which often doesn't work for group B:

dt[, C3 := .I == sample(.I, 1), by = C1] dt C1 C2 C3 1: A 2 FALSE 2: A 1 TRUE 3: A 3 FALSE 4: B 1 FALSE 5: C 2 TRUE 6: C 3 FALSE 7: C 4 FALSE 8: D 5 TRUE 9: D 4 FALSE 10: D 5 FALSE

实际上,我想将其扩展到N行.我已经尝试了(两行):

Actually I want to expand it on N rows. I've tried (for two rows):

dt[, C3 := .I==sample(.I, min(.N, 2)), by = C1]

那当然是行不通的.

非常感谢您的帮助!

推荐答案

dt[, C3 := 1:.N %in% sample(.N, min(.N, 2)), by = C1]

或使用 head ,但我认为应该慢一些

Or use head, but I think that should be slower

dt[, C3 := 1:.N %in% head(sample(.N), 2) , by = C1]

如果标记的行数不是恒定的,则可以

If the number of flagged rows is not constant you can do

flagsz <- c(2, 1, 2, 3) dt[, C3 := 1:.N %in% sample(.N, min(.N, flagsz[.GRP])), by = C1]

更多推荐

在data.table中按组标记随机选择的N行

本文发布于:2023-11-22 06:29:12,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1616320.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:标记   data   table   中按组

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!