如何计算元素在数据表中连续出现的次数?

编程入门 行业动态 更新时间:2024-10-24 13:27:40
本文介绍了如何计算元素在数据表中连续出现的次数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个data.table看起来像这样

ID,Order,Segment 1, A 1,2,B 1,3,B 1,4,C 1,5,B 1,6,B 1,7,B 1,8,B

订单列。我想了解每个ID的连续B的数量。理想情况下,我想要的输出是

ID,Consec 1,2 1,4

因为段B在行2和3(2次)中连续出现,然后在行5中连续出现, 6,7,8(4次)。

循环解决方案非常明显,但也很慢。

在data.table中有没有优雅的解决方案,也很快?

$ b

library(data.table)#v1.9.5 + DT [order(ID,Order)] [,indx:= rleid(Segment) 'b', list(Consec = .N),by = list(indx,ID)] [,indx:= NULL] [] #ID Consec # 1:1 2 #2:1 4

或@eddi建议

DT [order(ID,Order)] [,。(Consec = .N),by =。(ID,Segment, rleid(Segment))] [Segment =='B',。(ID,Consec)] #ID Consec #1:1 2 #2:1 4

更有效的方法是使用 setorder 而不是 order (由@Arun建议)

setorder ,ID,Order)[,。(Consec = .N),by =。(ID,Segment, rleid(Segment))] [Segment =='B',...(ID,Consec) $ b#ID Consec #1:1 2 #2:1 4

I have a data.table that looks like this

ID, Order, Segment 1, 1, A 1, 2, B 1, 3, B 1, 4, C 1, 5, B 1, 6, B 1, 7, B 1, 8, B

Basically by ordering the data using the Order column. I would like to understand the number of consecutive B's for each of the ID's. Ideally the output I would like is

ID, Consec 1, 2 1, 4

Because the segment B appears consecutively in row 2 and 3 (2 times), and then again in row 5,6,7,8 (4 times).

The loop solution is quite obvious but would also be very slow.

Are there elegant solutions in data.table that is also fast?

P.S. The data I am dealing with has ~20 million rows.

解决方案

Try

library(data.table)#v1.9.5+ DT[order(ID, Order)][, indx:=rleid(Segment)][Segment=='B', list(Consec=.N), by = list(indx, ID)][,indx:=NULL][] # ID Consec #1: 1 2 #2: 1 4

Or as @eddi suggested

DT[order(ID, Order)][, .(Consec = .N), by = .(ID, Segment, rleid(Segment))][Segment == 'B', .(ID, Consec)] # ID Consec #1: 1 2 #2: 1 4

A more memory efficient method would be to use setorder instead of order (as suggested by @Arun)

setorder(DT, ID, Order)[, .(Consec = .N), by = .(ID, Segment, rleid(Segment))][Segment == 'B', .(ID, Consec)] # ID Consec #1: 1 2 #2: 1 4

更多推荐

如何计算元素在数据表中连续出现的次数?

本文发布于:2023-11-22 06:56:41,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1616405.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数据表   元素   次数

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!