如何计算元素在数据表中连续出现的次数？

编程入门行业动态更新时间:2024-10-24 13:27:40

本文介绍了如何计算元素在数据表中连续出现的次数？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个data.table看起来像这样

ID，Order，Segment 1， A 1，2，B 1，3，B 1，4，C 1，5，B 1，6，B 1，7，B 1，8，B

订单列。我想了解每个ID的连续B的数量。理想情况下，我想要的输出是

ID，Consec 1，2 1，4

因为段B在行2和3（2次）中连续出现，然后在行5中连续出现， 6,7,8（4次）。

循环解决方案非常明显，但也很慢。

在data.table中有没有优雅的解决方案，也很快？

$ b

library（data.table）＃v1.9.5 + DT [order（ID，Order）] [，indx：= rleid（Segment） 'b'， list（Consec = .N），by = list（indx，ID）] [，indx：= NULL] [] ＃ID Consec ＃ 1：1 2 ＃2：1 4

或@eddi建议

DT [order（ID，Order）] [，。（Consec = .N），by =。（ID，Segment， rleid（Segment））] [Segment =='B'，。（ID，Consec）] ＃ID Consec ＃1：1 2 ＃2：1 4

更有效的方法是使用 setorder 而不是 order （由@Arun建议）

setorder ，ID，Order）[，。（Consec = .N），by =。（ID，Segment， rleid（Segment））] [Segment =='B'，...（ID，Consec） $ b＃ID Consec ＃1：1 2 ＃2：1 4

I have a data.table that looks like this

ID, Order, Segment 1, 1, A 1, 2, B 1, 3, B 1, 4, C 1, 5, B 1, 6, B 1, 7, B 1, 8, B

Basically by ordering the data using the Order column. I would like to understand the number of consecutive B's for each of the ID's. Ideally the output I would like is

ID, Consec 1, 2 1, 4

Because the segment B appears consecutively in row 2 and 3 (2 times), and then again in row 5,6,7,8 (4 times).

The loop solution is quite obvious but would also be very slow.

Are there elegant solutions in data.table that is also fast?

P.S. The data I am dealing with has ~20 million rows.

解决方案

Try

library(data.table)#v1.9.5+ DT[order(ID, Order)][, indx:=rleid(Segment)][Segment=='B', list(Consec=.N), by = list(indx, ID)][,indx:=NULL][] # ID Consec #1: 1 2 #2: 1 4

Or as @eddi suggested

DT[order(ID, Order)][, .(Consec = .N), by = .(ID, Segment, rleid(Segment))][Segment == 'B', .(ID, Consec)] # ID Consec #1: 1 2 #2: 1 4

A more memory efficient method would be to use setorder instead of order (as suggested by @Arun)

setorder(DT, ID, Order)[, .(Consec = .N), by = .(ID, Segment, rleid(Segment))][Segment == 'B', .(ID, Consec)] # ID Consec #1: 1 2 #2: 1 4