在多个条件下加入时的奇怪行为

编程入门 行业动态 更新时间:2024-10-28 00:23:37
本文介绍了在多个条件下加入时的奇怪行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

在回答此问题中有关使用 data.table

In answering this question about rolling joins with the data.table package, I've run into some odd behavior when using multiple conditions.

考虑以下数据集:

dt <- data.table(t_id = c(1,4,2,3,5), place = c("a","a","d","a","d"), num = c(5.1, 5.1, 6.2, 5.1, 6.2), key=c("place")) dt_lu <- data.table(f_id = c(rep(1,4),rep(2,3)), place = c("a","b","c","d","a","d","a"), num = c(6,7,8,9,6,7,8), key=c("place"))

$ b b

当我想用 dt_lu 加入 dt ,其中只有 dt_lu ,并且 dt_lu $ num 高于 dt $ num 如下:

When I want to join dt with dt_lu where only those cases of dt_lu that have the same place and where dt_lu$num is higher than dt$num as follows:

dt_lu[dt, list(tid = i.t_id, tnum = i.num, fnum = num[i.num < num], fid = f_id), by = .EACHI]

我得到所需的结果:

place tid tnum fnum fid 1: a 1 5.1 6 1 2: a 1 5.1 6 2 3: a 1 5.1 8 2 4: a 4 5.1 6 1 5: a 4 5.1 6 2 6: a 4 5.1 8 2 7: a 3 5.1 6 1 8: a 3 5.1 6 2 9: a 3 5.1 8 2 10: d 2 6.2 9 1 11: d 2 6.2 7 2 12: d 5 6.2 9 1 13: d 5 6.2 7 2

当我想添加一个附加条件时,我可以通过链接附加条件如下:

When I want to add an additional condition, I can get the desired result easily by chaining that additional conditions as follows:

dt_lu[dt, list(tid = i.t_id, tnum = i.num, fnum = num[i.num < num], fid = f_id), by = .EACHI][fnum - tnum < 2]

这会给我:

place tid tnum fnum fid 1: a 1 5.1 6 1 2: a 1 5.1 6 2 3: a 4 5.1 6 1 4: a 4 5.1 6 2 5: a 3 5.1 6 1 6: a 3 5.1 6 2 7: d 2 6.2 7 2 8: d 5 6.2 7 2

但是当我添加额外的条件(即:差异必须小于 2 )如下:

However when I add the extra condition (i.e.: the difference has to be less than 2) as follows:

dt_lu[dt, list(tid = i.t_id, tnum = i.num, fnum = num[i.num < num & num - i.num < 2], fid = f_id), by = .EACHI]

我没有得到预期结果:

place tid tnum fnum fid 1: a 1 5.1 6 1 2: a 1 5.1 6 2 3: a 1 5.1 6 2 4: a 4 5.1 6 1 5: a 4 5.1 6 2 6: a 4 5.1 6 2 7: a 3 5.1 6 1 8: a 3 5.1 6 2 9: a 3 5.1 6 2 10: d 2 6.2 7 1 11: d 2 6.2 7 2 12: d 5 6.2 7 1 13: d 5 6.2 7 2

警告消息: $。

Moreover, I get the following warning message:

警告消息:在 [。data.table (dt_lu, dt,list(tid = i.t_id,tnum = i.num,fnum = num [i.num < :组1的结果的列3是长度2,但该结果中最长的列是3.再循环离开剩余的1个项目。

Warning message: In [.data.table(dt_lu, dt, list(tid = i.t_id, tnum = i.num, fnum = num[i.num < : Column 3 of result for group 1 is length 2 but the longest column in this result is 3. Recycled leaving remainder of 1 items. This warning is once only for the first group with this issue.

预期的结果是:

place tid tnum fnum fid 1: a 1 5.1 6 1 2: a 1 5.1 6 2 4: a 4 5.1 6 1 5: a 4 5.1 6 2 7: a 3 5.1 6 1 8: a 3 5.1 6 2 11: d 2 6.2 7 2 13: d 5 6.2 7 2

我故意保留示例显示在最终结果(与工作解决方案相同)中必须维护哪些行。

I deliberately kept the rownumbers from the first example to show which rows have to be maintained in the final result (which is the same as the working solution).

作为这个答案显示,应该可以在连接操作中使用多个条件。

As this answer shows, it should be possible to use multiple conditions inside the join operation.

我尝试了以下替代方法,但是它们都不起作用:

I tried the following alternatives, but both of them don't work:

dt_lu[dt, list(tid = i.t_id, tnum = i.num, fnum = num[(i.num < num) & (num - i.num < 2)], fid = f_id), by = .EACHI] dt_lu[dt, { val = num[(i.num < num) & (num - i.num < 2)]; list(tid = i.t_id, tnum = i.num, fnum = val, fid = f_id)}, by = .EACHI]

有人可以解释一下为什么我在连接操作中没有得到多个条件的期望结果? / p>

Could someone explain me why I don't get the desired result with multiple conditions inside the join operation?

推荐答案

警告消息给出了问题。此外,使用 print()在这里非常有用。

The warning message gives away the issue. Also, using print() is quite helpful here.

dt_lu[dt, print(i.num < num & num - i.num < 2), by=.EACHI] # [1] TRUE TRUE FALSE # [1] TRUE TRUE FALSE # [1] TRUE TRUE FALSE # [1] FALSE TRUE # [1] FALSE TRUE # Empty data.table (0 rows) of 3 cols: place,place,num

考虑第一种情况,条件计算结果为 TRUE, FALSE 。这组有3个观察。您的 j-expression 包含:

Consider the first case where the condition evaluates to TRUE, TRUE, FALSE. There are 3 observations for this group. And your j-expression contains:

.(tid = i.t_id, tnum = i.num, fnum = num[i.num < num & num - i.num < 2], fid = f_id)

i.t_id 和 i。 num 长度为1(因为它们来自 dt )。但 num [.. condn ..] 将返回length = 2,而 f_id 将返回length =长度= 1和长度= 2的项目将被回收到最长项目/矢量的长度= 3。这导致不正确的结果。由于3不能完全被2整除,它返回警告。

i.t_id and i.num are of length 1 (as they come from dt). But num[..condn..] will return length = 2, whereas f_id will return length = 3. Both the length=1 and length=2 items will be recycled to the length of the longest item/vector = 3. That leads to incorrect result. Since 3 isn't perfectly divisible by 2, it returns the warning.

您打算做什么:

.(tid = i.t_id, tnum = i.num, fnum = num[i.num < num & num - i.num < 2], fid = f_id[i.num < num & num - i.num < 2])

p>

or equivalently:

{ idx = i.num < num & num - i.num < 2 .(tid = i.t_id, tnum = i.num, fnum = num[idx], fid = f_id[idx]) }

组合在一起:

dt_lu[dt, { idx = i.num < num & num - i.num < 2 .(tid = i.t_id, tnum = i.num, fnum = num[idx], fid = f_id[idx]) }, by = .EACHI] # place tid tnum fnum fid # 1: a 1 5.1 6 1 # 2: a 1 5.1 6 2 # 3: a 4 5.1 6 1 # 4: a 4 5.1 6 2 # 5: a 3 5.1 6 1 # 6: a 3 5.1 6 2 # 7: d 2 6.2 7 2 # 8: d 5 6.2 7 2

更多推荐

在多个条件下加入时的奇怪行为

本文发布于:2023-10-16 11:32:57,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1497438.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多个   入时   条件下   奇怪

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!