交叉口2熊猫数据帧(intersection 2 pandas dataframe)

编程入门 行业动态 更新时间:2024-10-22 16:37:16
交叉口2熊猫数据帧(intersection 2 pandas dataframe)

在我的问题中我有2个数据帧mydataframe1和mydataframe2如下所示。

mydataframe1 Out[13]: Start End Remove 50 60 1 61 105 0 106 150 1 151 160 0 161 180 1 181 200 0 201 400 1 mydataframe2 Out[14]: Start End 55 100 105 140 151 154 155 185 220 240

从mydataframe2我想删除mydataframe2中任何"Remove" = 1间隔中包含间隔Start-End(也部分)的mydataframe1 。 换句话说,mydataframe2的间隔和mydataframe2每个间隔之间不应该存在任何一致性。

在这种情况下,mydataframe2成为

mydataframe2 Out[15]: Start End 151 154

in my problem I have 2 dataframes mydataframe1 and mydataframe2 as below.

mydataframe1 Out[13]: Start End Remove 50 60 1 61 105 0 106 150 1 151 160 0 161 180 1 181 200 0 201 400 1 mydataframe2 Out[14]: Start End 55 100 105 140 151 154 155 185 220 240

From mydataframe2 I would like to remove the rows for which the interval Start-End are contained (also partially) in any of the "Remove"=1 intervals in mydataframe1. In other words there should not be any itnersection between the intervals of mydataframe2 and each of the intervals in mydataframe1

in this case mydataframe2 becomes

mydataframe2 Out[15]: Start End 151 154

最满意答案

您可以使用pd.IntervalIndex进行交叉

获取要删除的行

In [313]: dfr = df1.query('Remove == 1')

构造IntervalIndex以删除范围

In [314]: s1 = pd.IntervalIndex.from_arrays(dfr.Start, dfr.End, 'both')

构造要测试的IntervalIndex

In [315]: s2 = pd.IntervalIndex.from_arrays(df2.Start, df2.End, 'both')

选择不在s1范围内的s2行

In [316]: df2.loc[[x not in s1 for x in s2]] Out[316]: Start End 2 151 154

细节

In [320]: df1 Out[320]: Start End Remove 0 50 60 1 1 61 105 0 2 106 150 1 3 151 160 0 4 161 180 1 5 181 200 0 6 201 400 1 In [321]: df2 Out[321]: Start End 0 55 100 1 105 140 2 151 154 3 155 185 4 220 240 In [322]: dfr Out[322]: Start End Remove 0 50 60 1 2 106 150 1 4 161 180 1 6 201 400 1

IntervalIndex详细信息

In [323]: s1 Out[323]: IntervalIndex([[50, 60], [106, 150], [161, 180], [201, 400]] closed='both', dtype='interval[int64]') In [324]: s2 Out[324]: IntervalIndex([[55, 100], [105, 140], [151, 154], [155, 185], [220, 240]] closed='both', dtype='interval[int64]') In [326]: [x not in s1 for x in s2] Out[326]: [False, False, True, False, False]

You could use pd.IntervalIndex for intersections

Get rows to be removed

In [313]: dfr = df1.query('Remove == 1')

Construct IntervalIndex from to be removed ranges

In [314]: s1 = pd.IntervalIndex.from_arrays(dfr.Start, dfr.End, 'both')

Construct IntervalIndex from to be tested

In [315]: s2 = pd.IntervalIndex.from_arrays(df2.Start, df2.End, 'both')

Select rows of s2 which are not in s1 ranges

In [316]: df2.loc[[x not in s1 for x in s2]] Out[316]: Start End 2 151 154

Details

In [320]: df1 Out[320]: Start End Remove 0 50 60 1 1 61 105 0 2 106 150 1 3 151 160 0 4 161 180 1 5 181 200 0 6 201 400 1 In [321]: df2 Out[321]: Start End 0 55 100 1 105 140 2 151 154 3 155 185 4 220 240 In [322]: dfr Out[322]: Start End Remove 0 50 60 1 2 106 150 1 4 161 180 1 6 201 400 1

IntervalIndex details

In [323]: s1 Out[323]: IntervalIndex([[50, 60], [106, 150], [161, 180], [201, 400]] closed='both', dtype='interval[int64]') In [324]: s2 Out[324]: IntervalIndex([[55, 100], [105, 140], [151, 154], [155, 185], [220, 240]] closed='both', dtype='interval[int64]') In [326]: [x not in s1 for x in s2] Out[326]: [False, False, True, False, False]

更多推荐

本文发布于:2023-07-30 22:13:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1340347.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:熊猫   交叉口   数据   dataframe   pandas

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!