在我的问题中我有2个数据帧mydataframe1和mydataframe2如下所示。
mydataframe1 Out[13]: Start End Remove 50 60 1 61 105 0 106 150 1 151 160 0 161 180 1 181 200 0 201 400 1 mydataframe2 Out[14]: Start End 55 100 105 140 151 154 155 185 220 240从mydataframe2我想删除mydataframe2中任何"Remove" = 1间隔中包含间隔Start-End(也部分)的mydataframe1 。 换句话说,mydataframe2的间隔和mydataframe2每个间隔之间不应该存在任何一致性。
在这种情况下,mydataframe2成为
mydataframe2 Out[15]: Start End 151 154in my problem I have 2 dataframes mydataframe1 and mydataframe2 as below.
mydataframe1 Out[13]: Start End Remove 50 60 1 61 105 0 106 150 1 151 160 0 161 180 1 181 200 0 201 400 1 mydataframe2 Out[14]: Start End 55 100 105 140 151 154 155 185 220 240From mydataframe2 I would like to remove the rows for which the interval Start-End are contained (also partially) in any of the "Remove"=1 intervals in mydataframe1. In other words there should not be any itnersection between the intervals of mydataframe2 and each of the intervals in mydataframe1
in this case mydataframe2 becomes
mydataframe2 Out[15]: Start End 151 154最满意答案
您可以使用pd.IntervalIndex进行交叉
获取要删除的行
In [313]: dfr = df1.query('Remove == 1')构造IntervalIndex以删除范围
In [314]: s1 = pd.IntervalIndex.from_arrays(dfr.Start, dfr.End, 'both')构造要测试的IntervalIndex
In [315]: s2 = pd.IntervalIndex.from_arrays(df2.Start, df2.End, 'both')选择不在s1范围内的s2行
In [316]: df2.loc[[x not in s1 for x in s2]] Out[316]: Start End 2 151 154细节
In [320]: df1 Out[320]: Start End Remove 0 50 60 1 1 61 105 0 2 106 150 1 3 151 160 0 4 161 180 1 5 181 200 0 6 201 400 1 In [321]: df2 Out[321]: Start End 0 55 100 1 105 140 2 151 154 3 155 185 4 220 240 In [322]: dfr Out[322]: Start End Remove 0 50 60 1 2 106 150 1 4 161 180 1 6 201 400 1IntervalIndex详细信息
In [323]: s1 Out[323]: IntervalIndex([[50, 60], [106, 150], [161, 180], [201, 400]] closed='both', dtype='interval[int64]') In [324]: s2 Out[324]: IntervalIndex([[55, 100], [105, 140], [151, 154], [155, 185], [220, 240]] closed='both', dtype='interval[int64]') In [326]: [x not in s1 for x in s2] Out[326]: [False, False, True, False, False]You could use pd.IntervalIndex for intersections
Get rows to be removed
In [313]: dfr = df1.query('Remove == 1')Construct IntervalIndex from to be removed ranges
In [314]: s1 = pd.IntervalIndex.from_arrays(dfr.Start, dfr.End, 'both')Construct IntervalIndex from to be tested
In [315]: s2 = pd.IntervalIndex.from_arrays(df2.Start, df2.End, 'both')Select rows of s2 which are not in s1 ranges
In [316]: df2.loc[[x not in s1 for x in s2]] Out[316]: Start End 2 151 154Details
In [320]: df1 Out[320]: Start End Remove 0 50 60 1 1 61 105 0 2 106 150 1 3 151 160 0 4 161 180 1 5 181 200 0 6 201 400 1 In [321]: df2 Out[321]: Start End 0 55 100 1 105 140 2 151 154 3 155 185 4 220 240 In [322]: dfr Out[322]: Start End Remove 0 50 60 1 2 106 150 1 4 161 180 1 6 201 400 1IntervalIndex details
In [323]: s1 Out[323]: IntervalIndex([[50, 60], [106, 150], [161, 180], [201, 400]] closed='both', dtype='interval[int64]') In [324]: s2 Out[324]: IntervalIndex([[55, 100], [105, 140], [151, 154], [155, 185], [220, 240]] closed='both', dtype='interval[int64]') In [326]: [x not in s1 for x in s2] Out[326]: [False, False, True, False, False]更多推荐
发布评论