确定一个范围是否在另一个范围内(Determine if one range is within another)

编程入门 行业动态 更新时间:2024-10-25 10:33:34
确定一个范围是否在另一个范围内(Determine if one range is within another)

如果存在按第一列排序的范围的文件(范围不重叠):

1 10 12 15 18 19

另一个,按第一列排序(可以有重叠):

1 5 2 10 12 13 13 20

我想确定第二个文件中的每一行(范围),如果该行(范围)与第一个文件中的任何范围相交。 到目前为止我做了以下事情

df_1 = pd.read_csv('range1.txt',sep=' ') df_2 = pd.read_csv('range2.txt',sep=' ') for i in xrange(len(df_1)): start_1 = df_1.iloc[i,0] stop_1 = df_1.iloc[i, 1] for j in xrange(len(df_2)): start_2 = df_2.iloc[j,0] stop_2 = df_2.iloc[j, 1] if start_2 > stop_1: break elif stop_2 < start_1: continue else: # add ranges from second file to list

我知道这可能是非常低效的,所以我想知道是否有更高的计算效率/更快的方法来解决这个问题。

If there is a file with ranges sorted by the first column (no overlap of ranges):

1 10 12 15 18 19

And another, sorted by the first column (can have overlaps):

1 5 2 10 12 13 13 20

I would like to determine for each line (range)in the second file, if this line(range) intersects with any of the ranges in the first file. I did the following so far

df_1 = pd.read_csv('range1.txt',sep=' ') df_2 = pd.read_csv('range2.txt',sep=' ') for i in xrange(len(df_1)): start_1 = df_1.iloc[i,0] stop_1 = df_1.iloc[i, 1] for j in xrange(len(df_2)): start_2 = df_2.iloc[j,0] stop_2 = df_2.iloc[j, 1] if start_2 > stop_1: break elif stop_2 < start_1: continue else: # add ranges from second file to list

This I know can be terribly inefficient, so I was wondering if there is a more computationally efficient/faster way to solve this.

最满意答案

@Olivier Pellier-Cuit提供了快速重叠测试的链接 。 如果您需要进行成员资格检查而不是重叠测试,请使用此算法 。

因此,使用此算法,我们可以执行以下操作:

df1['m'] = (df1.a + df1.b) df1['d'] = (df1.b - df1.a) df2['m'] = (df2.a + df2.b) df2['d'] = (df2.b - df2.a) df2[['m','d']].apply(lambda x: (np.abs(df1.m - x.m) < df1.d +x.d).any(), axis=1)

PS我通过去掉division by 2来略微简化了m和d的计算,因为它可以完成消除常用术语。

输出:

In [105]: df2[['m','d']].apply(lambda x: (np.abs(df1.m - x.m) < df1.d +x.d).any(), axis=1) Out[105]: 0 True 1 True 2 True 3 True 4 False dtype: bool

建立:

df1 = pd.read_csv(io.StringIO(""" a b 1 10 12 15 18 19 """), delim_whitespace=True) df2 = pd.read_csv(io.StringIO(""" a b 1 5 2 10 12 13 13 20 50 60 """), delim_whitespace=True)

注意:我故意在DF2上添加了一对(50,60),它与DF1的任何间隔都不重叠

计算m和d列的数据框:

In [106]: df1 Out[106]: a b m d 0 1 10 11 9 1 12 15 27 3 2 18 19 37 1 In [107]: df2 Out[107]: a b m d 0 1 5 6 4 1 2 10 12 8 2 12 13 25 1 3 13 20 33 7 4 50 60 110 10

@Olivier Pellier-Cuit has provided a link to fast overlap test. If you need membership check instead of overlap test, use this algorithm.

So using this algorithm we can do the following:

df1['m'] = (df1.a + df1.b) df1['d'] = (df1.b - df1.a) df2['m'] = (df2.a + df2.b) df2['d'] = (df2.b - df2.a) df2[['m','d']].apply(lambda x: (np.abs(df1.m - x.m) < df1.d +x.d).any(), axis=1)

PS i've slightly simplified the calculations of m and d by getting rid of division by 2, because it can be done eliminating common terms.

Output:

In [105]: df2[['m','d']].apply(lambda x: (np.abs(df1.m - x.m) < df1.d +x.d).any(), axis=1) Out[105]: 0 True 1 True 2 True 3 True 4 False dtype: bool

setup:

df1 = pd.read_csv(io.StringIO(""" a b 1 10 12 15 18 19 """), delim_whitespace=True) df2 = pd.read_csv(io.StringIO(""" a b 1 5 2 10 12 13 13 20 50 60 """), delim_whitespace=True)

NOTE: i've intentionally added a pair (50, 60) to the DF2, which doesn't overlap with any interval from DF1

Data frames with calculated m and d columns:

In [106]: df1 Out[106]: a b m d 0 1 10 11 9 1 12 15 27 3 2 18 19 37 1 In [107]: df2 Out[107]: a b m d 0 1 5 6 4 1 2 10 12 8 2 12 13 25 1 3 13 20 33 7 4 50 60 110 10

更多推荐

本文发布于:2023-08-01 12:50:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1357928.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:范围内   Determine   range

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!