从两个numpy数组中删除重复的元素(remove duplicate elements from two numpy arrays)

编程入门 行业动态 更新时间:2024-10-28 06:24:17
从两个numpy数组中删除重复的元素(remove duplicate elements from two numpy arrays)

我有两个numpy数组a和b ,有两千万个元素(浮点数)。 如果这两个数组的组合元素相同,那么我们称之为重复,应该从两个数组中删除。 例如,

a = numpy.array([1,3,6,3,7,8,3,2,9,10,14,6]) b = numpy.array([2,4,15,4,7,9,2,2,0,11,4,15])

从这两个数组中,我们有a[2]&b[2]与a[11]&b[11] ,然后我们将其称为重复元素,应将其删除。 与a[1]&b[1]与a[3]&b[3]虽然每个数组本身都有重复元素,但它们不会被视为重复元素。 所以我希望返回的数组是:

a = numpy.array([1,3,6,7,8,3,2,9,10,14]) b = numpy.array([2,4,15,7,9,2,2,0,11,4])

任何人都有最聪明的方法来实现这种减少?

I have two numpy arrays a and b, with twenty million elements (float number). If the combination elements of those two arrays are the same, then we call it duplicate, which should be remove from the two arrays. For instance,

a = numpy.array([1,3,6,3,7,8,3,2,9,10,14,6]) b = numpy.array([2,4,15,4,7,9,2,2,0,11,4,15])

From those two arrays, we have a[2]&b[2] is the same as a[11]&b[11], then we call it duplicate element, which should be removed. The same as a[1]&b[1] vs a[3]&b[3]Although each array has duplicate elements itself, they are not treated as duplicate elements. So I want the returned arrays to be:

a = numpy.array([1,3,6,7,8,3,2,9,10,14]) b = numpy.array([2,4,15,7,9,2,2,0,11,4])

Anyone has the cleverest way to implement such reduction?

最满意答案

首先,您必须打包a和b以识别重复项。 如果值是正整数(参见其他情况下的编辑),可以通过以下方式实现:

base=a.max()+1 c=a+base*b

然后在c找到唯一值:

val,ind=np.unique(c,return_index=True)

并检索a和b的关联值。

ind.sort() print(a[ind]) print(b[ind])

对于副本的撤销。 (这里有两个):

[ 1 3 6 7 8 3 2 9 10 14] [ 2 4 15 7 9 2 2 0 11 4]

编辑

无论数据类型如何,c数组都可以如下所示,将数据打包为字节:

ab=ascontiguousarray(vstack((a,b)).T) dtype = 'S'+str(2*a.itemsize) c=ab.view(dtype=dtype)

First you have to pack a and b to identify duplicates. If values are positive integers (see the edit in other cases), this can be achieved by :

base=a.max()+1 c=a+base*b

Then just find unique values in c:

val,ind=np.unique(c,return_index=True)

and retrieve the associated values in a and b.

ind.sort() print(a[ind]) print(b[ind])

for the disparition of the duplicate. (two here):

[ 1 3 6 7 8 3 2 9 10 14] [ 2 4 15 7 9 2 2 0 11 4]

EDIT

regardless of datatype, the c array can be made as follow, packing data to bytes :

ab=ascontiguousarray(vstack((a,b)).T) dtype = 'S'+str(2*a.itemsize) c=ab.view(dtype=dtype)

更多推荐

本文发布于:2023-07-29 14:04:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1316713.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:组中   元素   两个   numpy   arrays

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!