我有两个numpy数组a和b ,有两千万个元素(浮点数)。 如果这两个数组的组合元素相同,那么我们称之为重复,应该从两个数组中删除。 例如,
a = numpy.array([1,3,6,3,7,8,3,2,9,10,14,6]) b = numpy.array([2,4,15,4,7,9,2,2,0,11,4,15])从这两个数组中,我们有a[2]&b[2]与a[11]&b[11] ,然后我们将其称为重复元素,应将其删除。 与a[1]&b[1]与a[3]&b[3]虽然每个数组本身都有重复元素,但它们不会被视为重复元素。 所以我希望返回的数组是:
a = numpy.array([1,3,6,7,8,3,2,9,10,14]) b = numpy.array([2,4,15,7,9,2,2,0,11,4])任何人都有最聪明的方法来实现这种减少?
I have two numpy arrays a and b, with twenty million elements (float number). If the combination elements of those two arrays are the same, then we call it duplicate, which should be remove from the two arrays. For instance,
a = numpy.array([1,3,6,3,7,8,3,2,9,10,14,6]) b = numpy.array([2,4,15,4,7,9,2,2,0,11,4,15])From those two arrays, we have a[2]&b[2] is the same as a[11]&b[11], then we call it duplicate element, which should be removed. The same as a[1]&b[1] vs a[3]&b[3]Although each array has duplicate elements itself, they are not treated as duplicate elements. So I want the returned arrays to be:
a = numpy.array([1,3,6,7,8,3,2,9,10,14]) b = numpy.array([2,4,15,7,9,2,2,0,11,4])Anyone has the cleverest way to implement such reduction?
最满意答案
首先,您必须打包a和b以识别重复项。 如果值是正整数(参见其他情况下的编辑),可以通过以下方式实现:
base=a.max()+1 c=a+base*b然后在c找到唯一值:
val,ind=np.unique(c,return_index=True)并检索a和b的关联值。
ind.sort() print(a[ind]) print(b[ind])对于副本的撤销。 (这里有两个):
[ 1 3 6 7 8 3 2 9 10 14] [ 2 4 15 7 9 2 2 0 11 4]编辑
无论数据类型如何,c数组都可以如下所示,将数据打包为字节:
ab=ascontiguousarray(vstack((a,b)).T) dtype = 'S'+str(2*a.itemsize) c=ab.view(dtype=dtype)First you have to pack a and b to identify duplicates. If values are positive integers (see the edit in other cases), this can be achieved by :
base=a.max()+1 c=a+base*bThen just find unique values in c:
val,ind=np.unique(c,return_index=True)and retrieve the associated values in a and b.
ind.sort() print(a[ind]) print(b[ind])for the disparition of the duplicate. (two here):
[ 1 3 6 7 8 3 2 9 10 14] [ 2 4 15 7 9 2 2 0 11 4]EDIT
regardless of datatype, the c array can be made as follow, packing data to bytes :
ab=ascontiguousarray(vstack((a,b)).T) dtype = 'S'+str(2*a.itemsize) c=ab.view(dtype=dtype)更多推荐
发布评论