找到两对不相交的对，它们总和到同一个向量(Find two disjoint pairs of pairs that sum to the same vector)

找到两对不相交的对，它们总和到同一个向量(Find two disjoint pairs of pairs that sum to the same vector) python

这是查找两对总和为相同值的对的后续行动。

我有随机的2d数组，我使用

import numpy as np from itertools import combinations n = 50 A = np.random.randint(2, size=(m,n))

我想确定矩阵是否有两对不相交的列对，它们总和到相同的列向量。我正在寻找一种快速的方法来做到这一点。在前一个问题中（（0,1），（0,2））可以作为一对列索引接受，但在这种情况下，它不是两对中的0。

上一个问题中接受的答案是如此巧妙地优化，我不明白如何让这个看起来简单的改变。（我对这个问题中的列而不是行感兴趣，但我总是可以做A.transpose（）。）

下面是一些代码，用于显示测试所有4乘4阵列。

n = 4 nxn = np.arange(n*n).reshape(n, -1) count = 0 for i in xrange(2**(n*n)): A = (i >> nxn) %2 p = 1 for firstpair in combinations(range(n), 2): for secondpair in combinations(range(n), 2): if firstpair < secondpair and not set(firstpair) & set(secondpair): if (np.array_equal(A[firstpair[0]] + A[firstpair[1]], A[secondpair[0]] + A[secondpair[1]] )): if (p): count +=1 p = 0 print count

这应输出3136。

This is a follow-up to Find two pairs of pairs that sum to the same value .

I have random 2d arrays which I make using

import numpy as np from itertools import combinations n = 50 A = np.random.randint(2, size=(m,n))

I would like to determine if the matrix has two disjoint pairs of pairs of columns which sum to the same column vector. I am looking for a fast method to do this. In the previous problem ((0,1), (0,2)) was acceptable as a pair of pairs of column indices but in this case it is not as 0 is in both pairs.

The accepted answer from the previous question is so cleverly optimised I can't see how to make this simple looking change unfortunately. (I am interested in columns rather than rows in this question but I can always just do A.transpose().)

Here is some code to show it testing all 4 by 4 arrays.

n = 4 nxn = np.arange(n*n).reshape(n, -1) count = 0 for i in xrange(2**(n*n)): A = (i >> nxn) %2 p = 1 for firstpair in combinations(range(n), 2): for secondpair in combinations(range(n), 2): if firstpair < secondpair and not set(firstpair) & set(secondpair): if (np.array_equal(A[firstpair[0]] + A[firstpair[1]], A[secondpair[0]] + A[secondpair[1]] )): if (p): count +=1 p = 0 print count

This should output 3136.

最满意答案

这是我的解决方案，扩展到我认为你想要的。但是，现在还不完全清楚; 一个可以获得任意数量的行对，它们总和相同的总和; 它们中可能存在唯一的行子集，它们总和为相同的值。例如：

给定这组行对，它们总和相同

[[19 19 30 30] [11 16 11 16]]

这些行中存在唯一的子集，可能仍被视为有效; 但是应该吗？

[[19 30] [16 11]]

无论如何，我希望这些细节很容易处理，给出下面的代码。

import numpy as np n = 20 #also works for non-square A A = np.random.randint(2, size=(n*6,n)).astype(np.int8) ##A = np.array( [[0, 0, 0], [1, 1, 1], [1, 1 ,1]], np.uint8) ##A = np.zeros((6,6)) #force the inclusion of some hits, to keep our algorithm on its toes ##A[0] = A[1] def base_pack_lazy(a, base, dtype=np.uint64): """ pack the last axis of an array as minimal base representation lazily yields packed columns of the original matrix """ a = np.ascontiguousarray( np.rollaxis(a, -1)) packing = int(np.dtype(dtype).itemsize * 8 / (float(base) / 2)) for columns in np.array_split(a, (len(a)-1)//packing+1): R = np.zeros(a.shape[1:], dtype) for col in columns: R *= base R += col yield R def unique_count(a): """returns counts of unique elements""" unique, inverse = np.unique(a, return_inverse=True) count = np.zeros(len(unique), np.int) np.add.at(count, inverse, 1) #note; this scatter operation requires numpy 1.8; use a sparse matrix otherwise! return unique, count, inverse def voidview(arr): """view the last axis of an array as a void object. can be used as a faster form of lexsort""" return np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))).reshape(arr.shape[:-1]) def has_identical_row_sums_lazy(A, combinations_index): """ compute the existence of combinations of rows summing to the same vector, given an nxm matrix A and an index matrix specifying all combinations naively, we need to compute the sum of each row combination at least once, giving n^3 computations however, this isnt strictly required; we can lazily consider the columns, giving an early exit opportunity all nicely vectorized of course """ multiplicity, combinations = combinations_index.shape #list of indices into combinations_index, denoting possibly interacting combinations active_combinations = np.arange(combinations, dtype=np.uint32) #keep all packed columns; we might need them later columns = [] for packed_column in base_pack_lazy(A, base=multiplicity+1): #loop over packed cols columns.append(packed_column) #compute rowsums only for a fixed number of columns at a time. #this is O(n^2) rather than O(n^3), and after considering the first column, #we can typically already exclude almost all combinations partial_rowsums = sum(packed_column[I[active_combinations]] for I in combinations_index) #find duplicates in this column unique, count, inverse = unique_count(partial_rowsums) #prune those combinations which we can exclude as having different sums, based on columns inspected thus far active_combinations = active_combinations[count[inverse] > 1] #early exit; no pairs if len(active_combinations)==0: return False """ we now have a small set of relevant combinations, but we have lost the details of their particulars to see which combinations of rows does sum to the same value, we do need to consider rows as a whole we can simply apply the same mechanism, but for all columns at the same time, but only for the selected subset of row combinations known to be relevant """ #construct full packed matrix B = np.ascontiguousarray(np.vstack(columns).T) #perform all relevant sums, over all columns rowsums = sum(B[I[active_combinations]] for I in combinations_index) #find the unique rowsums, by viewing rows as a void object unique, count, inverse = unique_count(voidview(rowsums)) #if not, we did something wrong in deciding on active combinations assert(np.all(count>1)) #loop over all sets of rows that sum to an identical unique value for i in xrange(len(unique)): #set of indexes into combinations_index; #note that there may be more than two combinations that sum to the same value; we grab them all here combinations_group = active_combinations[inverse==i] #associated row-combinations #array of shape=(mulitplicity,group_size) row_combinations = combinations_index[:,combinations_group] #if no duplicate rows involved, we have a match if len(np.unique(row_combinations[:,[0,-1]])) == multiplicity*2: print row_combinations return True #none of identical rowsums met uniqueness criteria return False def has_identical_triple_row_sums(A): n = len(A) idx = np.array( [(i,j,k) for i in xrange(n) for j in xrange(n) for k in xrange(n) if i<j and j<k], dtype=np.uint16) idx = np.ascontiguousarray( idx.T) return has_identical_row_sums_lazy(A, idx) def has_identical_double_row_sums(A): n = len(A) idx = np.array(np.tril_indices(n,-1), dtype=np.int32) return has_identical_row_sums_lazy(A, idx) from time import clock t = clock() for i in xrange(1): ## print has_identical_double_row_sums(A) print has_identical_triple_row_sums(A) print clock()-t

编辑：代码清理

Here is my solution, extended to do what I believe you want. It isn't entirely clear though; one may get an arbitrary number of row-pairs that sum to the same total; there may exist unique subsets of rows within them that sum to the same value. For instance:

Given this set of row-pairs that sum to the same total

[[19 19 30 30] [11 16 11 16]]

There exists a unique subset of these rows that may still be counted as valid; but should it?

[[19 30] [16 11]]

Anyway, I hope those details are easy to deal with, given the code below.

import numpy as np n = 20 #also works for non-square A A = np.random.randint(2, size=(n*6,n)).astype(np.int8) ##A = np.array( [[0, 0, 0], [1, 1, 1], [1, 1 ,1]], np.uint8) ##A = np.zeros((6,6)) #force the inclusion of some hits, to keep our algorithm on its toes ##A[0] = A[1] def base_pack_lazy(a, base, dtype=np.uint64): """ pack the last axis of an array as minimal base representation lazily yields packed columns of the original matrix """ a = np.ascontiguousarray( np.rollaxis(a, -1)) packing = int(np.dtype(dtype).itemsize * 8 / (float(base) / 2)) for columns in np.array_split(a, (len(a)-1)//packing+1): R = np.zeros(a.shape[1:], dtype) for col in columns: R *= base R += col yield R def unique_count(a): """returns counts of unique elements""" unique, inverse = np.unique(a, return_inverse=True) count = np.zeros(len(unique), np.int) np.add.at(count, inverse, 1) #note; this scatter operation requires numpy 1.8; use a sparse matrix otherwise! return unique, count, inverse def voidview(arr): """view the last axis of an array as a void object. can be used as a faster form of lexsort""" return np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))).reshape(arr.shape[:-1]) def has_identical_row_sums_lazy(A, combinations_index): """ compute the existence of combinations of rows summing to the same vector, given an nxm matrix A and an index matrix specifying all combinations naively, we need to compute the sum of each row combination at least once, giving n^3 computations however, this isnt strictly required; we can lazily consider the columns, giving an early exit opportunity all nicely vectorized of course """ multiplicity, combinations = combinations_index.shape #list of indices into combinations_index, denoting possibly interacting combinations active_combinations = np.arange(combinations, dtype=np.uint32) #keep all packed columns; we might need them later columns = [] for packed_column in base_pack_lazy(A, base=multiplicity+1): #loop over packed cols columns.append(packed_column) #compute rowsums only for a fixed number of columns at a time. #this is O(n^2) rather than O(n^3), and after considering the first column, #we can typically already exclude almost all combinations partial_rowsums = sum(packed_column[I[active_combinations]] for I in combinations_index) #find duplicates in this column unique, count, inverse = unique_count(partial_rowsums) #prune those combinations which we can exclude as having different sums, based on columns inspected thus far active_combinations = active_combinations[count[inverse] > 1] #early exit; no pairs if len(active_combinations)==0: return False """ we now have a small set of relevant combinations, but we have lost the details of their particulars to see which combinations of rows does sum to the same value, we do need to consider rows as a whole we can simply apply the same mechanism, but for all columns at the same time, but only for the selected subset of row combinations known to be relevant """ #construct full packed matrix B = np.ascontiguousarray(np.vstack(columns).T) #perform all relevant sums, over all columns rowsums = sum(B[I[active_combinations]] for I in combinations_index) #find the unique rowsums, by viewing rows as a void object unique, count, inverse = unique_count(voidview(rowsums)) #if not, we did something wrong in deciding on active combinations assert(np.all(count>1)) #loop over all sets of rows that sum to an identical unique value for i in xrange(len(unique)): #set of indexes into combinations_index; #note that there may be more than two combinations that sum to the same value; we grab them all here combinations_group = active_combinations[inverse==i] #associated row-combinations #array of shape=(mulitplicity,group_size) row_combinations = combinations_index[:,combinations_group] #if no duplicate rows involved, we have a match if len(np.unique(row_combinations[:,[0,-1]])) == multiplicity*2: print row_combinations return True #none of identical rowsums met uniqueness criteria return False def has_identical_triple_row_sums(A): n = len(A) idx = np.array( [(i,j,k) for i in xrange(n) for j in xrange(n) for k in xrange(n) if i<j and j<k], dtype=np.uint16) idx = np.ascontiguousarray( idx.T) return has_identical_row_sums_lazy(A, idx) def has_identical_double_row_sums(A): n = len(A) idx = np.array(np.tril_indices(n,-1), dtype=np.int32) return has_identical_row_sums_lazy(A, idx) from time import clock t = clock() for i in xrange(1): ## print has_identical_double_row_sums(A) print has_identical_triple_row_sums(A) print clock()-t

Edit: code cleanup

更多推荐