消除多余的numpy行(Eliminating redundant numpy rows)

编程入门行业动态更新时间:2024-10-10 21:31:56

如果我有一个数组

arr = [[0,1] [1,2] [2,3] [4,3] [5,6] [3,4] [2,1] [6,7]]

如何消除列值可能交换的冗余行？在上面的例子中，代码会将数组减少到

arr = [[0,1] [1,2] [2,3] [4,3] [5,6] [6,7]]

我曾考虑过使用arr[:,::-1 ， np.all和np.any切片的组合，但到目前为止，我在比较行时只是给出了每行True和False ， t区分相似的行。

j = np.any([np.all(y==x, axis=1) for y in x[:,::-1]], axis=0)

这会产生[False, True, False, True, False, True, True, False] 。

提前致谢。

If I have an array

arr = [[0,1] [1,2] [2,3] [4,3] [5,6] [3,4] [2,1] [6,7]]

how could I eliminate redundant rows where columns values may be swapped? In the example above, the code would reduce the array to

arr = [[0,1] [1,2] [2,3] [4,3] [5,6] [6,7]]

I have thought about using a combination of slicing arr[:,::-1, np.all, and np.any, but what I have come up so far simply gives me True and False per row when comparing rows but this wouldn't discriminate between similar rows.

j = np.any([np.all(y==x, axis=1) for y in x[:,::-1]], axis=0)

which yields [False, True, False, True, False, True, True, False].

Thanks in advance.

最满意答案

基本上你想要找到唯一行，并且这些答案很重要地从最上面的两个答案中借用 - 但是你需要首先对行进行排序以消除不同的订单。

如果你不关心最后的行的顺序，这是简短的方式（但比下面慢）：

np.vstack({tuple(row) for row in np.sort(arr,-1)})

如果你想保持顺序，你可以把每一行排序成一个无效的对象，并使用带有return_index

b = np.ascontiguousarray(np.sort(arr,-1)).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[1]))) _, idx = np.unique(b, return_index=True) unique_arr = arr[idx]

使用set row-wise而不是使用np.sort(arr,-1)和np.void来创建对象数组可能会很诱人，但是这只适用于行中没有重复值的情况。如果存在， [1,2,2]的一行将被认为等同于[1,1,2]的一行 - 两者都将被set(1,2)

Basically you want to Find Unique Rows, and these answers borrow heavily from the top two answers there - but you need to sort the rows first to eliminate different orders.

If you don't care about order of rows at the end, this is the short way (but slower than below):

np.vstack({tuple(row) for row in np.sort(arr,-1)})

If you do want to maintain order, you can turn each sorted row into a void object and use np.unique with return_index

b = np.ascontiguousarray(np.sort(arr,-1)).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[1]))) _, idx = np.unique(b, return_index=True) unique_arr = arr[idx]

It might be tempting to use set row-wise instead of using np.sort(arr,-1) and np.void to make an object array, but this only works if there are no repeated values in rows. If there are, a row of [1,2,2] will be considered equivalent to a row with [1,1,2] - both will be set(1,2)

更多推荐

本文发布于:2023-07-09 10:35:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1085672.html