有效地重新排列NumPy数组

编程入门行业动态更新时间:2024-10-07 06:50:15

本文介绍了有效地重新排列NumPy数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

假设我有一个简单的一维NumPy数组：

Let's say I have a simple 1D NumPy array:

x = np.random.rand(1000)

然后检索排序的索引：

idx = np.argsort(x)

但是，我需要将索引列表移到 idx 的前面。因此，假设 indices = [10、20、30、40、50] 必须始终是前5个，然后其余部分将从开始idx （减去在索引中找到的索引）

However, I need to move a list of indices to the front of idx. So, let's say indices = [10, 20, 30, 40, 50] need to always be the first 5 and then the rest will follow from idx (minus the indices found in indices)

一种简单的方法将会是：

A naive way to accomplish this would be:

indices = np.array([10, 20, 30, 40, 50]) out = np.empty(idx.shape[0], dtype=int64) out[:indices.shape[0]] = indices n = indices.shape[0] for i in range(idx.shape[0]): if idx[i] not in indices: out[n] = idx[i] n += 1

有没有一种方法可以有效地并且可能就地执行此操作？

Is there a way to do this efficiently and, possibly, in-place?

推荐答案

方法1

一种方法是使用 np.isin 掩码-

mask = np.isin(idx, indices, invert=True) out = np.r_[indices, idx[mask]]

方法2：跳过第一个 argsort

另一个使这些给定的索引最小，从而迫使它们以 argsorting 。我们不需要在此方法中使用 idx ，因为无论如何我们在解决方案中都是argsorting-

Another with making those given indices minimum, thus forcing them to be at the start with argsorting. We don't need idx for this method as we are argsort-ing in our solution anyway -

def argsort_constrained(x, indices): xc = x.copy() xc[indices] = x.min()-np.arange(len(indices),0,-1) return xc.argsort()

基准化-更紧密

让我们研究一下如何跳过启动 argsort idx 帮助我们采用第二种方法。

Let's study how does this entire thing of skipping the computation of starting argsort idx helps us with the second approach.

我们将从给定的示例开始：

We will start off with the given sample :

In [206]: x = np.random.rand(1000) In [207]: indices = np.array([10, 20, 30, 40, 50]) In [208]: %timeit argsort_constrained(x, indices) 38.6 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [209]: idx = np.argsort(x) In [211]: %timeit np.argsort(x) 27.7 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [212]: %timeit in1d_masking(x, idx, indices) ...: %timeit isin_masking(x, idx, indices) 44.4 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 50.7 µs ± 303 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

请注意，如果使用 np.concatenate 用这些小型数据集代替 np.r _ ，您可以做得更好。

Note that if you use np.concatenate in place of np.r_ with these small datasets, you could do better.

因此， argsort_constrained 的总运行时成本为大约 38.6 µs ，而其他两个具有屏蔽功能的时钟在各自的定时编号之上大约为 27.7 µs 。

So, argsort_constrained has a total runtime cost of around 38.6 µs, whereas the other two with masking have around 27.7 µs on top of their individual timing numbers.

让我们将所有内容按 10x 放大，并进行相同的实验：

Let's scale up everything by 10x and do the same experiments :

In [213]: x = np.random.rand(10000) In [214]: indices = np.sort(np.random.choice(len(x), 50, replace=False)) In [215]: %timeit argsort_constrained(x, indices) 740 µs ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [216]: idx = np.argsort(x) In [217]: %timeit np.argsort(x) 731 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [218]: %timeit in1d_masking(x, idx, indices) ...: %timeit isin_masking(x, idx, indices) 1.07 ms ± 47.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.02 ms ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

使用掩蔽标记的运行时成本要高于 argsort_constrained 。随着我们进一步扩大，这种趋势应该继续下去。

Again, the individual runtime costs with masking ones are higher than with argsort_constrained. And this trend should continue as we scale up further.

更多推荐

有效地重新排列NumPy数组

本文发布于:2023-11-29 15:09:11，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1646752.html