有效地重新排列NumPy数组

编程入门 行业动态 更新时间:2024-10-07 06:50:15
本文介绍了有效地重新排列NumPy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

假设我有一个简单的一维NumPy数组:

Let's say I have a simple 1D NumPy array:

x = np.random.rand(1000)

然后检索排序的索引:

idx = np.argsort(x)

但是,我需要将索引列表移到 idx 的前面。因此,假设 indices = [10、20、30、40、50] 必须始终是前5个,然后其余部分将从开始idx (减去在索引中找到的索引)

However, I need to move a list of indices to the front of idx. So, let's say indices = [10, 20, 30, 40, 50] need to always be the first 5 and then the rest will follow from idx (minus the indices found in indices)

一种简单的方法将会是:

A naive way to accomplish this would be:

indices = np.array([10, 20, 30, 40, 50]) out = np.empty(idx.shape[0], dtype=int64) out[:indices.shape[0]] = indices n = indices.shape[0] for i in range(idx.shape[0]): if idx[i] not in indices: out[n] = idx[i] n += 1

有没有一种方法可以有效地并且可能就地执行此操作?

Is there a way to do this efficiently and, possibly, in-place?

推荐答案

方法1

一种方法是使用 np.isin 掩码-

mask = np.isin(idx, indices, invert=True) out = np.r_[indices, idx[mask]]

方法2:跳过第一个 argsort

另一个使这些给定的索引最小,从而迫使它们以 argsorting 。我们不需要在此方法中使用 idx ,因为无论如何我们在解决方案中都是argsorting-

Another with making those given indices minimum, thus forcing them to be at the start with argsorting. We don't need idx for this method as we are argsort-ing in our solution anyway -

def argsort_constrained(x, indices): xc = x.copy() xc[indices] = x.min()-np.arange(len(indices),0,-1) return xc.argsort()

基准化-更紧密

让我们研究一下如何跳过启动 argsort idx 帮助我们采用第二种方法。

Let's study how does this entire thing of skipping the computation of starting argsort idx helps us with the second approach.

我们将从给定的示例开始:

We will start off with the given sample :

In [206]: x = np.random.rand(1000) In [207]: indices = np.array([10, 20, 30, 40, 50]) In [208]: %timeit argsort_constrained(x, indices) 38.6 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [209]: idx = np.argsort(x) In [211]: %timeit np.argsort(x) 27.7 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [212]: %timeit in1d_masking(x, idx, indices) ...: %timeit isin_masking(x, idx, indices) 44.4 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 50.7 µs ± 303 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

请注意,如果使用 np.concatenate 用这些小型数据集代替 np.r _ ,您可以做得更好。

Note that if you use np.concatenate in place of np.r_ with these small datasets, you could do better.

因此, argsort_constrained 的总运行时成本为大约 38.6 µs ,而其他两个具有屏蔽功能的时钟在各自的定时编号之上大约为 27.7 µs 。

So, argsort_constrained has a total runtime cost of around 38.6 µs, whereas the other two with masking have around 27.7 µs on top of their individual timing numbers.

让我们将所有内容按 10x 放大,并进行相同的实验:

Let's scale up everything by 10x and do the same experiments :

In [213]: x = np.random.rand(10000) In [214]: indices = np.sort(np.random.choice(len(x), 50, replace=False)) In [215]: %timeit argsort_constrained(x, indices) 740 µs ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [216]: idx = np.argsort(x) In [217]: %timeit np.argsort(x) 731 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [218]: %timeit in1d_masking(x, idx, indices) ...: %timeit isin_masking(x, idx, indices) 1.07 ms ± 47.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.02 ms ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

使用掩蔽标记的运行时成本要高于 argsort_constrained 。随着我们进一步扩大,这种趋势应该继续下去。

Again, the individual runtime costs with masking ones are higher than with argsort_constrained. And this trend should continue as we scale up further.

更多推荐

有效地重新排列NumPy数组

本文发布于:2023-11-29 15:09:11,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1646752.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:有效地   数组   排列   NumPy

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!