东西比的std :: nth

编程入门行业动态更新时间:2024-10-13 18:20:03

本文介绍了东西比的std :: nth_element更快的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我工作的一个kd树实现和我目前使用的的std :: nth_element 作为分区由他们中间元素的载体。然而性病:: nth_element需要90％的树结构的时间。任何人都可以提出一个更有效率的选择？

I'm working on a kd-tree implementation and I'm currently using std::nth_element for partition a vector of elements by their median. However std::nth_element takes 90% of the time of tree construction. Can anyone suggest a more efficient alternative?

在此先感谢

推荐答案

你真的需要第n个元素，或者你需要一个元素接近中间？

Do you really need the nth element, or do you need an element "near" the middle?

有更快的方法来获取元素接近中间。一个例子去大致如下：

There are faster ways to get an element "near" the middle. One example goes roughly like:

function rough_middle(container) divide container into subsequences of length 5 find median of each subsequence of length 5 ~ O(k) * O(n/5) return rough_middle( { median of each subsequence} ) ~ O(rough_middle(n/5))

结果应该是东西，大约是在中间。一个真正的第n个元素算法可能使用类似上面，然后事后清理发现实际的第n个元素。

The result should be something that is roughly in the middle. A real nth element algorithm might use something like the above, and then clean it up afterwards to find the actual nth element.

在 N = 5 ，你得到的中间。

在 N = 25 ，你得到的短序列中段的中间。这会比所有的每个短序列中的较小者要大，或至少是第九元件和不大于16号的元素，或36％的距离边缘。

At n=25, you get the middle of the short sequence middles. This is going to be greater than all of the lesser of each short sequence, or at least the 9th element and no more than the 16th element, or 36% away from edge.

在 N = 125 ，你得到每个短序列中间的粗中间。这至少是在9日中间，所以有8 * 3 + 2 = 26不到你粗糙的中间分子，或20.8％，距边缘。

At n=125, you get the rough middle of each short sequence middle. This is at least the 9th middle, so there are 8*3+2=26 elements less than your rough middle, or 20.8% away from edge.

在 N = 625 ，你得到每个短序列中间的粗中间。这是至少26日中间，所以有比你粗糙的中间77元以下，12％，远离边缘。

At n=625, you get the rough middle of each short sequence middle. This is at least the 26th middle, so there are 77 elements less than your rough middle, or 12% away from the edge.

在 N = 5 ^氏/ code>，你得到的粗中间5 ^（K-1）粗糙的中段。如果 5 ^ k的粗中间顺序 R（K），然后研究（K + 1）= R（K）* 3-1〜3 ^氏/ code>。

At n=5^k, you get the rough middle of the 5^(k-1) rough middles. If the rough middle of a 5^k sequence is r(k), then r(k+1) = r(k)*3-1 ~ 3^k.

3 ^氏/ code>的增长速度低于5 ^ K的O形符号。

3^k grows slower than 5^k in O-notation.

3^log_5(n) = e^( ln(3) ln(n)/ln(5) ) = n^(ln(3)/ln(5)) =~ n^0.68

时的下界非常粗略的估计，其中 rough_middle 的 N 元素序列的结束

在理论上，它可能需要多达约ñ削减^ 0.33 迭代达成一个单一的元素，这是不是真的那么好。（n个位数^ 0.68是n位的〜0.68倍。如果我们剃光那么多关每一个粗糙的中间，我们需要重复它非常粗略 N R个0.33 次n位的数量，以消耗所有位 - 多了，因为当我们从 N ，接下来的 N 得到从中减去一个较小的值）。

In theory, it may take as many as approx n^0.33 iterations of reductions to reach a single element, which isn't really that good. (the number of bits in n^0.68 is ~0.68 times the number of bits in n. If we shave that much off each rough middle, we need to repeat it very roughly n^0.33 times number of bits in n to consume all the bits -- more, because as we subtract from the n, the next n gets a slightly smaller value subtracted from it).

这是我见过的第n个元素的解决方案解决这个问题的方法是做一个分区并修复在每个级别：代替递归到 rough_middle ，你递归到中。中位数的真正的中央，然后保证是pretty的接近你的序列的实际中间，你可以找到真正的中间相对较快（O形符号）来源于此。

The way that the nth element solutions I've seen solve this is by doing a partition and repair at each level: instead of recursing into rough_middle, you recurse into middle. The real middle of the medians is then guaranteed to be pretty close to the actual middle of your sequence, and you can "find the real middle" relatively quickly (in O-notation) from this.

也许我们可以当有多个元素做更精确的 rough_middle 迭代优化这个过程，但从来没有迫使它是实际的中间？更大的结束 N 是，越接近我们所需要的递归调用是中间为最终的结果是相当接近中间的中间。

Possibly we can optimize this process by doing a more accurate rough_middle iterations when there are more elements, but never forcing it to be the actual middle? The bigger the end n is, the closer to the middle we need the recursive calls to be to the middle for the end result to be reasonably close to the middle.

但在实践中，你的序列是一个非常糟糕的一个实际占据N ^ 0.33步骤划分到什么可能是非常低的概率。有点像快速排序问题：3个元素中值通常是不够好

But in practice, the probability that your sequence is a really bad one that actually takes n^0.33 steps to partition down to nothing might be really low. Sort of like the quicksort problem: median of 3 elements is usually good enough.

一个快速统计分析。

您挑选随机5个元素，并选择中间的一个。

You pick 5 elements at random, and pick the middle one.

一组的平均指数 2M + 1 均匀分布的随机样本如下，大约有（M + 1，M + 1）参数的β分布，有可能一些调节因子的非 - [0,1] 的间隔

The median index of a set of 2m+1 random sample of a uniform distribution follows the beta distribution with parameters of roughly (m+1, m+1), with maybe some scaling factors for non-[0,1] intervals.

中值的平均值是显然的1/2。方差为：

The mean of the median is clearly 1/2. The variance is:

(3*3)^2 / ( (3+3)^2 (3+3+1) ) = 81 / (36 * 7) =~ 0.32

搞清楚，下一步是超出了我的统计数据。我会作弊。

Figuring out the next step is beyond my stats. I'll cheat.

如果我们想象，从一堆均值为0.5，方差为0.32物品取中位数指数元素是一样好，他们的平均指数...

If we imagine that taking the median index element from a bunch of items with mean 0.5 and variance 0.32 is as good as averaging their index...

让 N 现在是我们最初的集合元素的数量。

Let n now be the number of elements in our original set.

那么短序列的中位数的指数之和的平均值为n倍N / 5 * 0.5 = 0.1 * N ^ 2 。短序列的中位数的指数之和的差额为n倍N / 5 * 0.32 = 0.064 * N ^ 2 。

Then the sum of the indexes of medians of the short sequences has an average of n times n/5*0.5 = 0.1 * n^2. The variance of the sum of the indexes of the medians of the short sequences is n times n/5*0.32 = 0.064 * n^2.

如果我们再通过N / 5，我们得到划分值：

If we then divide the value by n/5 we get:

因此次的平均值为1.6 / 2和方差。

So mean of n/2 and variance of 1.6.

哦，如果这是真的，这将是真棒。差异不与规模增长 N 表示，由于 N 得到的中位数大，平均指数短序列被紧紧得离谱分布。我想这有一定道理。可悲的是，我们不是很这么做 - 我们想要的伪中间的短序列的中位数的分布。这几乎肯定是糟糕的。

Oh, if that was true, that would be awesome. Variance that doesn't grow with the size of n means that as n gets large, the average index of the medians of the short sequences gets ridiculously tightly distributed. I guess it makes some sense. Sadly, we aren't quite doing that -- we want the distribution of the pseudo-median of the medians of the short sequences. Which is almost certainly worse.

实现细节。我们可以用的内存数数的开销做一个就地粗加工位数。（我们甚至可以做到这一点没有内存开销！）

Implementation detail. We can with logarithmic number of memory overhead do an in-place rough median. (we might even be able to do it without the memory overhead!)

我们维持5个指标向量与这里没有什么占位符。

We maintain a vector of 5 indexes with a "nothing here" placeholder.

每个是一个连续的层。

在每个元素，我们提前底部指标。如果是完整的，我们抢位，然后将它放在一个新的水平了，并清除底层。

At each element, we advance the bottom index. If it is full, we grab the median, and insert it on the next level up, and clear the bottom layer.

最后，我们完成。

using target = std::pair<size_t,std::array<size_t, 5>>; bool push( target& t, size_t i ) { t.second[t.first]=i; ++t.first; if (t.first==5) return true; } template<class Container> size_t extract_median( Container const& c, target& t ) { Assert(t.first != 0); std::sort( t.data(), t.data()+t.first, [&c](size_t lhs, size_t rhs){ return c[lhs]<c[rhs]; } ); size_t r = t[(t.first+1)/2]; t.first = 0; return r; } template<class Container> void advance(Container const& c, std::vector<target>& targets, size_t i) { size_t height = 0; while(true) { if (targets.size() <= height) targets.push_back({}); if (!push(targets[height], i)) return; i = extract_median(c, targets[height]); } } template<class Container> size_t collapse(Container const& c, target* b, target* e) { if (b==e) return -1; size_t before = collapse(c, b, e-1); target& last = (*e-1); if (before!=-1) push(before, last); if (last.first == 0) return -1; return extract_median(c, last); } template<class Container> size_t rough_median_index( Container const& c ) { std::vector<target> targets; for (auto const& x:c) { advance(c, targets, &x-c.data()); } return collapse(c, targets.data(), targets.data()+targets.size()); }

勾画出它如何工作，随机访问的容器。

which sketches out how it could work on random access containers.

更多推荐

东西比的std :: nth

本文发布于:2023-11-29 22:23:08，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1647802.html