矢量排序/唯一/擦除与复制到unordered

编程入门 行业动态 更新时间:2024-10-13 02:14:45
矢量排序/唯一/擦除与复制到unordered_set的性能(Performance of vector sort/unique/erase vs. copy to unordered_set)

我有一个函数可以将网格中的点列表的所有邻居输出到一定距离,这涉及很多重复(我的邻居的邻居==我再次)。

我一直在尝试几种不同的解决方案,但我不知道哪种解决方案效率更高。 下面是一些代码,演示了两个并行运行的解决方案,一个使用std :: vector sort-unique-erase,另一个使用std :: copy到std :: unordered_set。

我还尝试了另一个解决方案,即将包含邻居的向量传递给邻居函数,该函数将使用std :: find来确保在添加邻居之前不存在邻居。

所以有三个解决方案,但我无法完全理解哪个会更快。 任何人的想法?

代码片段如下:

// Vector of all neighbours of all modified phi points, which may initially include duplicates. std::vector<VecDi> aneighs; // Hash function, mapping points to their norm distance. auto hasher = [&] (const VecDi& a) { return std::hash<UINT>()(a.squaredNorm() >> 2); }; // Unordered set for storing neighbours without duplication. std::unordered_set<VecDi, UINT (*) (const VecDi& a)> sneighs(phi.dims().squaredNorm() >> 2, hasher); ... compute big long list of points including many duplicates ... // Insert neighbours into unordered_set to remove duplicates. std::copy(aneighs.begin(), aneighs.end(), std::inserter(sneighs, sneighs.end())); // De-dupe neighbours list. // TODO: is this method faster or slower than unordered_set? std::sort(aneighs.begin(), aneighs.end(), [&] (const VecDi& a, const VecDi&b) { const UINT aidx = Grid<VecDi, D>::index(a, phi.dims(), phi.offset()); const UINT bidx = Grid<VecDi, D>::index(b, phi.dims(), phi.offset()); return aidx < bidx; }); aneighs.erase(std::unique(aneighs.begin(), aneighs.end()), aneighs.end());

I have a function that gets all neighbours of a list of points in a grid out to a certain distance, which involves a lot of duplicates (my neighbour's neighbour == me again).

I've been experimenting with a couple of different solutions, but I have no idea which is the more efficient. Below is some code demonstrating two solutions running side by side, one using std::vector sort-unique-erase, the other using std::copy into a std::unordered_set.

I also tried another solution, which is to pass the vector containing the neighbours so far to the neighbour function, which will use std::find to ensure a neighbour doesn't already exist before adding it.

So three solutions, but I can't quite wrap my head around which is gonna be faster. Any ideas anyone?

Code snippet follows:

// Vector of all neighbours of all modified phi points, which may initially include duplicates. std::vector<VecDi> aneighs; // Hash function, mapping points to their norm distance. auto hasher = [&] (const VecDi& a) { return std::hash<UINT>()(a.squaredNorm() >> 2); }; // Unordered set for storing neighbours without duplication. std::unordered_set<VecDi, UINT (*) (const VecDi& a)> sneighs(phi.dims().squaredNorm() >> 2, hasher); ... compute big long list of points including many duplicates ... // Insert neighbours into unordered_set to remove duplicates. std::copy(aneighs.begin(), aneighs.end(), std::inserter(sneighs, sneighs.end())); // De-dupe neighbours list. // TODO: is this method faster or slower than unordered_set? std::sort(aneighs.begin(), aneighs.end(), [&] (const VecDi& a, const VecDi&b) { const UINT aidx = Grid<VecDi, D>::index(a, phi.dims(), phi.offset()); const UINT bidx = Grid<VecDi, D>::index(b, phi.dims(), phi.offset()); return aidx < bidx; }); aneighs.erase(std::unique(aneighs.begin(), aneighs.end()), aneighs.end());

最满意答案

这里很大程度上取决于输出集的大小(反过来,这取决于你采样的邻居的距离)。

如果它很小,(不超过几十个项目)你使用std::vector和std::find手动滚动设置实现可能会保持相当的竞争力。 它的问题在于它是一个O(N 2 )算法 - 每次插入一个项目时,你必须搜索所有现有的项目,因此每个插入对于已经在集合中的项目数量是线性的。 因此,随着集合变大,其插入项目的时间大致呈二次方式增长。

使用std::set ,每次插入只需进行大约log 2 (N)比较而不是N比较。 这将总体复杂性从O(N 2 )降低到O(N log N)。 主要的缺点是它(至少通常)是作为由单独分配的节点构建的树实现的。 这通常会降低其引用的位置 - 即,您插入的每个项目将包含数据本身和一些指针,遍历树意味着跟随指针。 由于它们是单独分配的,因此很可能在树中(当前)相邻的节点在内存中不会相邻,因此您将看到相当多的缓存未命中。 底线:虽然随着项目数量的增加,其速度增长相当缓慢,所涉及的常数相当大 - 对于少量项目,它起始时间相当慢(通常比手动版本慢一点) )。

使用vector / sort / unique结合了前面每个的一些优点。 将项目存储在向量中(不添加每个指针)通常会导致更好的缓存使用 - 相邻索引处的项目也位于相邻的内存位置,因此当您插入新项目时,新项目的位置可能会已经在缓存中。 主要的缺点是,如果你正在处理一个非常大的集合,这可能会使用更多的内存。 当一个集合在插入每个项目时消除重复项目(即,只有当项目与集合中已有的项目不同时才会插入项目),这将插入所有项目,然后在最后删除所有重复项目。 考虑到当前的内存可用性和我你可能正在访问的邻居的数量,我怀疑这在实践中是一个主要的缺点,但在错误的情况下,它可能会导致一个严重的问题 - 几乎任何虚拟内存的使用几乎肯定会让它成为净损失。

从复杂性的角度来看最后一个,它是O(N log N),有点像集合。 不同之处在于,设置它实际上更像是O(N log M),其中N是邻居的总数,M是唯一邻居的数量。 使用向量,它实际上是O(N log N),其中N(再次)是邻居的总数。 因此,如果重复数量非常大,则集合可能具有显着的算法优势。

也可以在纯线性序列中实现类似集合的结构。 这保留了集合仅存储唯一项目的优势,但也保留了向量的参考优势的局部性。 我们的想法是将大部分当前集合排序,因此您可以在日志(N)复杂度中进行搜索。 但是,当您插入新项目时,只需将其放在单独的向量(或现有向量的未排序部分)中。 当您执行新插入时,您还会对这些未排序的项目进行线性搜索。

当未排序的部分变得太大时(对于某些“太大”的定义),您对这些项进行排序并将它们合并到主组中,然后再次启动相同的序列。 如果根据“log N”(其中N是已排序组中的项目数)定义“太大”,则可以保留整个数据结构的O(N log N)复杂度。 当我玩它时,我发现未分类的部分可能比我预期的要大,但它开始引起问题。

A great deal here is likely to depend on the size of the output set (which, in turn, will depend on how distant of neighbors you sample).

If it's small, (no more than a few dozen items or so) your hand-rolled set implementation using std::vector and std::find will probably remain fairly competitive. Its problem is that it's an O(N2) algorithm -- each time you insert an item, you have to search all the existing items, so each insertion is linear on the number of items already in the set. Therefore, as the set grows larger, its time to insert items grows roughly quadratically.

Using std::set you each insertion has to only do approximately log2(N) comparisons instead of N comparison. That reduces the overall complexity from O(N2) to O(N log N). The major shortcoming is that it's (at least normally) implemented as a tree built up of individually allocated nodes. That typically reduces its locality of reference -- i.e., each item you insert will consist of the data itself plus some pointers, and traversing the tree means following pointers around. Since they're allocated individually, chances are pretty good that nodes that are (currently) adjacent in the tree won't be adjacent in memory, so you'll see a fair number of cache misses. Bottom line: while its speed grows fairly slowly as the number of items increases, the constants involved are fairly large -- for a small number of items, it'll start out fairly slow (typically quite a bit slower than your hand-rolled version).

Using a vector/sort/unique combines some of the advantages of each of the preceding. Storing the items in a vector (without extra pointers for each) typically leads to better cache usage -- items at adjacent indexes are also at adjacent memory locations, so when you insert a new item, chances are that the location for the new item will already be in the cache. The major disadvantage is that if you're dealing with a really large set, this could use quite a bit more memory. Where a set eliminates duplicates as you insert each item (i.e., an item will only be inserted if it's different from anything already in the set) this will insert all the items, then at the end delete all the duplicates. Given current memory availability and the number of neighbors I'd guess you're probably visiting, I doubt this is a major disadvantage in practice, but under the wrong circumstances, it could lead to a serious problem -- nearly any use of virtual memory would almost certainly make it a net loss.

Looking at the last from a complexity viewpoint, it's going to O(N log N), sort of like the set. The difference is that with the set it's really more like O(N log M), where N is the total number of neighbors, and M is the number of unique neighbors. With the vector, it's really O(N log N), where N is (again) the total number of neighbors. As such, if the number of duplicates is extremely large, a set could have a significant algorithmic advantage.

It's also possible to implement a set-like structure in purely linear sequences. This retains the set's advantage of only storing unique items, but also the vector's locality of reference advantage. The idea is to keep most of the current set sorted, so you can search it in log(N) complexity. When you insert a new item, however, you just put it in the separate vector (or an unsorted portion of the existing vector). When you do a new insertion you also do a linear search on those unsorted items.

When that unsorted part gets too large (for some definition of "too large") you sort those items and merge them into the main group, then start the same sequence again. If you define "too large" in terms of "log N" (where N is the number of items in the sorted group) you can retain O(N log N) complexity for the data structure as a whole. When I've played with it, I've found that the unsorted portion can be larger than I'd have expected before it starts to cause a problem though.

更多推荐

本文发布于:2023-07-17 14:00:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1145614.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:矢量   擦除   unordered

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!