如何有效地将袜子配对?

编程入门行业动态更新时间:2024-10-12 16:21:07

本文介绍了如何有效地将袜子配对?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

昨天，我从干净的衣物中取出袜子，弄清楚我的操作方式不是很有效.我一直在天真地搜寻-挑选一只袜子，然后迭代"一堆，以找到那双袜子.这需要平均迭代n/2 * n/4 = n 2 /8个袜子.

Yesterday I was pairing the socks from the clean laundry and figured out the way I was doing it is not very efficient. I was doing a naive search— picking one sock and "iterating" the pile in order to find its pair. This requires iterating over n/2 * n/4 = n2/8 socks on average.

作为计算机科学家，我在想我能做什么?当然会想到进行排序(根据大小/颜色/...)以实现O(NlogN)解决方案.

As a computer scientist I was thinking what I could do? Sorting (according to size/color/...) of course came to mind to achieve an O(NlogN) solution.

因为我无法复制袜子，所以不能选择使用散列或其他非现场解决方案(尽管如果可以的话，这可能会很好).

Hashing or other not-in-place solutions are not an option, because I am not able to duplicate my socks (though it could be nice if I could).

所以，问题基本上是:

给一堆n双袜子，其中包含2n个元素(假设每只袜子都有一对完全匹配的袜子)，最好的方法是将它们有效配对，并增加对数额外空间? (我相信我可以记住该信息的数量.)

Given a pile of n pairs of socks, containing 2n elements (assume each sock has exactly one matching pair), what is the best way to pair them up efficiently with up to logarithmic extra space? (I believe I can remember that amount of info if needed.)

我希望能从以下几个方面回答这个问题:

I will appreciate an answer that addresses the following aspects:

针对大量袜子的理论通用解决方案.
袜子的实际数量不是那么多，我不相信我的配偶和我有超过30对. (而且很容易区分我的袜子和她的袜子；也可以使用吗?)
是否等同于元素区别问题?

A general theoretical solution for a huge number of socks.
The actual number of socks is not that large, I don't believe my spouse and I have more than 30 pairs. (And it is fairly easy to distinguish between my socks and hers; can this be used as well?)
Is it equivalent to the element distinctness problem?

推荐答案

排序解决方案，但是排序有点过多:我们不需要订购；我们只需要平等组.

Sorting solutions have been proposed, but sorting is a little too much: We don't need order; we just need equality groups.

因此哈希就足够了(并且更快).

So hashing would be enough (and faster).

对于每种颜色的袜子，形成一堆.遍历输入篮中的所有袜子，然后将它们分配到颜色堆上.

遍历每个桩，并通过其他指标(例如模式)将其分配到第二组桩中

递归地应用此方案，直到将所有袜子分配到可以立即进行视觉处理的很小的绒头

For each color of socks, form a pile. Iterate over all socks in your input basket and distribute them onto the color piles.

Iterate over each pile and distribute it by some other metric (e.g. pattern) into the second set of piles

Recursively apply this scheme until you have distributed all socks onto very small piles that you can visually process immediately

这种递归哈希分区实际上是由 SQL Server 在需要进行哈希处理时完成的对庞大的数据集进行联接或哈希聚合.它将其构建输入流分配到许多独立的分区中.该方案可线性扩展到任意数量的数据和多个CPU.

This kind of recursive hash partitioning is actually being done by SQL Server when it needs to hash join or hash aggregate over huge data sets. It distributes its build input stream into many partitions which are independent. This scheme scales to arbitrary amounts of data and multiple CPUs linearly.

如果您可以找到提供了足够的存储桶并且每个存储桶足够小以能够非常快速地处理的分发密钥(哈希密钥)，则不需要递归分区.不幸的是，我认为袜子没有这种特性.

You don't need recursive partitioning if you can find a distribution key (hash key) that provides enough buckets that each bucket is small enough to be processed very quickly. Unfortunately, I don't think socks have such a property.

如果每只袜子都有一个称为"PairID"的整数，则可以根据PairID % 10(最后一位)轻松地将它们分配到10个存储桶中.

If each sock had an integer called "PairID" one could easily distribute them into 10 buckets according to PairID % 10 (the last digit).

我能想到的最好的现实分区是创建一个矩形矩形:一个维是颜色，另一个维是图案.为什么是矩形?因为我们需要O(1)对堆的随机访问. (3D 立方体也可以，但这不是很实用.)

The best real-world partitioning I can think of is creating a rectangle of piles: one dimension is color, the other is the pattern. Why a rectangle? Because we need O(1) random-access to piles. (A 3D cuboid would also work, but that is not very practical.)

更新:

并行性如何?可以让多个人更快地匹配袜子吗?

What about parallelism? Can multiple humans match the socks faster?

最简单的并行化策略是让多名工人从输入筐中取出并将袜子放在堆上.这只会如此之大-想象一下有100个人在10堆土地上战斗. 同步成本(表现为手动碰撞和人际交往)破坏效率和加快速度(请参见通用可扩展性法律！).这容易发生死锁吗?否，因为每个工人一次只需要访问一堆.只有一个锁"就不会出现死锁. 活锁可能是可行的，具体取决于人类如何协调对桩的访问.他们可能只使用随机补偿就像网卡在物理级别上那样确定哪些卡可以专门访问网络线.如果它适用于 NIC ，它也应适用于人类.

如果每个工人都有自己的一堆桩，几乎可以无限缩放.然后，工人可以从进料篮中取出大块的袜子(很少有竞争，因为他们很少这样做)，并且在分配袜子时根本不需要同步(因为他们有局部线程堆).最后，所有工人都需要结合他们的桩组.我相信，如果工人形成聚集树，就可以用O(log(工人数*每个工人的堆数))完成.

The simplest parallelization strategy is to have multiple workers take from the input basket and put the socks onto the piles. This only scales up so much - imagine 100 people fighting over 10 piles. The synchronization costs (manifesting themselves as hand-collisions and human communication) destroy efficiency and speed-up (see the Universal Scalability Law!). Is this prone to deadlocks? No, because each worker only needs to access one pile at a time. With just one "lock" there cannot be a deadlock. Livelocks might be possible depending on how the humans coordinate access to piles. They might just use random backoff like network cards do that on a physical level to determine what card can exclusively access the network wire. If it works for NICs, it should work for humans as well.

It scales nearly indefinitely if each worker has its own set of piles. Workers can then take big chunks of socks from the input basket (very little contention as they are doing it rarely) and they do not need to synchronise when distributing the socks at all (because they have thread-local piles). At the end, all workers need to union their pile-sets. I believe that can be done in O(log (worker count * piles per worker)) if the workers form an aggregation tree.

元素区别问题如何处理?如文章所述，元素差异性问题可以在O(N)中解决.对于袜子问题(也是O(N)，如果您只需要一个分配步骤，我也建议使用多个步骤，因为人类在计算上很差-如果您在md5(color, length, pattern, ...)上分配一个步骤，即一个所有属性的完美哈希).

What about the element distinctness problem? As the article states, the element distinctness problem can be solved in O(N). This is the same for the socks problem (also O(N), if you need only one distribution step (I proposed multiple steps only because humans are bad at calculations - one step is enough if you distribute on md5(color, length, pattern, ...), i.e. a perfect hash of all attributes)).

很明显，不能比O(N)快，所以我们已经达到了最佳下限.

Clearly, one cannot go faster than O(N), so we have reached the optimal lower bound.

尽管输出并不完全相同(在一种情况下，只是布尔值.在另一种情况下，是成对的袜子)，则渐进复杂度是相同的.

Although the outputs are not exactly the same (in one case, just a boolean. In the other case, the pairs of socks), the asymptotic complexities are the same.