昨天,我从干净的衣物中取出袜子,弄清楚我的操作方式不是很有效.我一直在天真地搜寻-挑选一只袜子,然后迭代"一堆,以找到那双袜子.这需要平均迭代n/2 * n/4 = n 2 /8个袜子.
Yesterday I was pairing the socks from the clean laundry and figured out the way I was doing it is not very efficient. I was doing a naive search— picking one sock and "iterating" the pile in order to find its pair. This requires iterating over n/2 * n/4 = n2/8 socks on average.
作为计算机科学家,我在想我能做什么?当然会想到进行排序(根据大小/颜色/...)以实现O(NlogN)解决方案.
As a computer scientist I was thinking what I could do? Sorting (according to size/color/...) of course came to mind to achieve an O(NlogN) solution.
因为我无法复制袜子,所以不能选择使用散列或其他非现场解决方案(尽管如果可以的话,这可能会很好).
Hashing or other not-in-place solutions are not an option, because I am not able to duplicate my socks (though it could be nice if I could).
所以,问题基本上是:
给一堆n双袜子,其中包含2n个元素(假设每只袜子都有一对完全匹配的袜子),最好的方法是将它们有效配对,并增加对数额外空间? (我相信我可以记住该信息的数量.)
Given a pile of n pairs of socks, containing 2n elements (assume each sock has exactly one matching pair), what is the best way to pair them up efficiently with up to logarithmic extra space? (I believe I can remember that amount of info if needed.)
我希望能从以下几个方面回答这个问题:
I will appreciate an answer that addresses the following aspects:
- 针对大量袜子的理论通用解决方案.
- 袜子的实际数量不是那么多,我不相信我的配偶和我有超过30对. (而且很容易区分我的袜子和她的袜子;也可以使用吗?)
- 是否等同于元素区别问题?
- A general theoretical solution for a huge number of socks.
- The actual number of socks is not that large, I don't believe my spouse and I have more than 30 pairs. (And it is fairly easy to distinguish between my socks and hers; can this be used as well?)
- Is it equivalent to the element distinctness problem?
排序解决方案,但是排序有点过多:我们不需要订购; 我们只需要平等组.
Sorting solutions have been proposed, but sorting is a little too much: We don't need order; we just need equality groups.
因此哈希就足够了(并且更快).
So hashing would be enough (and faster).
这种递归哈希分区实际上是由 SQL Server 在需要进行哈希处理时完成的对庞大的数据集进行联接或哈希聚合.它将其构建输入流分配到许多独立的分区中.该方案可线性扩展到任意数量的数据和多个CPU.
This kind of recursive hash partitioning is actually being done by SQL Server when it needs to hash join or hash aggregate over huge data sets. It distributes its build input stream into many partitions which are independent. This scheme scales to arbitrary amounts of data and multiple CPUs linearly.
如果您可以找到提供了足够的存储桶并且每个存储桶足够小以能够非常快速地处理的分发密钥(哈希密钥),则不需要递归分区.不幸的是,我认为袜子没有这种特性.
You don't need recursive partitioning if you can find a distribution key (hash key) that provides enough buckets that each bucket is small enough to be processed very quickly. Unfortunately, I don't think socks have such a property.
如果每只袜子都有一个称为"PairID"的整数,则可以根据PairID % 10(最后一位)轻松地将它们分配到10个存储桶中.
If each sock had an integer called "PairID" one could easily distribute them into 10 buckets according to PairID % 10 (the last digit).
我能想到的最好的现实分区是创建一个矩形矩形:一个维是颜色,另一个维是图案.为什么是矩形?因为我们需要O(1)对堆的随机访问. (3D 立方体也可以,但这不是很实用.)
The best real-world partitioning I can think of is creating a rectangle of piles: one dimension is color, the other is the pattern. Why a rectangle? Because we need O(1) random-access to piles. (A 3D cuboid would also work, but that is not very practical.)
更新:
并行性如何?可以让多个人更快地匹配袜子吗?
What about parallelism? Can multiple humans match the socks faster?
元素区别问题如何处理?如文章所述,元素差异性问题可以在O(N)中解决.对于袜子问题(也是O(N),如果您只需要一个分配步骤,我也建议使用多个步骤,因为人类在计算上很差-如果您在md5(color, length, pattern, ...)上分配一个步骤,即一个所有属性的完美哈希).
What about the element distinctness problem? As the article states, the element distinctness problem can be solved in O(N). This is the same for the socks problem (also O(N), if you need only one distribution step (I proposed multiple steps only because humans are bad at calculations - one step is enough if you distribute on md5(color, length, pattern, ...), i.e. a perfect hash of all attributes)).
很明显,不能比O(N)快,所以我们已经达到了最佳下限.
Clearly, one cannot go faster than O(N), so we have reached the optimal lower bound.
尽管输出并不完全相同(在一种情况下,只是布尔值.在另一种情况下,是成对的袜子),则渐进复杂度是相同的.
Although the outputs are not exactly the same (in one case, just a boolean. In the other case, the pairs of socks), the asymptotic complexities are the same.
更多推荐
如何有效地将袜子配对?
发布评论