集合的简单通用哈希函数(Simple general

系统教程 行业动态 更新时间:2024-06-14 16:52:52
集合的简单通用哈希函数(Simple general-purpose hash function for a collection)

请标记为重复,但到目前为止我发现的大多数问题都比我想要的更具体或更复杂。 例如,在“什么是好的散列函数”中 ,接受的答案似乎是针对散列字符串。

我最近开始用.NET编程,我发现很遗憾内置类缺乏检查等式和查找哈希等基本功能的能力。 我相信他们有自己的设计理由; 无需保护.NET。 我只是想知道当我需要使用集合作为字典的关键时如何避免重大的副轨道。 例如,我希望两个不同的List对象包含所有相等的值,以映射到字典中的相同条目。 开箱即用,它们不会:List的默认行为是List除了它本身不等于任何东西,因此具有相同值的列表的另一个实例是不同的键。

实现Equals非常简单。 这是我不确定的哈希函数。

是否提供了我可以在我的GetHashCode实现中调用的东西?

如果我必须从头开始编写,那么什么是一个非常简单但足够好的哈希算法呢? 我可以使用SHA1,但我认为这将是矫枉过正的。 我可以只考虑项目的所有哈希值,但我认为这会有一些讨厌的碰撞属性。 我不关心计算哈希是否非常快,但我不希望我的哈希表在具有某些特定分布的数据集上减慢到线性。 我想要的是这么简单,我可以记住它。 如果你可以解释(或链接)它的工作原因,你可以获得奖金。

Please mark as duplicate, but most questions I've found so far are too specific or more complex than I'm looking for. E.g. in "What is a good hash function", the accepted answer seems to be oriented toward hashing strings.

I've recently started programming in .NET, and I find it unfortunate that built-in classes lack the ability to do some basic things like check equality and find their hash. I'm sure they have their design reasons for that; no need to defend .NET. I just want to know how to avoid a significant sidetrack when I need to use a collection as a key to a dictionary. I want, for example, two different List objects containing all equal values to map to the same entry in the dictionary. Out of the box, they don't: the default behavior for List is that a List is not equal to anything but itself, so another instance of a list with the same values is a different key.

Implementing Equals is straightforward. It's the hash function that I am unsure of.

Is there just something provided that I can call in my implementation of GetHashCode?

If I have to write it from scratch, what's a really simple but good enough hash algorithm? I could use SHA1 but I think it would be overkill. I could just xor all the hashes of the items, but I think that would have some nasty collision properties. I don't care if computing hashes is blazingly fast, but I don't want my hash table to slow to linear on data sets with some particular distribution. What I'd like is something so simple that I can memorize it. Bonus if you can explain (or link to) why it works.

最满意答案

这里要非常小心。 如果你为List<T> (或类似的集合)创建一个GetHashCode方法,那么可能它会做这样的事情:

public override int GetHashCode() { int hash = 13; foreach (var t in this) { // X is an operation (undefined here) that somehow combines // the previous hash value and the item's hash value hash = hash X t.GetHashCode(); } return hash; }

(我会建议类似Jenkins哈希来计算哈希码。还要看看王哈希 (或比特混音器)。)

除非您第一次计算该值并对其进行缓存,否则每次调用GetHashCode时,您最终都会遍历所有项目。

因此,您已为集合创建了GetHashCode和Equals ,并将实例放入Dictionary 。 现在你必须非常小心,不要更改集合(即不添加或删除任何项目)或集合中的任何项目。 否则, GetHashCode的值将改变,字典将不再起作用。

我强烈建议如果你想使用集合作为字典的关键,你要确保集合是不可变的。

还有一件事需要考虑。 列表相等的概念并不像您指出的那么简单。 例如,列表[1, 2, 3, 4, 5]和[5, 1, 3, 4, 2]相等? 这取决于你对平等的定义。 当然A.Union(B) == A.Intersect(B) ,这意味着如果你的相等定义是“包含相同的项目”,它们是相等的。 但如果订单很重要,那么这些清单并不相同。

如果您的定义是“包含相同的项目”,那么上面显示的哈希码计算将不起作用,因为哈希码计算是依赖于顺序的。 因此,如果您想计算这些列表的哈希码,则必须先对它们进行排序。

如果列表不能包含重复项,则计算相等性是创建一个列表的哈希集并从该哈希集中的另一个列表中查找每个项目的问题。 如果列表可以包含重复项,那么您必须对它们进行排序以确定相等性,或者使用某种带有计数的字典。 并且这两者都暗示列表中包含的对象将实现某种形式的相等比较器等。

一些平等的定义根本不考虑重复。 也就是说, [1, 2, 3]将等于[3, 3, 3, 2, 1, 1] 。

考虑到平等的不同差异以及为定义List<T>的行为而允许的那些以及更多的努力,我可以理解为什么设计集合类的人没有实现值相等。 特别是考虑到使用List<T>或类似的集合作为字典或散列表中的键是非常罕见的。

Be very careful here. If you create a GetHashCode method for a List<T> (or similar collection), then presumably it'll do something like this:

public override int GetHashCode() { int hash = 13; foreach (var t in this) { // X is an operation (undefined here) that somehow combines // the previous hash value and the item's hash value hash = hash X t.GetHashCode(); } return hash; }

(I would suggest something like the Jenkins hash for computing the hash code. Also look into the Wang hash (or bit mixer).)

Unless you compute that value the first time and the cache it, you will end up iterating over all of the items every time GetHashCode is called.

So you've created a GetHashCode and Equals for your collection and you put an instance into a Dictionary. Now you have to be very careful not to change the collection (i.e. don't add or remove any items) or any of the items inside the collection. Otherwise the value of GetHashCode will change, and the dictionary won't work anymore.

I strongly suggest that if you want to use a collection as the key to a dictionary, you make very sure that the collection is immutable.

One other thing to consider. The concept of list equality isn't as simple as you indicate. For example, are the lists [1, 2, 3, 4, 5] and [5, 1, 3, 4, 2] equal? It rather depends on your definition of equality. Certainly A.Union(B) == A.Intersect(B), which means that they're equal if your definition of equality is "contain the same items." But if order matters, then the lists aren't equal.

If your definition is "contain the same items," then the hash code calculation I showed above isn't going to work because hash code computations are order dependent. So if you wanted to compute the hash code of those lists, you'd have to sort them first.

If the lists cannot contain duplicates, then computing equality is a matter of creating a hash set of one list and looking up each item from the other list in that hash set. If the lists can contain duplicates, then you either have to sort them to determine equality, or use some kind of dictionary with a count. And both of those imply that the objects contained in the list will implement some form of equality comparer, etc.

And some definitions of equality don't take duplicates into account at all. That is, [1, 2, 3] would be equal to [3, 3, 3, 2, 1, 1].

Considering the varying differences of equality and the effort it would have taken to allow for those and more in defining the behavior of List<T>, I can understand why whoever designed the collection classes didn't implement value equality. Especially considering that it's pretty uncommon to use a List<T> or similar collection as the key in a dictionary or hash table.

更多推荐

本文发布于:2023-04-05 12:25:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/037c0c88636ecde646cfec2d14f8246d.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:函数   简单   general   Simple

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!