HashSet 如何比List .Contains更快地包含?(How is HashSet.Contains faster than List.Contains?)

系统教程 行业动态 更新时间:2024-06-14 17:01:31
HashSet 如何比List .Contains更快地包含?(How is HashSet.Contains faster than List.Contains?)

我有一个简单的要求:我有数以百万计的字符串,并且想要测试它们是否存在于一个小集合中。 我对这个集合使用List<T> vs HashSet<T>有疑问。

当需求相反时,例如,你有100个字符串,需要检查它们是否存在于一组数百个字符串中,我完全理解HashSet<T>是最佳选择。

但就我而言,似乎.NET在HashSet<T>上调用Contains时必须计算数百万次哈希(调用GetHashCode ),因此调用List<T> Contains可能会更快?

任何人都可以解释这个假设是否正确?

I have a simple requirement: I have millions of strings, and want to test if they exist in a small set. I'm in doubt between using a List<T> vs a HashSet<T> for this set.

When the requirement is the opposite, for example, you have 100 strings and need to check if they exist in a set of millions of strings, I fully understand that the HashSet<T> is the best choice.

But in my case, it seems that .NET has to calculate millions of hashes (calls to GetHashCode) when calling Contains on the HashSet<T>, so calling Contains of a List<T> could be faster?

Can anyone explain if this assumption is correct?

最满意答案

这些都不适合我HashSet<string>听起来像是对我来说最好的方法。

是的,.NET必须为每个字符串计算哈希代码 - 问题是,只要检查候选集中数百个字符串中的每一个字符串是否相等,就会花费多少时间。

根据所有性能问题,你应该真正测试这个而不是猜测。 例如,如果所有字符串的长度都不相同,并且它们都很长,那么Equals对每个候选对象都很便宜,而GetHashCode可能需要很长时间。 但是,如果所有的字符串都是以相同的6个字符开始的长度10,那么GetHashCode会相当便宜,但是每个字符串相等性检查都必须检查所有这些通用前缀字符。 这些更像你的实际情况? 你的基准测试显示了什么? 你需要多快?

Neither of these seems appropriate to me - a HashSet<string> sounds like it may be the best approach to me.

Yes, .NET has to compute the hash code for each string - the question is whether or not that takes as long as checking for equality with each of the hundreds of strings in the candidate set.

As per all performance questions, you should really test this rather than guessing. For example, if all the strings have different lengths and they're all long, then Equals will be cheap against each candidate, and GetHashCode could take a long time. However, if all your strings are length 10 starting with the same 6 characters, say, then GetHashCode will be reasonably cheap but each string equality check will have to check all of those common prefix characters. Which of these is more like your actual situation? What have your benchmarks shown? How fast do you need this to be?

更多推荐

本文发布于:2023-04-20 16:04:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/c0e20f785e8d02d990ed66ded3a44e1f.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:更快   HashSet   List   faster

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!