高效的字符串匹配算法

编程入门 行业动态 更新时间:2024-10-13 18:25:42
本文介绍了高效的字符串匹配算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我试图建立一个高效的字符串匹配算法。这将在一个大容量的环境中执行,所以性能是至关重要的。

I'm trying to build an efficient string matching algorithm. This will execute in a high-volume environment, so performance is critical.

下面是我的要求:

  • 在给定的域名,如www.example,确定是否匹配之一条目列表。
  • 在参赛作品可以是绝对匹配,即www.example。
  • 条目可包含通配符,如* .example的。
  • 通配符条目从最定义的水平和热身赛。例如,* .example的将匹配www.example,example,并sub.www.example。
  • 通配符条目没有被嵌入,即子。*。example将不会是一个条目。

语言/环境:C#(.NET Framework 3.5中)

Language/environment: C# (.Net Framework 3.5)

我认为分裂的条目(和域名查询)到数组,扭转了订单,然后通过数组迭代。虽然准确,感觉慢。

I've considered splitting the entries (and domain lookup) into arrays, reversing the order, then iterating through the arrays. While accurate, it feels slow.

我认为正则表达式,但我担心准确地重新presenting项定期EX pressions名单。

I've considered Regex, but am concerned about accurately representing the list of entries as regular expressions.

我的问题:什么是如果一个字符串发现,在域名形式的有效方式,匹配任何一个字符串列表,因为上面列出的说明

My question: what's an efficient way of finding if a string, in the form of a domain name, matches any one in a list of strings, given the description listed above?

推荐答案

如果你想推出自己的,我将存储的条目以树状结构。请参见我回答另一个SO质疑有关拼写检查,看看有什么我意思。

If you're looking to roll your own, I would store the entries in a tree structure. See my answer to another SO question about spell checkers to see what I mean.

而不是标记化的结构。人物,我只想把每个条​​目作为一个完整的字符串。任何符号化的实施将仍然需要做的整套字符字符串匹配,无论如何,所以你不妨做这一切在一杆。

Rather than tokenize the structure by "." characters, I would just treat each entry as a full string. Any tokenized implementation would still have to do string matching on the full set of characters anyway, so you may as well do it all in one shot.

这和常规拼写检查树之间的唯一区别是:

The only differences between this and a regular spell-checking tree are:

  • 的匹配需要在反向进行
  • 您必须考虑到通配符
  • The matching needs to be done in reverse
  • You have to take into account the wildcards

    要解决点#2,您只需在测试结束检查的*字符。

    To address point #2, you would simply check for the "*" character at the end of a test.

    一个简单的例子:

    条目:

    *.fark wwwn

    树:

    m -> o -> c -> . -> k -> r -> a -> f -> . -> * \ -> n -> n -> c -> . -> w -> w -> w

    检查www.blog.fark将涉及通过树向上回溯到第一*。由于在结束了遍历一个*,有一个匹配。

    Checking www.blog.fark would involve tracing through the tree up to the first "*". Because the traversal ended on a "*", there is a match.

    检查www.cern会失败n的第二个N,N,C,...

    Checking www.cern would fail on the second "n" of n,n,c,...

    检查dev.wwwn也将失败,因为遍历一个字符不是*。

    Checking dev.wwwn would also fail, since the traversal ends on a character other than "*".

  • 更多推荐

    高效的字符串匹配算法

    本文发布于:2023-11-29 21:19:04,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1647657.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:高效   字符串   算法

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!