算法"模糊匹配"字符串

编程入门 行业动态 更新时间:2024-10-14 22:23:15
本文介绍了算法"模糊匹配"字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

通过模糊匹配我的意思不是类似字符串由Levenshtein距离或类似的东西,但它在TextMate中/伊多文/冰柱使用方法:给定一个字符串列表,找到那些包括在搜索字符串中的所有字符,但可能与之间,preferring最合适的。

By fuzzy matching I don't mean similar strings by Levenshtein distance or something similar, but the way it's used in TextMate/Ido/Icicles: given a list of strings, find those which include all characters in the search string, but possibly with other characters between, preferring the best fit.

推荐答案

我终于明白了什么,你所期待的。这个问题很有趣但是在看2算法,你发现它似乎人们对规范广泛不同意见;)

I've finally understood what you were looking for. The issue is interesting however looking at the 2 algorithms you found it seems that people have widely different opinions about the specifications ;)

我认为这将是有益的陈述问题和要求更清楚。

I think it would be useful to state the problem and the requirements more clearly.

问题

我们正在寻找一种方式,允许用户只输入关键字他们居然打算的几个字母,并提出他们从中选择一个列表,加快打字。

We are looking for a way to speed up typing by allowing users to only type a few letters of the keyword they actually intended and propose them a list from which to select.

  • 预期的输入的所有字母是在关键字
  • 据预计,在输入的字母在同一个顺序在关键字
  • 关键字的列表中返回应psented一致(reproductible)为了$ P $
  • 在该算法应不区分大小写
  • 分析

    前两个要求可以总结一下像这样:对于输入 AXG 我们正在寻找的话匹配此正EX pression [^ A] * A [^ X] * X [^ G] * G *

    The first two requirements can be sum up like such: for an input axg we are looking for words matching this regular expression [^a]*a[^x]*x[^g]*g.*

    第三个要求是故意松。其中是一致的话应该出现在列表中需要的命令......但它很难猜测评分方法是否会比按字母顺序排列好。如果列表extremy长,则评分方法可能会更好,但对于短名单更容易让眼睛去寻找下一个列表排序明显地特定项目。

    The third requirement is purposely loose. The order in which the words should appear in the list need being consistent... however it's difficult to guess whether a scoring approach would be better than alphabetical order. If the list is extremy long, then a scoring approach could be better, however for short list it's easier for the eye to look for a particular item down a list sorted in an obvious manner.

    此外,按字母顺序排列具有一致性的优势打字时:即增加一个字母不完全列表重新排序(痛苦的眼睛和大脑),它只是过滤掉不符合任何更长的项目

    Also, the alphabetical order has the advantage of consistency during typing: ie adding a letter does not completely reorder the list (painful for the eye and brain), it merely filters out the items that do not match any longer.

    有一个关于处理UNI code字符,例如没有precision是 A 类似于 A 或其它字符完全?因为我知道,目前使用这些字符在他们的关键字没有语言的,我会让它溜走现在。

    There is no precision about handling unicode characters, for example is à similar to a or another character altogether ? Since I know of no language that currently uses such characters in their keywords, I'll let it slip for now.

    我的解决办法:

    对于任何投入,我将建立早期pssed正规前pression EX $ P $。它适用于Python的,因为语言已经拥有不区分大小写匹配。

    For any input, I would build the regular expression expressed earlier. It suitable for Python because the language already features case-insensitive matching.

    我会再匹配关键字的我(按字母顺序排序)列表,并输出使过滤。

    I would then match my (alphabetically sorted) list of keywords, and output it so filtered.

    在伪code:

    WORDS = ['Bar', 'Foo', 'FooBar', 'Other'] def GetList(input, words = WORDS): expr = ['[^' + i + ']*' + i for i in input] return [w for w in words if re.match(expr, w, re.IGNORECASE)]

    我可以用一个一行但认为它会掩盖了code;)

    I could have used a one-liner but thought it would obscure the code ;)

    该解决方案非常适用于增量的情况下(例如,当您匹配的用户类型,从而保持重建),因为当用户添加,你可以简单地重新过滤结果你只计算一个字符。因此:

    This solution works very well for incremental situations (ie, when you match as the user type and thus keep rebuilding) because when the user adds a character you can simply refilter the result you just computed. Thus:

    • 无论有几个字符,从而匹配的是快速和列表的长度没有多大关系
    • 要么也有许多字符,这意味着我们正在筛选候选名单,因此它没有太大的关系,如果匹配需要更长的时间逐元素

    我也应该注意到,这种常规的前pression不涉及回溯,因此十分有效。它也可以被建模为一个简单的状态机。

    I should also note that this regular expression does not involve back-tracking and is thus quite efficient. It could also be modeled as a simple state machine.

    更多推荐

    算法"模糊匹配"字符串

    本文发布于:2023-10-23 05:29:23,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1519927.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:字符串   算法   模糊   QUOT

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!