字符串匹配技术

编程入门 行业动态 更新时间:2024-10-28 12:20:38
本文介绍了字符串匹配技术的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

以下字符串被视为相等.我怎样才能匹配这样的东西?

The following strings are considered equal. How can I match stuff like this?

"Hazard Const. Company" "hazard construction company" "PETERSON-CHASE GENERAL ENGINEERING CONSTRUCTION INC" "peterson-chase general engineering construction inc" "TRAFFIC DEVELOPMENT SERVICES " "traffic development services"

我的环境是 ruby​​,但我只是想知道匹配字符串的一般原则.由于空格问题和缩写,上述示例不适用于基本的a"==b".我可以使用正则表达式忽略大小写或小写字符串来缓解大小写问题...

My environment is ruby, but I'm just wondering general principles to match strings. The above examples don't work w/ rudimentary "a"=="b" because of whitespace issues, and abbreviations. I can mitigate casing issues w/ regex case-ignore or downcase the strings...

推荐答案

以下示例比较了所有字符串并计算了 leventhtein 差异(使一个字符串适应另一个字符串所需的击键次数).

The following sample compares all of your strings and computes the levensthtein difference (amount of keystrokes it takes to adapt one string to the other).

基于定义的最大差异和对字符串长度的补偿,然后将字符串作为具有出现次数和值的键放入哈希中.

Based on a defined maximum difference and with a compensation for the lengts of the string it then puts the strings in a hash as a key with the number of occurences als value.

require 'levenshtein' MAX_DISTANCE, COMPENSATION = 3, 5 strings = [ "Hazard Const. Company", "hazard construction company", "PETERSON-CHASE GENERAL ENGINEERING CONSTRUCTION INC", "peterson-chase general engineering construction inc", "TRAFFIC DEVELOPMENT SERVICES ", "traffic development services" ] result = {} strings.each do |s| s.downcase! similar = result.keys.select { |key| Levenshtein.distance(key, s) < MAX_DISTANCE+(s.length/COMPENSATION) } if similar.any? result[similar.first] += 1 else result.merge!({s => 1}) end end puts result.inspect # {"hazard const. company"=>2, "peterson-chase general engineering construction inc"=>2, "traffic development services "=>2}

更多推荐

字符串匹配技术

本文发布于:2023-11-12 07:57:06,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1580937.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:字符串   技术

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!