如何在Ruby中为正则表达式字符串匹配生成百分比？

编程入门行业动态更新时间:2024-10-23 13:29:57

本文介绍了如何在Ruby中为正则表达式字符串匹配生成百分比？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在尝试构建一种简单的方法，以查看数据库中约100个条目的姓氏，并提取所有匹配到特定百分比字母以上的条目。我当前的方法是：

I'm trying to build a simple method to look at about 100 entries in a database for a last name and pull out all the ones that match above a specific percentage of letters. My current approach is:

将数据库中的所有100个条目拉入数组

它们，同时执行以下操作

将姓氏拆分为字母数组

从包含该名称字母的另一个数组中减去该数组我正在尝试匹配，只留下不匹配的字母。

取结果的大小，然后除以步骤3中数组的原始大小，得到一个百分比。

如果百分比高于预定义的阈值，则将该数据库对象推入结果数组。

Pull all 100 entries from the database into an array

Iterate through them while performing the following action

Split the last name into an array of letters

Subtract that array from another array that contains the letters for the name I am trying to match which leaves only the letters that weren't matched.

Take the size of the result and divide by the original size of the array from step 3 to get a percentage.

If the percentage is above a predefined threshold, push that database object into a results array.

这行得通，但是我觉得必须有一些凉爽的红宝石/正则表达式/活动记录方法才能更有效地做到这一点。我已经用Google搜索了很多，但找不到任何东西。

This works, but I feel like there must be some cool ruby/regex/active record method of doing this more efficiently. I have googled quite a bit but can't find anything.

推荐答案

要评论您建议的措施的优缺点，需要进行推测，这在SO上是无法实现的。因此，我将仅演示如何实现建议的方法。

To comment on the merit of the measure you suggested would require speculation, which is out-of-bounds at SO. I therefore will merely demonstrate how you might implement your proposed approach.

代码

首先定义一个辅助方法：

First define a helper method:

class Array def difference(other) h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 } reject { |e| h[e] > 0 && h[e] -= 1 } end end

简而言之，如果

a = [3,1,2,3,4,3,2,2,4] b = [2,3,4,4,3,4]

然后

a - b #=> [1]

而

a.difference(b) #=> [1, 3, 2, 2]

我对这个SO问题。我发现它有很多用途，因此建议将其添加到Ruby Core中。

This method is elaborated in my answer to this SO question. I've found so many uses for it that I've proposed it be added to the Ruby Core.

以下方法生成一个哈希，其键是名称（字符串）的元素，其值是小数 target 字符串中名称的每个字符串中包含的字母。

The following method produces a hash whose keys are the elements of names (strings) and whose values are the fractions of the letters in the target string that are contained in each string in names.

def target_fractions(names, target) target_arr = target.downcase.scan(/[a-z]/) target_size = target_arr.size names.each_with_object({}) do |s,h| s_arr = s.downcase.scan(/[a-z]/) target_remaining = target_arr.difference(s_arr) h[s] = (target_size-target_remaining.size)/target_size.to_f end end

示例

target = "Jimmy S. Bond"

和您要比较的名称由

names = ["Jill Dandy", "Boomer Asad", "Josefine Simbad"]

然后

target_fractions(names, target) #=> {"Jill Dandy"=>0.5, "Boomer Asad"=>0.5, "Josefine Simbad"=>0.8}

说明

对于上述名称和 target ，

target_arr = target.downcase.scan(/[a-z]/) #=> ["j", "i", "m", "m", "y", "s", "b", "o", "n", "d"] target_size = target_arr.size #=> 10

现在考虑

s = "Jill Dandy" h = {}

然后

s_arr = s.downcase.scan(/[a-z]/) #=> ["j", "i", "l", "l", "d", "a", "n", "d", "y"] target_remaining = target_arr.difference(s_arr) #=> ["m", "m", "s", "b", "o"] h[s] = (target_size-target_remaining.size)/target_size.to_f #=> (10-5)/10.0 => 0.5 h #=> {"Jill Dandy"=>0.5}