使用Jaro

编程入门 行业动态 更新时间:2024-10-07 08:24:28
本文介绍了使用Jaro-Winkler,A和B之间的距离是否等于B和A?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在使用以下类来计算之间的 Jaro-Winkler距离两个字符串.我注意到的是,字符串A和B之间计算出的距离并不总是与字符串B和A相同.这是可以预期的吗?

I'm using the following class to calculate the Jaro-Winkler distance between two strings. What I'm noticing is that the distance calculated between string A and B is not always the same as string B and A. Is this to be expected?

RAMADI ~ TRADING 0.73492063492063 TRADING ~ RAMADI 0.71825396825397

演示

推荐答案

结果发现,PHP版本的Jaro-Winkler字符串比较方法中存在一个错误,在线上有很多地方.

Turns out, there is a bug in the PHP versions of the Jaro-Winkler string comparison method found many places online.

当前,与字符串B相比,字符串A与字符串B的结果与字符串B的结果不同.字符串.这是不正确的. 在比较A与B的匹配值与B与A的匹配值时,Jaro-Winkler方法应产生相同的结果.

Currently, string A compared to string B will yield a different result to string B compared to string A, when either string A or B contains a character found in both strings, that is found more than once in one of the string. This is incorrect. The Jaro-Winkler method should yield the same result when comparing the match value from A compared to B with B compared to A.

为此,在识别公共字符时,不应重复相同的字符.常见字符变量需要删除重复数据后才能返回.

To rectify this, when identifying the common characters, the same character should not be repeated. The common characters variable needs to be deduplicated before returned.

下面的代码将公共字符串替换为使用公共字符作为键的数组,以避免重复.通过使用下面的代码,与B相比,A与B的结果相同.

The below code replaces the common characters string with an array that uses the common character as the key, to avoid duplication. By using the code below, A compared to B yields the same results as B compared to A.

这与该方法的C#版本一致.

This is inline with the C# version of the method.

//$commonCharacters=''; # The Common Characters variable must be an array $commonCharacters = []; for( $i=0; $i < $str1_len; $i++){ $noMatch = True; // compare if char does match inside given allowedDistance // and if it does add it to commonCharacters for( $j= max( 0, $i-$allowedDistance ); $noMatch && $j < min( $i + $allowedDistance + 1, $str2_len ); $j++) { if( $temp_string2[(int)$j] == $string1[$i] ){ // MJR $noMatch = False; //$commonCharacters .= $string1[$i]; # The Common Characters array uses the character as a key to avoid duplication. $commonCharacters[$string1[$i]] = $string1[$i]; $temp_string2[(int)$j] = ''; // MJR } } } //return $commonCharacters; # When returning, turn the array back to a string, as expected return implode("", $commonCharacters);

更多推荐

使用Jaro

本文发布于:2023-11-29 09:17:35,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1645978.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:Jaro

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!