我正在使用以下类来计算之间的 Jaro-Winkler距离两个字符串.我注意到的是,字符串A和B之间计算出的距离并不总是与字符串B和A相同.这是可以预期的吗?
I'm using the following class to calculate the Jaro-Winkler distance between two strings. What I'm noticing is that the distance calculated between string A and B is not always the same as string B and A. Is this to be expected?
RAMADI ~ TRADING 0.73492063492063 TRADING ~ RAMADI 0.71825396825397演示
推荐答案结果发现,PHP版本的Jaro-Winkler字符串比较方法中存在一个错误,在线上有很多地方.
Turns out, there is a bug in the PHP versions of the Jaro-Winkler string comparison method found many places online.
当前,与字符串B相比,字符串A与字符串B的结果与字符串B的结果不同.字符串.这是不正确的. 在比较A与B的匹配值与B与A的匹配值时,Jaro-Winkler方法应产生相同的结果.
Currently, string A compared to string B will yield a different result to string B compared to string A, when either string A or B contains a character found in both strings, that is found more than once in one of the string. This is incorrect. The Jaro-Winkler method should yield the same result when comparing the match value from A compared to B with B compared to A.
为此,在识别公共字符时,不应重复相同的字符.常见字符变量需要删除重复数据后才能返回.
To rectify this, when identifying the common characters, the same character should not be repeated. The common characters variable needs to be deduplicated before returned.
下面的代码将公共字符串替换为使用公共字符作为键的数组,以避免重复.通过使用下面的代码,与B相比,A与B的结果相同.
The below code replaces the common characters string with an array that uses the common character as the key, to avoid duplication. By using the code below, A compared to B yields the same results as B compared to A.
这与该方法的C#版本一致.
This is inline with the C# version of the method.
//$commonCharacters=''; # The Common Characters variable must be an array $commonCharacters = []; for( $i=0; $i < $str1_len; $i++){ $noMatch = True; // compare if char does match inside given allowedDistance // and if it does add it to commonCharacters for( $j= max( 0, $i-$allowedDistance ); $noMatch && $j < min( $i + $allowedDistance + 1, $str2_len ); $j++) { if( $temp_string2[(int)$j] == $string1[$i] ){ // MJR $noMatch = False; //$commonCharacters .= $string1[$i]; # The Common Characters array uses the character as a key to avoid duplication. $commonCharacters[$string1[$i]] = $string1[$i]; $temp_string2[(int)$j] = ''; // MJR } } } //return $commonCharacters; # When returning, turn the array back to a string, as expected return implode("", $commonCharacters);更多推荐
使用Jaro
发布评论