SAS 中有 Jaro-Winkler 字符串比较的实现吗?
Is there an implementation of the Jaro-Winkler string comparison in SAS?
看起来 Link King 有 Jaro-Winkler,但我会更喜欢自己调用函数的灵活性.
It looks like Link King has Jaro-Winkler, but I'd prefer the flexibility of calling the function myself.
谢谢!
推荐答案据我所知,jaro-winkler 距离没有内置函数.@Itzy 已经引用了我所知道的唯一一个.您可以使用 proc fcmp 滚动您自己的函数,但如果您愿意的话.我什至会从下面的代码开始.我只是尝试关注关于它的维基百科文章.无论如何,它肯定不是 Bill Winkler 的 strcmp.c 文件的完美代表,并且可能有很多错误.
There is no built in function for jaro-winkler distance that I am aware of. @Itzy already reference the only ones that I know of. You can roll you own functions with proc fcmp though if you feel up to it. I'll even give you a head start with the code below. I just tried to follow the wikipedia article on it. It certainly isn't close to being a perfect representation of Bill Winkler's strcmp.c file by any means and likely has lots of bugs.
proc fcmp outlib=work.jaro.chars; subroutine jaromatch ( string1 $ , string2 $ , matchChars $); outargs matchChars; /* Returns number of matched characters between 2 strings excluding blanks*/ /* two chars from string1 and string2 are considered matching if they are no farther than floor(max(|s1|, |s2|)/2)-1 */ str1_len = length(strip(string1)); str2_len = length(strip(string2)); allowedDist = floor(max(str1_len, str2_len)/2) -1; matchChars=""; /* walk through string 1 and match characters to string2 */ do i= 1 to str1_len; x=substr(string1,i,1); position = findc(string2,x ,max(1,i-allowedDist)); if position > 0 then do; if position - i <= allowedDist then do; y=substr(string2,position,1); /* build list of matched characters */ matchChars=cats(matchChars,y); end; end; end; matchChars = strip(matchChars); endsub; function jarotrans (string1 $ , string2 $ ); ntrans = 0; ubnd = min(length(strip(string1)), length(strip(string2))); do i = 1 to ubnd; if substr(string1,i,1) ne substr(string2,i,1) then do; ntrans + 1; end; end; return(ntrans/2); endsub; function getPrefixlen( string1 $ , string2 $, maxprelen); /* get the length of the matching characters at the beginning */ n = min(maxprelen, length(string1), length(string2)); do i = 1 to n; if substr(string1,i,1) ne substr(string2,i,1) then return(max(1,i-1)); end; endsub; function jarodist(string1 $, string2 $); /* get number of matched characters */ call jaromatch(string1, string2, m1); m1_len = length(m1); if m1_len = 0 then return(0); call jaromatch(string2, string1, m2); m2_len = length(m2); if m2_len = 0 then return(0); /* get number of transposed characters */ ntrans = jarotrans(m1, m2); put m1_len= m2_len= ntrans= ; j_dist = (m1_len/length(string1) + m2_len/length(string2) + (m1_len-ntrans)/m1_len ) / 3; return(j_dist); endsub; function jarowink( string1 $, string2 $, prefixscale); jarodist=jarodist(string1, string2); prelen=getPrefixlen(string1, string2, 4); if prelen = 0 then return(jarodist); else return(jarodist + prelen * prefixscale * (1-jarodist)); endsub; run;quit; /* tell SAS where to find the functions we just wrote */ option cmplib=work.jaro; /* Now let's try it out! */ data _null_; string1='DIXON'; string2='DICKSONX'; x=jarodist(string1, string2); y=jarowink(string1, string2, 0.1); put x= y=; run;更多推荐
SAS 中的 Jaro
发布评论