SAS中的Jaro

编程入门 行业动态 更新时间:2024-10-07 14:28:24
本文介绍了SAS中的Jaro-Winkler字符串比较功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

在SAS中是否有 Jaro-Winkler 字符串比较的实现?

Is there an implementation of the Jaro-Winkler string comparison in SAS?

链接王看起来像Jaro-Winkler,但我想宁愿自己调用该函数的灵活性.

It looks like Link King has Jaro-Winkler, but I'd prefer the flexibility of calling the function myself.

谢谢!

推荐答案

我不知道关于jaro-winkler距离的内置函数. @Itzy已经引用了我所知道的唯一信息.即使您愿意,也可以使用proc fcmp滚动自己的函数.我什至会为您提供以下代码的入门.我只是尝试关注有关它的维基百科文章.无论如何,它肯定不是比尔·温克勒(Bill Winkler)的strcmp.c文件的完美代表,并且可能存在很多错误.

There is no built in function for jaro-winkler distance that I am aware of. @Itzy already reference the only ones that I know of. You can roll you own functions with proc fcmp though if you feel up to it. I'll even give you a head start with the code below. I just tried to follow the wikipedia article on it. It certainly isn't close to being a perfect representation of Bill Winkler's strcmp.c file by any means and likely has lots of bugs.

proc fcmp outlib=work.jaro.chars; subroutine jaromatch ( string1 $ , string2 $ , matchChars $); outargs matchChars; /* Returns number of matched characters between 2 strings excluding blanks*/ /* two chars from string1 and string2 are considered matching if they are no farther than floor(max(|s1|, |s2|)/2)-1 */ str1_len = length(strip(string1)); str2_len = length(strip(string2)); allowedDist = floor(max(str1_len, str2_len)/2) -1; matchChars=""; /* walk through string 1 and match characters to string2 */ do i= 1 to str1_len; x=substr(string1,i,1); position = findc(string2,x ,max(1,i-allowedDist)); if position > 0 then do; if position - i <= allowedDist then do; y=substr(string2,position,1); /* build list of matched characters */ matchChars=cats(matchChars,y); end; end; end; matchChars = strip(matchChars); endsub; function jarotrans (string1 $ , string2 $ ); ntrans = 0; ubnd = min(length(strip(string1)), length(strip(string2))); do i = 1 to ubnd; if substr(string1,i,1) ne substr(string2,i,1) then do; ntrans + 1; end; end; return(ntrans/2); endsub; function getPrefixlen( string1 $ , string2 $, maxprelen); /* get the length of the matching characters at the beginning */ n = min(maxprelen, length(string1), length(string2)); do i = 1 to n; if substr(string1,i,1) ne substr(string2,i,1) then return(max(1,i-1)); end; endsub; function jarodist(string1 $, string2 $); /* get number of matched characters */ call jaromatch(string1, string2, m1); m1_len = length(m1); if m1_len = 0 then return(0); call jaromatch(string2, string1, m2); m2_len = length(m2); if m2_len = 0 then return(0); /* get number of transposed characters */ ntrans = jarotrans(m1, m2); put m1_len= m2_len= ntrans= ; j_dist = (m1_len/length(string1) + m2_len/length(string2) + (m1_len-ntrans)/m1_len ) / 3; return(j_dist); endsub; function jarowink( string1 $, string2 $, prefixscale); jarodist=jarodist(string1, string2); prelen=getPrefixlen(string1, string2, 4); if prelen = 0 then return(jarodist); else return(jarodist + prelen * prefixscale * (1-jarodist)); endsub; run;quit; /* tell SAS where to find the functions we just wrote */ option cmplib=work.jaro; /* Now let's try it out! */ data _null_; string1='DIXON'; string2='DICKSONX'; x=jarodist(string1, string2); y=jarowink(string1, string2, 0.1); put x= y=; run;

更多推荐

SAS中的Jaro

本文发布于:2023-11-29 09:18:47,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1645981.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:SAS   Jaro

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!