模糊匹配两个字符串

编程入门 行业动态 更新时间:2024-10-17 09:43:02
本文介绍了模糊匹配两个字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有两个向量,每个向量都包含一系列字符串.例如

I have two vectors, each of which includes a series of strings. For example,

V1=c("pen", "document folder", "warn") V2=c("pens", "copy folder", "warning")

我需要找到最匹配的两个.我直接使用levenshtein距离.但这还不够好.就我而言,一支笔和一支笔应具有相同的含义.文档文件夹和复制文件夹可能是一回事.警告和警告实际上是相同的.我正在尝试使用tm之类的软件包.但是我不确定哪个函数适合执行此操作.谁能告诉我这件事吗?

I need to find which two are matched the best. I directly use levenshtein distance. But it is not good enough. In my case, pen and pens should mean the same. document folder and copy folder are probably the same thing. warn and warning are actually the same. I am trying to use the packages like tm. But I am not very sure which functions are suitable for doing this. Can anyone tell me about this?

推荐答案

在我的经验中,余弦匹配对于此类工作是很好的选择:

In my experience the cosine match is a good one for such kind of a jobs:

V1 <- c("pen", "document folder", "warn") V2 <- c("copy folder", "warning", "pens") result <- sapply(V1, function(x) stringdist(x, V2, method = 'cosine', q = 1)) rownames(result) <- V2 result pen document folder warn copy folder 0.6797437 0.2132042 0.8613250 warning 0.6150998 0.7817821 0.1666667 pens 0.1339746 0.6726732 0.7500000

当距离足够近时,您必须定义一个截止点,距离越低,匹配度越好.您还可以使用Q参数,该参数说明应将多少个字母组合进行比较.例如:

You have to define a cut off when the distance is close enough, how lower the distance how better they match. You can also play with the Q parameter which says how many letters combinations should be compared to each other. For example:

result <- sapply(V1, function(x) stringdist(x, V2, method = 'cosine', q = 3)) rownames(result) <- V2 result pen document folder warn copy folder 1.0000000 0.5377498 1.0000000 warning 1.0000000 1.0000000 0.3675445 pens 0.2928932 1.0000000 1.0000000

更多推荐

模糊匹配两个字符串

本文发布于:2023-10-23 05:31:54,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1519936.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:字符串   模糊   两个

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!