问题描述
限时送ChatGPT账号..我正在尝试使用 grep 在 R 中执行字符串匹配.我必须匹配 df1$ColA 到 df2$ColA我给出了以下输入和输出:
I am trying to perform string matching in R using grep. I have to match df1$ColA to df2$ColA I have given below inputs and outputs:
ColA
text1
text2
text3
text4
text5
text6
text7
df2:
ColA
text1 text2 text12
text23 text22 text7
中间输出:
ColA ColB
text1 text2 text12 text1, text2
text23 text22 text7 text7
最终输出:
ColA ColB
text1 text2 text12 text1
text1 text2 text12 text2
text23 text22 text7 text7
方法:
我目前正在使用
test$test <- sapply(df2$ColA, function(x) ifelse(grep(paste(as.character(unlist(df1$ColA)),collapse="|"),x),1,0))
它会告诉我 df1$ColA 字符串是否与 df2$ColA 匹配但不会返回匹配的字符串.请指教.
It will give me if df1$ColA string is matching with df2$ColA but won't return matching strings. Please advice.
推荐答案
这是一个基于 match()
的半矢量化解决方案,它应该很快,并且能准确地生成您正在寻找的内容.匹配 df1$ColA
中项目的方法是对 df2$ColA
进行标记并将 df1$ColA
与每个标记匹配.然后它构建整个(原始)df2$ColA
元素的重复,并在输出中添加 df1$ColA
匹配作为 ColB
.
Here's a semi-vectorised solution based on match()
that should be fast and produce exactly what you are looking for. The way to match the items in df1$ColA
is to tokenise the df2$ColA
and match df1$ColA
to each of the tokens. It then builds up a repetition of the entire (original) df2$ColA
element, and adds the df1$ColA
match as ColB
in the output.
# set up the data, which the OP should have done
df1 <- data.frame(ColA = paste0("text", 1:7),
stringsAsFactors = FALSE)
df2 <- data.frame(ColA = c("text1 text2 text12",
"text23 text22 text7"),
stringsAsFactors = FALSE)
# create a matrix of matches of first to elements of second
matmatrix <- sapply(strsplit(df2$ColA, " "), match, df1$ColA)
# repeat original text in same length as potential match
origdfColArep <- rep(df2$ColA, each = nrow(matmatrix))
# create the results dataset, first the matches of the second part
result <- data.frame(ColA = origdfColArep[!is.na(as.vector(matmatrix))],
stringsAsFactors = FALSE)
# then add the matching first part
result$ColB <- df1$ColA[na.omit(as.vector(matmatrix))]
result
## ColA ColB
## 1 text1 text2 text12 text1
## 2 text1 text2 text12 text2
## 3 text23 text22 text7 text7
这篇关于从 R 中的列表匹配字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论