R在一列中查找重复项并在第二列中折叠

编程入门行业动态更新时间:2024-10-12 03:24:03

本文介绍了R在一列中查找重复项并在第二列中折叠的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个包含两列联系字符串的数据框.在一列(名为 probes)中，我有重复的案例(即具有相同字符串的多个案例).对于探测中的每个案例，我想找到包含相同字符串的所有案例，然后将第二列(名为 genes)中所有相应案例的值合并为一个案例.例如，如果我有这样的结构:

I have a data frame with two columns contacting character strings. in one column (named probes) I have duplicated cases (that is, several cases with the same character string). for each case in probes I want to find all the cases containing the same string, and then merge the values of all the corresponding cases in the second column (named genes) into a single case. for example, if I have this structure:

probes genes 1 cg00050873 TSPY4 2 cg00061679 DAZ1 3 cg00061679 DAZ4 4 cg00061679 DAZ4

我想改成这样的结构:

probes genes 1 cg00050873 TSPY4 2 cg00061679 DAZ1 DAZ4 DAZ4

显然使用 which 对单个探针执行此操作没有问题，然后粘贴和折叠

obviously there is no problem doing this for a single probe using which, and then paste and collapse

ind<-which(olap$probes=="cg00061679") genename<-(olap[ind,2]) genecomb<-paste(genename[1:length(genename)], collapse=" ")

但我不确定如何在整个数据帧中提取探针列中重复项的索引.有什么想法吗?

but I'm not sure how to extract the indices of the duplicates in probes column across the whole data frame. any ideas?

提前致谢

推荐答案

可以在base R中使用tapply

You can use tapply in base R

data.frame(probes=unique(olap$probes), genes=tapply(olap$genes, olap$probes, paste, collapse=" "))

或使用 plyr:

library(plyr) ddply(olap, "probes", summarize, genes = paste(genes, collapse=" "))

更新

在第一个版本中这样做可能更安全:

It's probably safer in the first version to do this:

tmp <- tapply(olap$genes, olap$probes, paste, collapse=" ") data.frame(probes=names(tmp), genes=tmp)

以防万一 unique 以与 tapply 不同的顺序提供探针.就我个人而言，我总是使用 ddply.

Just in case unique gives the probes in a different order to tapply. Personally I would always use ddply.

更多推荐

R在一列中查找重复项并在第二列中折叠

本文发布于:2023-10-31 09:20:16，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1545709.html

并在

上一篇：仅当第一列中的值相同时，才根据第二列对numpy数组进行排序
下一篇： R在一列中查找重复项，并在第二列中折叠

发布评论取消回复

评论列表（有 0 条评论）

R在一列中查找重复项并在第二列中折叠

发布评论取消回复

最近发表

热门文章

标签列表