R在一列中查找重复项，并在第二列中折叠

编程入门行业动态更新时间:2024-10-12 05:53:39

本文介绍了R在一列中查找重复项，并在第二列中折叠的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述我有一个数据框架，两列联系人字符串。在一列中（命名为 probes ）我有重复的情况（也就是说，几个情况下使用相同的字符串）。对于探针中的每种情况，我想查找包含相同字符串的所有案例，然后将第二列（名为基因）中的所有相应案例的值合并到单个案例例如，如果我有这样的结构：

探针基因 1 cg00050873 TSPY4 2 cg00061679 DAZ1 3 cg00061679 DAZ4 4 cg00061679 DAZ4

我要更改这个结构：

探针基因 1 cg00050873 TSPY4 2 cg00061679 DAZ1 DAZ4 DAZ4

显然没有问题，这样做一个单一的探针使用哪个，然后粘贴和折叠

ind< - 其中（olap $ probes ==cg00061679） genename< ;-( olap [ind，2]） genecomb< -paste（genename [1：length（genename）]，collapse =）

但我不知道如何在整个数据帧中提取probe列中的重复索引。任何想法？

提前感谢

解决方案

code>在基础R中单击

data.frame（probes = unique（olap $探针），基因=自由（olap $ genes，olap $ probes，paste，collapse =））

或使用plyr：

library（plyr） ddply（olap，probes总结基因= paste（基因，collapse =））

更新

在第一个版本中可能更安全：

只要以独一无二的方式将探测器以不同的顺序发送到 tapply 。我个人总是使用 ddply 。 
 I have a data frame with two columns contacting character strings. in one column (named probes)  I have duplicated cases (that is, several cases with the same character string).  for each case in probes I want to find all the cases containing the same string, and then merge the values of all the corresponding cases in the second column (named genes) into a single case.
for example, if I have this structure:    probes  genes
1   cg00050873  TSPY4
2   cg00061679  DAZ1
3   cg00061679  DAZ4
4   cg00061679  DAZ4
I want to change it to this structure:    probes  genes
1   cg00050873  TSPY4
2   cg00061679  DAZ1 DAZ4 DAZ4
obviously there is no problem doing this for a single probe using which, and then paste and collapseind<-which(olap$probes=="cg00061679")
genename<-(olap[ind,2])
genecomb<-paste(genename[1:length(genename)], collapse=" ")
but I'm not sure how to extract the indices of the duplicates in probes column across the whole data frame. any ideas?

Thanks in advance
 解决方案 You can use tapply in base Rdata.frame(probes=unique(olap$probes), 
           genes=tapply(olap$genes, olap$probes, paste, collapse=" "))
or use plyr:library(plyr)
ddply(olap, "probes", summarize, genes = paste(genes, collapse=" "))
UPDATE

It's probably safer in the first version to do this:tmp <- tapply(olap$genes, olap$probes, paste, collapse=" ")
data.frame(probes=names(tmp), genes=tmp)
Just in case unique gives the probes in a different order to tapply. Personally I would always use ddply.



 更多推荐
R在一列中查找重复项,并在第二列中折叠



 

本文发布于:2023-10-31 09:20:35，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1545710.html


版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

      本文标签：并在
      
        上一篇： R在一列中查找重复项并在第二列中折叠 
        下一篇： Java:如何使用HashMaps根据第二列中的条件求和一列的所有值


    
     
    
    
     
      
      
      
        发布评论取消回复
        
          	
            
            
            
            
          
            
            
          
          
          
            
              
            
          
          
            
            
              
                
                 
              
            
          
        
      
       
      
      
        
           评论列表 （有 0 条评论）


  
  	
      最近发表
      
        荆门网站建设的重要性
        win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法
        您可以尝试添加 --skip-broken 选项来解决该问题  您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案
        关于无线网络波动大的解决办法
        Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法
        VS 2019 点击页面自动定位到解决方案资源管理器目录位置
        （亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法
        Typora官网下载的最新版本mac10.13以下版本用不了的解决办法
        成功解决ModuleNotFoundError: No module named ‘torch._C‘
        MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题
      
    
      
        
        
          >www.elefans.com
          编程频道|电子爱好者 - 技术资讯及电子产品介绍！
          

          
        
      
      
      
      
      
      
      
      
      
      
            
      
      
        热门文章
        
          
             从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止  
             币安API错误代码1102，未发送强制参数“时间戳”  
             如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题  
             在 Node.js 中从网络流创建 blob  
             使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？  
             使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流  
             如何从nodejs连接laravel>laravel  
             使用nodejs观看目录  
             如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？  
             FirebaseError：无法从.env加载环境变量  
          
        
      
      
      
      
      
      
            
      
      
          标签列表
          
            文件
            如何在
            Python
            系统
            java
            方法
            数据
            错误
            windows
            函数
            android
            linux
            教程
            如何使用
            代码
            字符串
            计算机
            电脑
            服务器
            NET
            应用程序
            数组
            PHP
            MySQL
            SQL
            对象
            项目
            程序
            数据库
            word