嘿,我有两个标签文件,例如: file1.txt
He llo I have two tab file such as : file1.txt
Clustername Seqname1 Seqname2 Cluster1 Seq1(+) SeqA Cluster1 Seq2(-) SeqA Cluster1 Seq3(+) SeqB Cluster1 Seq300(+) SeqB Cluster1 Seq90(+) SeqL Cluster1 Seq90(+) SeqO Cluster1 Seq2(-) SeqC Cluster2 Seq8(-) SeqY Cluster2 Seq8(-) SeqH Cluster2 Seq8(-) SeqP Cluster2 Seq79(-) SeqY Cluster3 Seq10(+) SeqK Cluster3 Seq10(+) SeqS Cluster3 Seq10(+) SeqT Cluster4 Seq300(+) SeqBfile2.txt
Clustername Names Cluster1 SeqA Cluster1 Seq1(+) Cluster1 SeqC Cluster1 Seq2(-) Cluster1 SeqO Cluster1 Seq3(+) Cluster1 Seq90(+) Cluster1 SeqB Cluster1 SeqG Cluster2 Seq8(-) Cluster2 SeqY Cluster2 SeqH Cluster3 Seq10(+) Cluster3 SeqK Cluster4 SeqB Cluster4 Seq300(+)正如您在file2.txt中看到的
SeqL 在Cluster1中不存在,那么我想删除该行: Cluster1 Seq90(+) SeqL来自 file1.txt
as you can see in file2.txt SeqL is not present in the Cluster1, then I want to remove the line : Cluster1 Seq90(+) SeqL from the file1.txt
Seq300(+)在Cluster1中也不存在,然后我删除了该行:
Seq300(+) is not present either in Cluster1, then I remove the line:
Cluster1 Seq300(+) SeqB来自 file1.txt
相同于:
Cluster2 Seq8(-) SeqP Cluster2 Seq79(-) SeqY在 file2.txt 中的CLuster2中也没有SeqP,在Cluster2中也没有Seq79(-),然后我删除了以下行:
there is no SeqP in CLuster2 nor Seq79(-) in Cluster2 in file2.txt, then I remove lines:
Cluster2 Seq8(-) SeqP Cluster2 Seq79(-) SeqY来自 file1.txt
相同于:
Cluster3 Seq10(+) SeqS Cluster3 Seq10(+) SeqT因为SeqS和SeqT不在 file2.txt 的Cluster2中,所以我从 file1.txt 中删除了以下两行: /p>
because SeqS and SeqT are not in Cluster2 in file2.txt, then I remove the two following lines from the file1.txt:
Cluster3 Seq10(+) SeqS Cluster3 Seq10(+) SeqT最后我应该得到一个ex file1.txt,例如:
at the end I should get an ex file1.txt such as:
Clustername Seqname1 Seqname2 Cluster1 Seq1(+) SeqA Cluster1 Seq2(-) SeqA Cluster1 Seq3(+) SeqB Cluster1 Seq90(+) SeqO Cluster1 Seq2(-) SeqC Cluster2 Seq8(-) SeqY Cluster2 Seq8(-) SeqH Cluster3 Seq10(+) SeqK Cluster4 Seq300(+) SeqB推荐答案
使用 DataFrame.merge + DataFrame.reindex 以获得原始列:
Use DataFrame.merge + DataFrame.reindex to get the original columns:
new_df=( df1.merge(df2,left_on=['Clustername','Seqname1'],right_on=['Clustername','Names']) .merge(df2,left_on=['Clustername','Seqname2'],right_on=['Clustername','Names']) .reindex(columns=df1.columns)) print(new_df)输出
Clustername Seqname1 Seqname2 0 Cluster1 Seq1(+) SeqA 1 Cluster1 Seq2(-) SeqA 2 Cluster1 Seq2(-) SeqC 3 Cluster1 Seq3(+) SeqB 4 Cluster1 Seq90(+) SeqO 5 Cluster2 Seq8(-) SeqY 6 Cluster2 Seq8(-) SeqH 7 Cluster3 Seq10(+) SeqK 8 Cluster4 Seq300(+) SeqB
n个seqnames列的解决方案:
df1['aux']=df1.groupby('Clustername').cumcount() new_df= ( df1.melt(['Clustername','aux'],var_name='Seq') .merge(df2,left_on=['Clustername','value'],right_on=['Clustername','Names']) .groupby(['Clustername','aux']) .filter(lambda x: x.value.size>=(len(df1.columns)-2)) .pivot_table(index=['Clustername','aux'],columns='Seq',values='value',aggfunc=''.join) .reset_index() .drop('aux',axis=1) .rename_axis(columns=None) ) print(new_df)输出
Clustername Seqname1 Seqname2 0 Cluster1 Seq1(+) SeqA 1 Cluster1 Seq2(-) SeqA 2 Cluster1 Seq3(+) SeqB 3 Cluster1 Seq90(+) SeqO 4 Cluster1 Seq2(-) SeqC 5 Cluster2 Seq8(-) SeqY 6 Cluster2 Seq8(-) SeqH 7 Cluster3 Seq10(+) SeqK 8 Cluster4 Seq300(+) SeqB更多推荐
根据python中另一个标签文件中的条件从标签文件中删除行
发布评论