使用 pandas 聚合所有数据框行对组合

编程入门行业动态更新时间:2024-10-08 19:41:29

本文介绍了使用 pandas 聚合所有数据框行对组合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我使用python pandas跨数据帧执行分组和聚合，但是我现在想对行进行特定的成对聚合(n选择2，统计组合).这是示例数据，在这里我想查看[mygenes]中的所有基因对:

I use python pandas to perform grouping and aggregation across data frames, but I would like to now perform specific pairwise aggregation of rows (n choose 2, statistical combination). Here is the example data, where I would like to look at all pairs of genes in [mygenes]:

import pandas import itertools mygenes=['ABC1', 'ABC2', 'ABC3', 'ABC4'] df = pandas.DataFrame({'Gene' : ['ABC1', 'ABC2', 'ABC3', 'ABC4','ABC5'], 'case1' : [0,1,1,0,0], 'case2' : [1,1,1,0,1], 'control1':[0,0,1,1,1], 'control2':[1,0,0,1,0] }) >>> df Gene case1 case2 control1 control2 0 ABC1 0 1 0 1 1 ABC2 1 1 0 0 2 ABC3 1 1 1 0 3 ABC4 0 0 1 1 4 ABC5 0 1 1 0

最终产品应如下所示(默认情况下，应用np.sum很好):

The final product should look like this (applying np.sum by default is fine):

case1 case2 control1 control2 'ABC1', 'ABC2' 1 2 0 1 'ABC1', 'ABC3' 1 2 1 1 'ABC1', 'ABC4' 0 1 1 2 'ABC2', 'ABC3' 2 2 1 0 'ABC2', 'ABC4' 1 1 1 1 'ABC3', 'ABC4' 1 1 2 1

可以使用itertools($ itertoolsbinations(mygenes, 2))轻松获得基因对的集合，但是我无法弄清楚如何根据值对特定行进行汇总.有人可以建议吗?谢谢

The set of gene pairs can be easily obtained with itertools ($itertoolsbinations(mygenes, 2)), but I can't figure out how to perform aggregation of specific rows based on their values. Can anyone advise? Thank you

推荐答案

我想不出一种聪明的矢量化方法来做到这一点，但是除非性能是真正的瓶颈，否则我倾向于使用最简单的有意义的方法.在这种情况下，我可能会set_index("Gene")然后使用loc选择行:

I can't think of a clever vectorized way to do this, but unless performance is a real bottleneck I tend to use the simplest thing which makes sense. In this case, I might set_index("Gene") and then use loc to pick out the rows:

>>> df = df.set_index("Gene") >>> cc = list(combinations(mygenes,2)) >>> out = pd.DataFrame([df.loc[c,:].sum() for c in cc], index=cc) >>> out case1 case2 control1 control2 (ABC1, ABC2) 1 2 0 1 (ABC1, ABC3) 1 2 1 1 (ABC1, ABC4) 0 1 1 2 (ABC2, ABC3) 2 2 1 0 (ABC2, ABC4) 1 1 1 1 (ABC3, ABC4) 1 1 2 1

更多推荐

使用 pandas 聚合所有数据框行对组合

本文发布于:2023-11-22 08:04:17，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1616607.html