pandas concat/合并并汇总一栏

编程入门行业动态更新时间:2024-10-28 05:13:08

本文介绍了 pandas concat/合并并汇总一栏的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有两个具有MultiIndex索引的pandas.DataFrame对象.一些索引值与两个数据帧共享，但不是全部.如果行(索引值)存在，我想合并这两个数据帧并取列之一的总和.否则，将行和列的值保持不变.

I have two pandas.DataFrame objects with MultiIndex indices. Some of the index values are shared with the two dataframes, but not all. I would like to merge these two data frames and take the sum of one of the columns if the row (index value) exists. Otherwise, keep the row and column value as it exists.

:这很接近，但不使用MultiIndex

我尝试创建一个示例:

def mklbl(prefix,n): try: return ["%s%s" % (prefix,i) for i in range(n)] except: return ["%s%s" % (prefix,i) for i in n] mi1 = pd.MultiIndex.from_product([mklbl('A',4), mklbl('C',2)]) mi2 = pd.MultiIndex.from_product([mklbl('A',[2,3,4]), mklbl('C',2)]) df2 = pd.DataFrame({'b':np.arange(len(mi2)), 'c':np.arange(len(mi2))[::-1]}, index=mi2).sort_index().sort_index(axis=1) df1 = pd.DataFrame({'a':np.arange(len(mi1)), 'b':np.arange(len(mi1))[::-1]}, index=mi1).sort_index().sort_index(axis=1)

各个DataFrame对象看起来像:

In [117]: df1 Out[117]: a b A0 C0 0 7 C1 1 6 A1 C0 2 5 C1 3 4 A2 C0 4 3 C1 5 2 A3 C0 6 1 C1 7 0

和

In [118]: df2 Out[118]: b c A2 C0 0 5 C1 1 4 A3 C0 2 3 C1 3 2 A4 C0 4 1 C1 5 0

我想做的是将这两个合并，并求和'b'列，但是保留所有行，无论它们是否存在于一个或另一个数据框中:

What I want to do is merge these two, and sum the 'b' column, but keep all rows whether they exist in one or the other dataframe:

In [117]: df_merged_bsummed Out[117]: a b c A0 C0 0 7 NaN C1 1 6 NaN A1 C0 2 5 NaN C1 3 4 NaN A2 C0 4 3 5 C1 5 3 4 A3 C0 6 3 3 C1 7 3 2 A4 C0 NaN 4 1 C1 NaN 5 0

推荐答案

在这种特殊情况下，我认为您可以添加它们并使用fill_value=0，具体取决于默认的对齐方式:

In this particular case, I think you could just add them and use fill_value=0, relying on the default alignment behaviour:

>>> df1.add(df2,fill_value=0) a b c A0 C0 0 7 NaN C1 1 6 NaN A1 C0 2 5 NaN C1 3 4 NaN A2 C0 4 3 5 C1 5 3 4 A3 C0 6 3 3 C1 7 3 2 A4 C0 NaN 4 1 C1 NaN 5 0

只有一个共同的列，只有一个总和，但是如果您想使它明确，您可以做类似的事情

There being only one column in common, only one is summed, but if you wanted to make that explicit you could instead do something like

>>> m = pd.concat([df1, df2],axis=1) >>> m["b"] = m.pop("b").sum(axis=1) >>> m a c b A0 C0 0 NaN 7 C1 1 NaN 6 A1 C0 2 NaN 5 C1 3 NaN 4 A2 C0 4 5 3 C1 5 4 3 A3 C0 6 3 3 C1 7 2 3 A4 C0 NaN 1 4 C1 NaN 0 5

更多推荐

pandas concat/合并并汇总一栏

本文发布于:2023-10-14 19:03:03，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1491929.html