pandas 在聚合列上合并

编程入门行业动态更新时间:2024-10-25 08:15:56

本文介绍了 pandas 在聚合列上合并的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

假设我创建一个DataFrame:

Let's say I create a DataFrame:

import pandas as pd df = pd.DataFrame({"a": [1,2,3,13,15], "b": [4,5,6,6,6], "c": ["wish", "you","were", "here", "here"]})

像这样:

a b c 0 1 4 wish 1 2 5 you 2 3 6 were 3 13 6 here 4 15 6 here

...然后按几列进行分组和汇总...

... and then group and aggregate by a couple columns ...

gb = df.groupby(['b','c']).agg({"a": lambda x: x.nunique()})

产生以下结果:

a b c 4 wish 1 5 you 1 6 here 2 were 1

是否可以将df与新聚合的表gb合并，以便在df中创建一个新列，其中包含来自gb的相应值?像这样:

Is it possible to merge df with the newly aggregated table gb such that I create a new column in df, containing the corresponding values from gb? Like this:

a b c nc 0 1 4 wish 1 1 2 5 you 1 2 3 6 were 1 3 13 6 here 2 4 15 6 here 2

我尝试做最简单的事情:

I tried doing the simplest thing:

df.merge(gb, on=['b','c'])

但这会导致错误:

KeyError: 'b'

之所以有意义，是因为分组表具有多索引并且b不是列.所以我的问题有两个:

Which makes sense because the grouped table has a Multi-index and b is not a column. So my question is two-fold:

是否可以将gb DataFrame的多索引转换回列(以使其具有b和c列)?

我可以在列名称上将df与gb合并吗?

Can I transform the multi-index of the gb DataFrame back into columns (so that it has the b and c column)?

Can I merge df with gb on the column names?

推荐答案

每当您要将groupby操作中的某些聚合列添加回df时，都应使用 transform ，这将产生一个序列，其索引与您的原始df对齐:

Whenever you want to add some aggregated column from groupby operation back to the df you should be using transform, this produces a Series with its index aligned with your orig df:

In [4]: df['nc'] = df.groupby(['b','c'])['a'].transform(pd.Series.nunique) df Out[4]: a b c nc 0 1 4 wish 1 1 2 5 you 1 2 3 6 were 1 3 13 6 here 2 4 15 6 here 2

无需重置索引或执行其他合并.

There is no need to reset the index or perform an additional merge.

更多推荐

pandas 在聚合列上合并

本文发布于:2023-10-17 10:25:18，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1500640.html