带有用户定义功能的Groupby pandas

编程入门行业动态更新时间:2024-10-26 04:29:58

本文介绍了带有用户定义功能的Groupby pandas 的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我知道将函数作为组键传递给每个索引值调用一次函数，并将返回值用作组名.我不知道如何在列值上调用该函数.

I understand that passing a function as a group key calls the function once per index value with the return values being used as the group names. What I can't figure out is how to call the function on column values.

所以我可以这样做:

people = pd.DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis']) def GroupFunc(x): if len(x) > 3: return 'Group1' else: return 'Group2' people.groupby(GroupFunc).sum()

这会将数据分为两组，一组的索引值的长度为3或更小，另一组的长度为3或更大.但是，如何传递列值之一?因此，例如，如果每个索引点的列d值都大于1.我知道我可以执行以下操作:

This splits the data into two groups, one of which has index values of length 3 or less, and the other with length three or more. But how can I pass one of the column values? So for example if column d value for each index point is greater than 1. I realise I could just do the following:

people.groupby(people.a > 1).sum()

但是我想知道如何在用户定义的函数中执行此操作，以备将来参考.

But I want to know how to do this in a user defined function for future reference.

类似:

def GroupColFunc(x): if x > 1: return 'Group1' else: return 'Group2'

但是我怎么称呼它呢? 我尝试过

But how do I call this? I tried

people.groupby(GroupColFunc(people.a))

和类似的变体，但这不起作用.

and similar variants but this does not work.

如何将列值传递给函数? 我如何传递多个列值分组是否以people.a> people.b为例?

How do I pass the column values to the function? How would I pass multiple column values e.g. to group on whether people.a > people.b for example?

推荐答案

要按> 1进行分组，可以定义以下函数:

To group by a > 1, you can define your function like:

>>> def GroupColFunc(df, ind, col): ... if df[col].loc[ind] > 1: ... return 'Group1' ... else: ... return 'Group2' ...

然后称呼它

>>> people.groupby(lambda x: GroupColFunc(people, x, 'a')).sum() a b c d e Group2 -2.384614 -0.762208 3.359299 -1.574938 -2.65963

或者您只能使用匿名功能来做到这一点:

Or you can do it only with anonymous function:

>>> people.groupby(lambda x: 'Group1' if people['b'].loc[x] > people['a'].loc[x] else 'Group2').sum() a b c d e Group1 -3.280319 -0.007196 1.525356 0.324154 -1.002439 Group2 0.895705 -0.755012 1.833943 -1.899092 -1.657191

如文档中所述，您也可以通过传递系列进行分组提供标签->组名映射:

As said in documentation, you can also group by passing Series providing a label -> group name mapping:

>>> mapping = np.where(people['b'] > people['a'], 'Group1', 'Group2') >>> mapping Joe Group2 Steve Group1 Wes Group2 Jim Group1 Travis Group1 dtype: string48 >>> people.groupby(mapping).sum() a b c d e Group1 -3.280319 -0.007196 1.525356 0.324154 -1.002439 Group2 0.895705 -0.755012 1.833943 -1.899092 -1.657191

更多推荐

带有用户定义功能的Groupby pandas

本文发布于:2023-10-21 17:34:40，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1514895.html