使用应用函数汇总具有不明确列的数据帧(Summarizing Dataframes with ambiguous columns with apply function)

编程入门 行业动态 更新时间:2024-10-22 18:33:50
使用应用函数汇总具有不明确列的数据帧(Summarizing Dataframes with ambiguous columns with apply function)

我有一个数据框和一个带字典的for循环来定义如何处理上一个问题中的特定列名: Pandas根据存在的列生成数据帧

import pandas as pd df=pd.DataFrame({'Players': [ 'Sam', 'Greg', 'Steve', 'Sam', 'Greg', 'Steve', 'Greg', 'Steve', 'Greg', 'Steve'], 'Wins': [10,5,5,20,30,20,6,9,3,10], 'Losses': [5,5,5,2,3,2,16,20,3,12], 'Type': ['A','B','B','B','A','B','B','A','A','B'], }) p=df.groupby('Players') sumdict = {'Total Games': (None, 'count'), 'Average Wins': ('Wins', 'mean'), 'Greatest Wins': ('Wins', 'max'), 'Unique games': ('Type', 'nunique'), 'Max Score': ('Score', 'max')} summary = [] for key, (column, op) in sumdict.items(): if column is None: res = p.agg(op).max(axis=1) elif column not in df: continue else: res = p[column].agg(lambda x: getattr(x, op)()) summary.append(pd.DataFrame({key: res})) summary = pd.concat(summary, axis=1)

该代码适用于几乎所有情况,除了apply计算列内特定情况的函数:

streak = pd.DataFrame({'Streak':p.Wins.apply(lambda x: (x > 5).sum())})

有没有办法将apply函数合并到字典sumdict ?

I have a dataframe and a for loop with dictionary to define how to handle specific column names from my previous question: Pandas Generating dataframe based on columns being present

import pandas as pd df=pd.DataFrame({'Players': [ 'Sam', 'Greg', 'Steve', 'Sam', 'Greg', 'Steve', 'Greg', 'Steve', 'Greg', 'Steve'], 'Wins': [10,5,5,20,30,20,6,9,3,10], 'Losses': [5,5,5,2,3,2,16,20,3,12], 'Type': ['A','B','B','B','A','B','B','A','A','B'], }) p=df.groupby('Players') sumdict = {'Total Games': (None, 'count'), 'Average Wins': ('Wins', 'mean'), 'Greatest Wins': ('Wins', 'max'), 'Unique games': ('Type', 'nunique'), 'Max Score': ('Score', 'max')} summary = [] for key, (column, op) in sumdict.items(): if column is None: res = p.agg(op).max(axis=1) elif column not in df: continue else: res = p[column].agg(lambda x: getattr(x, op)()) summary.append(pd.DataFrame({key: res})) summary = pd.concat(summary, axis=1)

The code works for almost all cases except for apply functions that count specific cases inside a column:

streak = pd.DataFrame({'Streak':p.Wins.apply(lambda x: (x > 5).sum())})

Is there a way to incorporate the apply function into the dictionary sumdict?

最满意答案

你有几个选择。

检查一个函数并使用它而不是getattr。 只需使用字符串,让函数通过......

IMO 2.有点清洁(尽管可能鲜为人知?)你可以将g.agg("max")作为g.max()的别名。

sumdict["Streak"] = "Wins", lambda x: (x > 5).sum()

并且您执行以下操作,注释行是唯一的更改:

summary = [] for key, (column, op) in sumdict.items(): if column is None: res = p.agg(op).max(axis=1) elif column not in df: continue else: res = p[column].agg(op) # just use the string (or it could be a func) summary.append(pd.DataFrame({key: res})) summary = pd.concat(summary, axis=1)

然后Streak工作得很完美:

In [23]: summary Out[23]: Greatest Wins Total Games Streak Average Wins Unique games Players Greg 30 4 2 11 2 Sam 20 2 2 15 2 Steve 20 4 3 11 2

You have a couple of options here.

check for a function and use that rather the getattr. just use the string and let the function fall through...

IMO 2. is a little cleaner (although perhaps lesser known?) that you can do g.agg("max") as an alias to g.max().

sumdict["Streak"] = "Wins", lambda x: (x > 5).sum()

and you do the following, the commented line is the only change:

summary = [] for key, (column, op) in sumdict.items(): if column is None: res = p.agg(op).max(axis=1) elif column not in df: continue else: res = p[column].agg(op) # just use the string (or it could be a func) summary.append(pd.DataFrame({key: res})) summary = pd.concat(summary, axis=1)

Then Streak works just perfect:

In [23]: summary Out[23]: Greatest Wins Total Games Streak Average Wins Unique games Players Greg 30 4 2 11 2 Sam 20 2 2 15 2 Steve 20 4 3 11 2

更多推荐

本文发布于:2023-07-04 11:04:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1023581.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:不明确   函数   数据   Summarizing   Dataframes

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!