我有一个数据框和一个带字典的for循环来定义如何处理上一个问题中的特定列名: Pandas根据存在的列生成数据帧
import pandas as pd df=pd.DataFrame({'Players': [ 'Sam', 'Greg', 'Steve', 'Sam', 'Greg', 'Steve', 'Greg', 'Steve', 'Greg', 'Steve'], 'Wins': [10,5,5,20,30,20,6,9,3,10], 'Losses': [5,5,5,2,3,2,16,20,3,12], 'Type': ['A','B','B','B','A','B','B','A','A','B'], }) p=df.groupby('Players') sumdict = {'Total Games': (None, 'count'), 'Average Wins': ('Wins', 'mean'), 'Greatest Wins': ('Wins', 'max'), 'Unique games': ('Type', 'nunique'), 'Max Score': ('Score', 'max')} summary = [] for key, (column, op) in sumdict.items(): if column is None: res = p.agg(op).max(axis=1) elif column not in df: continue else: res = p[column].agg(lambda x: getattr(x, op)()) summary.append(pd.DataFrame({key: res})) summary = pd.concat(summary, axis=1)该代码适用于几乎所有情况,除了apply计算列内特定情况的函数:
streak = pd.DataFrame({'Streak':p.Wins.apply(lambda x: (x > 5).sum())})有没有办法将apply函数合并到字典sumdict ?
I have a dataframe and a for loop with dictionary to define how to handle specific column names from my previous question: Pandas Generating dataframe based on columns being present
import pandas as pd df=pd.DataFrame({'Players': [ 'Sam', 'Greg', 'Steve', 'Sam', 'Greg', 'Steve', 'Greg', 'Steve', 'Greg', 'Steve'], 'Wins': [10,5,5,20,30,20,6,9,3,10], 'Losses': [5,5,5,2,3,2,16,20,3,12], 'Type': ['A','B','B','B','A','B','B','A','A','B'], }) p=df.groupby('Players') sumdict = {'Total Games': (None, 'count'), 'Average Wins': ('Wins', 'mean'), 'Greatest Wins': ('Wins', 'max'), 'Unique games': ('Type', 'nunique'), 'Max Score': ('Score', 'max')} summary = [] for key, (column, op) in sumdict.items(): if column is None: res = p.agg(op).max(axis=1) elif column not in df: continue else: res = p[column].agg(lambda x: getattr(x, op)()) summary.append(pd.DataFrame({key: res})) summary = pd.concat(summary, axis=1)The code works for almost all cases except for apply functions that count specific cases inside a column:
streak = pd.DataFrame({'Streak':p.Wins.apply(lambda x: (x > 5).sum())})Is there a way to incorporate the apply function into the dictionary sumdict?
最满意答案
你有几个选择。
检查一个函数并使用它而不是getattr。 只需使用字符串,让函数通过......IMO 2.有点清洁(尽管可能鲜为人知?)你可以将g.agg("max")作为g.max()的别名。
sumdict["Streak"] = "Wins", lambda x: (x > 5).sum()并且您执行以下操作,注释行是唯一的更改:
summary = [] for key, (column, op) in sumdict.items(): if column is None: res = p.agg(op).max(axis=1) elif column not in df: continue else: res = p[column].agg(op) # just use the string (or it could be a func) summary.append(pd.DataFrame({key: res})) summary = pd.concat(summary, axis=1)然后Streak工作得很完美:
In [23]: summary Out[23]: Greatest Wins Total Games Streak Average Wins Unique games Players Greg 30 4 2 11 2 Sam 20 2 2 15 2 Steve 20 4 3 11 2You have a couple of options here.
check for a function and use that rather the getattr. just use the string and let the function fall through...IMO 2. is a little cleaner (although perhaps lesser known?) that you can do g.agg("max") as an alias to g.max().
sumdict["Streak"] = "Wins", lambda x: (x > 5).sum()and you do the following, the commented line is the only change:
summary = [] for key, (column, op) in sumdict.items(): if column is None: res = p.agg(op).max(axis=1) elif column not in df: continue else: res = p[column].agg(op) # just use the string (or it could be a func) summary.append(pd.DataFrame({key: res})) summary = pd.concat(summary, axis=1)Then Streak works just perfect:
In [23]: summary Out[23]: Greatest Wins Total Games Streak Average Wins Unique games Players Greg 30 4 2 11 2 Sam 20 2 2 15 2 Steve 20 4 3 11 2更多推荐
发布评论