本文介绍了 pandas 按组汇总排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
按费用总和订购产品名称
按字典顺序对使用类型进行排序(另一种可行的选择:按降序对这些使用类型进行排序)
我已经看过这个问题,但是需要结果与我的略有不同.
I've already seen this question, but the desired outcome there is slightly different from mine.
想象一下这样分组的数据框:
Imagine a dataframe grouped thusly:
df.groupby(['product_name', 'usage_type']).total_cost.sum() product_name usage_type Lorem A 30.694665 B 0.000634 C 1.659360 D 0.000031 E 3339.140042 F 0.074340 Ipsum G 9.627360 A 19.053377 D 14.492155 Dolor B 9.698245 H 6993.792163 C 31947.955679 D 2150.400001 E 26.337789 Name: total_cost, dtype: float6我想要的输出是相同的结构,但是具有两个属性:
The output I want is the same structure, but with two properties:
首先出现价格最高的产品,但仍保留故障.
Such that the highest-cost products show up first, but still preserving the breakdown.
如果要简单得多,我可以按使用类型删除次级排序.
If it is significantly simpler, I'm okay with dropping the secondary sorting by usage type.
推荐答案从分组的DataFrame开始:
Starting with your grouped DataFrame:
import pandas as pd df2 = pd.read_table('data', sep='\s+').set_index(['product_name', 'usage_type']) # val # product_name usage_type # Lorem A 30.694665 # B 0.000634 # C 1.659360 # D 0.000031 # E 3339.140042 # F 0.074340 # Ipsum G 9.627360 # A 19.053377 # D 14.492155 # Dolor B 9.698245 # H 6993.792163 # C 31947.955679 # D 2150.400001 # E 26.337789您可以将键值存储在新列中:
You could store the key values in new columns:
df2['key1'] = df2.groupby(level='product_name')['val'].transform('sum') df2['key2'] = df2.index.get_level_values('usage_type'),然后按这些关键列进行排序:
and then sort by those key columns:
# >>> df2.sort(['key1', 'key2'], ascending=[False,True]) # val key1 key2 # product_name usage_type # Dolor B 9.698245 41128.183877 B # C 31947.955679 41128.183877 C # D 2150.400001 41128.183877 D # E 26.337789 41128.183877 E # H 6993.792163 41128.183877 H # Lorem A 30.694665 3371.569072 A # B 0.000634 3371.569072 B # C 1.659360 3371.569072 C # D 0.000031 3371.569072 D # E 3339.140042 3371.569072 E # F 0.074340 3371.569072 F # Ipsum A 19.053377 43.172892 A # D 14.492155 43.172892 D # G 9.627360 43.172892 G result = df2.sort(['key1', 'key2'], ascending=[False,True])['val'] print(result)收益
product_name usage_type Dolor B 9.698245 C 31947.955679 D 2150.400001 E 26.337789 H 6993.792163 Lorem A 30.694665 B 0.000634 C 1.659360 D 0.000031 E 3339.140042 F 0.074340 Ipsum A 19.053377 D 14.492155 G 9.627360更多推荐
pandas 按组汇总排序
发布评论