Dataframe分类排序优化问题2

编程入门 行业动态 更新时间:2024-10-16 00:19:10
本文介绍了Dataframe分类排序优化问题2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我之前问过一个排序问题,有人解决了它首先使用 DataFrame.sort_values 两列然后添加 GroupBy.head.

I asked a sorting problem before, and someone solved it use DataFrame.sort_values by both columns first and then add GroupBy.head.

数据框分类排序优化问题

现在我遇到了一个更复杂的排序.我需要按 category 对数据框进行分类.每个category在class的data2的值最大时,根据data1的值进行过滤,然后排序

Now I encounter a more complicated sorting. I need to classify the dataframe by category. Each category is filtered according to the value of data1 when the value of data2 of the class is the largest, and then sorted

代码如下,如何优化?

import numpy as np import pandas as pd df = pd.DataFrame() n = 200 df['category'] = np.random.choice(('A', 'B'), n) df['data1'] = np.random.rand(len(df))*100 df['data2'] = np.random.rand(len(df))*100 a = df[df['category'] == 'A'] c = a[a['data2'] == a.data2.max()].data1.max() a = a[a['data1'] <= c] a = a.sort_values(by='data1', ascending=False).head(4) b = df[df['category'] == 'B'] c = b[b['data2'] == b.data2.max()].data1.max() b = b[b['data1'] <= c] b = b.sort_values(by='data1', ascending=False).head(4) df = pd.concat([a, b]).sort_values(by=['category', 'data1'], ascending=[True, False]).reset_index(drop=True) print(df) category data1 data2 0 A 28.194042 98.813271 1 A 26.635099 82.768130 2 A 24.345177 80.558532 3 A 24.222105 89.596726 4 B 60.883981 98.444699 5 B 49.934815 90.319787 6 B 10.751913 86.124271 7 B 4.029914 89.802120

我用groupby,感觉代码太复杂了,能不能优化一下?

I use groupby, I feel the code is too complicated, can it be optimized?

import numpy as np import pandas as pd df = pd.DataFrame() n = 200 df['category'] = np.random.choice(('A', 'B'), n) df['data1'] = np.random.rand(len(df))*100 df['data2'] = np.random.rand(len(df))*100 a = df[df['category'] == 'A'] c = a[a['data2'] == a.data2.max()].data1.max() a = a[a['data1'] <= c] a = a.sort_values(by='data1', ascending=False).head(4) b = df[df['category'] == 'B'] c = b[b['data2'] == b.data2.max()].data1.max() b = b[b['data1'] <= c] b = b.sort_values(by='data1', ascending=False).head(4) df2 = pd.concat([a, b]).sort_values(by=['category', 'data1'], ascending=[True, False]).reset_index(drop=True) df3 = df.groupby('category').apply(lambda x: x[x['data1'].isin(x[x['data1'] <= x[x['data2'] == x['data2'].max()].data1.max()]['data1'].nlargest(4))]).reset_index(drop=True) df3 = df3.sort_values(by=['category', 'data1'], ascending=[True, False]).reset_index(drop=True) print((df2.data1-df3.data1).max()) print((df2.data2-df3.data2).max()) 0.0 0.0

推荐答案

使用:

df = pd.DataFrame() n = 200 df['category'] = np.random.choice(('A', 'B'), n) df['data1'] = np.random.rand(len(df))*100 df['data2'] = np.random.rand(len(df))*100 a = df[df['category'] == 'A'] c = a[a['data2'] == a.data2.max()].data1.max() a = a[a['data1'] <= c] a = a.sort_values(by='data1', ascending=False).head(4) b = df[df['category'] == 'B'] c = b[b['data2'] == b.data2.max()].data1.max() b = b[b['data1'] <= c] b = b.sort_values(by='data1', ascending=False).head(4) df1 = pd.concat([a, b]).sort_values(by=['category', 'data1'], ascending=[True, False]).reset_index(drop=True) print(df1) category data1 data2 0 A 87.560430 99.262452 1 A 85.798945 99.200321 2 A 68.614311 97.796274 3 A 41.641961 95.544980 4 B 69.937691 99.711156 5 B 56.932784 99.227111 6 B 19.903620 94.389186 7 B 12.701288 98.455274

这里首先通过每组最大data2获取所有data1,通过<=过滤,最后使用groupby.head:

Here are first get all data1 by maximal data2 per groups, filtered by <= and last used groupby.head:

s = (df.sort_values('data2') .drop_duplicates('category', keep='last') .set_index('category')['data1']) df = df[df['data1'] <= df['category'].map(s)] df1 = (df.sort_values(by=['category', 'data1'], ascending=[True, False]) .groupby('category') .head(4) .reset_index(drop=True)) print (df1) category data1 data2 0 A 87.560430 99.262452 1 A 85.798945 99.200321 2 A 68.614311 97.796274 3 A 41.641961 95.544980 4 B 69.937691 99.711156 5 B 56.932784 99.227111 6 B 12.701288 98.455274 7 B 19.903620 94.389186

更多推荐

Dataframe分类排序优化问题2

本文发布于:2023-11-30 14:02:27,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1650255.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:Dataframe

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!