Pandas 按唯一计数分组为新列

编程入门行业动态更新时间:2024-10-24 03:21:41

本文介绍了Pandas 按唯一计数分组为新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我想在我的 Pandas 数据框中添加一个新列 col，它将被计算为:

I want to add a new column col in my pandas data frame which will be calculated as:

select count(distinct ITEM) as col from base_data where STOCK > 0 group by DEPT, CLAS, DATE;

我正在做的事情

assort_size = base_data[(base_data['STOCK'] > 0)]\ .groupby(['DEPT','CLAS','DATE'])['ITEM']\ .transform('nunique')

基本上对于每个部门、班级、日期组合，我想获取库存中存在的项目数.因此，我想将此与父数据框合并的结果，但结果显示为 pandas.core.series.Series 所以我不能 append (axis=1)它返回(行数不同，例如 1.6 M 与 1.4 M).此外，我没有要加入的 DEPT、CLAS、DATE 列.我可以在这里做什么来获取按列分组的数据框?

Basically for each dept, class, date combination I want to get number of items which are present in stock. So I then want to result of this merge with parent data frame but result is coming out as pandas.core.series.Series so I can not append (axis=1) it back (row count differs e.g. 1.6 M Vs 1.4 M). Also I don't have DEPT, CLAS, DATE columns to join. What can I do here to get dataframe with group by columns?

有没有比创建一个新对象更好的方法来直接在父 Pandas 数据框 (base_data) 中创建新列，就像我创建 assort_size 一样?

Is there any better way to create new column directly in parent pandas dataframe (base_data)than creating a new object like I am creating assort_size?

推荐答案

您可以使用首先布尔索引，然后groupby和nunique 和最后一个 join:

You can use boolean indexing first, then groupby with nunique and last join:

base_data = pd.DataFrame({"DEPT": ["a", "a", "b", "b"], "CLAS":['d','d','d','d'], "STOCK": [-1, 1, 2,2], "DATE":pd.to_datetime(['2001-10-10','2001-10-10', '2001-10-10','2001-10-10']), "ITEM":[1,2,3,4]}) print (base_data) CLAS DATE DEPT ITEM STOCK 0 d 2001-10-10 a 1 -1 1 d 2001-10-10 a 2 1 2 d 2001-10-10 b 3 2 3 d 2001-10-10 b 4 2 assort_size = base_data[(base_data['STOCK'] > 0)]\ .groupby(['DEPT','CLAS','DATE'])['ITEM'].nunique().rename('n_item') print (assort_size) DEPT CLAS DATE a d 2001-10-10 1 b d 2001-10-10 2 Name: n_item, dtype: int64 print (base_data.join(assort_size, on=['DEPT','CLAS','DATE'])) CLAS DATE DEPT ITEM STOCK n_item 0 d 2001-10-10 a 1 -1 1 1 d 2001-10-10 a 2 1 1 2 d 2001-10-10 b 3 2 2 3 d 2001-10-10 b 4 2 2

更多推荐

Pandas 按唯一计数分组为新列

本文发布于:2023-10-30 19:08:55，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1543736.html