我想在我的 Pandas 数据框中添加一个新列 col,它将被计算为:
I want to add a new column col in my pandas data frame which will be calculated as:
select count(distinct ITEM) as col from base_data where STOCK > 0 group by DEPT, CLAS, DATE;我正在做的事情
assort_size = base_data[(base_data['STOCK'] > 0)]\ .groupby(['DEPT','CLAS','DATE'])['ITEM']\ .transform('nunique')基本上对于每个部门、班级、日期组合,我想获取库存中存在的项目数.因此,我想将此与父数据框合并的结果,但结果显示为 pandas.core.series.Series 所以我不能 append (axis=1)它返回(行数不同,例如 1.6 M 与 1.4 M).此外,我没有要加入的 DEPT、CLAS、DATE 列.我可以在这里做什么来获取按列分组的数据框?
Basically for each dept, class, date combination I want to get number of items which are present in stock. So I then want to result of this merge with parent data frame but result is coming out as pandas.core.series.Series so I can not append (axis=1) it back (row count differs e.g. 1.6 M Vs 1.4 M). Also I don't have DEPT, CLAS, DATE columns to join. What can I do here to get dataframe with group by columns?
有没有比创建一个新对象更好的方法来直接在父 Pandas 数据框 (base_data) 中创建新列,就像我创建 assort_size 一样?
Is there any better way to create new column directly in parent pandas dataframe (base_data)than creating a new object like I am creating assort_size?
推荐答案您可以使用 首先布尔索引,然后groupby和nunique 和最后一个 join:
You can use boolean indexing first, then groupby with nunique and last join:
base_data = pd.DataFrame({"DEPT": ["a", "a", "b", "b"], "CLAS":['d','d','d','d'], "STOCK": [-1, 1, 2,2], "DATE":pd.to_datetime(['2001-10-10','2001-10-10', '2001-10-10','2001-10-10']), "ITEM":[1,2,3,4]}) print (base_data) CLAS DATE DEPT ITEM STOCK 0 d 2001-10-10 a 1 -1 1 d 2001-10-10 a 2 1 2 d 2001-10-10 b 3 2 3 d 2001-10-10 b 4 2 assort_size = base_data[(base_data['STOCK'] > 0)]\ .groupby(['DEPT','CLAS','DATE'])['ITEM'].nunique().rename('n_item') print (assort_size) DEPT CLAS DATE a d 2001-10-10 1 b d 2001-10-10 2 Name: n_item, dtype: int64 print (base_data.join(assort_size, on=['DEPT','CLAS','DATE'])) CLAS DATE DEPT ITEM STOCK n_item 0 d 2001-10-10 a 1 -1 1 1 d 2001-10-10 a 2 1 1 2 d 2001-10-10 b 3 2 2 3 d 2001-10-10 b 4 2 2更多推荐
Pandas 按唯一计数分组为新列
发布评论