合并数据框与聚合

编程入门行业动态更新时间:2024-10-24 16:29:38

本文介绍了合并数据框与聚合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我想聚合一个数据框-获取每个组的第一行，并同时连接'upc'列中的值:

I want to aggregate a dataframe - to get the first row of every group and simultaneously to concatenate the values in column 'upc':

df = pd.DataFrame({ 'id1': [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6, 6, 6, 7, 7], 'id2': [11, 22, 11, 11, 22, 33, 33, 33, 33, 44, 44, 55, 66, 66, 22, 77, 77], 'value1': ["1first", "1second", "1third", "2first", "2second", "3first", "3second", "3third", "3fourth", "4first", "4second", "5first", "6first", "6second", "6third", "7first", "7second"], 'upc': [str(x) for x in range(100, 117)] }) firsts_df = df.groupby(['id1', 'id2']).first() concat_upcs_df = df[['id1', 'id2', 'upc']].groupby(['id1', 'id2']).apply(lambda x: '|'.join(x.upc)) firsts_df.merge(concat_upcs_df, how='inner',left_on=['id1', 'id2'], right_on=['id1', 'id2'])

这将导致此错误:

ValueError:无法将DataFrame与类型类为'pandas.core.series.Series'的实例合并

ValueError: can not merge DataFrame with instance of type class 'pandas.core.series.Series'

如何将聚合结果与数据框合并? 我可以用更少的成本得到相同的结果吗?

How can I merge an aggregation result with a dataframe? could I get same result with less costly operation?

推荐答案

我认为您需要as_index=False到first并为DataFrame s添加reset_index()到concat_upcs_df:

I think you need as_index=False to first and add reset_index() to concat_upcs_df for DataFrames:

firsts_df = df.groupby(['id1', 'id2'], as_index=False).first() concat_upcs_df = df[['id1', 'id2', 'upc']].groupby(['id1', 'id2']).apply(lambda x: '|'.join(x.upc)).reset_index(name='val') firsts_df.merge(concat_upcs_df, how='inner',left_on=['id1', 'id2'], right_on=['id1', 'id2']) print (df) id1 id2 upc value1 val 0 1 11 100 1first 100|102 1 1 22 101 1second 101 2 2 11 103 2first 103 3 2 22 104 2second 104 4 3 33 105 3first 105|106|107|108 5 4 44 109 4first 109|110 6 5 55 111 5first 111 7 6 22 114 6third 114 8 6 66 112 6first 112|113 9 7 77 115 7first 115|116

您还可以使用 drop_duplicates 代替first和apply而没有lambda，也 merge 与on一起使用，因为左连接列和右连接列相同:

You can also use drop_duplicates instead first and apply without lambda, also merge working with on, because left and right joined columns are same:

firsts_df = df.drop_duplicates(['id1', 'id2']) concat_upcs_df = df.groupby(['id1', 'id2'])['upc'].apply('|'.join).reset_index(name='val') df = firsts_df.merge(concat_upcs_df, on=['id1', 'id2'])

更多推荐

合并数据框与聚合

本文发布于:2023-10-17 10:28:20，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1500648.html