如何分组多个列并聚合不同列上的差异?

编程入门 行业动态 更新时间:2024-10-25 10:21:30
本文介绍了如何分组多个列并聚合不同列上的差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在这里寻求有关如何在 Python/Panda 中执行此操作的帮助:

I am looking for help here on how to do this in Python / Panda:

我正在寻找原始数据(如下),并通过具有多个 cols(州、县和日期)的组找到多个 cols(cnt_a 和 cnt_b)的每日差异.

I am looking to take the original data (below) and find the daily difference of multiple cols (cnt_a and cnt_b) by a group with multiple cols (state, county and date).

我一直在尝试不同的方法,但似乎无法通过检查重复项"来解决问题.问题

I've been trying it different ways, and I can't seem to get by the "check for duplicate" issue

dft_a = df.sort_values(['state','county','date']).groupby['state','county','date','cnt_a'].diff(-1)

尝试将其拆分以一次解决一件事:

Tried splitting it out to fix one thing at a time:

df1 = df.sort_values(['state','county','date']) df2 = df1.groupby(['state','county'])['cnt_a'].diff()

原始数据.=>df

date county state cnt_a cnt_b 2020-06-13 Bergen New Jersey 308 11 2020-06-14 Bergen New Jersey 308 11 2020-06-15 Bergen New Jersey 320 15 2020-06-12 Union New Jersey 100 3 2020-06-13 Union New Jersey 130 4 2020-06-14 Union New Jersey 150 5 2020-06-12 Bronx New York 200 100 2020-06-13 Bronx New York 210 200

想要的输出

date county state cnt_a cnt_b daydiff_a daydiff_b 2020-06-13 Bergen New Jersey 308 11 0 0 2020-06-14 Bergen New Jersey 308 11 0 0 2020-06-15 Bergen New Jersey 320 15 12 4 2020-06-12 Union New Jersey 100 3 0 0 2020-06-13 Union New Jersey 130 4 30 1 2020-06-14 Union New Jersey 150 5 20 1 2020-06-12 Bronx New York 200 100 0 0 2020-06-13 Bronx New York 210 200 10 100

推荐答案

  • 对df 进行排序很重要,因为df.groupby 将被排序.如果 df 没有先排序,.groupby 中的连接列将与 df 的顺序不同.
    • 一定要df,按'state'、'country'和'date'的顺序code>,然而,.groupby 中的 'date' 列被忽略.
      • It's important to sort df, because df.groupby will be sorted. If df isn't sorted first, the joined columns from .groupby will not be in the same order as df.
        • Be certain to df, in order, by 'state', 'country', and 'date', however, the 'date' column is ignored in .groupby.
          • 指定rsuffix,或使用.rename 更改列标题.
          • Specify rsuffix, and or use .rename to change the column headers.
          import pandas as pd # setup the test dataframe data = {'date': ['2020-06-13', '2020-06-14', '2020-06-15', '2020-06-12', '2020-06-13', '2020-06-14', '2020-06-12', '2020-06-13'], 'county': ['Bergen', 'Bergen', 'Bergen', 'Union', 'Union', 'Union', 'Bronx', 'Bronx'], 'state': ['New Jersey', 'New Jersey', 'New Jersey', 'New Jersey', 'New Jersey', 'New Jersey', 'New York', 'New York'], 'cnt_a': [308, 308, 320, 100, 130, 150, 200, 210], 'cnt_b': [11, 11, 15, 3, 4, 5, 100, 200]} df = pd.DataFrame(data) # set the date column to a datetime format df.date = pd.to_datetime(df.date) # sort the values df = df.sort_values(['state', 'county', 'date']) # groupby and join back to dataframe df df = df.join(df.groupby(['state', 'county'])[['cnt_a', 'cnt_b']].diff().fillna(0), rsuffix='_diff') # display(df) date county state cnt_a cnt_b cnt_a_diff cnt_b_diff 0 2020-06-13 Bergen New Jersey 308 11 0.0 0.0 1 2020-06-14 Bergen New Jersey 308 11 0.0 0.0 2 2020-06-15 Bergen New Jersey 320 15 12.0 4.0 3 2020-06-12 Union New Jersey 100 3 0.0 0.0 4 2020-06-13 Union New Jersey 130 4 30.0 1.0 5 2020-06-14 Union New Jersey 150 5 20.0 1.0 6 2020-06-12 Bronx New York 200 100 0.0 0.0 7 2020-06-13 Bronx New York 210 200 10.0 100.0

更多推荐

如何分组多个列并聚合不同列上的差异?

本文发布于:2023-10-14 11:53:06,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1490982.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多个   差异

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!