使用多索引计算数据帧中数字的连续出现次数

编程入门行业动态更新时间:2024-10-25 03:28:43

本文介绍了使用多索引计算数据帧中数字的连续出现次数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个包含多索引(股票和日期时间)的数据框，其中包含一个包含 1 和 0 的虚拟列，我想计算每只股票和每一天，在每一行中 1 或 0 的次数发生在假人"中列，每次从 1 开始，向上计数为 1，向下计数为 0 我在下面有一个示例，其中Counter"列代表我想要创建的内容:

I have a dataframe that has a multi index (stock and datetime) with a dummy column that contains 1s and 0s and I would like to count for each stock and for each day, in each row how many times the 1s or 0s have occurred in the 'Dummy" column, starting at 1 every time, and counting up for 1s and counting down for 0s I have an example below where the column 'Counter' represents what I would like to create:

import pandas as pd df = pd.DataFrame( { 'stock': ['AAPL', 'AAPL', 'AAPL','AAPL', 'AAPL','AAPL', 'AAPL', 'MSFT', 'MSFT'], 'datetime': ['2015-01-02 20:57', '2015-01-02 20:58', '2015-01-02 20:59', '2015-01-02 21:00','2015-01-03 20:57', '2015-01-03 20:58', '2015-01-03 20:59','2015-01-02 20:57', '2015-01-02 20:58'], 'Dummy': [0, 0, 1, 1, 1,1, 0, 1, 1], 'Counter': [-1, -2, 1, 2, 1, 2, 1, 1,2]}) df['datetime'] = pd.to_datetime(df['datetime']) df.set_index(['stock', 'datetime'], inplace =True)

这里回答了这个问题的一个更简单的版本(但是忽略了股票代码和日期)

A simpler version of this problem was answered here (this ignores the tickers and dates however)

统计数字连续出现的次数数据框

推荐答案

只需稍微修改你之前的解决方案

Just slightly modify your previous solution

m = df.Dummy.diff().ne(0).cumsum() counters = df.groupby([df.index.get_level_values(0), df.index.get_level_values(1).date, m]).cumcount()+1 df['Counter'] = np.where(df['Dummy']==0, -1, 1) * counters Out[95]: Dummy Counter stock datetime AAPL 2015-01-02 20:57:00 0 -1 2015-01-02 20:58:00 0 -2 2015-01-02 20:59:00 1 1 2015-01-02 21:00:00 1 2 2015-01-03 20:57:00 1 1 2015-01-03 20:58:00 1 2 2015-01-03 20:59:00 0 -1 MSFT 2015-01-02 20:57:00 1 1 2015-01-02 20:58:00 1 2

更多推荐

使用多索引计算数据帧中数字的连续出现次数

本文发布于:2023-11-22 06:58:49，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1616412.html