使用多索引每日数据计算数据框中数字的连续出现次数

编程入门 行业动态 更新时间:2024-10-25 09:25:29
本文介绍了使用多索引每日数据计算数据框中数字的连续出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

这里已经回答了这个问题的一个版本(但是,这使用了分钟频率的数据).

计算连续的次数具有多索引的数据帧中数字的出现

我有一个包含多索引(股票代码和日期)的数据框,其中包含一个包含 1 和 0 的虚拟列,我想为每只股票计算每一行中 1 或 0 出现的次数假人"列,每次从 1 开始,向上计数为 1,向下计数为 0 我在下面有一个示例,其中Counter"列代表我想要创建的内容:

df = pd.DataFrame( {'股票':['AAPL','AAPL','AAPL','AAPL','MSFT','MSFT','MSFT','MSFT'],'日期时间':['2015-01-02'、'2015-01-03'、'2015-01-04'、'2015-01-05'、'2015-01-02'、'2015-01-03', '2015-01-04', '2015-01-05'],'虚拟': [0, 0, 1, 1, 1,1, 0, 1],'计数器':[-1, -2, 1, 2, 1, 2, -1, 1]})df['datetime'] = pd.to_datetime(df['datetime'])df.set_index(['stock', 'datetime'], inplace =True)

解决方案

尝试类似:

将熊猫导入为 pddf = pd.DataFrame({'股票':['AAPL','AAPL','AAPL','AAPL','MSFT', 'MSFT', 'MSFT', 'MSFT'],'日期时间':['2015-01-02','2015-01-03','2015-01-04'、'2015-01-05'、'2015-01-02'、'2015-01-03'、'2015-01-04', '2015-01-05'],'虚拟': [0, 0, 1, 1, 1, 1, 0, 1]})df['datetime'] = pd.to_datetime(df['datetime'])df.set_index(['stock', 'datetime'], inplace=True)# 将每只股票中的连续 1 组和 0 组组合在一起df['group'] = df.groupby('stock')['Dummy'] \.transform(lambda g: g.ne(g.shift()).cumsum())# 在计数器 1 中设置值占位符 ->1, 0 ->-1df['Counter'] = df['Dummy'].apply(lambda x: 1 if x == 1 else -1)# 从每个股票和组中获取 cumsumdf['Counter'] = df.groupby(['stock', 'group'])['Counter'].cumsum().astype(int)# 删除组列df = df.drop(columns='group')# 用于显示打印(df.to_string())

输出:

虚拟计数器股票日期时间苹果 2015-01-02 0 -12015-01-03 0 -22015-01-04 1 12015-01-05 1 2微软 2015-01-02 1 12015-01-03 1 22015-01-04 0 -12015-01-05 1 1

A version of this problem was answered here (this uses data at the minute frequency however).

Counting the number of consecutive occurences of numbers in dataframe with multi index

I have a dataframe that has a multi index (stock ticker and date) with a dummy column that contains 1s and 0s and I would like to count for each stock, in each row how many times the 1s or 0s have occurred in the 'Dummy" column, starting at 1 every time, and counting up for 1s and counting down for 0s I have an example below where the column 'Counter' represents what I would like to create:

df = pd.DataFrame( { 'stock': ['AAPL', 'AAPL', 'AAPL','AAPL', 'MSFT', 'MSFT','MSFT', 'MSFT'], 'datetime': ['2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05'], 'Dummy': [0, 0, 1, 1, 1,1, 0, 1], 'Counter': [-1, -2, 1, 2, 1, 2, -1, 1]}) df['datetime'] = pd.to_datetime(df['datetime']) df.set_index(['stock', 'datetime'], inplace =True)

解决方案

Try something like:

import pandas as pd df = pd.DataFrame({ 'stock': ['AAPL', 'AAPL', 'AAPL', 'AAPL', 'MSFT', 'MSFT', 'MSFT', 'MSFT'], 'datetime': ['2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05'], 'Dummy': [0, 0, 1, 1, 1, 1, 0, 1]}) df['datetime'] = pd.to_datetime(df['datetime']) df.set_index(['stock', 'datetime'], inplace=True) # Group Consecutive 1 and 0 groups in each stock together df['group'] = df.groupby('stock')['Dummy'] \ .transform(lambda g: g.ne(g.shift()).cumsum()) # Set Value Placeholder in Counter 1 -> 1, 0 -> -1 df['Counter'] = df['Dummy'].apply(lambda x: 1 if x == 1 else -1) # Get cumsum from each stock and group df['Counter'] = df.groupby(['stock', 'group'])['Counter'].cumsum().astype(int) # Drop Group Column df = df.drop(columns='group') # For Display print(df.to_string())

Output:

Dummy Counter stock datetime AAPL 2015-01-02 0 -1 2015-01-03 0 -2 2015-01-04 1 1 2015-01-05 1 2 MSFT 2015-01-02 1 1 2015-01-03 1 2 2015-01-04 0 -1 2015-01-05 1 1

更多推荐

使用多索引每日数据计算数据框中数字的连续出现次数

本文发布于:2023-11-22 06:59:45,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1616415.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数据   框中   索引   次数   数字

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!