这里已经回答了这个问题的一个版本(但是,这使用了分钟频率的数据).
计算连续的次数具有多索引的数据帧中数字的出现
我有一个包含多索引(股票代码和日期)的数据框,其中包含一个包含 1 和 0 的虚拟列,我想为每只股票计算每一行中 1 或 0 出现的次数假人"列,每次从 1 开始,向上计数为 1,向下计数为 0 我在下面有一个示例,其中Counter"列代表我想要创建的内容:
df = pd.DataFrame( {'股票':['AAPL','AAPL','AAPL','AAPL','MSFT','MSFT','MSFT','MSFT'],'日期时间':['2015-01-02'、'2015-01-03'、'2015-01-04'、'2015-01-05'、'2015-01-02'、'2015-01-03', '2015-01-04', '2015-01-05'],'虚拟': [0, 0, 1, 1, 1,1, 0, 1],'计数器':[-1, -2, 1, 2, 1, 2, -1, 1]})df['datetime'] = pd.to_datetime(df['datetime'])df.set_index(['stock', 'datetime'], inplace =True) 解决方案尝试类似:
将熊猫导入为 pddf = pd.DataFrame({'股票':['AAPL','AAPL','AAPL','AAPL','MSFT', 'MSFT', 'MSFT', 'MSFT'],'日期时间':['2015-01-02','2015-01-03','2015-01-04'、'2015-01-05'、'2015-01-02'、'2015-01-03'、'2015-01-04', '2015-01-05'],'虚拟': [0, 0, 1, 1, 1, 1, 0, 1]})df['datetime'] = pd.to_datetime(df['datetime'])df.set_index(['stock', 'datetime'], inplace=True)# 将每只股票中的连续 1 组和 0 组组合在一起df['group'] = df.groupby('stock')['Dummy'] \.transform(lambda g: g.ne(g.shift()).cumsum())# 在计数器 1 中设置值占位符 ->1, 0 ->-1df['Counter'] = df['Dummy'].apply(lambda x: 1 if x == 1 else -1)# 从每个股票和组中获取 cumsumdf['Counter'] = df.groupby(['stock', 'group'])['Counter'].cumsum().astype(int)# 删除组列df = df.drop(columns='group')# 用于显示打印(df.to_string())输出:
虚拟计数器股票日期时间苹果 2015-01-02 0 -12015-01-03 0 -22015-01-04 1 12015-01-05 1 2微软 2015-01-02 1 12015-01-03 1 22015-01-04 0 -12015-01-05 1 1A version of this problem was answered here (this uses data at the minute frequency however).
Counting the number of consecutive occurences of numbers in dataframe with multi index
I have a dataframe that has a multi index (stock ticker and date) with a dummy column that contains 1s and 0s and I would like to count for each stock, in each row how many times the 1s or 0s have occurred in the 'Dummy" column, starting at 1 every time, and counting up for 1s and counting down for 0s I have an example below where the column 'Counter' represents what I would like to create:
df = pd.DataFrame( { 'stock': ['AAPL', 'AAPL', 'AAPL','AAPL', 'MSFT', 'MSFT','MSFT', 'MSFT'], 'datetime': ['2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05'], 'Dummy': [0, 0, 1, 1, 1,1, 0, 1], 'Counter': [-1, -2, 1, 2, 1, 2, -1, 1]}) df['datetime'] = pd.to_datetime(df['datetime']) df.set_index(['stock', 'datetime'], inplace =True)解决方案
Try something like:
import pandas as pd df = pd.DataFrame({ 'stock': ['AAPL', 'AAPL', 'AAPL', 'AAPL', 'MSFT', 'MSFT', 'MSFT', 'MSFT'], 'datetime': ['2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05'], 'Dummy': [0, 0, 1, 1, 1, 1, 0, 1]}) df['datetime'] = pd.to_datetime(df['datetime']) df.set_index(['stock', 'datetime'], inplace=True) # Group Consecutive 1 and 0 groups in each stock together df['group'] = df.groupby('stock')['Dummy'] \ .transform(lambda g: g.ne(g.shift()).cumsum()) # Set Value Placeholder in Counter 1 -> 1, 0 -> -1 df['Counter'] = df['Dummy'].apply(lambda x: 1 if x == 1 else -1) # Get cumsum from each stock and group df['Counter'] = df.groupby(['stock', 'group'])['Counter'].cumsum().astype(int) # Drop Group Column df = df.drop(columns='group') # For Display print(df.to_string())Output:
Dummy Counter stock datetime AAPL 2015-01-02 0 -1 2015-01-03 0 -2 2015-01-04 1 1 2015-01-05 1 2 MSFT 2015-01-02 1 1 2015-01-03 1 2 2015-01-04 0 -1 2015-01-05 1 1
更多推荐
使用多索引每日数据计算数据框中数字的连续出现次数
发布评论