我有一个包含多索引(股票和日期时间)的数据框,其中包含一个包含 1 和 0 的虚拟列,我想计算每只股票和每一天,在每一行中 1 或 0 的次数发生在假人"中列,每次从 1 开始,向上计数为 1,向下计数为 0 我在下面有一个示例,其中Counter"列代表我想要创建的内容:
I have a dataframe that has a multi index (stock and datetime) with a dummy column that contains 1s and 0s and I would like to count for each stock and for each day, in each row how many times the 1s or 0s have occurred in the 'Dummy" column, starting at 1 every time, and counting up for 1s and counting down for 0s I have an example below where the column 'Counter' represents what I would like to create:
import pandas as pd df = pd.DataFrame( { 'stock': ['AAPL', 'AAPL', 'AAPL','AAPL', 'AAPL','AAPL', 'AAPL', 'MSFT', 'MSFT'], 'datetime': ['2015-01-02 20:57', '2015-01-02 20:58', '2015-01-02 20:59', '2015-01-02 21:00','2015-01-03 20:57', '2015-01-03 20:58', '2015-01-03 20:59','2015-01-02 20:57', '2015-01-02 20:58'], 'Dummy': [0, 0, 1, 1, 1,1, 0, 1, 1], 'Counter': [-1, -2, 1, 2, 1, 2, 1, 1,2]}) df['datetime'] = pd.to_datetime(df['datetime']) df.set_index(['stock', 'datetime'], inplace =True)这里回答了这个问题的一个更简单的版本(但是忽略了股票代码和日期)
A simpler version of this problem was answered here (this ignores the tickers and dates however)
统计数字连续出现的次数数据框
推荐答案只需稍微修改你之前的解决方案
Just slightly modify your previous solution
m = df.Dummy.diff().ne(0).cumsum() counters = df.groupby([df.index.get_level_values(0), df.index.get_level_values(1).date, m]).cumcount()+1 df['Counter'] = np.where(df['Dummy']==0, -1, 1) * counters Out[95]: Dummy Counter stock datetime AAPL 2015-01-02 20:57:00 0 -1 2015-01-02 20:58:00 0 -2 2015-01-02 20:59:00 1 1 2015-01-02 21:00:00 1 2 2015-01-03 20:57:00 1 1 2015-01-03 20:58:00 1 2 2015-01-03 20:59:00 0 -1 MSFT 2015-01-02 20:57:00 1 1 2015-01-02 20:58:00 1 2更多推荐
使用多索引计算数据帧中数字的连续出现次数
发布评论