我有以下数据框:
Hotel_id Month_Year Chef_Id Chef_is_masterchef Transition 2400188 February-2018 4597566 1 0 2400188 March-2018 4597566 1 0 2400188 April-2018 4597566 1 0 2400188 May-2018 4597566 1 0 2400188 June-2018 4597566 1 0 2400188 July-2018 4597566 1 0 2400188 August-2018 4597566 1 0 2400188 September-2018 4597566 0 1 2400188 October-2018 4597566 0 0 2400188 November-2018 4597566 0 0 2400188 December-2018 4597566 0 0 2400188 January-2019 4597566 0 0 2400188 February-2019 4597566 0 0 2400188 March-2019 4597566 0 0 2400188 April-2019 4597566 0 0 2400188 May-2019 4597566 0 0 2400614 May-2015 2297544 0 0 2400614 June-2015 2297544 0 0 2400614 July-2015 2297544 0 0 2400614 August-2015 2297544 0 0 2400614 September-2015 2297544 0 0 2400614 October-2015 2297544 0 0 2400614 November-2015 2297544 0 0 2400614 December-2015 2297544 0 0 2400614 January-2016 2297544 1 1 2400614 February-2016 2297544 1 0 2400614 March-2016 2297544 1 0 3400624 May-2016 2597531 0 0 3400624 June-2016 2597531 0 0 3400624 July-2016 2597531 0 0 3400624 August-2016 2597531 1 1 2400133 February-2016 4597531 0 0 2400133 March-2016 4597531 0 0 2400133 April-2016 4597531 0 0 2400133 May-2016 4597531 0 0 2400133 June-2016 4597531 0 0 2400133 July-2016 4597531 0 0 2400133 August-2016 4597531 1 1 2400133 September-2016 4597531 1 0 2400133 October-2016 4597531 1 0 2400133 November-2016 4597531 1 0 2400133 December-2016 4597531 1 0 2400133 January-2017 4597531 1 0 2400133 February-2017 4597531 1 0 2400133 March-2017 4597531 1 0 2400133 April-2017 4597531 1 0 2400133 May-2017 4597531 1 0当 Chef_is_Masterchef 列中从 0 到 1 或 1 到 0 的转换发生时,此转换会在 >Transition 列作为 1.
When the transition takes place from 0 to 1 or 1 to 0 in the Chef_is_Masterchef column, this transition is indicated in the Transition column as 1.
实际上,我想创建另一列(名为Var"),其中的值将按照下面提到的原始数据框填充,
Actually, I thought of creating another column (named as "Var") where the values will be filled as mentioned below for the original data frame,
预期数据框:
Hotel_id Month_Year Chef_Id Chef_is_masterchef Transition Var 2400188 February-2018 4597566 1 0 -7 2400188 March-2018 4597566 1 0 -6 2400188 April-2018 4597566 1 0 -5 2400188 May-2018 4597566 1 0 -4 2400188 June-2018 4597566 1 0 -3 2400188 July-2018 4597566 1 0 -2 2400188 August-2018 4597566 1 0 -1 2400188 September-2018 4597566 0 1 0 2400188 October-2018 4597566 0 0 1 2400188 November-2018 4597566 0 0 2 2400188 December-2018 4597566 0 0 3 2400188 January-2019 4597566 0 0 4 2400188 February-2019 4597566 0 0 5 2400188 March-2019 4597566 0 0 6 2400188 April-2019 4597566 0 0 7 2400188 May-2019 4597566 0 0 8 2400614 May-2015 2297544 0 0 -8 2400614 June-2015 2297544 0 0 -7 2400614 July-2015 2297544 0 0 -6 2400614 August-2015 2297544 0 0 -5 2400614 September-2015 2297544 0 0 -4 2400614 October-2015 2297544 0 0 -3 2400614 November-2015 2297544 0 0 -2 2400614 December-2015 2297544 0 0 -1 2400614 January-2016 2297544 1 1 0 2400614 February-2016 2297544 1 0 1 2400614 March-2016 2297544 1 0 2 3400624 May-2016 2597531 0 0 -3 3400624 June-2016 2597531 0 0 -2 3400624 July-2016 2597531 0 0 -1 3400624 August-2016 2597531 1 1 0 2400133 February-2016 4597531 0 0 -6 2400133 March-2016 4597531 0 0 -5 2400133 April-2016 4597531 0 0 -4 2400133 May-2016 4597531 0 0 -3 2400133 June-2016 4597531 0 0 -2 2400133 July-2016 4597531 0 0 -1 2400133 August-2016 4597531 1 1 0 2400133 September-2016 4597531 1 0 1 2400133 October-2016 4597531 1 0 2 2400133 November-2016 4597531 1 0 3 2400133 December-2016 4597531 1 0 4 2400133 January-2017 4597531 1 0 5 2400133 February-2017 4597531 1 0 6 2400133 March-2017 4597531 1 0 7 2400133 April-2017 4597531 1 0 8 2400133 May-2017 4597531 1 0 9如果观察到,在 Var 列中的转换点,我将值设为零,并且对于我维护相应整数值之前和之后的行.
If observed, at the point of transition in the Var column I am giving the value as zero and for the rows before and after I am maintaining the corresponding integer values.
但是在使用以下代码后,我在 Var 列中遇到了问题,
s = df['Chef_is_masterchef'].eq(0).groupby(df['Chef_Id']).transform('sum') df['var'] = df.groupby('Chef_Id').cumcount().sub(s)以上代码的输出:
Hotel_id Month_Year Chef_Id Chef_is_masterchef Transition Var 2400188 February-2018 4597566 1 0 -9 2400188 March-2018 4597566 1 0 -8 2400188 April-2018 4597566 1 0 -7 2400188 May-2018 4597566 1 0 -6 2400188 June-2018 4597566 1 0 -5 2400188 July-2018 4597566 1 0 -4 2400188 August-2018 4597566 1 0 -3 2400188 September-2018 4597566 0 1 -2 2400188 October-2018 4597566 0 0 -1 2400188 November-2018 4597566 0 0 0 2400188 December-2018 4597566 0 0 1 2400188 January-2019 4597566 0 0 2 2400188 February-2019 4597566 0 0 3 2400188 March-2019 4597566 0 0 4 2400188 April-2019 4597566 0 0 5 2400188 May-2019 4597566 0 0 6 2400614 May-2015 2297544 0 0 -8 2400614 June-2015 2297544 0 0 -7 2400614 July-2015 2297544 0 0 -6 2400614 August-2015 2297544 0 0 -5 2400614 September-2015 2297544 0 0 -4 2400614 October-2015 2297544 0 0 -3 2400614 November-2015 2297544 0 0 -2 2400614 December-2015 2297544 0 0 -1 2400614 January-2016 2297544 1 1 0 2400614 February-2016 2297544 1 0 1 2400614 March-2016 2297544 1 0 2 3400624 May-2016 2597531 0 0 -3 3400624 June-2016 2597531 0 0 -2 3400624 July-2016 2597531 0 0 -1 3400624 August-2016 2597531 1 1 0 2400133 February-2016 4597531 0 0 -6 2400133 March-2016 4597531 0 0 -5 2400133 April-2016 4597531 0 0 -4 2400133 May-2016 4597531 0 0 -3 2400133 June-2016 4597531 0 0 -2 2400133 July-2016 4597531 0 0 -1 2400133 August-2016 4597531 1 1 0 2400133 September-2016 4597531 1 0 1 2400133 October-2016 4597531 1 0 2 2400133 November-2016 4597531 1 0 3 2400133 December-2016 4597531 1 0 4 2400133 January-2017 4597531 1 0 5 2400133 February-2017 4597531 1 0 6 2400133 March-2017 4597531 1 0 7 2400133 April-2017 4597531 1 0 8 2400133 May-2017 4597531 1 0 9如果观察到,对于 Chef_Id = 4597566,您可以在转换点看到 Var 列中的值不同而不是零.
If Observed, for the Chef_Id = 4597566 you can see at the point of transition the value is different instead of zero in the Var column.
这会产生一个问题,因为在转换点,我必须为每个 ID 选择最多 3 个月前和 2 个月后的行.同样在转换点,我必须使用以下代码为每个 id 选择最多 6 个月之前和 5 个月之后的行:
This creates a problem because, at the point of transition, I have to select rows including up to 3 months before and 2 months after for each id. Also at the point of transition, I have to select rows including up to 6 months before and 5 months after for each id using the below code:
df1 = df[df['var'].between(-3, 2)] print (df1) df2 = df[df['var'].between(-6, 5)] print (df2)所以请告诉我解决方案.
So please let me know the solution.
提前致谢!
推荐答案使用 GroupBy.cumcount 用于每组的计数器,然后通过 比较减去 0 值的数量0 和 GroupBy.transform:
Use GroupBy.cumcount for counter per groups and then subtract number of 0 values by compare by 0 and GroupBy.transform:
s = df['Chef_is_masterchef'].eq(0).groupby(df['Chef_Id']).transform('sum') df['var'] = df.groupby('Chef_Id').cumcount().sub(s) print (df) Hotel_id Month_Year Chef_Id Chef_is_masterchef Transition var 0 2400614 May-2015 2297544 0 0 -8 1 2400614 June-2015 2297544 0 0 -7 2 2400614 July-2015 2297544 0 0 -6 3 2400614 August-2015 2297544 0 0 -5 4 2400614 September-2015 2297544 0 0 -4 5 2400614 October-2015 2297544 0 0 -3 6 2400614 November-2015 2297544 0 0 -2 7 2400614 December-2015 2297544 0 0 -1 8 2400614 January-2016 2297544 1 1 0 9 2400614 February-2016 2297544 1 0 1 10 2400614 March-2016 2297544 1 0 2 11 3400624 May-2016 2597531 0 0 -3 12 3400624 June-2016 2597531 0 0 -2 13 3400624 July-2016 2597531 0 0 -1 14 3400624 August-2016 2597531 1 1 0 15 2400133 February-2016 4597531 0 0 -6 16 2400133 March-2016 4597531 0 0 -5 17 2400133 April-2016 4597531 0 0 -4 18 2400133 May-2016 4597531 0 0 -3 19 2400133 June-2016 4597531 0 0 -2 20 2400133 July-2016 4597531 0 0 -1 21 2400133 August-2016 4597531 1 1 0 22 2400133 September-2016 4597531 1 0 1 23 2400133 October-2016 4597531 1 0 2 24 2400133 November-2016 4597531 1 0 3 25 2400133 December-2016 4597531 1 0 4 26 2400133 January-2017 4597531 1 0 5 27 2400133 February-2017 4597531 1 0 6 28 2400133 March-2017 4597531 1 0 7 29 2400133 April-2017 4597531 1 0 8 30 2400133 May-2017 4597531 1 0 9最后过滤器 系列.之间:
Last filter by Series.between:
df1 = df[df['var'].between(-3, 2)] print (df1) df2 = df[df['var'].between(-6, 5)] print (df2)更多推荐
如何根据每个id的条件选择行
发布评论