使用列中的先前值和pandas中的一些计算来填充列(Fill the column using the previous value in the column and some calculatio

使用列中的先前值和pandas中的一些计算来填充列(Fill the column using the previous value in the column and some calculations in pandas)

我的数据框看起来像这样：

scale cons hold supply add.supply s_res z_res 48 -5 NaN NaN NaN NaN NaN NaN 49 -4 NaN NaN NaN NaN NaN NaN 50 -3 NaN NaN NaN NaN NaN NaN 51 -2 NaN NaN NaN NaN NaN NaN 52 -1 NaN NaN NaN NaN NaN NaN 53 0 0 300 0 NaN 100 200 54 1 20 NaN 0 NaN 200 322 55 2 30 NaN 70 NaN 100 100 56 3 25 NaN 0 NaN 400 110 57 4 15 NaN 0 NaN 100 300 58 5 10 NaN 0 NaN 100 180 59 6 40 NaN 0 NaN 100 100 ...

我需要做以下事情：

从scale = 1的值开始填充列hold ，其值按如下方式计算：

我在列hold取出先前的值，并从列减去当前单元格的相应值，并从列supply添加相应的值。

（对于对应于scale = 1的列hold中的单元格，对于下一个单元格(320 - 25) + 0) = 295 (280 - 30) + 70) = 320 ，对于下一个单元格(280 - 30) + 70) = 320 ，它将是(300 - 20) + 0 = 280 (320 - 25) + 0) = 295等等）

如果列hold值小于列s_res的对应值，则对于下一个单元格，我必须在列s_res和z_res添加相应的下一个单元格值之间的差异。

例如，列hold值为295 ，其中scale = 3 。该值小于s_res = 400列中的值。然后我需要计算下一个值： (295 - 15) + 0 + (300 - 100) = 480 。并在s_res列中add.supply s_res和z_res之间的add.supply 。

我需要列hold中的每个新计算值检查它是否小于列s_res的值。

结果应如下所示：

scale cons hold supply add.supply s_res z_res 48 -5 NaN NaN NaN NaN NaN NaN 49 -4 NaN NaN NaN NaN NaN NaN 50 -3 NaN NaN NaN NaN NaN NaN 51 -2 NaN NaN NaN NaN NaN NaN 52 -1 NaN NaN NaN NaN NaN NaN 53 0 0 300 0 NaN 100 200 54 1 20 280 0 NaN 200 322 55 2 30 320 70 NaN 100 100 56 3 25 295 0 NaN 400 110 57 4 15 480 0 200 100 300 58 5 10 470 0 NaN 100 180 59 6 40 430 0 NaN 100 100 ...

我会很感激任何建议。

UPD我试图应用代码

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum() df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan) df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()

更大的数据框，我遇到了问题

我的新数据框

scale cons hold supply add.supply s_res z_res 0 0 0 300 0 NaN 100 200 1 1 20 NaN 0 NaN 200 322 2 2 30 NaN 70 NaN 100 100 3 3 25 NaN 0 NaN 400 110 4 4 15 NaN 0 NaN 100 300 5 5 10 NaN 0 NaN 100 180 6 6 40 NaN 0 NaN 100 100 7 7 60 NaN 0 NaN 300 400 8 8 50 NaN 0 NaN 245 300 9 9 70 NaN 0 NaN 300 600 10 10 50 NaN 0 NaN 143 228 ...

结果应如下：

scale cons hold supply add.supply s_res z_res 0 0 0 300 0 NaN 100 200 1 1 20 280 0 NaN 200 322 2 2 30 320 70 NaN 100 100 3 3 25 295 0 NaN 400 110 4 4 15 480 0 200 100 300 5 5 10 470 0 NaN 100 180 6 6 40 430 0 NaN 100 100 7 7 60 370 0 NaN 300 400 8 8 50 320 0 NaN 245 300 9 9 70 250 0 NaN 300 600 10 10 50 285 0 85 143 228 ...

但代码执行的结果并非如此：

scale cons hold supply add.supply s_res z_res 0 0 0 300 0 NaN 100 200 1 1 20 280 0 NaN 200 322 2 2 30 320 70 NaN 100 100 3 3 25 295 0 NaN 400 110 4 4 15 480 0 200 100 300 5 5 10 470 0 NaN 100 180 6 6 40 430 0 NaN 100 100 7 7 60 370 0 NaN 300 400 8 8 50 375 0 55 245 300 9 9 70 605 0 300 300 600 10 10 50 640 0 85 143 228 ...

hold = 370后出现错误，但我不明白为什么。

My dataframe looks like this:

scale cons hold supply add.supply s_res z_res 48 -5 NaN NaN NaN NaN NaN NaN 49 -4 NaN NaN NaN NaN NaN NaN 50 -3 NaN NaN NaN NaN NaN NaN 51 -2 NaN NaN NaN NaN NaN NaN 52 -1 NaN NaN NaN NaN NaN NaN 53 0 0 300 0 NaN 100 200 54 1 20 NaN 0 NaN 200 322 55 2 30 NaN 70 NaN 100 100 56 3 25 NaN 0 NaN 400 110 57 4 15 NaN 0 NaN 100 300 58 5 10 NaN 0 NaN 100 180 59 6 40 NaN 0 NaN 100 100 ...

I need to do the following:

Starting with the value where scale = 1 fill the column hold with values calculated as follows:

I take the previous value in the column hold and subtract from it the corresponding value of the current cell from the column cons and add the corresponding value from the column supply.

(For a cell in a column hold that corresponds to scale = 1 it will be (300 - 20) + 0 = 280, for the next cell (280 - 30) + 70) = 320, for the next cell (320 - 25) + 0) = 295 and so on)

If the value in the column hold is less than the corresponding value in the column s_res, then to the next cell I must add the difference between the corresponding next cell values in the columns s_res and z_res.

For example, the value in the column hold is 295, where scale = 3. This value is less than the value in the column s_res = 400. Then the next value I need to count so: (295 - 15) + 0 + (300 - 100) = 480. And write this difference between s_res and z_res in the column add.supply.

I need every new calculated value in the column hold check whether it is less than the value in the column s_res.

The result should look like this:

scale cons hold supply add.supply s_res z_res 48 -5 NaN NaN NaN NaN NaN NaN 49 -4 NaN NaN NaN NaN NaN NaN 50 -3 NaN NaN NaN NaN NaN NaN 51 -2 NaN NaN NaN NaN NaN NaN 52 -1 NaN NaN NaN NaN NaN NaN 53 0 0 300 0 NaN 100 200 54 1 20 280 0 NaN 200 322 55 2 30 320 70 NaN 100 100 56 3 25 295 0 NaN 400 110 57 4 15 480 0 200 100 300 58 5 10 470 0 NaN 100 180 59 6 40 430 0 NaN 100 100 ...

I would be grateful for any advice.

UPD I tried to apply the code

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum() df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan) df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()

to a larger dataframe and I'm having problems

My new dataframe

scale cons hold supply add.supply s_res z_res 0 0 0 300 0 NaN 100 200 1 1 20 NaN 0 NaN 200 322 2 2 30 NaN 70 NaN 100 100 3 3 25 NaN 0 NaN 400 110 4 4 15 NaN 0 NaN 100 300 5 5 10 NaN 0 NaN 100 180 6 6 40 NaN 0 NaN 100 100 7 7 60 NaN 0 NaN 300 400 8 8 50 NaN 0 NaN 245 300 9 9 70 NaN 0 NaN 300 600 10 10 50 NaN 0 NaN 143 228 ...

The result should be the following:

scale cons hold supply add.supply s_res z_res 0 0 0 300 0 NaN 100 200 1 1 20 280 0 NaN 200 322 2 2 30 320 70 NaN 100 100 3 3 25 295 0 NaN 400 110 4 4 15 480 0 200 100 300 5 5 10 470 0 NaN 100 180 6 6 40 430 0 NaN 100 100 7 7 60 370 0 NaN 300 400 8 8 50 320 0 NaN 245 300 9 9 70 250 0 NaN 300 600 10 10 50 285 0 85 143 228 ...

But the result of the code execution was not what it should be:

scale cons hold supply add.supply s_res z_res 0 0 0 300 0 NaN 100 200 1 1 20 280 0 NaN 200 322 2 2 30 320 70 NaN 100 100 3 3 25 295 0 NaN 400 110 4 4 15 480 0 200 100 300 5 5 10 470 0 NaN 100 180 6 6 40 430 0 NaN 100 100 7 7 60 370 0 NaN 300 400 8 8 50 375 0 55 245 300 9 9 70 605 0 300 300 600 10 10 50 640 0 85 143 228 ...

Error appears after hold = 370, but I don't understand why.

最满意答案

您可以使用cumsum()和np.where的组合在整个DataFrame中执行此操作，而np.where执行此操作：

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum() df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan) df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()

想想你想分两个阶段做的转变。您有一个初始阶段，您可以在df.hold的初始值中添加和减去。然后，根据某些条件，你在某些情况下改变了新的持有价值。

cumsum()接受一个Series或DataFrame并创建一个新版本，其中每一行是前一行和当前行的累积和。您可以为df.cons和df.supply执行此df.cons ，以获取df.cons df.supply中减去并添加到df.hold的累计金额。现在你已经计算了df.hold的第一阶段。

您可以使用np.where来查明df.hold满足您感兴趣的条件。如果满足条件，您可以相应地设置df['add.supply'] 。然后，您可以将此新列添加到df.hold 。请注意，我们使用fillna(0)确保每行都有一个值，再次使用cumsum()来保留添加的条件值。

UPDATE

添加一个add.supply值后，上面的原始代码add.supply ，因为add.supply的第一阶段的未来值尚未包括它。可能有一种方法可以非迭代地执行此操作，并且肯定有一种比我在下面所做的更好更清洁的方式，但这至少可以完成工作：

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum() hold = df.hold.tolist() s_res = df.s_res.tolist() add = (df.z_res - df.s_res).shift(-1).tolist() newh = [hold[0]] totala = 0 for h, s, a in zip(hold, s_res, add): newh.append(h + totala) if newh[-1] < s: totala += a df['hold'] = pd.Series(newh[1:]) df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)

Instead of doing this row by row, you can use a combination of cumsum() and np.where to do this across the whole DataFrame:

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum() df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan) df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()

Think of the transformations you want to do in two stages. You have an initial stage where you're adding and subtracting from an initial value of df.hold. Then you're altering that new value of hold in some cases, according to some conditions.

cumsum() takes a Series or DataFrame and makes a new version where each row is the cumulative sum of the previous rows and the current row. You can do that for df.cons and df.supply to get the cumulative amounts that will be subtracted from and added to df.hold. Now you have the first stage of df.hold calculated.

You can use np.where to find out when df.hold meets the conditions you're interested in. Where it does, you can set df['add.supply'] accordingly. Then you can add this new column to df.hold. Note that we're using fillna(0) to make sure each row has a value, and cumsum() again to preserve the added conditional values over time.

UPDATE

The original code above didn't work after the addition of one value of add.supply, because future values of the first stage of df.hold didn't include it yet. There may be a way to do this non-iteratively, and there's certainly a better and cleaner way than what I've done below, but this at least will get the job done:

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum() hold = df.hold.tolist() s_res = df.s_res.tolist() add = (df.z_res - df.s_res).shift(-1).tolist() newh = [hold[0]] totala = 0 for h, s, a in zip(hold, s_res, add): newh.append(h + totala) if newh[-1] < s: totala += a df['hold'] = pd.Series(newh[1:]) df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)

更多推荐