在 Pandas 中创建类似 Excel 的 SUMIFS

编程入门 行业动态 更新时间:2024-10-18 22:34:43
本文介绍了在 Pandas 中创建类似 Excel 的 SUMIFS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我最近了解了 pandas 并且很高兴看到它的分析功能.我正在尝试将 Excel 数组函数转换为 Pandas,相当于我为创建绩效归因报告而创建的自动化电子表格.在此示例中,我根据其他列中的条件在 Excel 中创建了一个新列:

I recently learned about pandas and was happy to see its analytics functionality. I am trying to convert Excel array functions into the Pandas equivalent to automate spreadsheets that I have created for the creation of performance attribution reports. In this example, I created a new column in Excel based on conditions within other columns:

={SUMIFS($F$10:$F$4518,$A$10:$A$4518,$C$4,$B$10:$B$4518,0,$C$10:$C$4518," ",$D$10:$D$4518,$D10,$E$10:$E$4518,$E10)}

该公式是根据特定条件对F"数组​​(证券权重)中的值求和.A"数组(投资组合ID)是某个数字,B"数组(证券ID)为零,C"数组(组描述)是"",D"数组(开始日期)是该行的日期我所在的行,E"数组(结束日期)是我所在行的日期.

The formula is summing up the values in the "F" array (security weights) based on certain conditions. "A" array (portfolio ID) is a certain number, "B" array (security id) is zero, "C" array (group description) is " ", "D" array (start date) is the date of the row that I am on, and "E" array (end date) is the date of the row that I am on.

在 Pandas 中,我使用的是 DataFrame.在具有前三个条件的数据帧上创建一个新列很简单,但我对后两个条件有困难.

In Pandas, I am using the DataFrame. Creating a new column on a dataframe with the first three conditions is straight forward, but I am having difficult with the last two conditions.

reportAggregateDF['PORT_WEIGHT'] = reportAggregateDF['SEC_WEIGHT_RATE'] [(reportAggregateDF['PORT_ID'] == portID) & (reportAggregateDF['SEC_ID'] == 0) & (reportAggregateDF['GROUP_LIST'] == " ") & (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[:]) & (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[:])].sum()

显然,最后两个条件中的 .ix[:] 对我没有任何作用,但是有没有办法使总和以我所在的行为条件而不循环?我的目标是不做任何循环,而是使用纯向量运算.

Obviously the .ix[:] in the last two conditions is not doing anything for me, but is there a way to make the sum conditional on the row that I am on without looping? My goal is to not do any loops, but instead use purely vector operations.

推荐答案

您想使用 apply 函数和 lambda:

You want to use the apply function and a lambda:

>> df A B C D E 0 mitfx 0 200 300 0.25 1 gs 1 150 320 0.35 2 duk 1 5 2 0.45 3 bmo 1 145 65 0.65

假设我想将 C 列与 E 相加,但前提是列 B == 1 且 D 大于 5:

Let's say I want to sum column C times E but only if column B == 1 and D is greater than 5:

df['matches'] = df.apply(lambda x: x['C'] * x['E'] if x['B'] == 1 and x['D'] > 5 else 0, axis=1) df.matches.sum()

将其分为两个步骤可能会更清晰:

It might be cleaner to split this into two steps:

df_subset = df[(df.B == 1) & (df.D > 5)] df_subset.apply(lambda x: x.C * x.E, axis=1).sum()

或者简单地使用乘法来提高速度:

or to use simply multiplication for speed:

df_subset = df[(df.B == 1) & (df.D > 5)] print sum(df_subset.C * df_subset.E)

你想在没有循环的情况下解决这个问题是完全正确的.

You are absolutely right to want to do this problem without loops.

更多推荐

在 Pandas 中创建类似 Excel 的 SUMIFS

本文发布于:2023-11-22 06:38:25,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1616349.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:类似   Pandas   SUMIFS   Excel

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!