我有一个数据框,如下所示:
I have a dataframe that looks like this:
a1 | a2 | b3 | b4 | b5 | c | d 1 | 2 | 3 | 4 | 5 | 1 | 1 1 | 4 | 5 | 3 | 2 | 0 | 0 2 | 3 | 1 | 1 | 0 | 0 | 0我想创建两列a_count和b_count。
I want to create two columns, "a_count", and "b_count".
对于d的值为1 ORc的每一行为0:
For each row where the value of "d" is 1 OR "c" is 0:
-
a_count应表示a1或 a2中出现的次数1
"a_count" should represent the number of times '1' appears in a1 or a2
b_count 应该代表 b3 / b4 / b5
"b_count" should represent the number of times '1' appears in b3/b4/b5
如果'd'和'c'是0,它应该是一个0。
If both 'd' and 'c' are 0 it should just be a 0.
所以结果输出看起来像...
So the resulting output would look like...
a1 | a2 | b3 | b4 | b5 | c | d | a_count | b_count 1 | 2 | 3 | 4 | 5 | 0 | 0 | 0 | 0 1 | 4 | 5 | 3 | 2 | 1 | 0 | 1 | 0 1 | 1 | 1 | 1 | 0 | 0 | 1 | 2 | 2如果我分别计算a_count和b_count,可以吗? 我想我可以使用np.where等的组合,但是我觉得困惑我弄清楚如何得到a1 / a2或b3 / b4 / b5列中的计数,其中相应的值为1并且满足c和d的条件。
It's fine if I compute a_count and b_count separately. I guess I could use a combination of np.where, etc. but I think what confused me was figuring out how to get a count within either columns a1/a2 or b3/b4/b5 where the respective values were 1 AND the condition for c and d was met.
也许这是一个直截了当的问题,但我的大脑刚刚被油炸(如果这太简单了,有人可以指出我在正确的方向?
Maybe it's a straightforward question but my brain is just fried right now :( If it is too trivial can someone just point me in the right direction?
推荐答案是, np.where 是这个问题的好选择。
Yes, np.where is a good choice for this problem.
df['a_count'] = np.where((df['c'] == 0) & (df['d'] == 0), 0, (df[['a1', 'a2']]==1).sum(1)) df['b_count'] = np.where((df['c'] == 0) & (df['d'] == 0), 0, (df[['b3', 'b4', 'b5']]==1).sum(1))更多推荐
pandas 条件列计数
发布评论