Pandas OR语句以系列结尾包含(Pandas OR statement ending in series contains)

系统教程 行业动态 更新时间:2024-06-14 17:01:31
Pandas OR语句以系列结尾包含(Pandas OR statement ending in series contains)

我有一个DataFrame df ,它有列type和subtype以及大约100k行,我试图通过检查type / subtype组合来分类df包含的数据subtype 。 虽然df可以包含许多不同的组合,但是存在仅出现在某些数据类型中的特定组合。 要检查我的对象是否包含我正在执行的任何这些组合:

typeA = ((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | (df.subtype == 5) | (df.subtype == 6))) | ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | (df.subtype == 8))) A = typeA.sum()

其中typeA是一个很长的Falses系列,可能有一些Trues,如果A> 0,那么我知道它包含一个True。 这个方案的问题是,如果df的第一行产生一个True,它仍然需要检查其他所有内容。 检查整个DataFrame比使用带有break的for循环更快,但我想知道是否有更好的方法来执行它。

谢谢你的任何建议。

I have a DataFrame df that has columns type and subtype and about 100k rows, I'm trying to classify what kind of data df contains by checking type / subtype combinations. While df can contain many different combinations there are particular combinations that only appear in certain data types. To check if my objects contains any of these combinations I'm currently doing:

typeA = ((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | (df.subtype == 5) | (df.subtype == 6))) | ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | (df.subtype == 8))) A = typeA.sum()

Where typeA is a long Series of Falses that might have some Trues, if A > 0 then I know it contained a True. The problem with this scheme is that if the first row of the df produces a True it still has to check everything else. Checking the whole DataFrame is faster then using a for loop with a break, but I'm wondering if there is a better way to do it.

Thanks for any suggestions.

最满意答案

使用crosstab :

import numpy as np import pandas as pd df = pd.DataFrame(np.random.randint(0, 10, size=(100, 2)), columns=["type", "subtype"]) counts = pd.crosstab(df.type, df.subtype) print counts.loc[0, [2, 3, 5, 6]].sum() + counts.loc[5, [3, 4, 7, 8]].sum()

结果如下:

a = (((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | (df.subtype == 5) | (df.subtype == 6))) | ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | (df.subtype == 8)))) a.sum()

use crosstab:

import numpy as np import pandas as pd df = pd.DataFrame(np.random.randint(0, 10, size=(100, 2)), columns=["type", "subtype"]) counts = pd.crosstab(df.type, df.subtype) print counts.loc[0, [2, 3, 5, 6]].sum() + counts.loc[5, [3, 4, 7, 8]].sum()

the result is same as:

a = (((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | (df.subtype == 5) | (df.subtype == 6))) | ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | (df.subtype == 8)))) a.sum()

更多推荐

本文发布于:2023-04-20 16:17:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/ed13227e9182e6aee4b3cfa29bd31780.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:语句   结尾   系列   Pandas   statement

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!