此问题与以下要求完全相同,但又有一个问题,
This question is exactly as the following request, with one more twist,
- >熊猫:替换数据框中的列值
- 对熊猫数据框列中的值进行有条件的替换
- Pandas: Replacing column values in dataframe
- Conditional Substitution of values in pandas dataframe columns
因此,我想设置或有条件地设置pandas dataframe列值.增加的复杂性是,我无需使用字符串常量(df['data1'])来寻址数据帧列,而是需要使用变量(df[var_for_data1])来寻址它们,因为构造了我的df列名.
So, I want to set, or conditionally set pandas dataframe column values. The added complexity is, instead of addressing the dataframe columns with string constant (df['data1']), I need to address them with variables (df[var_for_data1]), becaus my df column names are constructed.
以下是简化了的示例来解释我想要的内容:
Here is the much simplified example to explain what I want:
df = pd.DataFrame({'data1': np.random.randn(100),'data2': np.random.randn(100)}) print(df.head()) Col = 'data1' print(df[Col].head()) df.data1 = df.data1 +.1 print(df[Col].head()) # so far so good, now how to do above with variable dataframe column name `Col` #df.Col = df.Col + .1问题出在代码中,到目前为止,现在还不错,现在如何在上面使用可变数据框列名Col 进行操作.
The question is in the code, so far so good, now how to do above with variable dataframe column name Col.
下一个问题是如何向上述分配中添加条件,比如说要这样做if df.data1 >=.25 and df.data1 <= .35:.当然,可以使用可变数据框列名称Col来表达它.
The next question is how to add a condition to the above assignment, say to do it if df.data1 >=.25 and df.data1 <= .35:. Of course, expressing it using the variable dataframe column name Col.
推荐答案您可以使用方括号使用字符串而不是属性来访问列名,我也强烈建议您放弃使用按属性访问列的习惯因为这会导致混乱的行为,例如,如果您具有列名sum而您执行df.sum则会返回方法sum而不是列'sum'的地址.
You can use square brackets to access a column name using the string rather than as an attribute, I also strongly recommend that you ditch this habit of accessing columns by attribute as this can lead to confusing behaviour such as if you have a column name sum and you do df.sum will return the address of the method sum rather than the column 'sum'.
所以df[Col] = df[Col] + 1
就可以工作.
关于第二个问题,要将数组与标量值进行比较,请分别对and,or和not使用按位运算符&,|和~,它们将返回一个数组布尔值,要使用多个条件,由于运算符优先级,您需要将条件包装在括号中,因为&的优先级高于比较运算符.
Regarding your 2nd question, to compare an array against a scalar value use the bitwise operators &, | and ~ for and, or and not respectively these will return an array of boolean values, to use more than 1 condition you need to wrap the conditions in parentheses due to operator precedence as & has higher precedence than the comparison operators.
所以:
df[(df[col] >=.25) & (df[col] <= .35)]应该起作用,这会将df只屏蔽同时满足两个条件的行
should work, this will mask the df to only the rows where both conditions are met
更多推荐
有条件地设置 pandas 数据框列值
发布评论