我有三个数据帧:
df1 = pd.DataFrame({'code': [123, 124], 'name': ['Manchester', 'Liverpool']}) df2 = pd.DataFrame({'code': [123], 'group': ['b'], 'items_highcost': [10]}) df3 = pd.DataFrame({'code': [123, 124], 'group': ['b', 'c'], 'items_not_highcost': [11, 8]})我想将它们合并到一个帧中,但不包括任何不在df1行,所以我做两个左连接:
df_merged = pd.merge(df1, df2, on='code', how='left') df_merged = pd.merge(df_merged, df3, on='code', how='left')这给了我以下合并数据集:
code name group_x items_highcost group_y items_not_highcost 0 123 Manchester b 10 b 11 1 124 Liverpool NaN NaN c 8但我希望每行都有一个group列,其中包含非空值group ,无论是来自df2还是df3还是两者都有。 (可以安全地假设group的值与df2和df3的相同代码相同,因为它们来自相同的数据源。)
现在我这样做:
def get_group(row): if row['group_x']: return row['group_x'] if row['group_y']: return row['group_y'] return None df_merged['group'] = df_merged.apply(get_group, axis=1) df_merged.drop(['group_x', 'group_y'], inplace=True, axis=1)这有效,但很难看。 理想情况下,我没有group_x和group_y的中间步骤,但可以在一个步骤中获得非null值。 有没有更好的办法?
I have three dataframes:
df1 = pd.DataFrame({'code': [123, 124], 'name': ['Manchester', 'Liverpool']}) df2 = pd.DataFrame({'code': [123], 'group': ['b'], 'items_highcost': [10]}) df3 = pd.DataFrame({'code': [123, 124], 'group': ['b', 'c'], 'items_not_highcost': [11, 8]})I want to merge them into a single frame, but excluding any rows that aren't in df1, so I do two left joins:
df_merged = pd.merge(df1, df2, on='code', how='left') df_merged = pd.merge(df_merged, df3, on='code', how='left')This gives me the following merged dataset:
code name group_x items_highcost group_y items_not_highcost 0 123 Manchester b 10 b 11 1 124 Liverpool NaN NaN c 8But I want there to be a single group column for each row, with the non-null value of group in it, whether from df2 or df3 or both. (It's safe to assume that the value of group will be the same the same codes in df2 and df3, as they're coming from the same data source.)
Right now I'm doing this:
def get_group(row): if row['group_x']: return row['group_x'] if row['group_y']: return row['group_y'] return None df_merged['group'] = df_merged.apply(get_group, axis=1) df_merged.drop(['group_x', 'group_y'], inplace=True, axis=1)This works, but is ugly. Ideally I would't have the intermediate step with group_x and group_y, but could get the non-null value in a single step. Is there a better way?
最满意答案
看起来你所做的只是填充缺失的值,在这种情况下你可以连续调用combine_first :
In [196]: df1.combine_first(df2).combine_first(df3) Out[196]: code group items_highcost items_not_highcost name 0 123 b 10.0 11.0 Manchester 1 124 c NaN 8.0 LiverpoolIt looks like all you're doing is filling the missing values in which case you could instead call combine_first successively:
In [196]: df1.combine_first(df2).combine_first(df3) Out[196]: code group items_highcost items_not_highcost name 0 123 b 10.0 11.0 Manchester 1 124 c NaN 8.0 Liverpool更多推荐
发布评论