合并2个Pandas DataFrame，如果索引匹配，则将另一个记录与另一个记录匹配(Merge 2 Pandas DataFrame, Take One Record Over Another i

合并2个Pandas DataFrame，如果索引匹配，则将另一个记录与另一个记录匹配(Merge 2 Pandas DataFrame, Take One Record Over Another if Index Matches)

我想合并两个Pandas DataFrames，但是在索引匹配的任何地方我只想在特定df的行中合并。

所以，如果我有

df1 A B type model apple v1 10 xyz orange v2 11 pqs df2 A B type model apple v3 11 xyz grape v4 12 def

我会的

df3 A B type model apple v1 10 xyz orange v2 11 pqs grape v4 12 def

因为df1.ix['apple']优先于df2.ix['apple'] ，而orange和grape是独一无二的。

我一直在尝试进行一些索引比较，但是df2.drop(df1.index[[0]])只是删除了df2的全部内容。

两个数据框都是多索引的，具有类似的结构，由以下内容创建：

pd.read_csv(..., index_col=[3, 1])

这导致像这样的索引：

MultiIndex( levels=[[u'apple', u'orange', u'grape', ...], [u'v1', u'v2', u'v3', ... ]], labels=[[0, 1, 2, 3, 4, 6, 7, 8, 9, 10, ...]], names=[u'type', u'model'] )

I want to merge two Pandas DataFrames, but anywhere an index matches I only want to merge in the row from a specific df.

So if I have

df1 A B type model apple v1 10 xyz orange v2 11 pqs df2 A B type model apple v3 11 xyz grape v4 12 def

I would get

df3 A B type model apple v1 10 xyz orange v2 11 pqs grape v4 12 def

Because df1.ix['apple'] takes precedence over df2.ix['apple'], and orange and grape are unique.

I have been trying to make some index comparison work, but df2.drop(df1.index[[0]]) is just removing the entire contents of df2.

Both data frames are multi-indexed with a similar structure, created by:

pd.read_csv(..., index_col=[3, 1])

Which results in an index like this:

MultiIndex( levels=[[u'apple', u'orange', u'grape', ...], [u'v1', u'v2', u'v3', ... ]], labels=[[0, 1, 2, 3, 4, 6, 7, 8, 9, 10, ...]], names=[u'type', u'model'] )

最满意答案

这就是DataFrame.combine_first()的用途：

import pandas as pd df1 = pd.DataFrame({'A': [10, 11], 'B': ['xyz', 'pqs']}, index=['apple', 'orange']) df2 = pd.DataFrame({'A': [11, 12], 'B': ['xyz', 'def']}, index=['apple', 'grape']) df3 = df1.combine_first(df2)

产量

df3 A B apple 10.0 xyz grape 12.0 def orange 11.0 pqs

编辑：在我发布上面的答案后，问题得到了实质性的修改 - 将model级别添加到索引中，有效地将其转换为MultiIndex。

import pandas as pd # Create the df1 in the question df1 = pd.DataFrame({'model': ['v1', 'v2'], 'A': [10, 11], 'B': ['xyz', 'pqs']}, index=['apple', 'orange']) df1.index.name = 'type' df1.set_index('model', append=True, inplace=True) # Create the df2 in the question df2 = pd.DataFrame({'model': ['v3', 'v4'], 'A': [11, 12], 'B': ['xyz', 'def']}, index=['apple', 'grape']) df2.index.name = 'type' df2.set_index('model', append=True, inplace=True) # Solution: remove the `model` from the index and apply the above # technique. Restore it to the index at the end if you want. df1.reset_index(level=1, inplace=True) df2.reset_index(level=1, inplace=True) df3 = df1.combine_first(df2).set_index('model', append=True)

结果：

df3 A B type model apple v1 10.0 xyz grape v4 12.0 def orange v2 11.0 pqs

That's what DataFrame.combine_first() is for:

import pandas as pd df1 = pd.DataFrame({'A': [10, 11], 'B': ['xyz', 'pqs']}, index=['apple', 'orange']) df2 = pd.DataFrame({'A': [11, 12], 'B': ['xyz', 'def']}, index=['apple', 'grape']) df3 = df1.combine_first(df2)

yields

df3 A B apple 10.0 xyz grape 12.0 def orange 11.0 pqs

EDIT: The question was substantially modified after I posted the answer above — adding the model level to the index, effectively turning it into a MultiIndex.

import pandas as pd # Create the df1 in the question df1 = pd.DataFrame({'model': ['v1', 'v2'], 'A': [10, 11], 'B': ['xyz', 'pqs']}, index=['apple', 'orange']) df1.index.name = 'type' df1.set_index('model', append=True, inplace=True) # Create the df2 in the question df2 = pd.DataFrame({'model': ['v3', 'v4'], 'A': [11, 12], 'B': ['xyz', 'def']}, index=['apple', 'grape']) df2.index.name = 'type' df2.set_index('model', append=True, inplace=True) # Solution: remove the `model` from the index and apply the above # technique. Restore it to the index at the end if you want. df1.reset_index(level=1, inplace=True) df2.reset_index(level=1, inplace=True) df3 = df1.combine_first(df2).set_index('model', append=True)

Result:

df3 A B type model apple v1 10.0 xyz grape v4 12.0 def orange v2 11.0 pqs

更多推荐