我想合并两个Pandas DataFrames,但是在索引匹配的任何地方我只想在特定df的行中合并。
所以,如果我有
df1 A B type model apple v1 10 xyz orange v2 11 pqs df2 A B type model apple v3 11 xyz grape v4 12 def我会的
df3 A B type model apple v1 10 xyz orange v2 11 pqs grape v4 12 def因为df1.ix['apple']优先于df2.ix['apple'] ,而orange和grape是独一无二的。
我一直在尝试进行一些索引比较,但是df2.drop(df1.index[[0]])只是删除了df2的全部内容。
两个数据框都是多索引的,具有类似的结构,由以下内容创建:
pd.read_csv(..., index_col=[3, 1])
这导致像这样的索引:
MultiIndex( levels=[[u'apple', u'orange', u'grape', ...], [u'v1', u'v2', u'v3', ... ]], labels=[[0, 1, 2, 3, 4, 6, 7, 8, 9, 10, ...]], names=[u'type', u'model'] )I want to merge two Pandas DataFrames, but anywhere an index matches I only want to merge in the row from a specific df.
So if I have
df1 A B type model apple v1 10 xyz orange v2 11 pqs df2 A B type model apple v3 11 xyz grape v4 12 defI would get
df3 A B type model apple v1 10 xyz orange v2 11 pqs grape v4 12 defBecause df1.ix['apple'] takes precedence over df2.ix['apple'], and orange and grape are unique.
I have been trying to make some index comparison work, but df2.drop(df1.index[[0]]) is just removing the entire contents of df2.
Both data frames are multi-indexed with a similar structure, created by:
pd.read_csv(..., index_col=[3, 1])
Which results in an index like this:
MultiIndex( levels=[[u'apple', u'orange', u'grape', ...], [u'v1', u'v2', u'v3', ... ]], labels=[[0, 1, 2, 3, 4, 6, 7, 8, 9, 10, ...]], names=[u'type', u'model'] )最满意答案
这就是DataFrame.combine_first()的用途:
import pandas as pd df1 = pd.DataFrame({'A': [10, 11], 'B': ['xyz', 'pqs']}, index=['apple', 'orange']) df2 = pd.DataFrame({'A': [11, 12], 'B': ['xyz', 'def']}, index=['apple', 'grape']) df3 = df1.combine_first(df2)产量
df3 A B apple 10.0 xyz grape 12.0 def orange 11.0 pqs编辑:在我发布上面的答案后,问题得到了实质性的修改 - 将model级别添加到索引中,有效地将其转换为MultiIndex。
import pandas as pd # Create the df1 in the question df1 = pd.DataFrame({'model': ['v1', 'v2'], 'A': [10, 11], 'B': ['xyz', 'pqs']}, index=['apple', 'orange']) df1.index.name = 'type' df1.set_index('model', append=True, inplace=True) # Create the df2 in the question df2 = pd.DataFrame({'model': ['v3', 'v4'], 'A': [11, 12], 'B': ['xyz', 'def']}, index=['apple', 'grape']) df2.index.name = 'type' df2.set_index('model', append=True, inplace=True) # Solution: remove the `model` from the index and apply the above # technique. Restore it to the index at the end if you want. df1.reset_index(level=1, inplace=True) df2.reset_index(level=1, inplace=True) df3 = df1.combine_first(df2).set_index('model', append=True)结果:
df3 A B type model apple v1 10.0 xyz grape v4 12.0 def orange v2 11.0 pqsThat's what DataFrame.combine_first() is for:
import pandas as pd df1 = pd.DataFrame({'A': [10, 11], 'B': ['xyz', 'pqs']}, index=['apple', 'orange']) df2 = pd.DataFrame({'A': [11, 12], 'B': ['xyz', 'def']}, index=['apple', 'grape']) df3 = df1.combine_first(df2)yields
df3 A B apple 10.0 xyz grape 12.0 def orange 11.0 pqsEDIT: The question was substantially modified after I posted the answer above — adding the model level to the index, effectively turning it into a MultiIndex.
import pandas as pd # Create the df1 in the question df1 = pd.DataFrame({'model': ['v1', 'v2'], 'A': [10, 11], 'B': ['xyz', 'pqs']}, index=['apple', 'orange']) df1.index.name = 'type' df1.set_index('model', append=True, inplace=True) # Create the df2 in the question df2 = pd.DataFrame({'model': ['v3', 'v4'], 'A': [11, 12], 'B': ['xyz', 'def']}, index=['apple', 'grape']) df2.index.name = 'type' df2.set_index('model', append=True, inplace=True) # Solution: remove the `model` from the index and apply the above # technique. Restore it to the index at the end if you want. df1.reset_index(level=1, inplace=True) df2.reset_index(level=1, inplace=True) df3 = df1.combine_first(df2).set_index('model', append=True)Result:
df3 A B type model apple v1 10.0 xyz grape v4 12.0 def orange v2 11.0 pqs更多推荐
发布评论