我有两个DataFrame:
I have two DataFrames:
s1: time X1 0 1234567000 96.32 1 1234567005 96.01 2 1234567009 96.05 s2: time X2 0 1234566999 23.88 1 1234567006 23.96我想在保留时间戳的同时用第二个DataFrame替换第一个时间序列/DataFrame的值,以获得:
I would like to replace the values of the first time series/DataFrame with the second DataFrame while keeping the timestamp, to obtain:
frame: time X2 0 1234567000 23.88 1 1234567005 23.88 2 1234567009 23.96输出(frame)的时间戳应为s1,但值应为s2. time是整数(它不是UNIX时间戳记). X1和X2是浮动的.
The output (frame) should have the timestamps of s1 but the values of s2. time is integer (It isn't a UNIX timestamp). X1 and X2 are float.
用熊猫有什么整洁的方法吗?
Is there any neat way to do it with pandas?
我目前使用的是外部联接/合并+ fillna +内部联接/合并+ del列的链,但这似乎效率不高.
I currently use a chain of outer join/merge + fillna + inner join/merge + del columns, but that doesn't seem efficient.
from __future__ import print_function import pandas as pd def merge_dataframes(s1, s2, common_column, back_fill=False, verbose=False): if verbose: print('s1: \n{0}'.format(s1)) if verbose: print('s2: \n{0}'.format(s2)) frame = pd.merge(s1,s2,how='outer').sort_values(by=common_column) if verbose: print('frame: \n{0}'.format(frame)) frame.fillna(method='ffill', inplace=True) if verbose: print('frame: \n{0}'.format(frame)) frame = pd.merge(frame,s1,how='inner').sort_values(by=common_column) if verbose: print('frame: \n{0}'.format(frame)) for column_name in s1.columns: if (column_name not in common_column) and (column_name not in s2.columns): del frame[column_name] if back_fill: frame.fillna(method='bfill', inplace=True) if verbose: print('frame: \n{0}'.format(frame)) return frame def main(): ''' Demonstrate the use of merge_dataframes(s1, s2, common_column) ''' s1 = pd.DataFrame({ 'time':[1234567000,1234567005,1234567009], 'X1':[96.32,96.01,96.05] },columns=['time','X1']) s2 = pd.DataFrame({ 'time':[1234566999,1234567006], 'X2':[23.88,23.96] },columns=['time','X2']) common_column = 'time' frame = merge_dataframes(s1, s2, common_column, verbose=True) print('frame: \n{0}'.format(frame)) if __name__ == "__main__": main() #cProfile.run('main()') # if you want to do some profiling推荐答案
pd.merge_asof在您的示例中对我有用
pd.merge_asof works for me on your sample
pd.merge_asof(s1,s2,on='time') Out[108]: time X1 X2 0 1234567000 96.32 23.88 1 1234567005 96.01 23.88 2 1234567009 96.05 23.96编辑-绝对合并的解决方案
def Matcher2(value,mat): return np.argmin(np.absolute(mat-value)) mat = s2.time.as_matrix() s1['dex'] = s1.time.apply(lambda row: Matcher2(row,mat)) mg = pd.merge(s1,s2,left_on='dex',right_index=True,how='left') print mg[['time_x','X1','X2']] time_x X1 X2 0 1234567000 96.32 23.88 1 1234567005 96.01 23.96 2 1234567009 96.05 23.96更多推荐
用 pandas 中另一个时间序列的值替换一个时间序列的值
发布评论