将 pandas 数据框与重叠的列/行组合

编程入门 行业动态 更新时间:2024-10-17 07:38:49
本文介绍了将 pandas 数据框与重叠的列/行组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在寻找一种有效的方式来组合100个熊猫数据框,这些数据框代表信息点的网格.这些数据帧的每个点都是唯一的,并且不与另一个数据帧的点重叠,但是它们确实在更大的拼凑空间中共享列和行. 即

I am looking for an efficient way to combine 100 pandas data frames, which represent a grid of information points. Each of these data frames' points is unique, and does not overlap points represented by another, but they do share columns and rows over a larger patchwork space. i.e.

1 2 3 4 5 6 7 8 9 A df1, df1, df1, df2, df2, df2, df3, df3, df3 B df1, df1, df1, df2, df2, df2, df3, df3, df3 C df1, df1, df1, df2, df2, df2, df3, df3, df3 D df4, df4, df4, df5, df5, df5, etc, etc, etc E df4, df4, df4, df5, df5, df5, etc, etc, etc F df4, df4, df4, df5, df5, df5, etc, etc, etc

Pandas的连接仅在列或行轴上组合,而不能在两者之间组合.因此,我一直在尝试增加数据帧的数量,并使用df1bine_first(df2)方法(无限重复).

Pandas' concatenate only combines over either the columns or the row axis, but not both. So I've been trying to increment over the data frames and using the df1bine_first(df2) method (repeat ad infinitum).

这是最好的方法,还是我应该意识到的另一种更有效的方法?

Is this the best way to proceed, or is there another more efficient method that I should be aware of?

推荐答案

这里基于非重叠数据点并假设非常规则的数据(在这种情况下为3x3),快速地从便利性角度和效率角度进行了猜测. >

Here's a quick guess at both the convenience and efficiency angles, based on non-overlapping datapoints and assuming very regular data (everything 3x3 in this case).

df1=pd.DataFrame( np.random.randn(3,3), index=list('ABC'), columns=list('123') ) df2=pd.DataFrame( np.random.randn(3,3), index=list('DEF'), columns=list('123') ) df3=pd.DataFrame( np.random.randn(3,3), index=list('ABC'), columns=list('456') ) df4=pd.DataFrame( np.random.randn(3,3), index=list('DEF'), columns=list('456') )

combine_first方式的优点是,您可以仅将列表中的所有内容转储而不用担心顺序:

The combine_first way has the advantage that you can just dump everything in a list without worrying about the order:

%%timeit comb_df = pd.DataFrame() for df in [df1,df2,df3,df4]: comb_df = comb_dfbine_first( df ) 100 loops, best of 3: 8.92 ms per loop

concat方法要求您按特定顺序对事物进行分组,但速度要快两倍以上:

The concat way requires you to group things in a specific order, but is more than twice as fast:

%%timeit df5 = pd.concat( [df1,df2], axis=0 ) df6 = pd.concat( [df3,df4], axis=0 ) df7 = pd.concat( [df5,df6], axis=1 ) 100 loops, best of 3: 3.84 ms per loop

快速检查两种方法是否相同:

Quick check that both ways work the same:

all( comb_df == df7 ) True

更多推荐

将 pandas 数据框与重叠的列/行组合

本文发布于:2023-11-29 20:13:45,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1647511.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:组合   数据   pandas

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!