我有一个非常大的DataFrame,它具有重复的列,但其下的值却没有.我想将重复的列合并在一起并添加值.
I have this really large DataFrame which has duplicate columns, but the values under it are not. I want to merge the duplicate columns together and add the values.
这个非常大的DataFrame是通过将Series附加在一起而制成的,这就是重复发生的地方.
This really large DataFrame is made by appending Series together, and that is where the duplication occurs.
Py Java Ruby C Ruby 2010 1 5 8 1 5 2011 5 5 1 9 8 2012 1 5 8 2 8 2013 6 3 8 1 9 2014 4 8 9 9 9所以我想将两个Ruby列加在一起以得到以下结果:
So I want to add both Ruby columns together to get this result:
Py Java Ruby C Ruby 2010 1 5 13 1 5 2011 5 5 9 9 8 2012 1 5 16 2 8 2013 6 3 17 1 9 2014 4 8 18 9 9我正在运行python 2.7
I am running python 2.7
推荐答案我建议使用groupby:
I would propose to use groupby:
df = df.groupby(axis=1, level=0).sum()为了使其也适用于MultiIndex,可以执行以下操作:
In order to make it work also for MultiIndex, one can do:
if df.columns.duplicated().any(): all_levels = df.columns.nlevels if all_levels > 1: all_levels = range(all_levels) df = df.groupby(axis=1, level=all_levels).sum()编辑
现在不再需要使用groupby了,只需执行以下操作即可:
EDIT
Instead of using groupby, one can now simply do:
df = df.sum(axis=1, level=0)请注意,nans将通过上述过程转换为0.为避免这种情况,可以使用skipna=False或min_count=1(取决于用例):
Be aware of nans, which will be converted to 0 by above procedures. To avoid that, one could use either skipna=False or min_count=1 (depending on use case):
df = df.sum(axis=1, level=0, skipna=False)更多推荐
Pandas DataFrame,将重复的列添加在一起
发布评论