所以我有一个数据框:
df = pd.DataFrame([["foo","fizz",1],["foo","fizz",2],["foo","buzz",3],["foo","buzz",4],["bar","fizz",6],["bar","buzz",8]],columns=["a","b","c"]) a b c 0 foo fizz 1 1 foo fizz 2 2 foo buzz 3 3 foo buzz 4 4 bar fizz 6 5 bar buzz 8我可以分组:
df2 = df.groupby(["a","b"]).sum() c a b bar buzz 8 fizz 6 foo buzz 7 fizz 3哪个很棒!但是我真正需要的是两列,而不是"c"列:"foo"和"bar":
Which is awesome! But what I really need, instead of the "c" column is two columns, "foo" and "bar":
foo bar b buzz 7 8 fizz 3 6有人可以建议一种方法吗?我尝试搜索,但是我想我没有正确的术语,所以我什么也找不到.
Can someone suggest a way to do this? I tried searching, but I guess I don't have the correct terminology for this so I couldn't find anything.
推荐答案您可以为此使用unstack:
df2.unstack(level='a')示例:
In [146]: df2.unstack(level='a') Out[146]: c a bar foo b buzz 8 7 fizz 6 3之后,您将获得多索引列.如果需要获取平面数据框,则可以使用multiindex的droplevel:
After that you'll get multiindexed columns. If you need to get flat dataframe you could use droplevel of multiindex:
df3 = df2.unstack(level='a') df3.columns = df3.columns.droplevel() In [177]: df3 Out[177]: a bar foo b buzz 8 7 fizz 6 3编辑
droplevel从MultiIndex降低级别,该列在unstack之后变为.默认情况下,它删除级别0,这是该数据帧所需的级别.
droplevel drops level from MultiIndex which your columns become after unstack. By default it drops level 0 which is what you need for that dataframe.
从help(pd.core.index.MultiIndex.droplevel)复制:
关于pandas.core.index模块中的功能 droplevel 的帮助:
下降级别(自身,级别= 0) 返回索引,删除了请求的级别.如果MultiIndex只有2 级别,结果将是索引类型而不是MultiIndex.
droplevel(self, level=0) Return Index with requested level removed. If MultiIndex has only 2 levels, the result will be of Index type not MultiIndex.
Parameters ---------- level : int/level name or list thereof Notes ----- Does not check if result index is unique or not Returns ------- index : Index or MultiIndex
更多推荐
将多级索引的一个级别拆分为多个列
发布评论