管理MultiIndex Dataframe对象(Managing MultiIndex Dataframe objects)

所以，使用分组DataFrame的另一个问题是我对此感到困惑...

我已经将聚合字典定义为：

aggregations_level_1 = { 'A': { 'mean': 'mean', }, 'B': { 'mean': 'mean', }, }

现在我有两个分组的DataFrame，我已经使用上面的聚合，然后加入：

grouped_top = df1.groupby(['group_lvl']).agg(aggregations_level_1) grouped_bottom = df2.groupby(['group_lvl']).agg(aggregations_level_1)

加入这些：

df3 = grouped_top.join(grouped_bottom, how='left', lsuffix='_top_10', rsuffix='_low_10') A_top_10 A_low_10 B_top_10 B_low_10 mean mean mean mean group_lvl a 3.711413 14.515901 3.711413 14.515901 b 4.024877 14.442106 3.694689 14.209040 c 3.694689 14.209040 4.024877 14.442106

现在，如果我调用索引和列，我有：

print df3.index >> Index([u'a', u'b', u'c'], dtype='object', name=u'group_lvl') print df3.columns >> MultiIndex(levels=[[u'A_top_10', u'A_low_10', u'B_top_10', u'B_low_10'], [u'mean']], labels=[[0, 1, 2, 3], [0, 0, 0, 0]])

因此，看起来我有一个带索引a,b,c的常规DataFrame对象a,b,c但每列都是一个MultiIndex对象。这是正确的解释吗？

我如何切片并调用它？说我想只为a,b,c A_top_10, A_low_10 ？对于a和c A_top_10, B_top_10只有A_top_10, B_top_10 ？

我很困惑所以任何整体帮助都会很棒！

So, yet another problem using grouped DataFrames that I am getting so confused over...

I have defined an aggregation dictionary as:

aggregations_level_1 = { 'A': { 'mean': 'mean', }, 'B': { 'mean': 'mean', }, }

And now I have two grouped DataFrames that I have aggregated using the above, then joined:

grouped_top = df1.groupby(['group_lvl']).agg(aggregations_level_1) grouped_bottom = df2.groupby(['group_lvl']).agg(aggregations_level_1)

Joining these:

df3 = grouped_top.join(grouped_bottom, how='left', lsuffix='_top_10', rsuffix='_low_10') A_top_10 A_low_10 B_top_10 B_low_10 mean mean mean mean group_lvl a 3.711413 14.515901 3.711413 14.515901 b 4.024877 14.442106 3.694689 14.209040 c 3.694689 14.209040 4.024877 14.442106

Now, if I call index and columns I have:

print df3.index >> Index([u'a', u'b', u'c'], dtype='object', name=u'group_lvl') print df3.columns >> MultiIndex(levels=[[u'A_top_10', u'A_low_10', u'B_top_10', u'B_low_10'], [u'mean']], labels=[[0, 1, 2, 3], [0, 0, 0, 0]])

So, it looks as though I have a regular DataFrame-object with index a,b,c but each column is a MultiIndex-object. Is this a correct interpretation?

How do I slice and call this? Say I would like to have only A_top_10, A_low_10 for all a,b,c? Only A_top_10, B_top_10 for a and c?

I am pretty confused so any overall help would be great!

最满意答案

需要切片器，但首先按sort_index排序列，否则出错：

UnsortedIndexError：'MultiIndex Slicing要求索引是完全lexsorted元组len（1），lexsort depth（0）'

df = df.sort_index(axis=1) idx = pd.IndexSlice df1 = df.loc[:, idx[['A_low_10', 'A_top_10'], :]] print (df1) A_low_10 A_top_10 mean mean group_lvl a 14.515901 3.711413 b 14.442106 4.024877 c 14.209040 3.694689

和：

idx = pd.IndexSlice df2 = df.loc[['a','c'], idx[['A_top_10', 'B_top_10'], :]] print (df2) A_top_10 B_top_10 mean mean group_lvl a 3.711413 3.711413 c 3.694689 4.024877

编辑：

因此，看起来我有一个带索引a，b，c的常规DataFrame对象，但每列都是一个MultiIndex对象。这是正确的解释吗？

我认为非常接近，更好的是我在columns有MultiIndex 。

Need slicers, but first sort columns by sort_index else error:

UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (1), lexsort depth (0)'

df = df.sort_index(axis=1) idx = pd.IndexSlice df1 = df.loc[:, idx[['A_low_10', 'A_top_10'], :]] print (df1) A_low_10 A_top_10 mean mean group_lvl a 14.515901 3.711413 b 14.442106 4.024877 c 14.209040 3.694689

And:

idx = pd.IndexSlice df2 = df.loc[['a','c'], idx[['A_top_10', 'B_top_10'], :]] print (df2) A_top_10 B_top_10 mean mean group_lvl a 3.711413 3.711413 c 3.694689 4.024877

EDIT: