所以,使用分组DataFrame的另一个问题是我对此感到困惑...
我已经将聚合字典定义为:
aggregations_level_1 = { 'A': { 'mean': 'mean', }, 'B': { 'mean': 'mean', }, }现在我有两个分组的DataFrame,我已经使用上面的聚合,然后加入:
grouped_top = df1.groupby(['group_lvl']).agg(aggregations_level_1) grouped_bottom = df2.groupby(['group_lvl']).agg(aggregations_level_1)加入这些:
df3 = grouped_top.join(grouped_bottom, how='left', lsuffix='_top_10', rsuffix='_low_10') A_top_10 A_low_10 B_top_10 B_low_10 mean mean mean mean group_lvl a 3.711413 14.515901 3.711413 14.515901 b 4.024877 14.442106 3.694689 14.209040 c 3.694689 14.209040 4.024877 14.442106现在,如果我调用索引和列,我有:
print df3.index >> Index([u'a', u'b', u'c'], dtype='object', name=u'group_lvl') print df3.columns >> MultiIndex(levels=[[u'A_top_10', u'A_low_10', u'B_top_10', u'B_low_10'], [u'mean']], labels=[[0, 1, 2, 3], [0, 0, 0, 0]])因此,看起来我有一个带索引a,b,c的常规DataFrame对象a,b,c但每列都是一个MultiIndex对象。 这是正确的解释吗?
我如何切片并调用它? 说我想只为a,b,c A_top_10, A_low_10 ? 对于a和c A_top_10, B_top_10只有A_top_10, B_top_10 ?我很困惑所以任何整体帮助都会很棒!
So, yet another problem using grouped DataFrames that I am getting so confused over...
I have defined an aggregation dictionary as:
aggregations_level_1 = { 'A': { 'mean': 'mean', }, 'B': { 'mean': 'mean', }, }And now I have two grouped DataFrames that I have aggregated using the above, then joined:
grouped_top = df1.groupby(['group_lvl']).agg(aggregations_level_1) grouped_bottom = df2.groupby(['group_lvl']).agg(aggregations_level_1)Joining these:
df3 = grouped_top.join(grouped_bottom, how='left', lsuffix='_top_10', rsuffix='_low_10') A_top_10 A_low_10 B_top_10 B_low_10 mean mean mean mean group_lvl a 3.711413 14.515901 3.711413 14.515901 b 4.024877 14.442106 3.694689 14.209040 c 3.694689 14.209040 4.024877 14.442106Now, if I call index and columns I have:
print df3.index >> Index([u'a', u'b', u'c'], dtype='object', name=u'group_lvl') print df3.columns >> MultiIndex(levels=[[u'A_top_10', u'A_low_10', u'B_top_10', u'B_low_10'], [u'mean']], labels=[[0, 1, 2, 3], [0, 0, 0, 0]])So, it looks as though I have a regular DataFrame-object with index a,b,c but each column is a MultiIndex-object. Is this a correct interpretation?
How do I slice and call this? Say I would like to have only A_top_10, A_low_10 for all a,b,c? Only A_top_10, B_top_10 for a and c?I am pretty confused so any overall help would be great!
最满意答案
需要切片器 ,但首先按sort_index排序列,否则出错:
df = df.sort_index(axis=1) idx = pd.IndexSlice df1 = df.loc[:, idx[['A_low_10', 'A_top_10'], :]] print (df1) A_low_10 A_top_10 mean mean group_lvl a 14.515901 3.711413 b 14.442106 4.024877 c 14.209040 3.694689UnsortedIndexError:'MultiIndex Slicing要求索引是完全lexsorted元组len(1),lexsort depth(0)'
和:
idx = pd.IndexSlice df2 = df.loc[['a','c'], idx[['A_top_10', 'B_top_10'], :]] print (df2) A_top_10 B_top_10 mean mean group_lvl a 3.711413 3.711413 c 3.694689 4.024877编辑:
因此,看起来我有一个带索引a,b,c的常规DataFrame对象,但每列都是一个MultiIndex对象。 这是正确的解释吗?
我认为非常接近,更好的是我在columns有MultiIndex 。
Need slicers, but first sort columns by sort_index else error:
df = df.sort_index(axis=1) idx = pd.IndexSlice df1 = df.loc[:, idx[['A_low_10', 'A_top_10'], :]] print (df1) A_low_10 A_top_10 mean mean group_lvl a 14.515901 3.711413 b 14.442106 4.024877 c 14.209040 3.694689UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (1), lexsort depth (0)'
And:
idx = pd.IndexSlice df2 = df.loc[['a','c'], idx[['A_top_10', 'B_top_10'], :]] print (df2) A_top_10 B_top_10 mean mean group_lvl a 3.711413 3.711413 c 3.694689 4.024877EDIT:
So, it looks as though I have a regular DataFrame-object with index a,b,c but each column is a MultiIndex-object. Is this a correct interpretation?
I think very close, better is say I have MultiIndex in columns.
更多推荐
发布评论