如何在groupby之后将数据帧中的行值转换为Python中的列标签？(How to convert rows values in dataframe to columns labels in Pyt

如何在groupby之后将数据帧中的行值转换为Python中的列标签？(How to convert rows values in dataframe to columns labels in Python after groupby?)

我有特定的情况，我想转换这个df：print df

Schoolname Attribute Value 0 xyz School Safe 3.44 1 xyz School Cleanliness 2.34 2 xyz School Money 4.65 3 abc School Safe 4.40 4 abc School Cleanliness 4.50 5 abc School Money 4.90 6 lmn School Safe 2.34 7 lmn School Cleanliness 3.89 8 lmn School Money 4.65

我需要使用这种格式，以便我可以将其转换为numpy数组进行线性回归建模。

required_df: Schoolname Safe Cleanliness Money 0 xyz School 3.44 2.34 4.65 1 abc School 4.40 4.50 4.90 2 lmn School 2.34 3.89 4.65

我知道我们需要做groupby（'Schoolname'），但之后无法想到让行名称成为列标签，相应的值反映在required_df中。

我需要这种格式，以便我可以将它转换为numpy数组并将其作为我的X向量提供给线性回归模型。

I have specific case where I want to convert this df: print df

Schoolname Attribute Value 0 xyz School Safe 3.44 1 xyz School Cleanliness 2.34 2 xyz School Money 4.65 3 abc School Safe 4.40 4 abc School Cleanliness 4.50 5 abc School Money 4.90 6 lmn School Safe 2.34 7 lmn School Cleanliness 3.89 8 lmn School Money 4.65

Which i need to get in this format so that i can convert it to numpy array for linear regression modelling.

required_df: Schoolname Safe Cleanliness Money 0 xyz School 3.44 2.34 4.65 1 abc School 4.40 4.50 4.90 2 lmn School 2.34 3.89 4.65

I know we need to do groupby('Schoolname') but unable to think after that to get rows name to become column label and corresponding values reflected in required_df.

I need in this format so that I can convert it to numpy array and give it to Linear Regression model as my X vector.

最满意答案

你可以使用pd.pivot

In [171]: df.pivot(index='Schoolname', columns='Attribute', values='Value') Out[171]: Attribute Cleanliness Money Safe Schoolname abc-School 4.50 4.90 4.40 lmn-School 3.89 4.65 2.34 xyz-School 2.34 4.65 3.44

或者更pd.pivot_table

In [172]: pd.pivot_table(df, values='Value', index='Schoolname', columns='Attribute') Out[172]: Attribute Cleanliness Money Safe Schoolname abc-School 4.50 4.90 4.40 lmn-School 3.89 4.65 2.34 xyz-School 2.34 4.65 3.44

You could use pd.pivot

In [171]: df.pivot(index='Schoolname', columns='Attribute', values='Value') Out[171]: Attribute Cleanliness Money Safe Schoolname abc-School 4.50 4.90 4.40 lmn-School 3.89 4.65 2.34 xyz-School 2.34 4.65 3.44

or more expressible pd.pivot_table

In [172]: pd.pivot_table(df, values='Value', index='Schoolname', columns='Attribute') Out[172]: Attribute Cleanliness Money Safe Schoolname abc-School 4.50 4.90 4.40 lmn-School 3.89 4.65 2.34 xyz-School 2.34 4.65 3.44

更多推荐