我有特定的情况,我想转换这个df:print df
Schoolname Attribute Value 0 xyz School Safe 3.44 1 xyz School Cleanliness 2.34 2 xyz School Money 4.65 3 abc School Safe 4.40 4 abc School Cleanliness 4.50 5 abc School Money 4.90 6 lmn School Safe 2.34 7 lmn School Cleanliness 3.89 8 lmn School Money 4.65我需要使用这种格式,以便我可以将其转换为numpy数组进行线性回归建模。
required_df: Schoolname Safe Cleanliness Money 0 xyz School 3.44 2.34 4.65 1 abc School 4.40 4.50 4.90 2 lmn School 2.34 3.89 4.65我知道我们需要做groupby('Schoolname'),但之后无法想到让行名称成为列标签,相应的值反映在required_df中。
我需要这种格式,以便我可以将它转换为numpy数组并将其作为我的X向量提供给线性回归模型。
I have specific case where I want to convert this df: print df
Schoolname Attribute Value 0 xyz School Safe 3.44 1 xyz School Cleanliness 2.34 2 xyz School Money 4.65 3 abc School Safe 4.40 4 abc School Cleanliness 4.50 5 abc School Money 4.90 6 lmn School Safe 2.34 7 lmn School Cleanliness 3.89 8 lmn School Money 4.65Which i need to get in this format so that i can convert it to numpy array for linear regression modelling.
required_df: Schoolname Safe Cleanliness Money 0 xyz School 3.44 2.34 4.65 1 abc School 4.40 4.50 4.90 2 lmn School 2.34 3.89 4.65I know we need to do groupby('Schoolname') but unable to think after that to get rows name to become column label and corresponding values reflected in required_df.
I need in this format so that I can convert it to numpy array and give it to Linear Regression model as my X vector.
最满意答案
你可以使用pd.pivot
In [171]: df.pivot(index='Schoolname', columns='Attribute', values='Value') Out[171]: Attribute Cleanliness Money Safe Schoolname abc-School 4.50 4.90 4.40 lmn-School 3.89 4.65 2.34 xyz-School 2.34 4.65 3.44或者更pd.pivot_table
In [172]: pd.pivot_table(df, values='Value', index='Schoolname', columns='Attribute') Out[172]: Attribute Cleanliness Money Safe Schoolname abc-School 4.50 4.90 4.40 lmn-School 3.89 4.65 2.34 xyz-School 2.34 4.65 3.44You could use pd.pivot
In [171]: df.pivot(index='Schoolname', columns='Attribute', values='Value') Out[171]: Attribute Cleanliness Money Safe Schoolname abc-School 4.50 4.90 4.40 lmn-School 3.89 4.65 2.34 xyz-School 2.34 4.65 3.44or more expressible pd.pivot_table
In [172]: pd.pivot_table(df, values='Value', index='Schoolname', columns='Attribute') Out[172]: Attribute Cleanliness Money Safe Schoolname abc-School 4.50 4.90 4.40 lmn-School 3.89 4.65 2.34 xyz-School 2.34 4.65 3.44更多推荐
发布评论