我正在尝试从pandas中的分组数据创建直方图。
到目前为止,我能够创建标准线图。 但我无法弄清楚如何做同样的直方图(条形图)。 我想得到2个年龄直方图,其中有泰坦尼克迷恋幸存者,但没有 - 看年龄分布是否有差异。
来源数据: https : //www.udacity.com/api/nodes/5454512672/supplemental_media/titanic-datacsv/download
到目前为止我的代码:
import pandas as pn titanic = pn.DataFrame.from_csv('titanic_data.csv') SurvivedAge= titanic.groupby(['Survived','Age']).size() SurvivedAge=SurvivedAge.reset_index() SurvivedAge.columns=['Survived', 'Age', 'Num'] SurvivedAge.index=(SurvivedAge['Survived']) del SurvivedAge['Survived'] SurvivedAget=SurvivedAge.reset_index().pivot('Age', 'Survived','Num') SurvivedAget.plot()当我试图从这个数据集中绘制直方图时,我得到了奇怪的结果。
SurvivedAget.hist()我很感激你的帮助。
I'm trying to create histogram from grouped data in pandas.
So far I was able to create standard line plot. But I can't figure out how to do the same to get histogram (bar chart). I would like to get 2 age histograms of persons who survived Titanic crush and who didn't - to see if there is a difference in age distribution.
Source data: https://www.udacity.com/api/nodes/5454512672/supplemental_media/titanic-datacsv/download
So far my code:
import pandas as pn titanic = pn.DataFrame.from_csv('titanic_data.csv') SurvivedAge= titanic.groupby(['Survived','Age']).size() SurvivedAge=SurvivedAge.reset_index() SurvivedAge.columns=['Survived', 'Age', 'Num'] SurvivedAge.index=(SurvivedAge['Survived']) del SurvivedAge['Survived'] SurvivedAget=SurvivedAge.reset_index().pivot('Age', 'Survived','Num') SurvivedAget.plot()when I'm trying to plot a histogram from this data set I'm getting strange results.
SurvivedAget.hist()I would be grateful for help with that.
最满意答案
您可以:
titanic = pd.read_csv('titanic_data.csv') survival_by_age = titanic.groupby(['Age', 'Survived']).size().unstack('Survived') survival_by_age.columns = ['No', 'Yes'] survival_by_age.plot.bar(title='Survival by Age')要得到:
你可以进一步调整 。 您还可以合并小数年龄,以便您可以使用整数索引,或将数据分组为5年龄跨度以获得更加用户友好的输出。 然后有一个各种类型的分布图的seaborn 。
You can:
titanic = pd.read_csv('titanic_data.csv') survival_by_age = titanic.groupby(['Age', 'Survived']).size().unstack('Survived') survival_by_age.columns = ['No', 'Yes'] survival_by_age.plot.bar(title='Survival by Age')to get:
which you can further tweak. You could also consolidate the fractional ages so you can use integer indices, or bin the data into say 5yr age spans to get more user-friendly output. And then there is seaborn with a various types of distribution plots.
更多推荐
发布评论