pandas简单教程

编程入门 行业动态 更新时间:2024-10-07 10:17:46
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

对象创建

通过传入一些值的列表来创建一个Series, Pandas会自动创建一个默认的整数索引:

s = pd.Series([1,3,5,np.nan,6,8])
s
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

通过传递带有日期时间索引和带标签列的NumPy数组来创建DataFrame:

dates = pd.date_range('20130101',periods=6)
dates
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df
ABCD2013-01-01-0.8289480.2817650.8036920.0300162013-01-020.4182121.5375280.4077420.6254492013-01-030.746757-0.338140-0.734583-2.3771162013-01-04-0.507705-0.409561-2.5962860.4649932013-01-05-0.154101-0.675057-0.747016-0.1920822013-01-060.892789-1.8483130.8974340.157656

通过传递可以转化为类似Series的dict对象来创建DataFrame:

df2 = pd.DataFrame({ 'A' : 1.,'B' : pd.Timestamp('20130102'),'C' : pd.Series(1,index=list(range(4)),dtype='float32'),'D' : np.array([3] * 4,dtype='int32'),'E' : pd.Categorical(["test","train","test","train"]),'F' : 'foo' })
df2
ABCDEF01.02013-01-021.03testfoo11.02013-01-021.03trainfoo21.02013-01-021.03testfoo31.02013-01-021.03trainfoo
df2.dtypes
A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

查看数据

df.head()
ABCD2013-01-01-0.8289480.2817650.8036920.0300162013-01-020.4182121.5375280.4077420.6254492013-01-030.746757-0.338140-0.734583-2.3771162013-01-04-0.507705-0.409561-2.5962860.4649932013-01-05-0.154101-0.675057-0.747016-0.192082
df.tail(2)
ABCD2013-01-05-0.154101-0.675057-0.747016-0.1920822013-01-060.892789-1.8483130.8974340.157656

显示索引、列和底层NumPy数据:

df.index
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04','2013-01-05', '2013-01-06'],dtype='datetime64[ns]', freq='D')
df.columns
Index(['A', 'B', 'C', 'D'], dtype='object')
df.values
array([[-0.82894761,  0.28176527,  0.80369199,  0.03001636],[ 0.41821203,  1.53752828,  0.40774162,  0.62544912],[ 0.74675688, -0.33814015, -0.73458287, -2.3771161 ],[-0.5077046 , -0.4095612 , -2.59628619,  0.46499331],[-0.15410053, -0.67505665, -0.74701636, -0.19208195],[ 0.89278944, -1.84831322,  0.8974336 ,  0.15765575]])
df.describe()#显示数据的快速统计摘要
ABCDcount6.0000006.0000006.0000006.000000mean0.094501-0.241963-0.328170-0.215181std0.6992431.1176901.3273841.099356min-0.828948-1.848313-2.596286-2.37711625%-0.419304-0.608683-0.743908-0.13655750%0.132056-0.373851-0.1634210.09383675%0.6646210.1267890.7047040.388159max0.8927891.5375280.8974340.625449
df.T
2013-01-01 00:00:002013-01-02 00:00:002013-01-03 00:00:002013-01-04 00:00:002013-01-05 00:00:002013-01-06 00:00:00A-0.8289480.4182120.746757-0.507705-0.1541010.892789B0.2817651.537528-0.338140-0.409561-0.675057-1.848313C0.8036920.407742-0.734583-2.596286-0.7470160.897434D0.0300160.625449-2.3771160.464993-0.1920820.157656
print( df.sort_index(axis=1, ascending=False))
print( df.sort_index(axis=0, ascending=False))
print( df.sort_index(axis=1, ascending=True))
print( df.sort_index(axis=0, ascending=True))
                   D         C         B         A
2013-01-01  0.030016  0.803692  0.281765 -0.828948
2013-01-02  0.625449  0.407742  1.537528  0.418212
2013-01-03 -2.377116 -0.734583 -0.338140  0.746757
2013-01-04  0.464993 -2.596286 -0.409561 -0.507705
2013-01-05 -0.192082 -0.747016 -0.675057 -0.154101
2013-01-06  0.157656  0.897434 -1.848313  0.892789A         B         C         D
2013-01-06  0.892789 -1.848313  0.897434  0.157656
2013-01-05 -0.154101 -0.675057 -0.747016 -0.192082
2013-01-04 -0.507705 -0.409561 -2.596286  0.464993
2013-01-03  0.746757 -0.338140 -0.734583 -2.377116
2013-01-02  0.418212  1.537528  0.407742  0.625449
2013-01-01 -0.828948  0.281765  0.803692  0.030016A         B         C         D
2013-01-01 -0.828948  0.281765  0.803692  0.030016
2013-01-02  0.418212  1.537528  0.407742  0.625449
2013-01-03  0.746757 -0.338140 -0.734583 -2.377116
2013-01-04 -0.507705 -0.409561 -2.596286  0.464993
2013-01-05 -0.154101 -0.675057 -0.747016 -0.192082
2013-01-06  0.892789 -1.848313  0.897434  0.157656A         B         C         D
2013-01-01 -0.828948  0.281765  0.803692  0.030016
2013-01-02  0.418212  1.537528  0.407742  0.625449
2013-01-03  0.746757 -0.338140 -0.734583 -2.377116
2013-01-04 -0.507705 -0.409561 -2.596286  0.464993
2013-01-05 -0.154101 -0.675057 -0.747016 -0.192082
2013-01-06  0.892789 -1.848313  0.897434  0.157656
 df.sort_values(by='B')#按值排序
ABCD2013-01-060.892789-1.8483130.8974340.1576562013-01-05-0.154101-0.675057-0.747016-0.1920822013-01-04-0.507705-0.409561-2.5962860.4649932013-01-030.746757-0.338140-0.734583-2.3771162013-01-01-0.8289480.2817650.8036920.0300162013-01-020.4182121.5375280.4077420.625449

选择

df['A']#选择一个列,产生一个“Series”,相当于“df.A”
2013-01-01   -0.828948
2013-01-02    0.418212
2013-01-03    0.746757
2013-01-04   -0.507705
2013-01-05   -0.154101
2013-01-06    0.892789
Freq: D, Name: A, dtype: float64
df[0:3]#通过[ ]选择,对行进行切片
ABCD2013-01-01-0.8289480.2817650.8036920.0300162013-01-020.4182121.5375280.4077420.6254492013-01-030.746757-0.338140-0.734583-2.377116
df['20130102':'20130104']
ABCD2013-01-020.4182121.5375280.4077420.6254492013-01-030.746757-0.338140-0.734583-2.3771162013-01-04-0.507705-0.409561-2.5962860.464993
 df.loc[dates[0]]#通过标签获取一行数据
A   -0.828948
B    0.281765
C    0.803692
D    0.030016
Name: 2013-01-01 00:00:00, dtype: float64
 df.loc['20130102':'20130104',['A','B']]
AB2013-01-020.4182121.5375282013-01-030.746757-0.3381402013-01-04-0.507705-0.409561
 df.loc['20130102',['A','B']]#减小返回对象的大小
A    0.418212
B    1.537528
Name: 2013-01-02 00:00:00, dtype: float64
 df.at[dates[0],'A']#获取标量值:
-0.8289476073976824
 df.at[dates[0],'A']#快速访问标量
-0.8289476073976824
df.iloc[3]#通过传递的整数的位置选择
A   -0.507705
B   -0.409561
C   -2.596286
D    0.464993
Name: 2013-01-04 00:00:00, dtype: float64
df.iloc[3:5,0:2]#通过整数切片
AB2013-01-04-0.507705-0.4095612013-01-05-0.154101-0.675057
df.iloc[[1,2,4],[0,2]]#通过传递整数的列表按位置切片
AC2013-01-020.4182120.4077422013-01-030.746757-0.7345832013-01-05-0.154101-0.747016
 df.iloc[1,1]#获取具体值
1.5375282822642125

布尔索引

df[df.A > 0]
ABCD2013-01-020.4182121.5375280.4077420.6254492013-01-030.746757-0.338140-0.734583-2.3771162013-01-060.892789-1.8483130.8974340.157656
df[df > 0]
ABCD2013-01-01NaN0.2817650.8036920.0300162013-01-020.4182121.5375280.4077420.6254492013-01-030.746757NaNNaNNaN2013-01-04NaNNaNNaN0.4649932013-01-05NaNNaNNaNNaN2013-01-060.892789NaN0.8974340.157656

赋值

s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))
#添加新列将自动根据索引对齐数据
s1
2013-01-02    1
2013-01-03    2
2013-01-04    3
2013-01-05    4
2013-01-06    5
2013-01-07    6
Freq: D, dtype: int64
df['F'] = s1
df.at[dates[0],'A'] = 0#通过标签赋值
df.iat[0,1] = 0#通过位置赋值
df.loc[:,'D'] = np.array([5] * len(df))
df
ABCDF2013-01-010.0000000.0000000.8036925NaN2013-01-020.4182121.5375280.40774251.02013-01-030.746757-0.338140-0.73458352.02013-01-04-0.507705-0.409561-2.59628653.02013-01-05-0.154101-0.675057-0.74701654.02013-01-060.892789-1.8483130.89743455.0
df2 = df.copy()
df2[df2 > 0] = -df2
df2
ABCDF2013-01-010.0000000.000000-0.803692-5NaN2013-01-02-0.418212-1.537528-0.407742-5-1.02013-01-03-0.746757-0.338140-0.734583-5-2.02013-01-04-0.507705-0.409561-2.596286-5-3.02013-01-05-0.154101-0.675057-0.747016-5-4.02013-01-06-0.892789-1.848313-0.897434-5-5.0

插入

df.loc['new']=[1,2,3,4,5]
df
ABCDF2013-01-01 00:00:000.0000000.0000000.8036925NaN2013-01-02 00:00:000.4182121.5375280.40774251.02013-01-03 00:00:000.746757-0.338140-0.73458352.02013-01-04 00:00:00-0.507705-0.409561-2.59628653.02013-01-05 00:00:00-0.154101-0.675057-0.74701654.02013-01-06 00:00:000.892789-1.8483130.89743455.0new1.0000002.0000003.00000045.0
df3=pd.DataFrame([6,6,6,6,6]).T# 修改df4的column和df3的一致
df3.columns = df.columns
# 把两个dataframe合并,需要设置 ignore_index=True
df_new = pd.concat([df,df3],ignore_index=True)
df_new
ABCDF00.0000000.0000000.8036925NaN10.4182121.5375280.40774251.020.746757-0.338140-0.73458352.03-0.507705-0.409561-2.59628653.04-0.154101-0.675057-0.74701654.050.892789-1.8483130.89743455.061.0000002.0000003.00000045.076.0000006.0000006.00000066.0

统计

df.mean()#平均值
A    0.342279
B    0.038065
C    0.147283
D    4.857143
F    3.333333
dtype: float64
df.mean(1)
2013-01-01 00:00:00    1.450923
2013-01-02 00:00:00    1.672696
2013-01-03 00:00:00    1.334807
2013-01-04 00:00:00    0.897290
2013-01-05 00:00:00    1.484765
2013-01-06 00:00:00    1.988382
new                    3.000000
dtype: float64
df.sum()
A     2.395953
B     0.266457
C     1.030982
D    34.000000
F    20.000000
dtype: float64
df.sum(1)
2013-01-01 00:00:00     5.803692
2013-01-02 00:00:00     8.363482
2013-01-03 00:00:00     6.674034
2013-01-04 00:00:00     4.486448
2013-01-05 00:00:00     7.423826
2013-01-06 00:00:00     9.941910
new                    15.000000
dtype: float64
df.var()#f方差
A    0.331841
B    1.751315
C    3.050677
D    0.142857
F    2.666667
dtype: float64
df.std()#标准差
df.corr()#相关系数
df.cov()#协方差
df.describe()#基本情况

统计作图

import matplotlib.pyplot as plt #导入作图库
plt.rcParams['font.sans-serif'] = ['SimHei']#用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False#正常显示负号
plt.figure(figsize = (7,5))#创建作图区域,指定比例
<matplotlib.figure.Figure at 0x229f4e7ee10>

plt.plot(x, y, s)

这是 Matplotlib通用的绘图方式,绘制y对于x(即以x为横轴的二维图形),字符串参量S指定绘制时图形的类型、样式和颜色,常用的选项有:'b’为蓝色、'r’为红色、'g’为绿色、‘o’为圆圈、’+‘为加号标记、’-‘为实线、’–'为虚线。当x、y均为实数同维向量时,则描出点(x(i),y(i)),然后用直线依次相连。

plt.plot(kind=box)

这里使用的是 DataFrame或 Series对象内置的方法作图,默认以 Index为横坐标,每列数据为纵坐标自动作图,通过kind参数指定作图类型,支持line(线)、bar(条形)barh、hist(直方图)、box(箱线图)、kde(密度图)和area、pie(饼图)等,同时也能够接受 plt.plot()中接受的参数。因此,如果数据已经被加载为 Pandas中的对象,那么以这种方式作图是比较简洁的。

x=np.linspace(0,2*np.pi,50) #x坐标
y=np.sin(x)#
plt.plot(x,y,'bp--')
plt.show()

plt.pie(size)

使用Matplotlib绘制饼图,其中size是一个列表,记录各个扇形的比例。pie有丰富的参数.

import matplotlib.pyplot as plt
# The slices will be ordered and plotted counter-clockwise.
labels= 'Frogs', 'Hogs','Dogs', 'Logs' #定义标签
sizes= [15, 30, 45, 10] #每一块的比例
colors=['yellowgreen', 'gold', 'lightskyblue', 'lightcoral'] #每一块的颜色
explode= (0, 0.1, 0, 0) #突出显示,这里仅仅突出显示第二块(即,Hogs' )
plt.pie (sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%',
shadow=True, startangle=90)
plt.axis ('equal') #显示为圆(避免比例压缩为椭圆)
plt.show()

更多推荐

简单,教程,pandas

本文发布于:2023-05-28 00:47:08,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/308227.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:简单   教程   pandas

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!