python时间序列-----中篇---python进行数据分析 period 、timestamp、 periodindex、 date

admin管理员组
文章数量:1595877

时期及其算数运算

时期的频率转换

按季度计算的时期频率

将Timestamp转换为Period(及其反向过程)

通过数组创建PeriodIndex

重采样及频率转换

降采样

OHLC重采样

通过groupby进行重采样

升采样和插值

通过时期进行重采样

时期及其算数运算

时期-period 表示的是时间区间，比如数日，数月，数季，数年等。

>>> p = pd.Period(2007,freq='A-DEC')
>>> p
Period('2007', 'A-DEC')
>>> p+5,p-2
(Period('2012', 'A-DEC'), Period('2005', 'A-DEC'))

如果两个Period对象拥有相同的频率，则它们差的就是他们之间的单位数量

>>> pd.Period(2014,freq='A-DEC') - p
7L

创建时间范围

>>> rng = pd.period_range('1/1/2000','6/30/2000',freq='M')
>>> rng
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')

PeriodIndex类保存了一组Period,他可以在任何Pandas数据结构中被用作轴索引：

>>> Series(np.random.randn(6),index=rng)
2000-01    1.784802
2000-02   -1.429045
2000-03   -0.145070
2000-04   -0.443552
2000-05    0.242232
2000-06   -0.442667
Freq: M, dtype: float64

PeriodIndex类的构造函数还允许直接使用一组字符串：

>>> value = ['2001Q3','2002Q2','2003Q3']
>>> index = pd.PeriodIndex(value,freq='Q-DEC')
>>> index
PeriodIndex(['2001Q3', '2002Q2', '2003Q3'], dtype='period[Q-DEC]', freq='Q-DEC')

时期的频率转换

>>> p = pd.Period('2007',freq='A-DEC')
>>> p.asfreq('M',how='start')
Period('2007-01', 'M')
>>> p.asfreq('M',how='end')
Period('2007-12', 'M')

对于月度子时期归属情况就不一样了

>>> p = pd.Period('2007','A-JUN')
>>> p.asfreq('M','start')
Period('2006-07', 'M')
>>> p.asfreq('M','end')
Period('2007-06', 'M')

超时期-superperiod 是由子时期-subperiod所属的位置决定的

>>> p = pd.Period('2007-08','M')
>>> p.asfreq('A-JUN')
Period('2008', 'A-JUN')

>>> rng = pd.period_range('2006','2009',freq='A-DEC')
>>> ts = Series(np.random.randn(len(rng)),index=rng)
>>> ts
2006   -1.672370
2007   -1.303141
2008    0.470823
2009    0.318962
Freq: A-DEC, dtype: float64
>>> ts.asfreq('M',how='start')
2006-01   -1.672370
2007-01   -1.303141
2008-01    0.470823
2009-01    0.318962
Freq: M, dtype: float64
>>> ts.asfreq('B',how='end')
2006-12-29   -1.672370
2007-12-31   -1.303141
2008-12-31    0.470823
2009-12-31    0.318962
Freq: B, dtype: float64

按季度计算的时期频率

>>> p = pd.Period('2014Q4',freq='Q-JAN')
>>> p
Period('2014Q4', 'Q-JAN')

在一月结束的财年中，2012Q4是从11月到1月开始

>>> p = pd.Period('2012Q4',freq='Q-JAN')
>>> p.asfreq('D','start')
Period('2011-11-01', 'D')
>>> p.asfreq('D','end')
Period('2012-01-31', 'D')

获得该季度倒数第二个工作日下午4点的时间戳

>>> p4pm = (p.asfreq('B','e') -1).asfreq('T','s') + 16*60
>>> p4pm
Period('2012-01-30 16:00', 'T')
>>> p4pm.to_timestamp()
Timestamp('2012-01-30 16:00:00')

>>> rng = pd.period_range('2011Q3','2012Q4',freq='Q-JAN')
>>> ts = Series(np.arange(len(rng)),index=rng)
>>> ts
2011Q3    0
2011Q4    1
2012Q1    2
2012Q2    3
2012Q3    4
2012Q4    5
Freq: Q-JAN, dtype: int32
>>> new_rng = (rng.asfreq('B','e')-1).asfreq('T','s') + 16*60
>>> ts.index = new_rng.to_timestamp()
>>> ts
2010-10-28 16:00:00    0
2011-01-28 16:00:00    1
2011-04-28 16:00:00    2
2011-07-28 16:00:00    3
2011-10-28 16:00:00    4
2012-01-30 16:00:00    5
dtype: int32

将Timestamp转换为Period(及其反向过程)

>>> rng = pd.date_range('1/1/2000',periods=3,freq='M')
>>> ts = Series(np.random.randn(3),index=rng)
>>> pts = ts.to_period()
>>> ts
2000-01-31    0.008677
2000-02-29    1.061882
2000-03-31    0.529983
Freq: M, dtype: float64
>>> pts
2000-01    0.008677
2000-02    1.061882
2000-03    0.529983
Freq: M, dtype: float64

转换为时间戳

>>> pts.to_timestamp(how='end')
2000-01-31    0.008677
2000-02-29    1.061882
2000-03-31    0.529983
Freq: M, dtype: float64

通过数组创建PeriodIndex

>>> data = pd.read_csv('D:\python\DataAnalysis\data\macrodata.csv')
>>> data.year[:5]
0    1959.0
1    1959.0
2    1959.0
3    1959.0
4    1960.0
Name: year, dtype: float64
>>> data.quarter[:5]
0    1.0
1    2.0
2    3.0
3    4.0
4    1.0
Name: quarter, dtype: float64

>>> index = pd.PeriodIndex(year = data.year,quarter = data.quarter,freq='Q-DEC')
>>> index[:5]
PeriodIndex(['1959Q1', '1959Q2', '1959Q3', '1959Q4', '1960Q1'], dtype='period[Q-DEC]', freq='Q-DEC')
>>> data.index = index
>>> data.infl
1959Q1    0.00
1959Q2    2.34
1959Q3    2.74
1959Q4    0.27
1960Q1    2.31
1960Q2    0.14
1960Q3    2.70
1960Q4    1.21
1961Q1   -0.40
1961Q2    1.47
1961Q3    0.80
1961Q4    0.80
1962Q1    2.26
1962Q2    0.13
1962Q3    2.11
1962Q4    0.79
1963Q1    0.53
1963Q2    2.75
1963Q3    0.78
1963Q4    2.46
1964Q1    0.13
1964Q2    0.90
1964Q3    1.29
1964Q4    2.05
1965Q1    1.28
1965Q2    2.54
1965Q3    0.89
1965Q4    2.90
1966Q1    4.99
1966Q2    2.10
          ... 
2002Q2    1.56
2002Q3    2.66
2002Q4    3.08
2003Q1    1.31
2003Q2    1.09
2003Q3    2.60
2003Q4    3.02
2004Q1    2.35
2004Q2    3.61
2004Q3    3.58
2004Q4    2.09
2005Q1    4.15
2005Q2    1.85
2005Q3    9.14
2005Q4    0.40
2006Q1    2.60
2006Q2    3.97
2006Q3   -1.58
2006Q4    3.30
2007Q1    4.58
2007Q2    2.75
2007Q3    3.45
2007Q4    6.38
2008Q1    2.82
2008Q2    8.53
2008Q3   -3.16
2008Q4   -8.79
2009Q1    0.94
2009Q2    3.37
2009Q3    3.56
Freq: Q-DEC, Name: infl, Length: 203, dtype: float64

>>> data[:5]
          year  quarter   realgdp   ...         pop  infl  realint
1959Q1  1959.0      1.0  2710.349   ...     177.146  0.00     0.00
1959Q2  1959.0      2.0  2778.801   ...     177.830  2.34     0.74
1959Q3  1959.0      3.0  2775.488   ...     178.657  2.74     1.09
1959Q4  1959.0      4.0  2785.204   ...     179.386  0.27     4.06
1960Q1  1960.0      1.0  2847.699   ...     180.007  2.31     1.19

[5 rows x 14 columns]

重采样及频率转换

重采样-rsampling 指的是将时间序列从一个频率转换为另一个频率的处理过程。将高频率数据聚合到低频率成为降采样-upsampling。并不是所有采样都可以被划分为这两个大类中。

>>> rng = pd.date_range('1/1/2000',periods=100,freq='D')
>>> ts = Series(np.random.randn(len(rng)),index=rng)
>>> ts.resample('M',how='mean')
<string>:1: FutureWarning: how in .resample() is deprecated
the new syntax is .resample(...).mean()
2000-01-31   -0.073764
2000-02-29    0.018607
2000-03-31   -0.292522
2000-04-30    0.363046
Freq: M, dtype: float64
>>> ts.resample('M').mean()
2000-01-31   -0.073764
2000-02-29    0.018607
2000-03-31   -0.292522
2000-04-30    0.363046
Freq: M, dtype: float64

resample是一个灵活高效的方法，可用于处理非常大的时间序列。

降采样

下面是12个1分钟数据

>>> rng = pd.date_range('1/1/2000',periods=12,freq='T')
>>> ts = Series(np.arange(12),index=rng)
>>> ts
2000-01-01 00:00:00     0
2000-01-01 00:01:00     1
2000-01-01 00:02:00     2
2000-01-01 00:03:00     3
2000-01-01 00:04:00     4
2000-01-01 00:05:00     5
2000-01-01 00:06:00     6
2000-01-01 00:07:00     7
2000-01-01 00:08:00     8
2000-01-01 00:09:00     9
2000-01-01 00:10:00    10
2000-01-01 00:11:00    11
Freq: T, dtype: int32

聚合到5分钟中，默认情况下，面元的左边界是包含的，即是00:00到00:05的区间是包含00:05的

>>> ts.resample('5min').sum()
2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int32

左包含和右包含

>>> ts.resample('5min').sum()
2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int32
>>> ts.resample('5min',closed='left').sum()
2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int32
>>> ts.resample('5min',closed='right').sum()
1999-12-31 23:55:00     0
2000-01-01 00:00:00    15
2000-01-01 00:05:00    40
2000-01-01 00:10:00    11
Freq: 5T, dtype: int32

我们可以对索引做一些位移，比如从右边界减去一秒一便更容易明白该时间戳到底是哪个时区。

>>> ts.resample('5min',how='sum',loffset='-1s')
1999-12-31 23:59:59    10
2000-01-01 00:04:59    35
2000-01-01 00:09:59    21
Freq: 5T, dtype: int32

OHLC重采样

金融领域有一种无所不在的时间序列聚合方式，即是计算各面元的四个值：开盘-open，收盘-close，最高-high，最低-low

>>> ts.resample('5min').ohlc()
                     open  high  low  close
2000-01-01 00:00:00     0     4    0      4
2000-01-01 00:05:00     5     9    5      9
2000-01-01 00:10:00    10    11   10     11

通过groupby进行重采样

>>> rng = pd.date_range('1/1/2000',periods=100,freq='D')
>>> ts = Series(np.arange(100),index=rng)
>>> ts.groupby(lambda x:x.month).mean()
1    15
2    45
3    75
4    95
dtype: int32
>>> ts.groupby(lambda x: x.weekday).mean()
0    47.5
1    48.5
2    49.5
3    50.5
4    51.5
5    49.0
6    50.0
dtype: float64

升采样和插值

将数据从低频率转换到高频率时，就不需要聚合了。

>>> frame = DataFrame(np.random.randn(2,4),index = pd.date_range('1/1/2000',periods=2,freq='W-WED'),columns=['Colorado','Texas','New York','Ohio'])
>>> frame[:5]
            Colorado     Texas  New York      Ohio
2000-01-05 -0.581785 -0.270777  0.885574 -1.072725
2000-01-12 -1.487840  1.603010  0.860155  0.157164

将其重采样到日频率，默认会引入缺失值；

>>> df_daily = frame.resample('D')
>>> df_daily
DatetimeIndexResampler [freq=<Day>, axis=0, closed=left, label=left, convention=start, base=0]
>>> df_daily.count()
            Colorado  Texas  New York  Ohio
2000-01-05         1      1         1     1
2000-01-06         0      0         0     0
2000-01-07         0      0         0     0
2000-01-08         0      0         0     0
2000-01-09         0      0         0     0
2000-01-10         0      0         0     0
2000-01-11         0      0         0     0
2000-01-12         1      1         1     1

>>> frame.resample('D').ffill()
            Colorado     Texas  New York      Ohio
2000-01-05 -0.581785 -0.270777  0.885574 -1.072725
2000-01-06 -0.581785 -0.270777  0.885574 -1.072725
2000-01-07 -0.581785 -0.270777  0.885574 -1.072725
2000-01-08 -0.581785 -0.270777  0.885574 -1.072725
2000-01-09 -0.581785 -0.270777  0.885574 -1.072725
2000-01-10 -0.581785 -0.270777  0.885574 -1.072725
2000-01-11 -0.581785 -0.270777  0.885574 -1.072725

通过时期进行重采样

>>> frame = DataFrame(np.random.randn(24,4),index=pd.period_range('1-2000','12-2001',freq='M'),columns=['Colorado','Texas','New York','Ohio'])
>>> frame[:5]
         Colorado     Texas  New York      Ohio
2000-01 -0.871782  1.125858 -0.545824 -1.135282
2000-02  0.197301 -0.636073 -0.937487 -0.661491
2000-03  0.026330 -2.000660 -1.352893  1.083992
2000-04  1.183148  1.761140  0.272041  0.300900
2000-05  0.275420 -0.064059  1.472698  0.471408
>>> annual_frame = frame.resample('A-DEC',how='mean')
>>> annual_frame
      Colorado     Texas  New York      Ohio
2000  0.742654  0.088633 -0.097009  0.139006
2001  0.068373 -0.373837  0.050390  0.254034

升采样要稍微麻烦一些，

>>> annual_frame.resample('Q-DEC').ffill()
        Colorado     Texas  New York      Ohio
2000Q1  0.742654  0.088633 -0.097009  0.139006
2000Q2  0.742654  0.088633 -0.097009  0.139006
2000Q3  0.742654  0.088633 -0.097009  0.139006
2000Q4  0.742654  0.088633 -0.097009  0.139006
2001Q1  0.068373 -0.373837  0.050390  0.254034
2001Q2  0.068373 -0.373837  0.050390  0.254034
2001Q3  0.068373 -0.373837  0.050390  0.254034
2001Q4  0.068373 -0.373837  0.050390  0.254034
>>> annual_frame.resample('Q-DEC',convention='start',fill_method='ffill')
        Colorado     Texas  New York      Ohio
2000Q1  0.742654  0.088633 -0.097009  0.139006
2000Q2  0.742654  0.088633 -0.097009  0.139006
2000Q3  0.742654  0.088633 -0.097009  0.139006
2000Q4  0.742654  0.088633 -0.097009  0.139006
2001Q1  0.068373 -0.373837  0.050390  0.254034
2001Q2  0.068373 -0.373837  0.050390  0.254034
2001Q3  0.068373 -0.373837  0.050390  0.254034
2001Q4  0.068373 -0.373837  0.050390  0.254034

由于时期指的是时间区间，所以升采样和将采样的规格就比较严格

在降采样中，目标频率必须是源频率的子时期-subperiod

在升采样中，目标频率必须是源频率的超时期-superperiod

如果不满足这些条件，就会引发异常。

本文标签：序列时间数据 Period Python

版权声明：本文标题：python时间序列-----中篇---python进行数据分析 period 、timestamp、 periodindex、 date_range、 resample、 OHLC、内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/dianzi/1728237199a1150557.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

python时间序列-----中篇---python进行数据分析 period 、timestamp、 periodindex、 date_range、 resample、 OHLC、

时期及其算数运算

时期的频率转换

按季度计算的时期频率

将Timestamp转换为Period(及其反向过程)

通过数组创建PeriodIndex

重采样及频率转换

降采样

OHLC重采样

通过groupby进行重采样

升采样和插值

通过时期进行重采样

更多相关文章

java.time period_java.time.Period

alsa sound修改--period-size上限配置

CGROUP CFS 调度中的 period，burst 概念

JMeter-Ramp-up Period解释

Jmeter之Ramp-up Period（in seconds）

Converting Phase Noise to Random Jitter(Period)

alsa中buffer、period、frame的理解

python pandas 日期计算_Pandas时间序列:时期(period)及其算术运算详解

时间间隔类之Period类和Duration类

Period.between获取到的时间和预期不符

将日数据转为周数据：to_period, resample

Java的getperiod_Java中的Period getMonths()方法

java between_Java8 Period.between方法坑及注意事项

redis的repl-ping-slave-period和repl-ping-replica-period

时序分析基本概念介绍&lt;min period&gt;

ssh The semaphore timeout period has expired.

jmeter ramp-up-period设置以及同步定时器使用

PowerBI-时间智能函数-SAMEPERIODLASTYEAR

《电子数据取证》读书笔记-第一章

python应用程序无法正常启动0xc00007b_应用程序无法正常启动0xc000007b

发表评论

推荐文章

Linux笔记，不定期更新

Ubuntu和Windows双系统服务器互相切换的简单方案

如何获得更好的交互体验_制定更好的用户体验决定的4条聪明的心理学规则

仓库管理软件哪个好？5款免费仓库管理软件对比！

Overture打谱软件免费安装下载版介绍

热门文章

【IoT】将各类遥控器（红外，频射，蓝牙，wifi，Zigbee）等设备接入米家，实现家庭物联网设备控制（以极米Z7X投影仪为例）

3G? 2G? 2.5G? 4G? 与 WIFI, GPRS,CDMA 3G无线上网

无线网卡和无线上网卡、无线路由器和无线AP、WIFI和WinMax

浅谈安科瑞智慧用电系统在电气火灾中的应用

15 款 PDF 编辑器帮助轻松编辑、合并PDF文档

QQ浏览器计算机未安装flash,Win8提示计算机没有安装Flash播放器怎么办

未root使用android auto,真正免root的Autokiller详细使用教程

免费PDF转Word？有这几个网站就够了。

解决 VC2012 ATL 多线程调用 报错：E_UNEXPECTED Catastrophic failure.

使用office这么多年，但知道什么是office365吗？

最新文章

听见丨 锤子明年将有更多智能硬件还有T3

拒绝亡羊补牢！SQL Server服务器安全防护！

《深入学习VMware vSphere 6》——第1章 vSphere虚拟化基础与规划 1.1虚拟化基础概念...

为什么说荷尔蒙推动了中国互联网？

Window Bat批处理

乌镇夜宴——程序员的江湖

备注

Zip即将复辟？免费是WinRAR的终极武器

常驻我电脑的10个软件

访谈|在网络世界捕获威胁的猎人

《Team of Teams》读书笔记--如何打造联网时代的组织架构

.net基本面试题

DOS BAT脚本学习——非常详细

Windows自动化上传文件至FTP

20145339《网络对抗技术》免杀原理与实践

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

时序分析基本概念介绍<min period>

解决 VC2012 ATL 多线程调用报错：E_UNEXPECTED Catastrophic failure.

听见丨锤子明年将有更多智能硬件还有T3

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载