Python Pandas使用说明(期货行情数据分析)

编程入门 行业动态 更新时间:2024-10-17 09:49:15

Python Pandas<a href=https://www.elefans.com/category/jswz/34/1769912.html style=使用说明(期货行情数据分析)"/>

Python Pandas使用说明(期货行情数据分析)

文章目录

  • Python Pandas使用说明(期货行情数据分析)
    • 安装pandas模块
    • 导入pandas模块
    • 从CSV文件读取数据
      • 沪金2102.csv文件
    • Futures1.py代码
      • 读取csv文件所有行(从filename变量指定的文件读取)
      • 读取csv文件指定行数
      • 打印df的所有数据
      • 列引用
        • 单列引用
        • 多列引用
    • Futures2.py代码
      • 行引用
      • 列引用的另一种方式
      • 位置引用(行列均指定)
        • 先列后行
        • 先行后列
        • loc函数
      • 条件引用
    • Futures3.py代码
      • 生成序列
      • 将文本转换成日期时间格式
      • 将'Date'列与'Time'列以空格间隔合并列,转换成datetime类型并生成序列
      • 删除列
      • 插入列
    • Futures4.py代码
      • 写CSV文件
        • 带索引列,指定列名称为'Index'
        • 不带索引列
      • 写XLSX文件
    • Futures5.py代码
      • 按标签排序sort_index()函数
      • 按值排序sort_values()函数
    • Futures6.py代码
      • rolling()函数

Python Pandas使用说明(期货行情数据分析)

安装pandas模块

pip3 install pandas

导入pandas模块

代码:

import pandas as pd

从CSV文件读取数据

df = pd.read_csv(filename)

沪金2102.csv文件

截取文件前21行如下,第1行:标题行;第2-21行:数据

Date,Time,Open,Max,Min,Close,TradeNumber,HolderNumber,Average
2020/12/23,02:27,391.8600,391.8600,391.7000,391.7000,59.0000,50041.0000,391.8169
2020/12/23,02:28,391.7000,391.9200,391.6800,391.8400,75.0000,50000.0000,391.8222
2020/12/23,02:29,391.8800,391.9200,391.6000,391.6000,131.0000,49936.0000,391.7632
2020/12/23,09:00,391.3400,391.5800,391.2200,391.5200,689.0000,49902.0000,391.4023
2020/12/23,09:01,391.5200,391.8000,391.5000,391.6000,296.0000,49869.0000,391.5945
2020/12/23,09:02,391.5800,391.8600,391.5400,391.8200,430.0000,49827.0000,391.7182
2020/12/23,09:03,391.8000,391.9200,391.6600,391.7800,268.0000,49859.0000,391.7683
2020/12/23,09:04,391.7800,391.8600,391.6000,391.7000,211.0000,49833.0000,391.7079
2020/12/23,09:05,391.6800,391.7200,391.5400,391.6000,244.0000,49834.0000,391.6280
2020/12/23,09:06,391.6200,391.6200,391.5000,391.5800,188.0000,49842.0000,391.5675
2020/12/23,09:07,391.5600,391.6000,391.5000,391.5000,207.0000,49848.0000,391.5408
2020/12/23,09:08,391.5200,391.5200,391.4200,391.4600,169.0000,49851.0000,391.4600
2020/12/23,09:09,391.4400,391.4400,391.3400,391.4400,267.0000,49821.0000,391.3938
2020/12/23,09:10,391.4400,391.5600,391.3800,391.4800,181.0000,49825.0000,391.4754
2020/12/23,09:11,391.4800,391.6200,391.3400,391.6000,254.0000,49831.0000,391.4771
2020/12/23,09:12,391.6200,391.7400,391.5600,391.7400,134.0000,49826.0000,391.6423
2020/12/23,09:13,391.7200,391.7400,391.5600,391.5800,210.0000,49776.0000,391.6367
2020/12/23,09:14,391.5400,391.7000,391.5400,391.7000,86.0000,49766.0000,391.5933
2020/12/23,09:15,391.6800,391.9200,391.5800,391.7800,373.0000,49744.0000,391.7805
2020/12/23,09:16,391.7600,392.0200,391.7600,391.9800,513.0000,49802.0000,391.9118

Futures1.py代码

#!/usr/local/bin/python3
# -*- coding:utf-8 -*-import os
import pandas as pddef get_cwd():return os.path.dirname(__file__)def get_path():return os.path.realpath(__file__)def read_data(filename, nrows = None):try:if nrows == None:df = pd.read_csv(filename)else:df = pd.read_csv(filename, nrows = nrows)return dfexcept:print("Cannot open file %s." %filename)def print_data(df):print(df)print(df[['Date', 'Time', 'Open', 'Max', 'Min', 'Close', 'TradeNumber', 'HolderNumber']])if __name__ == '__main__':def main():filename = get_cwd() + '/1分钟/沪金2102.csv'df = read_data(filename, nrows = 10)print_data(df)main()

运行结果:

         Date   Time    Open  ...  TradeNumber  HolderNumber   Average
0  2020/12/23  02:27  391.86  ...         59.0       50041.0  391.8169
1  2020/12/23  02:28  391.70  ...         75.0       50000.0  391.8222
2  2020/12/23  02:29  391.88  ...        131.0       49936.0  391.7632
3  2020/12/23  09:00  391.34  ...        689.0       49902.0  391.4023
4  2020/12/23  09:01  391.52  ...        296.0       49869.0  391.5945
5  2020/12/23  09:02  391.58  ...        430.0       49827.0  391.7182
6  2020/12/23  09:03  391.80  ...        268.0       49859.0  391.7683
7  2020/12/23  09:04  391.78  ...        211.0       49833.0  391.7079
8  2020/12/23  09:05  391.68  ...        244.0       49834.0  391.6280
9  2020/12/23  09:06  391.62  ...        188.0       49842.0  391.5675
[10 rows x 9 columns]Date   Time    Open     Max     Min   Close  TradeNumber  HolderNumber
0  2020/12/23  02:27  391.86  391.86  391.70  391.70         59.0       50041.0
1  2020/12/23  02:28  391.70  391.92  391.68  391.84         75.0       50000.0
2  2020/12/23  02:29  391.88  391.92  391.60  391.60        131.0       49936.0
3  2020/12/23  09:00  391.34  391.58  391.22  391.52        689.0       49902.0
4  2020/12/23  09:01  391.52  391.80  391.50  391.60        296.0       49869.0
5  2020/12/23  09:02  391.58  391.86  391.54  391.82        430.0       49827.0
6  2020/12/23  09:03  391.80  391.92  391.66  391.78        268.0       49859.0
7  2020/12/23  09:04  391.78  391.86  391.60  391.70        211.0       49833.0
8  2020/12/23  09:05  391.68  391.72  391.54  391.60        244.0       49834.0
9  2020/12/23  09:06  391.62  391.62  391.50  391.58        188.0       49842.0

读取csv文件所有行(从filename变量指定的文件读取)

df = pd.read_csv(filename)

读取csv文件指定行数

读取前10行(第0至9行)数据

df = pd.read_csv(filename, nrows = 10)

打印df的所有数据

print(df)

列引用

单列引用

dataframe类型的对象使用方括号[ ]引用列数据。引用’Date’列:

df['Date']
多列引用

引用’Date’, 'Time’两列数据,说明:多列使用了列表[ ]形式,因此出现连续两个方括号。

df[['Date', 'Time']]

如下代码打印Date, Time, Open, Max, Min, Close, TradeNumber, HolderNumber所在列的数据,没有打印Average列的数据。

print(df[['Date', 'Time', 'Open', 'Max', 'Min', 'Close', 'TradeNumber', 'HolderNumber']])

Futures2.py代码

以下仅列出print_data函数与main函数的代码(与Future1.py不同的部分)

def print_data(df):print(df[:6]) # 行引用,第0至5行,前闭后开print(df[:20:5])  # 第0至19行,间隔5行取一行,即:第0、5、10、15行print(df.Open[:6]) # 'Open'列引用的另一种方式,第0至5行print(df[0:6]['Open']) # 第0至5行(0:6,前闭后开),'Open'列print(df.loc[0:5,'Open']) # 第0至5行(0:5,前后闭合),'Open'列print()print("上涨行情数据:")print(df[df['Close'] > df['Open']][['Date', 'Time', 'Open','Close','Max','Min']]) # 上涨print()print("下跌行情数据:")print(df[df['Close'] < df['Open']][['Date', 'Time', 'Open','Close','Max','Min']]) # 下跌print()print("平盘行情数据:")print(df[df['Close'] == df['Open']][['Date', 'Time', 'Open','Close','Max','Min']]) # 平盘if __name__ == '__main__':def main():filename = get_cwd() + '/1分钟/沪金2102.csv'df = read_data(filename, nrows = 20)print_data(df)main()

运行结果:

         Date   Time    Open  ...  TradeNumber  HolderNumber   Average
0  2020/12/23  02:27  391.86  ...         59.0       50041.0  391.8169
1  2020/12/23  02:28  391.70  ...         75.0       50000.0  391.8222
2  2020/12/23  02:29  391.88  ...        131.0       49936.0  391.7632
3  2020/12/23  09:00  391.34  ...        689.0       49902.0  391.4023
4  2020/12/23  09:01  391.52  ...        296.0       49869.0  391.5945
5  2020/12/23  09:02  391.58  ...        430.0       49827.0  391.7182
[6 rows x 9 columns]Date   Time    Open  ...  TradeNumber  HolderNumber   Average
0   2020/12/23  02:27  391.86  ...         59.0       50041.0  391.8169
5   2020/12/23  09:02  391.58  ...        430.0       49827.0  391.7182
10  2020/12/23  09:07  391.56  ...        207.0       49848.0  391.5408
15  2020/12/23  09:12  391.62  ...        134.0       49826.0  391.6423
[4 rows x 9 columns]
0    391.86
1    391.70
2    391.88
3    391.34
4    391.52
5    391.58
Name: Open, dtype: float64
0    391.86
1    391.70
2    391.88
3    391.34
4    391.52
5    391.58
Name: Open, dtype: float64
0    391.86
1    391.70
2    391.88
3    391.34
4    391.52
5    391.58
Name: Open, dtype: float64
上涨行情数据:Date   Time    Open   Close     Max     Min
1   2020/12/23  02:28  391.70  391.84  391.92  391.68
3   2020/12/23  09:00  391.34  391.52  391.58  391.22
4   2020/12/23  09:01  391.52  391.60  391.80  391.50
5   2020/12/23  09:02  391.58  391.82  391.86  391.54
13  2020/12/23  09:10  391.44  391.48  391.56  391.38
14  2020/12/23  09:11  391.48  391.60  391.62  391.34
15  2020/12/23  09:12  391.62  391.74  391.74  391.56
17  2020/12/23  09:14  391.54  391.70  391.70  391.54
18  2020/12/23  09:15  391.68  391.78  391.92  391.58
19  2020/12/23  09:16  391.76  391.98  392.02  391.76
下跌行情数据:Date   Time    Open   Close     Max     Min
0   2020/12/23  02:27  391.86  391.70  391.86  391.70
2   2020/12/23  02:29  391.88  391.60  391.92  391.60
6   2020/12/23  09:03  391.80  391.78  391.92  391.66
7   2020/12/23  09:04  391.78  391.70  391.86  391.60
8   2020/12/23  09:05  391.68  391.60  391.72  391.54
9   2020/12/23  09:06  391.62  391.58  391.62  391.50
10  2020/12/23  09:07  391.56  391.50  391.60  391.50
11  2020/12/23  09:08  391.52  391.46  391.52  391.42
16  2020/12/23  09:13  391.72  391.58  391.74  391.56
平盘行情数据:Date   Time    Open   Close     Max     Min
12  2020/12/23  09:09  391.44  391.44  391.44  391.34

行引用

[start = 0 : end : step = 1]
start : 开始行号,缺省为0(pandas的行号从0开始计数)
end:结束行号(不包含),必须指定,不能省略
step:步幅(间隔),缺省为1

print(df[:10]) # 行引用,第0至9行,前闭后开
print(df[:30:3]) # 第0至29行,间隔3行取一行,即:第0、3、6、9、12、15、18、21、24、27行

列引用的另一种方式

df.Open与df[‘Open’]相同

	print(df.Open) # 'Open'列引用的另一种方式,与df['Open']相同

位置引用(行列均指定)

先列后行
	print(df.Open[:10]) # 'Open'列引用的另一种方式,第0至9行print(df['Open'][:10]) # 'Open'列引用的另一种方式,第0至9行
先行后列
	print(df[0:6]['Open']) # 第0至5行(0:6,前闭后开),'Open'列
loc函数
	print(df.loc[0:5,'Open']) # 第0至5行(0:5,前后闭合),'Open'列

条件引用

	# 'Close'列数据大于'Open'列数据的所有行print(df[df['Close'] > df['Open']]) # 'Close'列数据大于'Open'列数据的所有行中指定的列print(df[df['Close'] > df['Open']][['Date', 'Time', 'Open','Close','Max','Min']])

Futures3.py代码

以下仅列出print_data函数与main函数的代码(与Future1.py不同的部分)

def print_data(df):dt = pd.Series([pd.to_datetime(date) for date in df['Date'] + ' ' + df['Time']])df.drop(['Date', 'Time'], axis=1, inplace=True)print(df)df.insert(0, 'DateTime', dt) # 在第0列以dt为数据插入'DateTime'列print(df)if __name__ == '__main__':def main():filename = get_cwd() + '/1分钟/沪金2102.csv'df = read_data(filename, nrows = 5)print_data(df)main()

运行结果:

     Open     Max     Min   Close  TradeNumber  HolderNumber   Average
0  391.86  391.86  391.70  391.70         59.0       50041.0  391.8169
1  391.70  391.92  391.68  391.84         75.0       50000.0  391.8222
2  391.88  391.92  391.60  391.60        131.0       49936.0  391.7632
3  391.34  391.58  391.22  391.52        689.0       49902.0  391.4023
4  391.52  391.80  391.50  391.60        296.0       49869.0  391.5945DateTime    Open     Max  ...  TradeNumber  HolderNumber   Average
0 2020-12-23 02:27:00  391.86  391.86  ...         59.0       50041.0  391.8169
1 2020-12-23 02:28:00  391.70  391.92  ...         75.0       50000.0  391.8222
2 2020-12-23 02:29:00  391.88  391.92  ...        131.0       49936.0  391.7632
3 2020-12-23 09:00:00  391.34  391.58  ...        689.0       49902.0  391.4023
4 2020-12-23 09:01:00  391.52  391.80  ...        296.0       49869.0  391.5945
[5 rows x 8 columns]

生成序列

dt = pd.Series([1,2,3])
print(dt)

将文本转换成日期时间格式

pd.to_datetime(date) for date in df['Date'] + ' ' + df['Time']

将’Date’列与’Time’列以空格间隔合并列,转换成datetime类型并生成序列

	dt = pd.Series([pd.to_datetime(date) for date in df['Date'] + ' ' + df['Time']])

删除列

删除’Date’与’Time’两列数据
inplace = True # 在df对象直接操作

df.drop(['Date', 'Time'], axis=1, inplace=True)

插入列

将dt序列以’DateTime’为列标题插入到第0列

df.insert(0, 'DateTime', dt) 

Futures4.py代码

以下仅给出print_data函数的内容。

def print_data(df):dt = pd.Series([pd.to_datetime(date) for date in df['Date'] + ' ' + df['Time']])df.drop(['Date', 'Time'], axis=1, inplace=True)df.insert(0, 'DateTime', dt) # 在第0列以dt为数据插入'DateTime'列df.to_csv('withindex.csv', index_label = 'Index') # default: index = Truedf.to_csv('noindex.csv', index = False)df.to_excel('noindex.xlsx', sheet_name = "Gold2102", index = False)print(df.sort_index(axis=1, ascending = False))print(df.sort_values(by='Average', ascending = False))

运行结果(生成3个文件):

  1. withindex.csv:包含索引列的CSV文件
  2. noindex.csv:不包含索引列的CSV文件
  3. noindex.xlsx:不包含索引列的xlsx文件


图1 withindex.csv文件截图


图2 noindex.csv文件截图

图3 noindex.xlsx文件截图

写CSV文件

带索引列,指定列名称为’Index’
	df.to_csv('withindex.csv', index_label = 'Index') # default: index = True
不带索引列
	df.to_csv('noindex.csv', index = False)

写XLSX文件

	df.to_excel('noindex.xlsx', sheet_name = "Gold2102", index = False)

Futures5.py代码

def print_data(df):dt = pd.Series([pd.to_datetime(date) for date in df['Date'] + ' ' + df['Time']])df.drop(['Date', 'Time'], axis=1, inplace=True)df.insert(0, 'DateTime', dt) # 在第0列以dt为数据插入'DateTime'列print(df.sort_index(axis=0, ascending=False))print(df.sort_index(axis=1, ascending = False))print(df.sort_values(by='Average', ascending = False))if __name__ == '__main__':def main():filename = get_cwd() + '/1分钟/沪金2102.csv'df = read_data(filename, nrows = 5)print_data(df)main()

运行结果:

             DateTime    Open     Max  ...  TradeNumber  HolderNumber   Average
4 2020-12-23 09:01:00  391.52  391.80  ...        296.0       49869.0  391.5945
3 2020-12-23 09:00:00  391.34  391.58  ...        689.0       49902.0  391.4023
2 2020-12-23 02:29:00  391.88  391.92  ...        131.0       49936.0  391.7632
1 2020-12-23 02:28:00  391.70  391.92  ...         75.0       50000.0  391.8222
0 2020-12-23 02:27:00  391.86  391.86  ...         59.0       50041.0  391.8169[5 rows x 8 columns]TradeNumber    Open     Min  ...            DateTime   Close   Average
0         59.0  391.86  391.70  ... 2020-12-23 02:27:00  391.70  391.8169
1         75.0  391.70  391.68  ... 2020-12-23 02:28:00  391.84  391.8222
2        131.0  391.88  391.60  ... 2020-12-23 02:29:00  391.60  391.7632
3        689.0  391.34  391.22  ... 2020-12-23 09:00:00  391.52  391.4023
4        296.0  391.52  391.50  ... 2020-12-23 09:01:00  391.60  391.5945[5 rows x 8 columns]DateTime    Open     Max  ...  TradeNumber  HolderNumber   Average
1 2020-12-23 02:28:00  391.70  391.92  ...         75.0       50000.0  391.8222
0 2020-12-23 02:27:00  391.86  391.86  ...         59.0       50041.0  391.8169
2 2020-12-23 02:29:00  391.88  391.92  ...        131.0       49936.0  391.7632
4 2020-12-23 09:01:00  391.52  391.80  ...        296.0       49869.0  391.5945
3 2020-12-23 09:00:00  391.34  391.58  ...        689.0       49902.0  391.4023[5 rows x 8 columns]

按标签排序sort_index()函数

函数定义:

    def sort_index(self,axis=0,level=None,ascending: bool_t = True,inplace: bool_t = False,kind: str = "quicksort",na_position: str = "last",sort_remaining: bool_t = True,ignore_index: bool_t = False,key: IndexKeyFunc = None,):

说明:

  1. axis:0:按照纵轴方向INDEX的值进行排序;1:按照横轴方向即各列的列标签名称进行排序
  2. ascending:True(缺省):升序;False:降序
  3. inplace:False(缺省):复制当前对象,然后进行操作;True:在当前对象上进行操作
  4. kind:缺省为"quicksort",即:快速排序

按值排序sort_values()函数

DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')

说明:

  1. by:字符串(行名称或者列名称)以及列表(行名称或者列名称的列表)
  2. axis:0:按列排序(by指定列名称或其列表);1:按行排序(by指定行名称或其列表)
	df.sort_values(by='Average', ascending = False)

by = ‘Average’ # 按照’Average’列的数值排序
ascending = False # False为降序排列,True为升序排列

Futures6.py代码

def print_data(df):print(df['Open'][:len(df)])print(df['Open'].rolling(window=5).mean()[4:])def save_data(df, filename, sheet_name = None):df.to_excel(filename, index = False, sheet_name = 'Sheet1' if sheet_name is None else sheet_name)if __name__ == '__main__':def main():n = 10filename = get_cwd() + '/1分钟/沪金2102.csv'df = read_data(filename, nrows = n)process_data(df)print_data(df)save_data(df[:n], filename = 'df.xlsx', sheet_name = 'df')save_data(df.rolling(window=5).mean(), filename = 'ma5.xlsx', sheet_name = 'ma5')main()

运行结果:

0    391.86
1    391.70
2    391.88
3    391.34
4    391.52
5    391.58
6    391.80
7    391.78
8    391.68
9    391.62
Name: Open, dtype: float64
4    391.660
5    391.604
6    391.624
7    391.604
8    391.672
9    391.692
Name: Open, dtype: float64

说明:第一部分输出为’Open’列的前10行(第0至9行,不包含列标签行)。第二部分为’Open’列的5个移动平均线。注:前4行(第0至3行)由于采样数不够,故其移动平均值为空值。

rolling()函数

函数原型:

    def rolling(self,window: Union[int, timedelta, BaseOffset, BaseIndexer],min_periods: Optional[int] = None,center: bool_t = False,win_type: Optional[str] = None,on: Optional[str] = None,axis: Axis = 0,closed: Optional[str] = None,):

说明:
window = 5:以5个元素进行组合
.mean():取平均值

df['Open'].rolling(window=5).mean()[4:]

更多推荐

Python Pandas使用说明(期货行情数据分析)

本文发布于:2024-02-25 08:54:10,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1698425.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:使用说明   期货   行情   数据   Python

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!