使用说明(期货行情数据分析)"/>
Python Pandas使用说明(期货行情数据分析)
文章目录
- Python Pandas使用说明(期货行情数据分析)
- 安装pandas模块
- 导入pandas模块
- 从CSV文件读取数据
- 沪金2102.csv文件
- Futures1.py代码
- 读取csv文件所有行(从filename变量指定的文件读取)
- 读取csv文件指定行数
- 打印df的所有数据
- 列引用
- 单列引用
- 多列引用
- Futures2.py代码
- 行引用
- 列引用的另一种方式
- 位置引用(行列均指定)
- 先列后行
- 先行后列
- loc函数
- 条件引用
- Futures3.py代码
- 生成序列
- 将文本转换成日期时间格式
- 将'Date'列与'Time'列以空格间隔合并列,转换成datetime类型并生成序列
- 删除列
- 插入列
- Futures4.py代码
- 写CSV文件
- 带索引列,指定列名称为'Index'
- 不带索引列
- 写XLSX文件
- Futures5.py代码
- 按标签排序sort_index()函数
- 按值排序sort_values()函数
- Futures6.py代码
- rolling()函数
Python Pandas使用说明(期货行情数据分析)
安装pandas模块
pip3 install pandas
导入pandas模块
代码:
import pandas as pd
从CSV文件读取数据
df = pd.read_csv(filename)
沪金2102.csv文件
截取文件前21行如下,第1行:标题行;第2-21行:数据
Date,Time,Open,Max,Min,Close,TradeNumber,HolderNumber,Average
2020/12/23,02:27,391.8600,391.8600,391.7000,391.7000,59.0000,50041.0000,391.8169
2020/12/23,02:28,391.7000,391.9200,391.6800,391.8400,75.0000,50000.0000,391.8222
2020/12/23,02:29,391.8800,391.9200,391.6000,391.6000,131.0000,49936.0000,391.7632
2020/12/23,09:00,391.3400,391.5800,391.2200,391.5200,689.0000,49902.0000,391.4023
2020/12/23,09:01,391.5200,391.8000,391.5000,391.6000,296.0000,49869.0000,391.5945
2020/12/23,09:02,391.5800,391.8600,391.5400,391.8200,430.0000,49827.0000,391.7182
2020/12/23,09:03,391.8000,391.9200,391.6600,391.7800,268.0000,49859.0000,391.7683
2020/12/23,09:04,391.7800,391.8600,391.6000,391.7000,211.0000,49833.0000,391.7079
2020/12/23,09:05,391.6800,391.7200,391.5400,391.6000,244.0000,49834.0000,391.6280
2020/12/23,09:06,391.6200,391.6200,391.5000,391.5800,188.0000,49842.0000,391.5675
2020/12/23,09:07,391.5600,391.6000,391.5000,391.5000,207.0000,49848.0000,391.5408
2020/12/23,09:08,391.5200,391.5200,391.4200,391.4600,169.0000,49851.0000,391.4600
2020/12/23,09:09,391.4400,391.4400,391.3400,391.4400,267.0000,49821.0000,391.3938
2020/12/23,09:10,391.4400,391.5600,391.3800,391.4800,181.0000,49825.0000,391.4754
2020/12/23,09:11,391.4800,391.6200,391.3400,391.6000,254.0000,49831.0000,391.4771
2020/12/23,09:12,391.6200,391.7400,391.5600,391.7400,134.0000,49826.0000,391.6423
2020/12/23,09:13,391.7200,391.7400,391.5600,391.5800,210.0000,49776.0000,391.6367
2020/12/23,09:14,391.5400,391.7000,391.5400,391.7000,86.0000,49766.0000,391.5933
2020/12/23,09:15,391.6800,391.9200,391.5800,391.7800,373.0000,49744.0000,391.7805
2020/12/23,09:16,391.7600,392.0200,391.7600,391.9800,513.0000,49802.0000,391.9118
Futures1.py代码
#!/usr/local/bin/python3
# -*- coding:utf-8 -*-import os
import pandas as pddef get_cwd():return os.path.dirname(__file__)def get_path():return os.path.realpath(__file__)def read_data(filename, nrows = None):try:if nrows == None:df = pd.read_csv(filename)else:df = pd.read_csv(filename, nrows = nrows)return dfexcept:print("Cannot open file %s." %filename)def print_data(df):print(df)print(df[['Date', 'Time', 'Open', 'Max', 'Min', 'Close', 'TradeNumber', 'HolderNumber']])if __name__ == '__main__':def main():filename = get_cwd() + '/1分钟/沪金2102.csv'df = read_data(filename, nrows = 10)print_data(df)main()
运行结果:
Date Time Open ... TradeNumber HolderNumber Average
0 2020/12/23 02:27 391.86 ... 59.0 50041.0 391.8169
1 2020/12/23 02:28 391.70 ... 75.0 50000.0 391.8222
2 2020/12/23 02:29 391.88 ... 131.0 49936.0 391.7632
3 2020/12/23 09:00 391.34 ... 689.0 49902.0 391.4023
4 2020/12/23 09:01 391.52 ... 296.0 49869.0 391.5945
5 2020/12/23 09:02 391.58 ... 430.0 49827.0 391.7182
6 2020/12/23 09:03 391.80 ... 268.0 49859.0 391.7683
7 2020/12/23 09:04 391.78 ... 211.0 49833.0 391.7079
8 2020/12/23 09:05 391.68 ... 244.0 49834.0 391.6280
9 2020/12/23 09:06 391.62 ... 188.0 49842.0 391.5675
[10 rows x 9 columns]Date Time Open Max Min Close TradeNumber HolderNumber
0 2020/12/23 02:27 391.86 391.86 391.70 391.70 59.0 50041.0
1 2020/12/23 02:28 391.70 391.92 391.68 391.84 75.0 50000.0
2 2020/12/23 02:29 391.88 391.92 391.60 391.60 131.0 49936.0
3 2020/12/23 09:00 391.34 391.58 391.22 391.52 689.0 49902.0
4 2020/12/23 09:01 391.52 391.80 391.50 391.60 296.0 49869.0
5 2020/12/23 09:02 391.58 391.86 391.54 391.82 430.0 49827.0
6 2020/12/23 09:03 391.80 391.92 391.66 391.78 268.0 49859.0
7 2020/12/23 09:04 391.78 391.86 391.60 391.70 211.0 49833.0
8 2020/12/23 09:05 391.68 391.72 391.54 391.60 244.0 49834.0
9 2020/12/23 09:06 391.62 391.62 391.50 391.58 188.0 49842.0
读取csv文件所有行(从filename变量指定的文件读取)
df = pd.read_csv(filename)
读取csv文件指定行数
读取前10行(第0至9行)数据
df = pd.read_csv(filename, nrows = 10)
打印df的所有数据
print(df)
列引用
单列引用
dataframe类型的对象使用方括号[ ]引用列数据。引用’Date’列:
df['Date']
多列引用
引用’Date’, 'Time’两列数据,说明:多列使用了列表[ ]形式,因此出现连续两个方括号。
df[['Date', 'Time']]
如下代码打印Date, Time, Open, Max, Min, Close, TradeNumber, HolderNumber所在列的数据,没有打印Average列的数据。
print(df[['Date', 'Time', 'Open', 'Max', 'Min', 'Close', 'TradeNumber', 'HolderNumber']])
Futures2.py代码
以下仅列出print_data函数与main函数的代码(与Future1.py不同的部分)
def print_data(df):print(df[:6]) # 行引用,第0至5行,前闭后开print(df[:20:5]) # 第0至19行,间隔5行取一行,即:第0、5、10、15行print(df.Open[:6]) # 'Open'列引用的另一种方式,第0至5行print(df[0:6]['Open']) # 第0至5行(0:6,前闭后开),'Open'列print(df.loc[0:5,'Open']) # 第0至5行(0:5,前后闭合),'Open'列print()print("上涨行情数据:")print(df[df['Close'] > df['Open']][['Date', 'Time', 'Open','Close','Max','Min']]) # 上涨print()print("下跌行情数据:")print(df[df['Close'] < df['Open']][['Date', 'Time', 'Open','Close','Max','Min']]) # 下跌print()print("平盘行情数据:")print(df[df['Close'] == df['Open']][['Date', 'Time', 'Open','Close','Max','Min']]) # 平盘if __name__ == '__main__':def main():filename = get_cwd() + '/1分钟/沪金2102.csv'df = read_data(filename, nrows = 20)print_data(df)main()
运行结果:
Date Time Open ... TradeNumber HolderNumber Average
0 2020/12/23 02:27 391.86 ... 59.0 50041.0 391.8169
1 2020/12/23 02:28 391.70 ... 75.0 50000.0 391.8222
2 2020/12/23 02:29 391.88 ... 131.0 49936.0 391.7632
3 2020/12/23 09:00 391.34 ... 689.0 49902.0 391.4023
4 2020/12/23 09:01 391.52 ... 296.0 49869.0 391.5945
5 2020/12/23 09:02 391.58 ... 430.0 49827.0 391.7182
[6 rows x 9 columns]Date Time Open ... TradeNumber HolderNumber Average
0 2020/12/23 02:27 391.86 ... 59.0 50041.0 391.8169
5 2020/12/23 09:02 391.58 ... 430.0 49827.0 391.7182
10 2020/12/23 09:07 391.56 ... 207.0 49848.0 391.5408
15 2020/12/23 09:12 391.62 ... 134.0 49826.0 391.6423
[4 rows x 9 columns]
0 391.86
1 391.70
2 391.88
3 391.34
4 391.52
5 391.58
Name: Open, dtype: float64
0 391.86
1 391.70
2 391.88
3 391.34
4 391.52
5 391.58
Name: Open, dtype: float64
0 391.86
1 391.70
2 391.88
3 391.34
4 391.52
5 391.58
Name: Open, dtype: float64
上涨行情数据:Date Time Open Close Max Min
1 2020/12/23 02:28 391.70 391.84 391.92 391.68
3 2020/12/23 09:00 391.34 391.52 391.58 391.22
4 2020/12/23 09:01 391.52 391.60 391.80 391.50
5 2020/12/23 09:02 391.58 391.82 391.86 391.54
13 2020/12/23 09:10 391.44 391.48 391.56 391.38
14 2020/12/23 09:11 391.48 391.60 391.62 391.34
15 2020/12/23 09:12 391.62 391.74 391.74 391.56
17 2020/12/23 09:14 391.54 391.70 391.70 391.54
18 2020/12/23 09:15 391.68 391.78 391.92 391.58
19 2020/12/23 09:16 391.76 391.98 392.02 391.76
下跌行情数据:Date Time Open Close Max Min
0 2020/12/23 02:27 391.86 391.70 391.86 391.70
2 2020/12/23 02:29 391.88 391.60 391.92 391.60
6 2020/12/23 09:03 391.80 391.78 391.92 391.66
7 2020/12/23 09:04 391.78 391.70 391.86 391.60
8 2020/12/23 09:05 391.68 391.60 391.72 391.54
9 2020/12/23 09:06 391.62 391.58 391.62 391.50
10 2020/12/23 09:07 391.56 391.50 391.60 391.50
11 2020/12/23 09:08 391.52 391.46 391.52 391.42
16 2020/12/23 09:13 391.72 391.58 391.74 391.56
平盘行情数据:Date Time Open Close Max Min
12 2020/12/23 09:09 391.44 391.44 391.44 391.34
行引用
[start = 0 : end : step = 1]
start : 开始行号,缺省为0(pandas的行号从0开始计数)
end:结束行号(不包含),必须指定,不能省略
step:步幅(间隔),缺省为1
print(df[:10]) # 行引用,第0至9行,前闭后开
print(df[:30:3]) # 第0至29行,间隔3行取一行,即:第0、3、6、9、12、15、18、21、24、27行
列引用的另一种方式
df.Open与df[‘Open’]相同
print(df.Open) # 'Open'列引用的另一种方式,与df['Open']相同
位置引用(行列均指定)
先列后行
print(df.Open[:10]) # 'Open'列引用的另一种方式,第0至9行print(df['Open'][:10]) # 'Open'列引用的另一种方式,第0至9行
先行后列
print(df[0:6]['Open']) # 第0至5行(0:6,前闭后开),'Open'列
loc函数
print(df.loc[0:5,'Open']) # 第0至5行(0:5,前后闭合),'Open'列
条件引用
# 'Close'列数据大于'Open'列数据的所有行print(df[df['Close'] > df['Open']]) # 'Close'列数据大于'Open'列数据的所有行中指定的列print(df[df['Close'] > df['Open']][['Date', 'Time', 'Open','Close','Max','Min']])
Futures3.py代码
以下仅列出print_data函数与main函数的代码(与Future1.py不同的部分)
def print_data(df):dt = pd.Series([pd.to_datetime(date) for date in df['Date'] + ' ' + df['Time']])df.drop(['Date', 'Time'], axis=1, inplace=True)print(df)df.insert(0, 'DateTime', dt) # 在第0列以dt为数据插入'DateTime'列print(df)if __name__ == '__main__':def main():filename = get_cwd() + '/1分钟/沪金2102.csv'df = read_data(filename, nrows = 5)print_data(df)main()
运行结果:
Open Max Min Close TradeNumber HolderNumber Average
0 391.86 391.86 391.70 391.70 59.0 50041.0 391.8169
1 391.70 391.92 391.68 391.84 75.0 50000.0 391.8222
2 391.88 391.92 391.60 391.60 131.0 49936.0 391.7632
3 391.34 391.58 391.22 391.52 689.0 49902.0 391.4023
4 391.52 391.80 391.50 391.60 296.0 49869.0 391.5945DateTime Open Max ... TradeNumber HolderNumber Average
0 2020-12-23 02:27:00 391.86 391.86 ... 59.0 50041.0 391.8169
1 2020-12-23 02:28:00 391.70 391.92 ... 75.0 50000.0 391.8222
2 2020-12-23 02:29:00 391.88 391.92 ... 131.0 49936.0 391.7632
3 2020-12-23 09:00:00 391.34 391.58 ... 689.0 49902.0 391.4023
4 2020-12-23 09:01:00 391.52 391.80 ... 296.0 49869.0 391.5945
[5 rows x 8 columns]
生成序列
dt = pd.Series([1,2,3])
print(dt)
将文本转换成日期时间格式
pd.to_datetime(date) for date in df['Date'] + ' ' + df['Time']
将’Date’列与’Time’列以空格间隔合并列,转换成datetime类型并生成序列
dt = pd.Series([pd.to_datetime(date) for date in df['Date'] + ' ' + df['Time']])
删除列
删除’Date’与’Time’两列数据
inplace = True # 在df对象直接操作
df.drop(['Date', 'Time'], axis=1, inplace=True)
插入列
将dt序列以’DateTime’为列标题插入到第0列
df.insert(0, 'DateTime', dt)
Futures4.py代码
以下仅给出print_data函数的内容。
def print_data(df):dt = pd.Series([pd.to_datetime(date) for date in df['Date'] + ' ' + df['Time']])df.drop(['Date', 'Time'], axis=1, inplace=True)df.insert(0, 'DateTime', dt) # 在第0列以dt为数据插入'DateTime'列df.to_csv('withindex.csv', index_label = 'Index') # default: index = Truedf.to_csv('noindex.csv', index = False)df.to_excel('noindex.xlsx', sheet_name = "Gold2102", index = False)print(df.sort_index(axis=1, ascending = False))print(df.sort_values(by='Average', ascending = False))
运行结果(生成3个文件):
- withindex.csv:包含索引列的CSV文件
- noindex.csv:不包含索引列的CSV文件
- noindex.xlsx:不包含索引列的xlsx文件
图1 withindex.csv文件截图
图2 noindex.csv文件截图
图3 noindex.xlsx文件截图
写CSV文件
带索引列,指定列名称为’Index’
df.to_csv('withindex.csv', index_label = 'Index') # default: index = True
不带索引列
df.to_csv('noindex.csv', index = False)
写XLSX文件
df.to_excel('noindex.xlsx', sheet_name = "Gold2102", index = False)
Futures5.py代码
def print_data(df):dt = pd.Series([pd.to_datetime(date) for date in df['Date'] + ' ' + df['Time']])df.drop(['Date', 'Time'], axis=1, inplace=True)df.insert(0, 'DateTime', dt) # 在第0列以dt为数据插入'DateTime'列print(df.sort_index(axis=0, ascending=False))print(df.sort_index(axis=1, ascending = False))print(df.sort_values(by='Average', ascending = False))if __name__ == '__main__':def main():filename = get_cwd() + '/1分钟/沪金2102.csv'df = read_data(filename, nrows = 5)print_data(df)main()
运行结果:
DateTime Open Max ... TradeNumber HolderNumber Average
4 2020-12-23 09:01:00 391.52 391.80 ... 296.0 49869.0 391.5945
3 2020-12-23 09:00:00 391.34 391.58 ... 689.0 49902.0 391.4023
2 2020-12-23 02:29:00 391.88 391.92 ... 131.0 49936.0 391.7632
1 2020-12-23 02:28:00 391.70 391.92 ... 75.0 50000.0 391.8222
0 2020-12-23 02:27:00 391.86 391.86 ... 59.0 50041.0 391.8169[5 rows x 8 columns]TradeNumber Open Min ... DateTime Close Average
0 59.0 391.86 391.70 ... 2020-12-23 02:27:00 391.70 391.8169
1 75.0 391.70 391.68 ... 2020-12-23 02:28:00 391.84 391.8222
2 131.0 391.88 391.60 ... 2020-12-23 02:29:00 391.60 391.7632
3 689.0 391.34 391.22 ... 2020-12-23 09:00:00 391.52 391.4023
4 296.0 391.52 391.50 ... 2020-12-23 09:01:00 391.60 391.5945[5 rows x 8 columns]DateTime Open Max ... TradeNumber HolderNumber Average
1 2020-12-23 02:28:00 391.70 391.92 ... 75.0 50000.0 391.8222
0 2020-12-23 02:27:00 391.86 391.86 ... 59.0 50041.0 391.8169
2 2020-12-23 02:29:00 391.88 391.92 ... 131.0 49936.0 391.7632
4 2020-12-23 09:01:00 391.52 391.80 ... 296.0 49869.0 391.5945
3 2020-12-23 09:00:00 391.34 391.58 ... 689.0 49902.0 391.4023[5 rows x 8 columns]
按标签排序sort_index()函数
函数定义:
def sort_index(self,axis=0,level=None,ascending: bool_t = True,inplace: bool_t = False,kind: str = "quicksort",na_position: str = "last",sort_remaining: bool_t = True,ignore_index: bool_t = False,key: IndexKeyFunc = None,):
说明:
- axis:0:按照纵轴方向INDEX的值进行排序;1:按照横轴方向即各列的列标签名称进行排序
- ascending:True(缺省):升序;False:降序
- inplace:False(缺省):复制当前对象,然后进行操作;True:在当前对象上进行操作
- kind:缺省为"quicksort",即:快速排序
按值排序sort_values()函数
DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
说明:
- by:字符串(行名称或者列名称)以及列表(行名称或者列名称的列表)
- axis:0:按列排序(by指定列名称或其列表);1:按行排序(by指定行名称或其列表)
df.sort_values(by='Average', ascending = False)
by = ‘Average’ # 按照’Average’列的数值排序
ascending = False # False为降序排列,True为升序排列
Futures6.py代码
def print_data(df):print(df['Open'][:len(df)])print(df['Open'].rolling(window=5).mean()[4:])def save_data(df, filename, sheet_name = None):df.to_excel(filename, index = False, sheet_name = 'Sheet1' if sheet_name is None else sheet_name)if __name__ == '__main__':def main():n = 10filename = get_cwd() + '/1分钟/沪金2102.csv'df = read_data(filename, nrows = n)process_data(df)print_data(df)save_data(df[:n], filename = 'df.xlsx', sheet_name = 'df')save_data(df.rolling(window=5).mean(), filename = 'ma5.xlsx', sheet_name = 'ma5')main()
运行结果:
0 391.86
1 391.70
2 391.88
3 391.34
4 391.52
5 391.58
6 391.80
7 391.78
8 391.68
9 391.62
Name: Open, dtype: float64
4 391.660
5 391.604
6 391.624
7 391.604
8 391.672
9 391.692
Name: Open, dtype: float64
说明:第一部分输出为’Open’列的前10行(第0至9行,不包含列标签行)。第二部分为’Open’列的5个移动平均线。注:前4行(第0至3行)由于采样数不够,故其移动平均值为空值。
rolling()函数
函数原型:
def rolling(self,window: Union[int, timedelta, BaseOffset, BaseIndexer],min_periods: Optional[int] = None,center: bool_t = False,win_type: Optional[str] = None,on: Optional[str] = None,axis: Axis = 0,closed: Optional[str] = None,):
说明:
window = 5:以5个元素进行组合
.mean():取平均值
df['Open'].rolling(window=5).mean()[4:]
更多推荐
Python Pandas使用说明(期货行情数据分析)
发布评论