pandas教程:Interacting with Web APIs API和数据库的交互

编程入门 行业动态 更新时间:2024-10-23 05:49:24

pandas教程:Interacting with Web APIs API和<a href=https://www.elefans.com/category/jswz/34/1771350.html style=数据库的交互"/>

pandas教程:Interacting with Web APIs API和数据库的交互

文章目录

  • 6.3 Interacting with Web APIs (网络相关的API交互)
  • 6.4 Interacting with Databases(与数据库的交互)

6.3 Interacting with Web APIs (网络相关的API交互)

很多网站都有公开的API,通过JSON等格式提供数据流。有很多方法可以访问这些API,这里推荐一个易用的requests包。

找到githubpandas最新的30个issues,制作一个GET HTTP request, 通过使用requests包:

import pandas as pd
import numpy as np
import requests
url = ''
resp = requests.get(url)
resp
<Response [200]>

responsejson方法能返回一个dict,包含可以解析为python objectJSON

data = resp.json()
data[0]['title']
'Optimize data type'
data[0]
{'assignee': None,'assignees': [],'author_association': 'NONE','body': 'Hi guys, i\'m user of mysql\r\nwe have an "function" PROCEDURE ANALYSE\r\n.5/en/procedure-analyse.html\r\n\r\nit get all "dataframe" and show what\'s the best "dtype", could we do something like it in Pandas?\r\n\r\nthanks!','closed_at': None,'comments': 1,'comments_url': '','created_at': '2017-11-13T22:51:32Z','events_url': '','html_url': '','id': 273606786,'labels': [],'labels_url': '{/name}','locked': False,'milestone': None,'number': 18272,'repository_url': '','state': 'open','title': 'Optimize data type','updated_at': '2017-11-13T22:57:27Z','url': '','user': {'avatar_url': '=4','events_url': '{/privacy}','followers_url': '','following_url': '{/other_user}','gists_url': '{/gist_id}','gravatar_id': '','html_url': '','id': 2468782,'login': 'rspadim','organizations_url': '','received_events_url': '','repos_url': '','site_admin': False,'starred_url': '{/owner}{/repo}','subscriptions_url': '','type': 'User','url': ''}}

data中的每一个元素都是一个dict,这个dict就是在github上找到的issue页面上的信息。我们可以把data传给DataFrame并提取感兴趣的部分:

issues = pd.DataFrame(data, columns=['number', 'title', 'labels', 'state'])
issues
numbertitlelabelsstate
018272Optimize data type[]open
118271BUG: Series.rank(pct=True).max() != 1 for a la...[]open
218270(Series|DataFrame) datetimelike ops[]open
318268DOC: update Seriesbine/DataFramebine d...[]open
418266DOC: updated bine_first doc strings[{'url': '...open
518265Calling DataFrame.stack on an out-of-order col...[]open
618264cleaned up imports[{'url': '...open
718263Tslibs offsets paramd[]open
818262DEPR: let's deprecate[{'url': '...open
918258DEPR: deprecate (Sparse)Series.from_array[{'url': '...open
1018255ENH/PERF: Add cache='infer' to to_datetime[{'url': '...open
1118250Categorical.replace() unexpectedly returns non...[{'url': '...open
1218246pandas.MultiIndex.reorder_levels has no inplac...[]open
1318245TST: test tz-aware DatetimeIndex as separate m...[{'url': '...open
1418244RLS 0.21.1[{'url': '...open
1518243DEPR: deprecate .ftypes, get_ftype_counts[{'url': '...open
1618242CLN: Remove days, seconds and microseconds pro...[{'url': '...open
1718241DEPS: drop 2.7 support[{'url': '...open
1818238BUG: Fix filter method so that accepts byte an...[{'url': '...open
1918237Deprecate Series.asobject, Index.asobject, ren...[{'url': '...open
2018236df.plot() very slow compared to explicit matpl...[{'url': '...open
2118235Quarter.onOffset looks fishy[]open
2218231Reduce copying of input data on Series constru...[{'url': '...open
2318226Patch __init__ to prevent passing invalid kwds[{'url': '...open
2418222DataFrame.plot() produces incorrect legend lab...[{'url': '...open
2518220DataFrame.groupy renames columns when given a ...[]open
2618217Deprecate Index.summary[{'url': '...open
2718216Pass kwargs from read_parquet() to the underly...[{'url': '...open
2818215DOC/DEPR: ensure that @deprecated functions ha...[{'url': '...open
2918213Deprecate Series.from_array ?[{'url': '...open

6.4 Interacting with Databases(与数据库的交互)

如果在工作中,大部分数据并不会以textexcel的格式存储。最广泛使用的是SQL-based的关系型数据库(SQL Server,PostgreSQL,MySQL)。选择数据库通常取决于性能,数据整合性,实际应用的可扩展性。

读取SQLDataFrame非常直观,pandas中有一些函数能简化这个过程。举个例子,这里创建一个SQLite数据库,通过使用python内建的sqlite3 driver

import sqlite3
import pandas as pd
query = """
CREATE TABLE test
(a VARCHAR(20), b VARCHAR(20),c REAL,        d INTEGER
);"""
con = sqlite3.connect('../examples/mydata.sqlite')
con.execute(query)
<sqlite3.Cursor at 0x1049931f0>
conmit()

然后我们插入几行数据:

data = [('Atlanta', 'Georgia', 1.25, 6),('Tallahassee', 'Florida', 2.6, 3),('Sacramento', 'California', 1.7, 5)]
stmt = "INSERT INTO test VALUES(?, ?, ?, ?)"
con.executemany(stmt, data)
<sqlite3.Cursor at 0x1049932d0>
conmit()

大部分pythonSQL驱动(PyODBC, psycopg2, MySQLdb, pymssql, 等)返回a list of tuple,当从一个表格选择数据的时候:

cursor = con.execute('select * from test')
rows = cursor.fetchall()
rows
[('Atlanta', 'Georgia', 1.25, 6),('Tallahassee', 'Florida', 2.6, 3),('Sacramento', 'California', 1.7, 5)]

我们可以把list of tuples传递给DataFrame,但是我们也需要column names,包含cursordescription属性:

cursor.description
(('a', None, None, None, None, None, None),('b', None, None, None, None, None, None),('c', None, None, None, None, None, None),('d', None, None, None, None, None, None))
pd.DataFrame(rows, columns=[x[0] for x in cursor.description])
abcd
0AtlantaGeorgia1.256
1TallahasseeFlorida2.603
2SacramentoCalifornia1.705

我们不希望每次询问数据库的时候都重复以上步骤,这样对计算机很不好(逐步对计算机系统或文件做小改动导致大的损害)。SQLAlchemy计划是一个六星的Python SQL工具箱,它能抽象出不同SQL数据库之间的不同。pandas有一个read_sql函数,能让我们从SQLAlchemy connection从读取数据。这里我们用SQLAlchemy连接到同一个SQLite数据库,并从之前创建的表格读取数据:

import sqlalchemy as sqla
db = sqla.create_engine('sqlite:///../examples/mydata.sqlite')
pd.read_sql('select * from test', db)
abcd
0AtlantaGeorgia1.256
1TallahasseeFlorida2.603
2SacramentoCalifornia1.705

更多推荐

pandas教程:Interacting with Web APIs API和数据库的交互

本文发布于:2023-11-16 01:40:18,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1611141.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数据库   教程   Interacting   pandas   API

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!