基于python爬取全国2822所高校在各省,近三年的录取分数线

编程入门 行业动态 更新时间:2024-10-25 22:31:55

基于python爬取全国2822<a href=https://www.elefans.com/category/jswz/34/1742743.html style=所高校在各省,近三年的录取分数线"/>

基于python爬取全国2822所高校在各省,近三年的录取分数线

数据更新:爬取的2022、2021、2020三年的数据如下
链接:
提取码:ozu5

最近全国高考结束,考生都在等分当中,鉴于自己之前一直有个想法,爬取各高校的信息,方便考生选择,因此完成了一下代码,爬取了全国2822所高校,包括本科和高职院校,在各省的分数线。

下图是各高校在湖北省的,经过高校软科排名排序后的近3年录取分数情况:

完整的数据下载地址
链接:
提取码:z1db

数据中分数栏,空白部分,说明该学校在该省不招生。

部分代码如下,未优化…(代码已更新)

from ast import Str
from time import sleep
import requests
import json
import csv
import time
import random
from sqlalchemy import nulldef save_data(s,data):with open('D:/PYTHON_CODE/高校分数线/'+s+'.csv', encoding='UTF-8', mode='a+',newline='') as f:f_csv = csv.writer(f)f_csv.writerow(data)f.close()
headers_list = [{'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 10; SM-G981B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (iPad; CPU OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/87.0.4280.77 Mobile/15E148 Safari/604.1'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.109 Safari/537.36 CrKey/1.54.248666'}, {'user-agent': 'Mozilla/5.0 (X11; Linux aarch64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.188 Safari/537.36 CrKey/1.54.250320'}, {'user-agent': 'Mozilla/5.0 (BB10; Touch) AppleWebKit/537.10+ (KHTML, like Gecko) Version/10.0.9.2372 Mobile Safari/537.10+'}, {'user-agent': 'Mozilla/5.0 (PlayBook; U; RIM Tablet OS 2.1.0; en-US) AppleWebKit/536.2+ (KHTML like Gecko) Version/7.2.1.0 Safari/536.2+'}, {'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.3; en-us; SM-N900T Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30'}, {'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.1; en-us; GT-N7100 Build/JRO03C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30'}, {'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.0; en-us; GT-I9300 Build/IMM76D) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 7.0; SM-G950U Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G965U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.111 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.1.0; SM-T837A) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.80 Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; U; en-us; KFAPWI Build/JDQ39) AppleWebKit/535.19 (KHTML, like Gecko) Silk/3.13 Safari/535.19 Silk-Accelerated=true'}, {'user-agent': 'Mozilla/5.0 (Linux; U; Android 4.4.2; en-us; LGMS323 Build/KOT49I.MS32310c) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 550) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Mobile Safari/537.36 Edge/14.14263'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Moto G (4)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 10 Build/MOB31T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; Nexus 5X Build/OPR4.170623.006) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 7.1.1; Nexus 6 Build/N6F26U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; Nexus 6P Build/OPP3.170518.006) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 7 Build/MOB30X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 520)'}, {'user-agent': 'Mozilla/5.0 (MeeGo; NokiaN9) AppleWebKit/534.13 (KHTML, like Gecko) NokiaBrowser/8.5.0 Mobile Safari/534.13'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 9; Pixel 3 Build/PQ1A.181105.017.A1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.158 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 10; Pixel 4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 11; Pixel 3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.181 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (Linux; Android 8.0.0; Pixel 2 XL Build/OPD1.170816.004) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Mobile Safari/537.36'}, {'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1'}, {'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1'}, {'user-agent': 'Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1'}
]headers = random.choice(headers_list)def get_url(url):try:response = requests.get(url, headers=headers, timeout=1)  # 超时设置为10秒except:for i in range(4):  # 循环去请求网站response = requests.get(url, headers=headers, timeout=20)if response.status_code == 200:breakhtml_str = response.textreturn html_strprint("#########"" 版权所有:殷宗敏 & 数据接口来源-  & 在此表示感谢!""##########")url = '.0/school/name.json'
html = requests.get(url).text
unicodestr=json.loads(html)  #将string转化为dict
dat = unicodestr["data"]province_id=[{"name":11,"value":"北京"},{"name":12,"value":"天津"},{"name":13,"value":"河北"},{"name":14,"value":"山西"},{"name":15,"value":"内蒙古"},{"name":21,"value":"辽宁"},{"name":22,"value":"吉林"},{"name":23,"value":"黑龙江"},{"name":31,"value":"上海"},{"name":32,"value":"江苏"},{"name":33,"value":"浙江"},{"name":34,"value":"安徽"},{"name":35,"value":"福建"},{"name":36,"value":"江西"},{"name":37,"value":"山东"},{"name":41,"value":"河南"},{"name":42,"value":"湖北"},{"name":43,"value":"湖南"},{"name":44,"value":"广东"},{"name":45,"value":"广西"},{"name":46,"value":"海南"},{"name":50,"value":"重庆"},{"name":51,"value":"四川"},{"name":52,"value":"贵州"},{"name":53,"value":"云南"},{"name":54,"value":"西藏"},{"name":61,"value":"陕西"},{"name":62,"value":"甘肃"},{"name":63,"value":"青海"},{"name":64,"value":"宁夏"},{"name":65,"value":"新疆"}]
for l in province_id:header = ['名称', '省', '市', '区', '地址','介绍' ,'985','211','软科排名','学校类型','学校属性','特色专业',"2022分数线","2021分数线","2020分数线"]with open('D:/PYTHON_CODE/高校分数线/'+l["value"]+'.csv', encoding='utf-8-sig', mode='w',newline='') as f:f_csv = csv.writer(f)f_csv.writerow(header)#f.close()for i in dat:schoolid = i['school_id']schoolname = i['name']url1 = '.0/school/'+schoolid+'/info.json'print("正在下载"+schoolname)html1 = get_url(url1)unicodestr1=json.loads(html1)  #将string转化为dictif len(unicodestr1) !=0:dat1 = unicodestr1["data"]name = dat1["name"]content = dat1["content"]f985 = dat1["f985"]if f985 =="1":f985 = "是"else:f985 = "否"f211 = dat1["f211"]if f211 =="1":f211 = "是"else:f211 = "否"ruanke_rank = dat1["ruanke_rank"]if ruanke_rank=='0':ruanke_rank =''type_name= dat1["type_name"]school_nature_name = dat1["school_nature_name"]province_name = dat1["province_name"]city_name = dat1["city_name"]town_name = dat1["town_name"]address = dat1["address"]special =[]for j in  dat1["special"]:special.append(j["special_name"])        pro_type_min=dat1["pro_type_min"]fen2021=''fen2020=''fen2022=''for k in pro_type_min.keys():# print(k)# print(l["name"])if int(k) == l["name"]:print(pro_type_min[k])for m in pro_type_min[k]:if  m['year'] == 2022:s = ' 'for j in m['type'].keys():if j == '2073':s = s+'物理类:'+m['type'][j] +' 'if j == '2074':s = s+'历史类:'+m['type'][j] +' 'if j == '1':s = s+'理科:'+m['type'][j] +' 'if j == '2':s = s+'文科:'+m['type'][j] +' 'if j == '3':s = s+'综合类:'+m['type'][j] +' 'fen2022 =  selif  m['year'] == 2021:s = ' 'for j in m['type'].keys():if j == '2073':s = s+'物理类:'+m['type'][j] +' 'if j == '2074':s = s+'历史类:'+m['type'][j] +' 'if j == '1':s = s+'理科:'+m['type'][j] +' 'if j == '2':s = s+'文科:'+m['type'][j] +' 'if j == '3':s = s+'综合类:'+m['type'][j] +' 'fen2021 =  selse:s = ' 'for j in m['type'].keys():if j == '2073':s = s+'物理类:'+m['type'][j] +' 'if j == '2074':s = s+'历史类:'+m['type'][j] +' 'if j == '1':s = s+'理科:'+m['type'][j] +' 'if j == '2':s = s+'文科:'+m['type'][j] +' 'if j == '3':s = s+'综合类:'+m['type'][j] +' 'fen2020 =  stap = (name,province_name,city_name,town_name,address,content,f985,f211,ruanke_rank,type_name,school_nature_name,special,fen2022,fen2021,fen2020)save_data(l["value"],tap)

更多推荐

基于python爬取全国2822所高校在各省,近三年的录取分数线

本文发布于:2024-03-07 11:22:18,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1717692.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:所高校   近三年   录取分数线   全国   python

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!