Requests+re爬虫框架爬取教务系统课程信息

编程入门 行业动态 更新时间:2024-10-19 04:22:19

Requests+re<a href=https://www.elefans.com/category/jswz/34/1770264.html style=爬虫框架爬取教务系统课程信息"/>

Requests+re爬虫框架爬取教务系统课程信息

爬取教务系统课程数据

requests+re爬虫和解析框架,注意是这个教务系统类型
效果图如下#### 直接上源码,注意自己看懂修改意义更大,同时用户名和密码我遮盖掉了

下面展示源码(全原创)其中正则表达式部分有小错误,如果比较苛刻的同学可以自己再寻找规律修改。

// A code block
var foo = 'bar';
// An highlighted block
# coding:utf-8
import requests
import json
import re
import pandas as pd
# 登录请求地址url = '.html'
# 请求
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36","Cookie":"wengine_vpn_ticket=665a8f93264e9d45; refresh=1"
}
# body数据
data = {'yhm':"",   # 账号"mm":"",  # 密码
}
# 发送请求
session = requests.session()
session.post(url,headers=headers,data=data)
#url_index='.html'
#url_index='.html?vpn-12-o1-jwxt.zufe.edu&gnmkdm=N253512&su=160103900104&rwlx=2&xkly=0&bklx_id=0&xqh_id=2&jg_id=03&zyh_id=0390&zyfx_id=wfx&njdm_id=2016&bh_id=16039001&xbm=1&xslbdm=wlb&ccdm=w&xsbj=4294967296&sfkknj=0&sfkkzy=0&sfznkx=0&zdkxms=0&sfkxq=0&sfkcfx=0&kkbk=0&kkbkdj=0&xkxnm=2019&xkxqm=12&rlkz=0&kklxdm=10&kch_id=000120030&xkkz_id=A1E3328A1E0B79B1E053A40810AC64CF&cxbj=0&fxbj=0'
course_list_detail = [["课程板块","教学地点","教学时间","备注"]]def get_course_detail(url_index_2):r = session.post(url_index_2,headers=headers)#content = r.content.decode()result = r.content.decode().replace(' ', '').replace('\n', '').replace('\r', '')kcgsmc = str(re.findall('"kcgsmc":".*?"',result))[12:-3]jxdd = str(re.findall('"jxdd":".*?"',result))[10:-3]sksj = str(re.findall('"sksj":".*?"',result))[10:-3]xkbz = str(re.findall('"xkbz":".*?"',result))[10:-3]global tmp_list_2tmp_list_2 = [kcgsmc,jxdd,sksj,xkbz]#course_list_detail.append(tmp_list_2)def set_kch_id(kch_id):    url_index_2='.html?vpn-12-o1-jwxt.zufe.edu&gnmkdm=N253512&su=160103900104&rwlx=2&xkly=0&bklx_id=0&xqh_id=2&jg_id=03&zyh_id=0390&zyfx_id=wfx&njdm_id=2016&bh_id=16039001&xbm=1&xslbdm=wlb&ccdm=w&xsbj=4294967296&sfkknj=0&sfkkzy=0&sfznkx=0&zdkxms=0&sfkxq=0&sfkcfx=0&kkbk=0&kkbkdj=0&xkxnm=2019&xkxqm=12&rlkz=0&kklxdm=10&kch_id='+str(kch_id)+'&xkkz_id=A1E3328A1E0B79B1E053A40810AC64CF&cxbj=0&fxbj=0'get_course_detail(url_index_2)def get_course_brief(url_index):r = session.post(url_index,headers=headers)#content = r.content.decode()content = r.content.decode().replace(' ', '').replace('\n', '').replace('\r', '')pattern=repile('"cxbj.*?year',re.S)results = re.findall(pattern, content)#course_list = [["课程编号","课程名称","课程ID","课程学分"]]for result in results:jxbmc = str(re.findall('"jxbmc":".*?"',result))[11:-3]kcmc = str(re.findall('"kcmc":".*?"',result))[10:-3]xf = str(re.findall('"xf":".*?"',result))[8:-3]kch_id = str(re.findall('"kch":".*?"',result))[9:-3]set_kch_id(kch_id)tmp_list = [jxbmc,kcmc,xf]+tmp_list_2course_list.append(tmp_list)num1 = 1 ; num2 = 10
course_list = []
for i in range(6):url_index='.html?vpn-12-o1-jwxt.zufe.edu&gnmkdm=N253512&su=160103900104&rwlx=2&xkly=0&bklx_id=0&xqh_id=2&jg_id=03&zyh_id=0390&zyfx_id=wfx&njdm_id=2016&bh_id=16039001&xbm=1&xslbdm=wlb&ccdm=w&xsbj=4294967296&sfkknj=0&sfkkzy=0&sfznkx=0&zdkxms=0&sfkxq=0&sfkcfx=0&kkbk=0&kkbkdj=0&sfkgbcx=0&sfrxtgkcxd=0&tykczgxdcs=0&xkxnm=2019&xkxqm=12&kklxdm=10&rlkz=0&kspage='+str(num1)+'&jspage='+str(num2)+'&jxbzb='try:get_course_brief(url_index)except:print(i)breaknum1+=10; num2+=10
df = pd.DataFrame(course_list,columns=["课程编号","课程名称","课程学分","课程板块","教学地点","教学时间","备注"])
print(df)

更多推荐

Requests+re爬虫框架爬取教务系统课程信息

本文发布于:2024-03-08 19:04:40,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1721954.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:爬虫   教务   框架   课程   系统

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!