数据"/>
Python 抓取Scrape Center中ssr1数据
文章目录
- 1. 利用 requests 库和正则表达式抓取ssr1的相关内容
- Scrape Center
- ssr1网址
- (1)定义getHTML(url)方法,获取指定网页的源代码。
- (2)定义findSSR1(html)方法,解析源代码,获取每条电影信息。
- (3)定义write_to_file()方法,将电影信息写入Excel文件中。
- (4)定义main(offset)方法,总合所有方法。
1. 利用 requests 库和正则表达式抓取ssr1的相关内容
Scrape Center
Scrape Center
ssr1网址
/page/1
/page/2
…
/page/10
import re
import json
import time
import requests
from requests.exceptions import RequestException
#from fake_useragent import UserAgent
(1)定义getHTML(url)方法,获取指定网页的源代码。
def getHTML(url):try:headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}response = requests.get(url,timeout=30, headers=headers)response.encoding = response.apparent_encodingif response.status_code == 200:return response.textreturn Noneexcept RequestException:return None
(2)定义findSSR1(html)方法,解析源代码,获取每条电影信息。
def findSSR1(html):global slistpattern = repile('<div.*?el-col-md-4''.*?src="(.*?)"'#image'.*?<h2.*?>(.*?)</h2>'#name'.*?button.*?<span>(.*?)</span>.*?button.*?<span>(.*?)</span>.*?button.*?<span>(.*?)</span>'#备注'.*?info.*?<span.*?>(.*?)</span>.*?<span.*?>(.*?)</span>.*?<span.*?>(.*?)</span>.*?info.*?<span.*?>(.*?)</span>'#info'.*?score.*?>(.*?)</p>'#score'.*?</div>', re.S)items = re.findall(pattern,html)print(items)for item in items:slist.append([item[0],#imageitem[1],#name'、'.join(set([item[2],item[3],item[4]])),#infoitem[5],#countryitem[7],#时长item[8],#上映时间item[9].strip()])#评分#print(slist)return slist
(3)定义write_to_file()方法,将电影信息写入Excel文件中。
def write_to_file():global slist# 写入Excel文件wb = xw.Book()sht = wb.sheets('Sheet1')sht.range('a1').value = slist # 将数据添加到表格中
(4)定义main(offset)方法,总合所有方法。
def main():global slistslist = [['image', 'name', 'info', 'country', 'time', '上映时间', '评分']]for i in range(1,11):url = "/page/" + str(i)html = getHTML(url)findSSR1(html)time.sleep(1)write_to_file()
import re
import time
import requests
from requests.exceptions import RequestException
import xlwings as xw
#from fake_useragent import UserAgentdef getHTML(url):try:headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}response = requests.get(url,timeout=30, headers=headers)response.encoding = response.apparent_encodingif response.status_code == 200:return response.textreturn Noneexcept RequestException:return Nonedef findSSR1(html):global slistpattern = repile('<div.*?el-col-md-4''.*?src="(.*?)"'#image'.*?<h2.*?>(.*?)</h2>'#name'.*?button.*?<span>(.*?)</span>.*?button.*?<span>(.*?)</span>.*?button.*?<span>(.*?)</span>'#备注'.*?info.*?<span.*?>(.*?)</span>.*?<span.*?>(.*?)</span>.*?<span.*?>(.*?)</span>.*?info.*?<span.*?>(.*?)</span>'#info'.*?score.*?>(.*?)</p>'#score'.*?</div>', re.S)items = re.findall(pattern,html)print(items)for item in items:slist.append([item[0],#imageitem[1],#name'、'.join(set([item[2],item[3],item[4]])),#infoitem[5],#countryitem[7],#时长item[8],#上映时间item[9].strip()])#评分#print(slist)return slist
def write_to_file():global slist# 写入Excel文件wb = xw.Book()sht = wb.sheets('Sheet1')sht.range('a1').value = slist # 将数据添加到表格中def main():global slistslist = [['image', 'name', 'info', 'country', 'time', '上映时间', '评分']]for i in range(1,11):url = "/page/" + str(i)html = getHTML(url)findSSR1(html)time.sleep(1)write_to_file()if __name__ == '__main__':main()
更多推荐
Python 抓取Scrape Center中ssr1数据
发布评论