Python 抓取Scrape Center中ssr1数据

编程入门行业动态更新时间:2024-10-09 15:15:28

Python 抓取Scrape Center中ssr1<a href=https://www.elefans.com/category/jswz/34/1771445.html style= 数据"/>

Python 抓取Scrape Center中ssr1数据

文章目录

1. 利用 requests 库和正则表达式抓取ssr1的相关内容
- Scrape Center
- ssr1网址
- （1）定义getHTML(url)方法，获取指定网页的源代码。
- （2）定义findSSR1(html)方法，解析源代码，获取每条电影信息。
- （3）定义write_to_file()方法，将电影信息写入Excel文件中。
- （4）定义main(offset)方法，总合所有方法。

1. 利用 requests 库和正则表达式抓取ssr1的相关内容

Scrape Center

ssr1网址

/page/1
/page/2
…
/page/10

import re
import json
import time
import requests
from requests.exceptions import RequestException
#from fake_useragent import UserAgent

（1）定义getHTML(url)方法，获取指定网页的源代码。

def getHTML(url):try:headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}response = requests.get(url,timeout=30, headers=headers)response.encoding = response.apparent_encodingif response.status_code == 200:return response.textreturn Noneexcept RequestException:return None

（2）定义findSSR1(html)方法，解析源代码，获取每条电影信息。

def findSSR1(html):global slistpattern = repile('<div.*?el-col-md-4''.*?src="(.*?)"'#image'.*?<h2.*?>(.*?)</h2>'#name'.*?button.*?<span>(.*?)</span>.*?button.*?<span>(.*?)</span>.*?button.*?<span>(.*?)</span>'#备注'.*?info.*?<span.*?>(.*?)</span>.*?<span.*?>(.*?)</span>.*?<span.*?>(.*?)</span>.*?info.*?<span.*?>(.*?)</span>'#info'.*?score.*?>(.*?)</p>'#score'.*?</div>', re.S)items = re.findall(pattern,html)print(items)for item in items:slist.append([item[0],#imageitem[1],#name'、'.join(set([item[2],item[3],item[4]])),#infoitem[5],#countryitem[7],#时长item[8],#上映时间item[9].strip()])#评分#print(slist)return slist

（3）定义write_to_file()方法，将电影信息写入Excel文件中。

def write_to_file():global slist# 写入Excel文件wb = xw.Book()sht = wb.sheets('Sheet1')sht.range('a1').value = slist  # 将数据添加到表格中

（4）定义main(offset)方法，总合所有方法。

def main():global slistslist = [['image', 'name', 'info', 'country', 'time', '上映时间', '评分']]for i in range(1,11):url = "/page/" + str(i)html = getHTML(url)findSSR1(html)time.sleep(1)write_to_file()

import re
import time
import requests
from requests.exceptions import RequestException
import xlwings as xw
#from fake_useragent import UserAgentdef getHTML(url):try:headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}response = requests.get(url,timeout=30, headers=headers)response.encoding = response.apparent_encodingif response.status_code == 200:return response.textreturn Noneexcept RequestException:return Nonedef findSSR1(html):global slistpattern = repile('<div.*?el-col-md-4''.*?src="(.*?)"'#image'.*?<h2.*?>(.*?)</h2>'#name'.*?button.*?<span>(.*?)</span>.*?button.*?<span>(.*?)</span>.*?button.*?<span>(.*?)</span>'#备注'.*?info.*?<span.*?>(.*?)</span>.*?<span.*?>(.*?)</span>.*?<span.*?>(.*?)</span>.*?info.*?<span.*?>(.*?)</span>'#info'.*?score.*?>(.*?)</p>'#score'.*?</div>', re.S)items = re.findall(pattern,html)print(items)for item in items:slist.append([item[0],#imageitem[1],#name'、'.join(set([item[2],item[3],item[4]])),#infoitem[5],#countryitem[7],#时长item[8],#上映时间item[9].strip()])#评分#print(slist)return slist
def write_to_file():global slist# 写入Excel文件wb = xw.Book()sht = wb.sheets('Sheet1')sht.range('a1').value = slist  # 将数据添加到表格中def main():global slistslist = [['image', 'name', 'info', 'country', 'time', '上映时间', '评分']]for i in range(1,11):url = "/page/" + str(i)html = getHTML(url)findSSR1(html)time.sleep(1)write_to_file()if __name__ == '__main__':main()

更多推荐

Python 抓取Scrape Center中ssr1数据

本文发布于:2024-02-07 06:28:35，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1754398.html