Scrape Center爬虫平台之spa2案例

编程入门 行业动态 更新时间:2024-10-27 18:20:47

Scrape Center<a href=https://www.elefans.com/category/jswz/34/1770264.html style=爬虫平台之spa2案例"/>

Scrape Center爬虫平台之spa2案例

参考:
知乎
LLI ,ibra146
会修电脑的程序猿
scrapy学习之爬虫练习平台2
B站
=42

主要就是破解这个token值
思路分析:

1:当下时间戳time.time()取整,得t,假设t为1625572736

2:["/api/movie", 0, “1625572736”]----》/api/movie,0,1625572736

将 /api/movie,0,1625572736 用SHA1加密,化为16进制编码字符串,这字符串就是o的value
o = “d1983455197bf53903c2a2f6cf57a9a53863b923”

3: 将o与t用英文逗号连接,用Base64加密,即加密

d1983455197bf53903c2a2f6cf57a9a53863b923,1625572736
得出68位非人类看懂的字符串,也就浏览器看得爽
ZDE5ODM0NTUxOTdiZjUzOTAzYzJhMmY2Y2Y1N2E5YTUzODYzYjkyMywxNjI1NTcyNzM2
这就是token值
#4:得出的token值是二进制,要将其转为通用字符串

import requests
import time
import hashlib
import base64
def getHTMLText(url):try:r=requests.get(url,timeout=60)r.raise_for_status()r.encoding='utf-8'return r.json()except:pass
#1:时间戳取整    
t=int(time.time())
#2:SHA1加密    
s1 = f"/api/movie,0,{t}"
o = hashlib.sha1(s1.encode("utf-8")).hexdigest()    
s2=f'{o},{t}'
s3=s2.encode('utf-8')
#3:Base64加密
token=base64.b64encode(s3)
#4:bytes转str
token=token.decode()
print(token)
url=f"/api/movie/?limit=10&offset=0&token={token}"
html=getHTMLText(url)
print(html)
for i in range(10):print(html['results'][i]['id'],html['results'][i]['name'])

以上是第一页列表,以下是第二页列表,原来这个红线指的参数不是固定的,是根据offset的变化而变化

import time
import requests
def getHTMLText(url):try:r=requests.get(url,timeout=60)print(r.status_code)r.raise_for_status()r.encoding='utf-8'        return r.json()except:pass
#1:时间戳取整
t=int(time.time())
#2:SHA1加密
import hashlib
str = f"/api/movie,10,{t}"
o = hashlib.sha1(str.encode("utf-8")).hexdigest()
import base64
str=f'{o},{t}'
str=str.encode('utf-8')
#3:Base64加密
token=base64.b64encode(str)
with open('k:/zhusc/1.txt','wb') as f:f.write(token)
with open('k:/zhusc/1.txt') as f:token=f.read()
print(token)
url=f"/api/movie/?limit=10&offset=10&token={token}"
html=getHTMLText(url)
print(html)
import requests
import time
import hashlib
import base64
def getHTMLText(url):try:r=requests.get(url,timeout=60)r.raise_for_status()r.encoding='utf-8'return r.json()except:pass
for j in range(10):#1:时间戳取整    t=int(time.time())#2:SHA1加密    s1 = f"/api/movie,{j*10},{t}"o = hashlib.sha1(s1.encode("utf-8")).hexdigest()    s2=f'{o},{t}'s3=s2.encode('utf-8')#3:Base64加密token=base64.b64encode(s3)#4:bytes转strtoken=token.decode()url=f"/api/movie/?limit=10&offset={j*10}&token={token}"html=getHTMLText(url)for i in range(10):print(html['results'][i]['id'],html['results'][i]['name'])

#详情页
关于spa2的详情页url的解题思路,我觉得可以猜,因为这些url太像了。我列出前面3部电影+最后一部电影的URL。
ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIx
ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIy
ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIz
ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIxMDA=
base64在线解码:
ef34#teuq0btua#(-57w1q5o5–j@98xygimlyfxs*-!i-0-mb1
ef34#teuq0btua#(-57w1q5o5–j@98xygimlyfxs*-!i-0-mb2
ef34#teuq0btua#(-57w1q5o5–j@98xygimlyfxs*-!i-0-mb3
ef34#teuq0btua#(-57w1q5o5–j@98xygimlyfxs*-!i-0-mb100
马上看出规律,前面那些是固定值,后面是电影的排列顺序,一步到位啦。

![在这里插入图片描述](.jpg?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2xpbmd5dW5jZWxpYQ==,size_16,color_FFFFFF,t_70#pic_center

#单线程
import requests
import time
t1=time.time()
import hashlib
import base64
def getHTMLText(url):try:r=requests.get(url,timeout=60)r.encoding='utf-8'return r.json()except:pass
a="ef34#teuq0btua#(-57w1q5o5--j@98xygimlyfxs*-!i-0-mb"
for b in range(1,101):c=f"{a}{b}"url_id=base64.b64encode(c.encode('utf-8'))url_id=url_id.decode()#1:时间戳取整    t=int(time.time())#2:SHA1加密    s1 = f"/api/movie/{url_id},0,{t}"o = hashlib.sha1(s1.encode("utf-8")).hexdigest()    s2=f'{o},{t}'s3=s2.encode('utf-8')#3:Base64加密token=base64.b64encode(s3)#4:bytes转strtoken=token.decode()url=f"/api/movie/{url_id}/?token={token}"html=getHTMLText(url)print(html['id'],html['drama'])
print(time.time()-t1)#约一分钟
#异步爬取详情页
import requests
import time
t1=time.time()
import hashlib
import base64
import asyncio
import aiohttp
def getURL(b):a="ef34#teuq0btua#(-57w1q5o5--j@98xygimlyfxs*-!i-0-mb"c=f"{a}{b}"url_id=base64.b64encode(c.encode('utf-8'))url_id=url_id.decode()#1:时间戳取整    t=int(time.time())#2:SHA1加密    s1 = f"/api/movie/{url_id},0,{t}"o = hashlib.sha1(s1.encode("utf-8")).hexdigest()    s2=f'{o},{t}'s3=s2.encode('utf-8')#3:Base64加密token=base64.b64encode(s3)#4:bytes转strtoken=token.decode()url=f"/api/movie/{url_id}/?token={token}"return(url)
async def get(session, queue):while True:try:page = queue.get_nowait()except asyncio.QueueEmpty:returnurl = getURL(page)resp = await session.get(url,timeout=60)html=await resp.json(encoding='utf-8')print(html['id'],html['drama'])        
async def main():async with aiohttp.ClientSession() as session:queue = asyncio.Queue()for page in range(1,101):queue.put_nowait(page)tasks = []for _ in range(100):task = get(session, queue)tasks.append(task)await asyncio.wait(tasks)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
print(time.time()-t1)#约5秒

更多推荐

Scrape Center爬虫平台之spa2案例

本文发布于:2024-02-07 06:28:06,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1754183.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:爬虫   案例   平台   Scrape   Center

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!