爬虫示例

编程入门 行业动态 更新时间:2024-10-25 04:17:14

<a href=https://www.elefans.com/category/jswz/34/1770264.html style=爬虫示例"/>

爬虫示例

  常见的请求头:

    host:网站的域名    比如:www.lagou

    content-type:请求数据的类型

    user-agent:发送请求的代理

    cookie:发送请求携带的cookie

    referer:上一次请求的地址

    Location:(响应头中)重定向的地址

爬取抽屉:

 

备注:最常用的一种反爬虫的方式,就是验证请求头中有没有携带user-agent,所有在爬取时要携带这个头请求

 

抽屉网的自动登录和查看个人页面

 

备注:请求的过程中要携带cookie,可以通过从response中获取cookies,借助requests模块中封装的方法,直接将请求返回的响应封装成一个dict的形式,通过  ret.cookies.get_dict()的方法。

 

自动登录GitHub

获取登录页面

发送登录的post请求

发送post请求时是向,请求数据中,除了必须的登录名和密码外,还需要commit参数和utf8参数,此外还有一个动态参数authenticity_token,每次发送请求,都会发生变化。那这个参数是怎么来的?

  一般这种参数要么隐藏在页面中,就像django中的csrf-token一样,要么是在js中动态的通过一定的算法生成。

  github中的authenticity_token参数是在页面中隐藏的一个参数。

可以通过beautifulsoup直接解析出来

发送post请求必须携带cookies

 

 每次请求都会返回一个cookie,查看某个页面时,携带的请求要么是上一次的请求的cookie,要么是第一次访问页面的cookie,亦或者是两次或更多次cookie的结合。

代码示例:

import requestsfrom bs4 import BeautifulSoup###################  发送登录的get请求,获得登录页面   ####################
ret = requests.get(url='',)ret_cookie_dict = ret.cookies.get_dict()# 获取请求页面中的authenticity_token参数

soup = BeautifulSoup(ret.text,'html.parser')
token = soup.find(name='input',attrs={'name':'authenticity_token'}).attrs.get('value')#################  发送登录的post请求  ###########################333

ret1 = requests.post(url='',headers={'Host': 'github','Origin': '','Pragma': 'no-cache','Referer': '','Upgrade-Insecure-Requests': '1','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36'},data={'commit': 'Sign in','utf8': '✓','authenticity_token': token,'login': 'Zhao-panpan','password': 'xxxxx'},cookies = ret_cookie_dict
)ret1_cookie_dict = ret1.cookies.get_dict()###############  查看个人email设置页面   #########################

ret3 = requests.get(url='',headers={'Host': 'github','Origin': '','Pragma': 'no-cache','Referer': '','Upgrade-Insecure-Requests': '1','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36'},cookies = ret1_cookie_dict
)print(ret3.text)
github代码示例

 

备注:请求携带的头中,经常会有自定义的一些请求头,比如:code、token等

 

爬取拉钩网

登录时向login.json这个url发送post请求

请求头和请求数据的格式

点击登录后,会跳转到主页面/,这个过程发送了四次请求,其中三次重定向请求,一次最终请求

 

 备注:

  请求头中的参数:x-requested-with:XMLHttpRequest  表示发送的一次ajax请求

  requests模块返回响应时,如果返回的结果为一个字典类型的字符串时,而不是HTML时,可以通过    res.json()  获取请求内容json对应的字典。

  发送请求时,可以在请求参数中添加  allow_redirects=False  阻止requests模块内部的自动重定向

  发送post请求时,参数中data和json的区别:两者的数据都是作为请求体的数据发送,data参数是请求头中发送的content-type类型为urlencode的数据,在请求体中拼接成  name=xxxx&age=18的格式,而json参数是请求头中的content-type为json类型时,发送的,请数据转化为json字符串的格式。

  发送的数据在 Request  Payload中时,一般为json格式的数据,此时,请求的参数为json={‘k1’:"v1"}

 

  浏览器重定向时,可以设置Network中Preserve log的参数勾选,可以保留每次重定向的所有请求。

 

代码示例:

import re
import requests
all_cookie_dict = {}# ##################################### 第一步:访问登录页面 #####################################
r1 = requests.get(url='.html',headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
)token =  re.findall("X_Anti_Forge_Token = '(.*)';",r1.text)[0]
code =  re.findall("X_Anti_Forge_Code = '(.*)';",r1.text)[0]
r1_cookie_dict = r1.cookies.get_dict()
all_cookie_dict.update(r1_cookie_dict)# ##################################### 第二步:去登陆 #####################################
r2 = requests.post(url='.json',data={'isValidate':'true','username':'15131255089','password':'4565465','request_form_verifyCode':'','submit':''},headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','X-Requested-With':'XMLHttpRequest','Content-Type':'application/x-www-form-urlencoded; charset=UTF-8','Host':'passport.lagou','Origin':'','Referer':'.html','X-Anit-Forge-Code':code,'X-Anit-Forge-Token':token},cookies=all_cookie_dict)
r2_response_json = r2.json()
r2_cookie_dict = r2.cookies.get_dict()
all_cookie_dict.update(r2_cookie_dict)
# ##################################### 第三步:grant #####################################
r3 = requests.get(url='.html',headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','Referer':'.html','Host':'passport.lagou',},cookies=all_cookie_dict,allow_redirects=False)
r3_cookie_dict = r3.cookies.get_dict()
all_cookie_dict.update(r3_cookie_dict)
# ##################################### 第四步:action #####################################
r4 = requests.get(url=r3.headers['Location'],headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','Referer':'.html','Host':'www.lagou','Upgrade-Insecure-Requests':'1',},cookies=all_cookie_dict,allow_redirects=False)
r4_cookie_dict = r4.cookies.get_dict()
all_cookie_dict.update(r4_cookie_dict)# ##################################### 第五步:获取认证信息 #####################################
r5 = requests.get(url=r4.headers['Location'],headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','Referer':'.html','Host':'www.lagou','Upgrade-Insecure-Requests':'1',},cookies=all_cookie_dict,allow_redirects=False)
r5_cookie_dict = r5.cookies.get_dict()
all_cookie_dict.update(r5_cookie_dict)print(r5.headers['Location'])# ##################################### 第六步 #####################################
r6 = requests.get(url=r5.headers['Location'],headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','Referer':'.html','Host':'www.lagou','Upgrade-Insecure-Requests':'1',},cookies=all_cookie_dict,allow_redirects=False)
r6_cookie_dict = r6.cookies.get_dict()
all_cookie_dict.update(r6_cookie_dict)print(r6.headers['Location'])# ##################################### 第七步 #####################################
r7 = requests.get(url=r6.headers['Location'],headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','Referer':'.html','Host':'www.lagou','Upgrade-Insecure-Requests':'1',},cookies=all_cookie_dict,allow_redirects=False)
r7_cookie_dict = r7.cookies.get_dict()
all_cookie_dict.update(r7_cookie_dict)# ##################################### 第八步:查看个人信息 #####################################
r8 = requests.get(url='/',headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','Host':'gate.lagou','Pragma':'no-cache','Referer':'.html','X-L-REQ-HEADER':'{deviceType:1}'},cookies=all_cookie_dict
)
r8_response_json = r8.json()
# print(r8_response_json)
all_cookie_dict.update(r8.cookies.get_dict())# ##################################### 第九步:查看个人信息 #####################################

r9 = requests.put(url='/',headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','Host':'gate.lagou','Origin':'','Referer':'.html','X-L-REQ-HEADER':'{deviceType:1}','X-Anit-Forge-Code':r8_response_json.get('submitCode'),'X-Anit-Forge-Token':r8_response_json.get('submitToken'),'Content-Type':'application/json;charset=UTF-8',},json={"userName":"wupeiqi999","sex":"MALE","portrait":"images/myresume/default_headpic.png","positionName":"...","introduce":"...."},cookies=all_cookie_dict
)print(r9.text)
拉钩网代码示例

 爬取抖音

访问每一个用户的页面时,会发送一个ajax请求该用户的所有的作品

 

发送ajax的请求时,url路径携带的参数中,_signature参数是可变的,这个参数是js代码生成的

 

 

 生成该参数的js肯定在发送这个ajax请求之前得到的

 

 

 在所有js中先找_signature

 

 再找signature

 

signature是由一个_bytedAcrawler中的函数sign传入uid(用户id)生成的
再找_bytedAcrawler

 得到sign方法在导入的一个模块中require("douyin_falcon:node_modules/byted-acrawler/dist/runtime")

接着在source中查找所有的静态资源

找到这些js文件,然后放到一个HTML中,传入相应数据,console.log结果

<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1"><title>Title</title>
</head>
<body><script>__M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime", function (l, e) {Function(function (l) {return 'e(e,a,r){(b[e]||(b[e]=t("x,y","x "+e+" y")(r,a)}a(e,a,r){(k[r]||(k[r]=t("x,y","new x[y]("+Array(r+1).join(",x[y]")(1)+")")(e,a)}r(e,a,r){n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t)s[n="$"+t]=r[n];for(t=0,b=s=a;t<b;t)s[t]=a[t];c(e,0,s)}c(t,b,k){u(e){v[x]=e}f{g=,ting(bg)}l{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(h,y,d,g,v=[],x=0;;)switch(g=){case 1:u(!)4:f5:u((e){a=0,r=e;{c=a<r;c&&u(e[a]),c}}(6:y=,u((y8:if(g=,lg,g=,y===c)b+=g;else if(y!==l)y9:c10:u(s(11:y=,u(+y)12:for(y=f,d=[],g=0;g<y;g)d[g]=y.charCodeAt(g)^g+y;u(String.fromCharCode.apply(null,d13:y=,h=delete [y]14:59:u((g=)?(y=x,v.slice(x-=g,y:[])61:u([])62:g=,k[0]=65599*k[0]+k[1].charCodeAt(g)>>>065:h=,y=,[y]=h66:u(e(t[b],,67:y=,d=,u((g=).x===c?r(g.y,y,k):g.apply(d,y68:u(e((g=t[b])<"<"?(b--,f):g+g,,70:u(!1)71:n72:+f73:u(parseInt(f,3675:if(){bcase 74:g=<<16>>16g76:u(k[])77:y=,u([y])78:g=,u(a(v,x-=g+1,g79:g=,u(k["$"+g])81:h=,[f]=h82:u([f])83:h=,k[]=h84:!085:void 086:u(v[x-1])88:h=,y=,h,y89:u({e{r(e.y,arguments,k)}e.y=f,e.x=c,e})90:null91:h93:h=0:;default:u((g<<16>>16)-16)}}n=this,t=n.Function,s=Object.keys||(e){a={},r=0;for(c in e)a[r]=c;a=r,a},b={},k={};r'.replace(/[-]/g, function (e) {return l[15 & e.charCodeAt(0)]})}("v[x++]=v[--x]t.charCodeAt(b++)-32function return ))++.substrvar .length(),b+=;break;case ;break}".split("")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&effkx[!cs"l".Pq%widthl"@q&heightl"vr*getContextx$"2d[!cs#l#,*;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2q*shadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$jl  s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$b*b^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"jl  s&l&z0l!$ +["cs\'(0l#i\'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0s*yWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l <d>&+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i\'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl\'g,)gk}ejo{cm,)|yn~Lij~em["cl$b%@d<l&zl\'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ', [Object.defineProperty(e, "__esModule", {value: !0})])
});_bytedAcrawler = require("douyin_falcon:node_modules/byted-acrawler/dist/runtime");signature = _bytedAcrawler.sign('58841646784');console.log(signature);
</script></body>
</html>

运行后,报错

查找__M得到是由一个自执行函数中t的参数,这个t是传入的,传的是this

备注:! function  表示自执行函数

require是__M中的一个参数

 

这样可以得到这个动态的_signature的值了

最终执行的js文件为:

    !function(t) {if (t.__M = t.__M || {},!t.__M.require) {var e, n, r = document.getElementsByTagName("head")[0], i = {}, o = {}, a = {}, u = {}, c = {}, s = {}, l = function(t, n) {if (!(t in u)) {u[t] = !0;var i = document.createElement("script");if (n) {var o = setTimeout(n, e.timeout);i.onerror = function() {clearTimeout(o),n()};var a = function() {clearTimeout(o)};"onload"in i ? i.onload = a : i.onreadystatechange = function() {("loaded" === this.readyState || "complete" === this.readyState) && a()}}return i.type = "text/javascript",i.src = t,r.appendChild(i),i}}, f = function(t, e, n) {var r = i[t] || (i[t] = []);r.push(e);var o, a = c[t] || c[t + ".js"] || {}, u = a.pkg;o = u ? s[u].url || s[u].uri : a.url || a.uri || t,l(o, n && function() {n(t)})};n = function(t, e) {"function" != typeof e && (e = arguments[2]),t = t.replace(/\.js$/i, ""),o[t] = e;var n = i[t];if (n) {for (var r = 0, a = n.length; a > r; r++)n[r]();delete i[t]}},e = function(t) {if (t && t.splice)return e.async.apply(this, arguments);t = e.alias(t);var n = a[t];if (n)return n.exports;var r = o[t];if (!r)throw "[ModJS] Cannot find module `" + t + "`";n = a[t] = {exports: {}};var i = "function" == typeof r ? r.apply(n, [e, n.exports, n]) : r;return i && (n.exports = i),n.exports && !n.exports["default"] && Object.defineProperty && Object.isExtensible(n.exports) && Object.defineProperty(n.exports, "default", {value: n.exports}),n.exports},e.async = function(n, r, i) {function a(t) {for (var n, r = 0, h = t.length; h > r; r++) {var p = e.alias(t[r]);p in o ? (n = c[p] || c[p + ".js"],n && "deps"in n && a(n.deps)) : p in s || (s[p] = !0,l++,f(p, u, i),n = c[p] || c[p + ".js"],n && "deps"in n && a(n.deps))}}function u() {if (0 === l--) {for (var i = [], o = 0, a = n.length; a > o; o++)i[o] = e(n[o]);r && r.apply(t, i)}}"string" == typeof n && (n = [n]);var s = {}, l = 0;a(n),u()},e.resourceMap = function(t) {var e, n;n = t.res;for (e in n)n.hasOwnProperty(e) && (c[e] = n[e]);n = t.pkg;for (e in n)n.hasOwnProperty(e) && (s[e] = n[e])},e.loadJs = function(t) {l(t)},e.loadCss = function(t) {if (t.content) {var e = document.createElement("style");e.type = "text/css",e.styleSheet ? e.styleSheet.cssText = t.content : e.innerHTML = t.content,r.appendChild(e)} else if (t.url) {var n = document.createElement("link");n.href = t.url,n.rel = "stylesheet",n.type = "text/css",r.appendChild(n)}},e.alias = function(t) {return t.replace(/\.js$/i, "")},e.timeout = 5e3,t.__M.define = n,t.__M.require = e}
}(this);__M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime", function (l, e) {Function(function (l) {return 'e(e,a,r){(b[e]||(b[e]=t("x,y","x "+e+" y")(r,a)}a(e,a,r){(k[r]||(k[r]=t("x,y","new x[y]("+Array(r+1).join(",x[y]")(1)+")")(e,a)}r(e,a,r){n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t)s[n="$"+t]=r[n];for(t=0,b=s=a;t<b;t)s[t]=a[t];c(e,0,s)}c(t,b,k){u(e){v[x]=e}f{g=,ting(bg)}l{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(h,y,d,g,v=[],x=0;;)switch(g=){case 1:u(!)4:f5:u((e){a=0,r=e;{c=a<r;c&&u(e[a]),c}}(6:y=,u((y8:if(g=,lg,g=,y===c)b+=g;else if(y!==l)y9:c10:u(s(11:y=,u(+y)12:for(y=f,d=[],g=0;g<y;g)d[g]=y.charCodeAt(g)^g+y;u(String.fromCharCode.apply(null,d13:y=,h=delete [y]14:59:u((g=)?(y=x,v.slice(x-=g,y:[])61:u([])62:g=,k[0]=65599*k[0]+k[1].charCodeAt(g)>>>065:h=,y=,[y]=h66:u(e(t[b],,67:y=,d=,u((g=).x===c?r(g.y,y,k):g.apply(d,y68:u(e((g=t[b])<"<"?(b--,f):g+g,,70:u(!1)71:n72:+f73:u(parseInt(f,3675:if(){bcase 74:g=<<16>>16g76:u(k[])77:y=,u([y])78:g=,u(a(v,x-=g+1,g79:g=,u(k["$"+g])81:h=,[f]=h82:u([f])83:h=,k[]=h84:!085:void 086:u(v[x-1])88:h=,y=,h,y89:u({e{r(e.y,arguments,k)}e.y=f,e.x=c,e})90:null91:h93:h=0:;default:u((g<<16>>16)-16)}}n=this,t=n.Function,s=Object.keys||(e){a={},r=0;for(c in e)a[r]=c;a=r,a},b={},k={};r'.replace(/[-]/g, function (e) {return l[15 & e.charCodeAt(0)]})}("v[x++]=v[--x]t.charCodeAt(b++)-32function return ))++.substrvar .length(),b+=;break;case ;break}".split("")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&effkx[!cs"l".Pq%widthl"@q&heightl"vr*getContextx$"2d[!cs#l#,*;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2q*shadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$jl  s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$b*b^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"jl  s&l&z0l!$ +["cs\'(0l#i\'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0s*yWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l <d>&+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i\'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl\'g,)gk}ejo{cm,)|yn~Lij~em["cl$b%@d<l&zl\'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ', [Object.defineProperty(e, "__esModule", {value: !0})])
});_bytedAcrawler = __M.require("douyin_falcon:node_modules/byted-acrawler/dist/runtime");signature = _bytedAcrawler.sign('58841646784');console.log(signature);

接着就是下一步,怎么通过python执行这段js代码

  通过python执行js代码,先通过命令行node 执行这段代码,在通过python中的os模块中的popen执行命令行命令得到最终的结果,或者通过subprocess模块执行命令行

  通过node执行(命令行输入:node js文件的路径)后,报错

然后找到这个报错的地方document.getElementsByTagName("head")[0]

  解决方法:在该用户的抖音页面下console.log(document.getElementsByTagName("head")[0])得到的是一个head标签

然后将这段标签替换掉document.getElementsByTagName("head")[0]即可

替换后又报另一个错误    找不到__M

  解决方法:在浏览器中this是指window对象,在js中,this不是

接着又报错  找不到userAgent

  解决方法:在浏览器中,获取userAgent参数是通过一个navigator中的userAgent得到的

然后在这个js文件中加上这个参数即可

所以,最终的js文件为:

  navigator = {
userAgent:"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
}if (t.__M = t.__M || {},!t.__M.require) {var e, n, r = "<head> <meta charset=\"utf-8\"><title>快来加入抖音短视频,让你发现最有趣的我!</title><meta name=\"viewport\" content=\"width=device-width,initial-scale=1,user-scalable=0,minimum-scale=1,maximum-scale=1,minimal-ui,viewport-fit=cover\"><meta name=\"format-detection\" content=\"telephone=no\"><meta name=\"baidu-site-verification\" content=\"szjdG38sKy\"><meta name=\"keywords\" content=\"抖音、抖音音乐、抖音短视频、抖音官网、amemv\"><meta name=\"description\" content=\"抖音短视频-记录美好生活的视频平台\"><meta name=\"apple-mobile-web-app-capable\" content=\"yes\"><meta name=\"apple-mobile-web-app-status-bar-style\" content=\"default\"><link rel=\"apple-touch-icon-precomposed\" href=\"//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/image/logo/logo_launcher_v2_40f12f4.png\"><link rel=\"shortcut icon\" href=\"//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/image/logo/favicon_v2_7145ff0.ico\" type=\"image/x-icon\"><meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge;chrome=1\"><meta name=\"screen-orientation\" content=\"portrait\"><meta name=\"x5-orientation\" content=\"portrait\"><script async=\"\" src=\"//www.google-analytics/analytics.js\"></script><script type=\"text/javascript\">!function(){function e(e){return this.config=e,this}e.prototype={reset:function(){var e=Math.min(document.documentElement.clientWidth,750)/750*100;document.documentElement.style.fontSize=e+\"px\";var t=parseFloat(window.getComputedStyle(document.documentElement).fontSize),n=e/t;1!=n&&(document.documentElement.style.fontSize=e*n+\"px\")}},window.Adapter=new e,window.Adapter.reset(),window.οnlοad=function(){window.Adapter.reset()},window.οnresize=function(){window.Adapter.reset()}}();</script> <meta name=\"screen-orientation\" content=\"portrait\"><meta name=\"x5-orientation\" content=\"portrait\"><script>tac='i)69eo056r4s!i$1afls\"0,<8~z|\x7f@QGNCJF[\\\\^D\\\\KFYSk~^WSZhg,(lfi~ah`{md\"inb|1d<,%Dscafgd\"in,8[xtm}nLzNEGQMKAdGG^NTY\x1ckgd\"inb<b|1d<g,&TboLr{m,(\x02)!jx-2n&vr$testxg,%@tug{mn ,%vrfkbm[!cb|'</script><script type=\"text/javascript\">!function(){function e(e){return this.config=e,this}e.prototype={reset:function(){var e=Math.min(document.documentElement.clientWidth,750)/750*100;document.documentElement.style.fontSize=e+\"px\";var t=parseFloat(window.getComputedStyle(document.documentElement).fontSize),n=e/t;1!=n&&(document.documentElement.style.fontSize=e*n+\"px\")}},window.Adapter=new e,window.Adapter.reset(),window.οnlοad=function(){window.Adapter.reset()},window.οnresize=function(){window.Adapter.reset()}}();</script><meta name=\"pathname\" content=\"aweme_mobile_user\"> <meta name=\"screen-orientation\" content=\"portrait\"><meta name=\"x5-orientation\" content=\"portrait\"><meta name=\"theme-color\" content=\"#161823\"><meta name=\"pathname\" content=\"aweme_mobile_video\"><link rel=\"dns-prefetch\" href=\"//s3.bytecdn/\"><link rel=\"dns-prefetch\" href=\"//s3a.bytecdn/\"><link rel=\"dns-prefetch\" href=\"//s3b.bytecdn/\"><link rel=\"dns-prefetch\" href=\"//s0.pstatp/\"><link rel=\"dns-prefetch\" href=\"//s1.pstatp/\"><link rel=\"dns-prefetch\" href=\"//s2.pstatp/\"><link rel=\"dns-prefetch\" href=\"//v1-dy.ixigua/\"><link rel=\"dns-prefetch\" href=\"//v1-dy.ixiguavideo/\"><link rel=\"dns-prefetch\" href=\"//v3-dy.ixigua/\"><link rel=\"dns-prefetch\" href=\"//v3-dy.ixiguavideo/\"><link rel=\"dns-prefetch\" href=\"//v6-dy.ixigua/\"><link rel=\"dns-prefetch\" href=\"//v6-dy.ixiguavideo/\"><link rel=\"dns-prefetch\" href=\"//v9-dy.ixigua/\"><link rel=\"dns-prefetch\" href=\"//v9-dy.ixiguavideo/\"><link rel=\"dns-prefetch\" href=\"//v11-dy.ixigua/\"><link rel=\"dns-prefetch\" href=\"//v11-dy.ixiguavideo/\"><link rel=\"stylesheet\" href=\"//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/style/base_99078a4.css\"><style>@font-face{font-family:iconfont;src:url(//s3a.bytecdn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eadf2f.eot);src:url(//s3a.bytecdn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eadf2f.eot#iefix) format('embedded-opentype'),url(//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eb9a50.woff) format('woff'),url(//s3a.bytecdn/ies/resource/falcon/douyin_falcon/static/font/iconfont_da2e2ef.ttf) format('truetype'),url(//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/font/iconfont_31180f7.svg#iconfont) format('svg')}.iconfont{font-family:iconfont!important;font-size:.24rem;font-style:normal;letter-spacing:-.045rem;margin-left:-.085rem}@font-face{font-family:icons;src:url(//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_2f1b1cd.eot);src:url(//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_2f1b1cd.eot#iefix) format('embedded-opentype'),url(//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_87ad39c.woff) format('woff'),url(//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_5848858.ttf) format('truetype'),url(//s3a.bytecdn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_20c7f77.svg#iconfont) format('svg')}.icons{font-family:icons!important;font-size:.24rem;font-style:normal;-webkit-font-smoothing:antialiased;-webkit-text-stroke-width:.2px;-moz-osx-font-smoothing:grayscale}@font-face{font-family:Ies;src:url(//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/icons/Ies_317064f.woff2?ba9fc668cd9544e80b6f5998cdce1672) format(\"woff2\"),url(//s3a.bytecdn/ies/resource/falcon/douyin_falcon/static/icons/Ies_a07f3d4.woff?ba9fc668cd9544e80b6f5998cdce1672) format(\"woff\"),url(//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/icons/Ies_4c0d8be.ttf?ba9fc668cd9544e80b6f5998cdce1672) format(\"truetype\"),url(//s3.bytecdn/ies/resource/falcon/douyin_falcon/static/icons/Ies_1ac3f94.svg?ba9fc668cd9544e80b6f5998cdce1672#Ies) format(\"svg\")}i{line-height:1}i[class^=ies-]:before,i[class*=\" ies-\"]:before{font-family:Ies!important;font-style:normal;font-weight:400!important;font-variant:normal;text-transform:none;line-height:1;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.ies-checked:before{content:\"\\f101\"}.ies-chevron-left:before{content:\"\\f102\"}.ies-chevron-right:before{content:\"\\f103\"}.ies-clear:before{content:\"\\f104\"}.ies-close:before{content:\"\\f105\"}.ies-copy:before{content:\"\\f106\"}.ies-delete:before{content:\"\\f107\"}.ies-edit:before{content:\"\\f108\"}.ies-help-circle:before{content:\"\\f109\"}.ies-info:before{content:\"\\f10a\"}.ies-loading:before{content:\"\\f10b\"}.ies-location:before{content:\"\\f10c\"}.ies-paste:before{content:\"\\f10d\"}.ies-plus:before{content:\"\\f10e\"}.ies-query:before{content:\"\\f10f\"}.ies-remove:before{content:\"\\f110\"}.ies-search:before{content:\"\\f111\"}.ies-settings:before{content:\"\\f112\"}.ies-shopping-bag:before{content:\"\\f113\"}.ies-sort-left:before{content:\"\\f114\"}.ies-sort-right:before{content:\"\\f115\"}.ies-title-decorate-left:before{content:\"\\f116\"}.ies-title-decorate-right:before{content:\"\\f117\"}.ies-triangle-right:before{content:\"\\f118\"}.ies-triangle-top:before{content:\"\\f119\"}.ies-video:before{content:\"\\f11a\"}</style> <link rel=\"stylesheet\" href=\"//s3a.bytecdn/ies/resource/falcon/douyin_falcon/component/loading/index_5108ff2.css\">\n" +"<link rel=\"stylesheet\" href=\"//s3.bytecdn/ies/resource/falcon/douyin_falcon/component/banner/index_3941ffc.css\">\n" +"<link rel=\"stylesheet\" href=\"//s3a.bytecdn/ies/resource/falcon/douyin_falcon/component/common/openBrowser/index_2c31596.css\">\n" +"<link rel=\"stylesheet\" href=\"//s3a.bytecdn/ies/resource/falcon/douyin_falcon/page/reflow_user/index_ecb0bc9.css\">\n" +"<link rel=\"stylesheet\" href=\"//s3a.bytecdn/ies/resource/falcon/douyin_falcon/pkg/video_93fd288.css\"></head>", i = {}, o = {}, a = {}, u = {}, c = {}, s = {},l = function (t, n) {if (!(t in u)) {u[t] = !0;var i = document.createElement("script");if (n) {var o = setTimeout(n, e.timeout);i.onerror = function () {clearTimeout(o),n()};var a = function () {clearTimeout(o)};"onload" in i ? i.onload = a : i.onreadystatechange = function () {("loaded" === this.readyState || "complete" === this.readyState) && a()}}return i.type = "text/javascript",i.src = t,r.appendChild(i),i}}, f = function (t, e, n) {var r = i[t] || (i[t] = []);r.push(e);var o, a = c[t] || c[t + ".js"] || {}, u = a.pkg;o = u ? s[u].url || s[u].uri : a.url || a.uri || t,l(o, n && function () {n(t)})};n = function (t, e) {"function" != typeof e && (e = arguments[2]),t = t.replace(/\.js$/i, ""),o[t] = e;var n = i[t];if (n) {for (var r = 0, a = n.length; a > r; r++)n[r]();delete i[t]}},e = function (t) {if (t && t.splice)return e.async.apply(this, arguments);t = e.alias(t);var n = a[t];if (n)return n.exports;var r = o[t];if (!r)throw "[ModJS] Cannot find module `" + t + "`";n = a[t] = {exports: {}};var i = "function" == typeof r ? r.apply(n, [e, n.exports, n]) : r;return i && (n.exports = i),n.exports && !n.exports["default"] && Object.defineProperty && Object.isExtensible(n.exports) && Object.defineProperty(n.exports, "default", {value: n.exports}),n.exports},e.async = function (n, r, i) {function a(t) {for (var n, r = 0, h = t.length; h > r; r++) {var p = e.alias(t[r]);p in o ? (n = c[p] || c[p + ".js"],n && "deps" in n && a(n.deps)) : p in s || (s[p] = !0,l++,f(p, u, i),n = c[p] || c[p + ".js"],n && "deps" in n && a(n.deps))}}function u() {if (0 === l--) {for (var i = [], o = 0, a = n.length; a > o; o++)i[o] = e(n[o]);r && r.apply(t, i)}}"string" == typeof n && (n = [n]);var s = {}, l = 0;a(n),u()},e.resourceMap = function (t) {var e, n;n = t.res;for (e in n)n.hasOwnProperty(e) && (c[e] = n[e]);n = t.pkg;for (e in n)n.hasOwnProperty(e) && (s[e] = n[e])},e.loadJs = function (t) {l(t)},e.loadCss = function (t) {if (t.content) {var e = document.createElement("style");e.type = "text/css",e.styleSheet ? e.styleSheet.cssText = t.content : e.innerHTML = t.content,r.appendChild(e)} else if (t.url) {var n = document.createElement("link");n.href = t.url,n.rel = "stylesheet",n.type = "text/css",r.appendChild(n)}},e.alias = function (t) {return t.replace(/\.js$/i, "")},e.timeout = 5e3,t.__M.define = n,t.__M.require = e} }(this);this.__M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime", function (l, e) {Function(function (l) {return 'e(e,a,r){(b[e]||(b[e]=t("x,y","x "+e+" y")(r,a)}a(e,a,r){(k[r]||(k[r]=t("x,y","new x[y]("+Array(r+1).join(",x[y]")(1)+")")(e,a)}r(e,a,r){n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t)s[n="$"+t]=r[n];for(t=0,b=s=a;t<b;t)s[t]=a[t];c(e,0,s)}c(t,b,k){u(e){v[x]=e}f{g=,ting(bg)}l{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(h,y,d,g,v=[],x=0;;)switch(g=){case 1:u(!)4:f5:u((e){a=0,r=e;{c=a<r;c&&u(e[a]),c}}(6:y=,u((y8:if(g=,lg,g=,y===c)b+=g;else if(y!==l)y9:c10:u(s(11:y=,u(+y)12:for(y=f,d=[],g=0;g<y;g)d[g]=y.charCodeAt(g)^g+y;u(String.fromCharCode.apply(null,d13:y=,h=delete [y]14:59:u((g=)?(y=x,v.slice(x-=g,y:[])61:u([])62:g=,k[0]=65599*k[0]+k[1].charCodeAt(g)>>>065:h=,y=,[y]=h66:u(e(t[b],,67:y=,d=,u((g=).x===c?r(g.y,y,k):g.apply(d,y68:u(e((g=t[b])<"<"?(b--,f):g+g,,70:u(!1)71:n72:+f73:u(parseInt(f,3675:if(){bcase 74:g=<<16>>16g76:u(k[])77:y=,u([y])78:g=,u(a(v,x-=g+1,g79:g=,u(k["$"+g])81:h=,[f]=h82:u([f])83:h=,k[]=h84:!085:void 086:u(v[x-1])88:h=,y=,h,y89:u({e{r(e.y,arguments,k)}e.y=f,e.x=c,e})90:null91:h93:h=0:;default:u((g<<16>>16)-16)}}n=this,t=n.Function,s=Object.keys||(e){a={},r=0;for(c in e)a[r]=c;a=r,a},b={},k={};r'.replace(/[-]/g, function (e) {return l[15 & e.charCodeAt(0)]})}("v[x++]=v[--x]t.charCodeAt(b++)-32function return ))++.substrvar .length(),b+=;break;case ;break}".split("")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&effkx[!cs"l".Pq%widthl"@q&heightl"vr*getContextx$"2d[!cs#l#,*;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2q*shadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$jl s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$b*b^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"jl s&l&z0l!$ +["cs\'(0l#i\'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0s*yWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l <d>&+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i\'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl\'g,)gk}ejo{cm,)|yn~Lij~em["cl$b%@d<l&zl\'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ', [Object.defineProperty(e, "__esModule", {value: !0})]) });_bytedAcrawler = this.__M.require("douyin_falcon:node_modules/byted-acrawler/dist/runtime");signature = _bytedAcrawler.sign(process.argv[2])console.log(signature);

 

备注:使用命令行执行node 执行js时,如果传参,就需要一个process.argv的东东。

  process.argv 是一个包含命令行参数的数组。第一参数是“节点”,第二个是js的文件名。接下来的就是我们要的命令行参数。

所以process.argv[2]得到的值就是我们传的参数。

现在我们通过终端可以得到这个随机的值了,下一步是如何在python中执行

import os
import subprocess
# 方式一 signature = os.popen('node D:\Python\pachong\signed.js %s'%user_id) print(signature.read())# 方式二 signat = subprocess.getoutput('node D:\Python\pachong\signed.js %s'%user_id) print(signat)

 发送请求:

  

import requests
import subprocessuser_id = '58841646784'
signature = subprocess.getoutput('node signed.js %s' %user_id)user_video_list = []# ############################# 获取个人作品 ##########################
user_video_params = {'user_id': str(user_id),'count': '21','max_cursor': '0','aid': '1128','_signature': signature,'dytk': 'b4dceed99803a04a1c4395ffc81f3dbc' # '114f1984d1917343ccfb14d94e7ce5f5'
}res = requests.get(url="/",params=user_video_params,headers={'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','x-requested-with':'XMLHttpRequest','referer':'',}
)
print(res.text)

运行得到结果:{"status_code": 0, "has_more": true, "aweme_list": []}

  has_more为true表示请求正确,所以要多爬几次。

 

爬出代码示例:

import requestsuser_id = '58841646784' # 6556303280# 获取小姐姐的所有作品
"""signature = _bytedAcrawler.sign('用户ID')douyin_falcon:node_modules/byted-acrawler/dist/runtime
"""
import subprocess
signature = subprocess.getoutput('node s1.js %s' %user_id)user_video_list = []# ############################# 获取个人作品 ##########################
user_video_params = {'user_id': str(user_id),'count': '21','max_cursor': '0','aid': '1128','_signature': signature,'dytk': 'b4dceed99803a04a1c4395ffc81f3dbc' # '114f1984d1917343ccfb14d94e7ce5f5'
}def get_aweme_list(max_cursor=None):if max_cursor:user_video_params['max_cursor'] = str(max_cursor)res = requests.get(url="/",params=user_video_params,headers={'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','x-requested-with':'XMLHttpRequest','referer':'',})content_json = res.json()aweme_list = content_json.get('aweme_list', [])user_video_list.extend(aweme_list)if content_json.get('has_more') == 1:return get_aweme_list(content_json.get('max_cursor'))get_aweme_list()# ############################# 获取喜欢作品 ##########################
favor_video_list = []favor_video_params = {'user_id': str(user_id),'count': '21','max_cursor': '0','aid': '1128','_signature': signature,'dytk': 'b4dceed99803a04a1c4395ffc81f3dbc'
}def get_favor_list(max_cursor=None):if max_cursor:favor_video_params['max_cursor'] = str(max_cursor)res = requests.get(url="/",params=favor_video_params,headers={'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','x-requested-with':'XMLHttpRequest','referer':'',})content_json = res.json()aweme_list = content_json.get('aweme_list', [])favor_video_list.extend(aweme_list)if content_json.get('has_more') == 1:return get_favor_list(content_json.get('max_cursor'))get_favor_list()# ############################# 视频下载 ##########################
for item in user_video_list:video_id = item['video']['play_addr']['uri']video = requests.get(url='/',params={'video_id':video_id},headers={'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','x-requested-with': 'XMLHttpRequest','referer': '',},stream=True)file_name = video_id + '.mp4'with open(file_name,'wb') as f:for line in video.iter_content():f.write(line)for item in favor_video_list:video_id = item['video']['play_addr']['uri']video = requests.get(url='/',params={'video_id':video_id},headers={'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36','x-requested-with': 'XMLHttpRequest','referer': '',},stream=True)file_name = video_id + '.mp4'with open(file_name, 'wb') as f:for line in video.iter_content():f.write(line)

 

 

 

 

 

 

 

  

 

转载于:.html

更多推荐

爬虫示例

本文发布于:2024-02-11 15:14:40,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1681667.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:爬虫   示例

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!