星图数据采集研究"/>
关于巨量星图数据采集研究
准备采集星图达人信息这块数据,抓个包看看
比较方便的直接抓到包
然后写成python脚本
import requestscookies = {
}headers = {'Host': 'www.xingtu','sec-ch-ua': '"Chromium";v="118", "Microsoft Edge";v="118", "Not=A?Brand";v="99"','sec-ch-ua-mobile': '?0','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 Edg/118.0.2088.76','Accept': 'application/json, text/plain, */*','x-login-source': '1','Agw-Js-Conv': 'str','X-CSRFToken': 'W3HXlV9Agod6vTBUi0ojwG0FaC8jmhQh','sec-ch-ua-platform': '"Windows"','Sec-Fetch-Site': 'same-origin','Sec-Fetch-Mode': 'cors','Sec-Fetch-Dest': 'empty','Referer': '=7297066212915052581&video_type=2&_route_from=from_page%3DMarket%26search_session_id%3D7297066212915052581%26is_for_order%3D1%26platform_source%3D1%26order_by%3Dscore%26sort_type%3D2%26search_scene%3D1%26display_scene%3D1%26limit%3D20%26page%3D1%26regular_filter%3D%255Bobject%2BObject%255D%26rel_attribute_filter%3D%255Bobject%2BObject%255D%26task_category%3D1%26package_id%3D%26is_filter%3D1%26current_tab%3D1%26displayScene%3Dmarket%26is_limit_time_price%3D0&btm_ppre=a0.b0.c0.d0&btm_pre=a4738.b16016.c26503.d3332&btm_show_id=f937aa1a-407d-46af-a87d-a776068b0e9b&btm_pre_unit_params=%257B%2522platform_source%2522%253A1%252C%2522order_by%2522%253A%2522score%2522%252C%2522sort_type%2522%253A2%252C%2522search_scene%2522%253A1%252C%2522display_scene%2522%253A1%252C%2522limit%2522%253A20%252C%2522page%2522%253A1%252C%2522regular_filter%2522%253A%257B%2522current_tab%2522%253A1%252C%2522marketing_target%2522%253A1%252C%2522task_category%2522%253A1%257D%252C%2522rel_attribute_filter%2522%253A%257B%2522price_by_video_type__ge%2522%253A%257B%2522field_value%2522%253A%25220%2522%252C%2522rel_id%2522%253A%25222%2522%257D%257D%252C%2522task_category%2522%253A%25221%2522%252C%2522package_id%2522%253A%2522%2522%252C%2522is_filter%2522%253A%25221%2522%252C%2522current_tab%2522%253A1%252C%2522displayScene%2522%253A%2522market%2522%257D','Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
}params = {'platform_source': '1','platform_channel': '1','o_author_id': '7234394998563684413',
}response = requests.get('', params=params, cookies=cookies, headers=headers)
这样就很好的拿到数据了
相对来说比较简单
但是星图大量采集需要过很多风控,我这边已经解决大量采集风控的问题,下一期再写一篇过风控的方法
更多推荐
关于巨量星图数据采集研究
发布评论