使用爬虫获取ajax数据

编程入门 行业动态 更新时间:2024-10-06 20:27:17

使用<a href=https://www.elefans.com/category/jswz/34/1770264.html style=爬虫获取ajax数据"/>

使用爬虫获取ajax数据

使用爬虫获取豆瓣电影排名信息

分析

  • 因为豆瓣电影排行榜内容使用ajax加载的,如果只是简单访问 url “=剧情&type=11&interval_id=100:90&action=” 得到的只是这个页面的一个框架html,并没有需要的实质性的内容
  • 经过抓包分析,发现内容数据在 url “=11&interval_id=100%3A90&action=&start=0&limit=20” 中, 返回的是json格式数据
    其实获取电影排行榜内容就是获取其中的json数据,通过改变url里的 interval_id、start、limit几个参数便可以获取到json数据

使用模块

  • urllib.request
  • json

代码

-简单修改了一下url 可以获取到前100条数据

from urllib import request
import jsonclass DouBanMovieSpide:"""豆瓣电影剧情片排行榜"""def __init__(self):self.url = "=11&interval_id=100%3A90&action=&start=0&limit=100"self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",}def load_page(self):"""加载页面,获取json数据"""try:req = request.Request(self.url, headers=self.headers)response = request.urlopen(req)html = response.read().decode()# print(type(html))     # > <class 'str'>self.parse_page(html)except Exception as e:print("load_page error:{}".format(e))def parse_page(self, html):"""解析html页面,实际上就是提取json数据"""try:text = json.loads(html)movie_list = []for t in text:rating = t['rating'][0]rank = t['rank']title = t['title']movie_info = {"rating": rating,"rank": rank,"title": title,}movie_list.append(movie_info)self.write_info(movie_list)except Exception as e:print("parse_page error:{}".format(e))def write_info(self, movie):"""将提取出来的json数据存储到json文件中"""with open("../text/doubanmovie.json", 'w', encoding="utf-8") as f:f.write(json.dumps(movie, ensure_ascii=False))print("write success")if __name__ == "__main__":dbm = DouBanMovieSpide()dbm.load_page()
  • 获取到的数据
[{"rating": "9.6", "rank": 1, "title": "肖申克的救赎"}, {"rating": "9.6", "rank": 2, "title": "霸王别姬"}, {"rating": "9.6", "rank": 3, "title": "控方证人"}, {"rating": "9.5", "rank": 4, "title": "美丽人生"}, {"rating": "9.5", "rank": 5, "title": "辛德勒的名单"}, {"rating": "9.4", "rank": 6, "title": "这个杀手不太冷"}, {"rating": "9.4", "rank": 7, "title": "阿甘正传"}, {"rating": "9.4", "rank": 8, "title": "十二怒汉"}, {"rating": "9.4", "rank": 9, "title": "泰坦尼克号 3D版"}, {"rating": "9.4", "rank": 10, "title": "背靠背,脸对脸"}, {"rating": "9.4", "rank": 11, "title": "灿烂人生"}, {"rating": "9.4", "rank": 12, "title": "茶馆"}, {"rating": "9.4", "rank": 13, "title": "十二怒汉"}, {"rating": "9.4", "rank": 14, "title": "控方证人"}, {"rating": "9.3", "rank": 15, "title": "盗梦空间"}, {"rating": "9.3", "rank": 16, "title": "泰坦尼克号"}, {"rating": "9.3", "rank": 17, "title": "千与千寻"}, {"rating": "9.3", "rank": 18, "title": "忠犬八公的故事"}, {"rating": "9.3", "rank": 19, "title": "放牛班的春天"}, {"rating": "9.3", "rank": 20, "title": "熔炉"}, {"rating": "9.3", "rank": 21, "title": "城市之光"}, {"rating": "9.3", "rank": 22, "title": "巴黎圣母院"}, {"rating": "9.2", "rank": 23, "title": "三傻大闹宝莱坞"}, {"rating": "9.2", "rank": 24, "title": "海上钢琴师"}, {"rating": "9.2", "rank": 25, "title": "星际穿越"}, {"rating": "9.2", "rank": 26, "title": "楚门的世界"}, {"rating": "9.2", "rank": 27, "title": "触不可及"}, {"rating": "9.2", "rank": 28, "title": "教父"}, {"rating": "9.2", "rank": 29, "title": "活着"}, {"rating": "9.2", "rank": 30, "title": "天堂电影院"}, {"rating": "9.2", "rank": 31, "title": "乱世佳人"}, {"rating": "9.2", "rank": 32, "title": "鬼子来了"}, {"rating": "9.2", "rank": 33, "title": "辩护人"}, {"rating": "9.2", "rank": 34, "title": "素媛"}, {"rating": "9.2", "rank": 35, "title": "小鞋子"}, {"rating": "9.2", "rank": 36, "title": "摩登时代"}, {"rating": "9.2", "rank": 37, "title": "七武士"}, {"rating": "9.2", "rank": 38, "title": "东京物语"}, {"rating": "9.2", "rank": 39, "title": "生活多美好"}, {"rating": "9.2", "rank": 40, "title": "超感猎杀:完结特别篇"}, {"rating": "9.2", "rank": 41, "title": "洞"}, {"rating": "9.2", "rank": 42, "title": "切腹"}, {"rating": "9.2", "rank": 43, "title": "哀乐中年"}, {"rating": "9.2", "rank": 44, "title": "狐妖小红娘剧场版:王权富贵"}, {"rating": "9.1", "rank": 45, "title": "摔跤吧!爸爸"}, {"rating": "9.1", "rank": 46, "title": "无间道"}, {"rating": "9.1", "rank": 47, "title": "蝙蝠侠:黑暗骑士"}, {"rating": "9.1", "rank": 48, "title": "指环王3:王者无敌"}, {"rating": "9.1", "rank": 49, "title": "飞越疯人院"}, {"rating": "9.1", "rank": 50, "title": "两杆大烟枪"}, {"rating": "9.1", "rank": 51, "title": "窃听风暴"}, {"rating": "9.1", "rank": 52, "title": "末代皇帝"}, {"rating": "9.1", "rank": 53, "title": "饮食男女"}, {"rating": "9.1", "rank": 54, "title": "钢琴家"}, {"rating": "9.1", "rank": 55, "title": "教父2"}, {"rating": "9.1", "rank": 56, "title": "美国往事"}, {"rating": "9.1", "rank": 57, "title": "狩猎"}, {"rating": "9.1", "rank": 58, "title": "无人知晓"}, {"rating": "9.1", "rank": 59, "title": "完美的世界"}, {"rating": "9.1", "rank": 60, "title": "忠犬八公物语"}, {"rating": "9.1", "rank": 61, "title": "海蒂和爷爷"}, {"rating": "9.1", "rank": 62, "title": "爱·回家"}, {"rating": "9.1", "rank": 63, "title": "芙蓉镇"}, {"rating": "9.1", "rank": 64, "title": "攻壳机动队2:无罪"}, {"rating": "9.1", "rank": 65, "title": "沉静如海"}, {"rating": "9.1", "rank": 66, "title": "地下"}, {"rating": "9.1", "rank": 67, "title": "熊的故事"}, {"rating": "9.1", "rank": 68, "title": "南海十三郎"}, {"rating": "9.1", "rank": 69, "title": "寻子遇仙记"}, {"rating": "9.1", "rank": 70, "title": "生之欲"}, {"rating": "9.1", "rank": 71, "title": "天堂回信"}, {"rating": "9.1", "rank": 72, "title": "鳄鱼波鞋走天涯"}, {"rating": "9.1", "rank": 73, "title": "剃头匠"}, {"rating": "9.1", "rank": 74, "title": "女人步上楼梯时"}, {"rating": "9.1", "rank": 75, "title": "丛林赤子心"}, {"rating": "9.1", "rank": 76, "title": "情迷意乱"}, {"rating": "9.1", "rank": 77, "title": "无言的山丘"}, {"rating": "9.1", "rank": 78, "title": "战争与和平"}, {"rating": "9.0", "rank": 79, "title": "我不是药神"}, {"rating": "9.0", "rank": 80, "title": "怦然心动"}, {"rating": "9.0", "rank": 81, "title": "少年派的奇幻漂流"}, {"rating": "9.0", "rank": 82, "title": "当幸福来敲门"}, {"rating": "9.0", "rank": 83, "title": "罗马假日"}, {"rating": "9.0", "rank": 84, "title": "搏击俱乐部"}, {"rating": "9.0", "rank": 85, "title": "闻香识女人"}, {"rating": "9.0", "rank": 86, "title": "指环王1:魔戒再现"}, {"rating": "9.0", "rank": 87, "title": "狮子王"}, {"rating": "9.0", "rank": 88, "title": "死亡诗社"}, {"rating": "9.0", "rank": 89, "title": "指环王2:双塔奇兵"}, {"rating": "9.0", "rank": 90, "title": "音乐之声"}, {"rating": "9.0", "rank": 91, "title": "穿条纹睡衣的男孩"}, {"rating": "9.0", "rank": 92, "title": "小森林 夏秋篇"}, {"rating": "9.0", "rank": 93, "title": "一一"}, {"rating": "9.0", "rank": 94, "title": "小森林 冬春篇"}, {"rating": "9.0", "rank": 95, "title": "我爱你"}, {"rating": "9.0", "rank": 96, "title": "大独裁者"}, {"rating": "9.0", "rank": 97, "title": "红鳉鱼"}, {"rating": "9.0", "rank": 98, "title": "莫娣"}, {"rating": "9.0", "rank": 99, "title": "从海底出击"}, {"rating": "9.0", "rank": 100, "title": "大路"}]

总结

  • 使用爬虫获取ajax数据,其实本质就是获取ajax返回的json数据(不全是)。
  • 使用爬虫获取数据应该要重视数据来源,有时不用太在意页面内容。

更多推荐

使用爬虫获取ajax数据

本文发布于:2024-02-14 05:14:58,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1762440.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:爬虫   数据   ajax

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!