爬取菜鸟教程|菜鸟笔记,作为爬虫玩家,不想复制,但有需要,所以写来spider.

编程知识 行业动态 更新时间:2024-06-13 00:19:45

想要这些数据

代码可直接运行,但是要先装包,最后将数据放到excel表格中了
爬取连接为https://www.runoob/python/python-exceptions.html

import requests as re
import pandas as pd
import bs4
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Cookie':'__guid=61023018.2220520734065574000.1561789449521.1106; _ga=GA1.2.691952474.1561789450; _gid=GA1.2.1913507903.1562568389; monitor_count=10; Hm_lvt_3eec0b7da6548cf07db3bc477ea905ee=1562730834,1562750530,1562752398,1562752508; Hm_lpvt_3eec0b7da6548cf07db3bc477ea905ee=1562752508',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.9',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Host':'www.runoob',
'Referer':'https://www.baidu/link?url=p9nOeNKSa-aZaI0Sf_fk9sYJ0nyIS0V4X3rdM2T2vxjObxbWIHy-Com3v5Nd3cR0eyuen9VK5yTiPoCiKdN7Oa&wd=&eqid=bf29f91c00044643000000025d25614c',
'Upgrade-Insecure-Requests':'1'

}
url1='https://www.runoob/python/python-exceptions.html'
a = re.get(url=url1).content.decode('utf-8')
# print(a)
html = bs4.BeautifulSoup(a,'lxml')
s =html.table.find_all('td')
lists = []
for i in s:
    for j in i:
        f=j.replace('\r\n',"")
        lists.append(f)
a1 =[]
a2 = []
for i in range(len(lists)):
    if i%2==0:
        a1.append(lists[i])
    else:
        a2.append(lists[i])
aa2 = pd.DataFrame(a2,a1)

aa2.to_excel(excel_writer=r'2.xlsx')

更多推荐

爬取菜鸟教程|菜鸟笔记,作为爬虫玩家,不想复制,但有需要,所以写来spider.

本文发布于:2023-03-28 19:08:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/c2f28d10cbad5d97614a1d99091ab462.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:菜鸟   爬虫   写来   玩家   笔记

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!