豆瓣 电影名称"/>
python3 豆瓣 电影名称
python3.7
python3 爬虫 豆瓣 电影名称
#!/usr/bin/python
# coding: utf-8import requests
from bs4 import BeautifulSoup #从bs4这个库中导入BeautifulSoupheaders = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36','Host': 'movie.douban'
}link = ''r = requests.get(link, headers=headers, timeout=20)
print(str(1), "页响应状态码:", r.status_code)
# print(r.text)print (type(r.status_code))
if r.status_code is not 200:exit(0)soup = BeautifulSoup(r.text, "lxml")
div_list = soup.find_all('div', class_='hd')print ('##################\n')
print (div_list[0])each = div_list[0]print ('+++++++++++++++\n')
movie = each.a.spanprint (movie)
movie_text = movie.text.strip()
print (movie_text)
print ('##################\n')print (div_list[1])each = div_list[1]print ('+++++++++++++++\n')
movie = each.a.spanprint (movie)movie_text = movie.text.strip()
print (movie_text)
log:
C:\ProgramData\Anaconda3\python.exe E:/python/work/ana_test/test.py
1 页响应状态码: 200
<class 'int'>
##################<div class="hd">
<a class="" href="/">
<span class="title">肖申克的救赎</span>
<span class="title"> / The Shawshank Redemption</span>
<span class="other"> / 月黑高飞(港) / 刺激1995(台)</span>
</a>
<span class="playable">[可播放]</span>
</div>
+++++++++++++++<span class="title">肖申克的救赎</span>
肖申克的救赎
##################<div class="hd">
<a class="" href="/">
<span class="title">霸王别姬</span>
<span class="other"> / 再见,我的妾 / Farewell My Concubine</span>
</a>
<span class="playable">[可播放]</span>
</div>
+++++++++++++++<span class="title">霸王别姬</span>
霸王别姬Process finished with exit code 0
将 print(r.text)打印出来的 内容:
用 notepad ++,保存到一个 123.html文件夹中。
需要设置notepad++ 的中文编码格式为 utf8 .
用 notepad ++ 打开的时候 ,显示的 是 能看懂的中文,而不是乱码 才行。
保存之后,双击 123.html 会在IE浏览器中 显示如下效果。
提取电影英文名,港台名,导演,主演,上映年份,电影分类和评分
代码如下:
#!/usr/bin/python
# coding: utf-8import requests
from bs4 import BeautifulSoup #从bs4这个库中导入BeautifulSoupheaders = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36','Host': 'movie.douban'
}link = ''r = requests.get(link, headers=headers, timeout=20)
print(str(1), "页响应状态码:", r.status_code)
# print(r.text)print (type(r.status_code))
if r.status_code is not 200:exit(0)soup = BeautifulSoup(r.text, "lxml")div_info_list = soup.find_all('div',class_ = 'info')print (div_info_list[0])each = div_info_list[0]
title = each.find('div', class_='hd').a.span.text.strip()print (title)info = each.find('div', class_='bd').p.text.strip()
print ('-------------- 1 ---------------- \n')
print(info)info = info.replace("\n", " ").replace("\xa0", " ")
print ('-------------- 2 ---------------- \n')
print(info)
info = ' '.join(info.split())
print ('-------------- 3 ---------------- \n')
print(info)
rating = each.find('span', class_='rating_num').text.strip()
num_rating = each.find('div', class_='star').contents[7].text.strip()
try:quote = each.find('span', class_='inq').text.strip()
except:quote = ""print ('-------------- 4 ---------------- \n')print ('title =', title)
print ('info = ', info)
print ('rating = ', rating)
print ('num_rating = ', num_rating)
print ('quote = ', quote)
运行结果:
C:\ProgramData\Anaconda3\python.exe E:/python/work/ana_test/test.py
1 页响应状态码: 200
<class 'int'>
<div class="info">
<div class="hd">
<a class="" href="/">
<span class="title">肖申克的救赎</span>
<span class="title"> / The Shawshank Redemption</span>
<span class="other"> / 月黑高飞(港) / 刺激1995(台)</span>
</a>
<span class="playable">[可播放]</span>
</div>
<div class="bd">
<p class="">导演: 弗兰克·德拉邦特 Frank Darabont 主演: 蒂姆·罗宾斯 Tim Robbins /...<br/>1994 / 美国 / 犯罪 剧情</p>
<div class="star">
<span class="rating5-t"></span>
<span class="rating_num" property="v:average">9.7</span>
<span content="10.0" property="v:best"></span>
<span>1559199人评价</span>
</div>
<p class="quote">
<span class="inq">希望让人自由。</span>
</p>
</div>
</div>
肖申克的救赎
-------------- 1 ---------------- 导演: 弗兰克·德拉邦特 Frank Darabont 主演: 蒂姆·罗宾斯 Tim Robbins /...1994 / 美国 / 犯罪 剧情
-------------- 2 ---------------- 导演: 弗兰克·德拉邦特 Frank Darabont 主演: 蒂姆·罗宾斯 Tim Robbins /... 1994 / 美国 / 犯罪 剧情
-------------- 3 ---------------- 导演: 弗兰克·德拉邦特 Frank Darabont 主演: 蒂姆·罗宾斯 Tim Robbins /... 1994 / 美国 / 犯罪 剧情
-------------- 4 ---------------- title = 肖申克的救赎
info = 导演: 弗兰克·德拉邦特 Frank Darabont 主演: 蒂姆·罗宾斯 Tim Robbins /... 1994 / 美国 / 犯罪 剧情
rating = 9.7
num_rating = 1559199人评价
quote = 希望让人自由。Process finished with exit code 0
更多推荐
python3 豆瓣 电影名称
发布评论