这是我目前的代码。 我不确定我做错了什么。 也许我没有深入挖掘HTML,并给Beautifulsoup正确的标签? 目前,我的代码正在返回空白。
from bs4 import BeautifulSoup from urllib.request import urlopen html = urlopen("https://www.youtube.com/watch?v=5_zrHZdhaBU") soup = BeautifulSoup(html,'html.parser') nameList = soup.findAll("div", {"id": "cp-2"}) for name in nameList: print(name.get_text())这是我检查的代码。 我试图让Python回到我的身边,“但它被取消了”
<div id="cp-2" class="caption-line" data-time="7.54"><div class="caption-line-time">0:07</div><div class="caption-line-text">but it was untucked.</div></div>***编辑
点击分享按钮旁边的“更多”即可找到代码。 然后你点击成绩单,你会看到所有的文字。
Here is my current code. I am not sure what I am doing wrong. Maybe I am not digging deep enough in the html and giving Beautifulsoup the right tags? At the moment, my code is returning me blanks.
from bs4 import BeautifulSoup from urllib.request import urlopen html = urlopen("https://www.youtube.com/watch?v=5_zrHZdhaBU") soup = BeautifulSoup(html,'html.parser') nameList = soup.findAll("div", {"id": "cp-2"}) for name in nameList: print(name.get_text())Here is the code that I inspected. I'm trying to get Python to return back to me "but it was untucked"
<div id="cp-2" class="caption-line" data-time="7.54"><div class="caption-line-time">0:07</div><div class="caption-line-text">but it was untucked.</div></div>***Edit
The code can be found by clicking on "more" next to the share button. Then you click on transcripts and you will see all the text there.
最满意答案
哦,是的,它通过Ajax加载:打开页面,然后打开Network选项卡,按开始时间对请求进行排序(最先请求),点击Youtube上的CC按钮。
你得到api/timedtext请求,响应是一个XML。 在这里它的成绩单的完整网址:
https://www.youtube.com/api/timedtext?signature=1A03D323CBD455E9993B7AC447CA64764FA6FE75.59F4BD2D45A32E89FBF54B418EE2F763283A1007&asr_langs=fr%2Cja%2Cnl%2Ces%2Cru%2Cko%2Cit%2Cde%2Cpt%2Cen&key=yttt1&caps=asr&v=5_zrHZdhaBU&hl=en_US&expire=1480702409&sparams= asr_langs%2Ccaps%2CV%2Cexpire&郎= EN&FMT = srv3
不过,我不知道这个URL是如何生成的。 这需要对复杂的YouTube脚本进行调查等。
编辑: 这个答案帮助了我。 您可以省略大部分这些参数,只需使用以下URL即可:
https://www.youtube.com/api/timedtext?&v=5_zrHZdhaBU&lang=en或者一般来说:
https://www.youtube.com/api/timedtext?&v={video_id}&lang={language_code}Oh yes, it's loaded via Ajax: open the page, then open Network tab, sort requests by start time (latest requests first), click CC button on Youtube.
You get api/timedtext request, the response is an XML. Here it the full url to the transcript:
https://www.youtube.com/api/timedtext?signature=1A03D323CBD455E9993B7AC447CA64764FA6FE75.59F4BD2D45A32E89FBF54B418EE2F763283A1007&asr_langs=fr%2Cja%2Cnl%2Ces%2Cru%2Cko%2Cit%2Cde%2Cpt%2Cen&key=yttt1&caps=asr&v=5_zrHZdhaBU&hl=en_US&expire=1480702409&sparams=asr_langs%2Ccaps%2Cv%2Cexpire&lang=en&fmt=srv3
I have no idea how this URL is generated, though. This requires invesigation of complex YouTube scripts, etc.
EDIT: This answer helped me. You can omit most of these parameters and just use this URL:
https://www.youtube.com/api/timedtext?&v=5_zrHZdhaBU&lang=enOr this in general:
https://www.youtube.com/api/timedtext?&v={video_id}&lang={language_code}更多推荐
发布评论