使用美丽汤获取网址

编程入门 行业动态 更新时间:2024-10-15 10:14:13
本文介绍了使用美丽汤获取网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个重定向(即HTTP 302)到实际网站的URL.然后我要解析.但是我想获取网站的实际URL(即真实URL).在BeautifulSoup中有没有一种方法可以做到这一点.

I have a URL that redirects (i.e HTTP 302) to the actual website. Which I'm then parsing. However I want to get the actual URL for the website (i.e the true URL). Is there a way of doing this in BeautifulSoup.

即www.bananas-重定向-> www.realfruit.我想以字符串形式获取的www.realfruit网址.

i.e www.bananas - redirects -> www.realfruit. Its the www.realfruit url I want to obtain as a string.

推荐答案

HTML页面的URL实际上是HTTP元数据,与HTML源代码无关.BeautifulSoup传递给HTML源(以文件对象或字符串的形式),而不是HTTP上下文.它不知道有关消息来源的任何信息.

The URL of a HTML page is HTTP metadata, not anything to do with the HTML source, really. BeautifulSoup is handed the HTML source (in the form of a file object or a string), not the HTTP context. It doesn't know anything about where the source came from.

充其量,如果您幸运的话,HTML源包括一个规范的URL < link> 标记,这是搜索引擎在尝试将人们再次引导至同一页面时应使用的URL.但这不一定是在将页面交给BeautifulSoup之前用于加载页面的实际URL!

At best, if you are lucky, the HTML source includes a canonical URL <link> tag, which is the URL a search engine should use when trying to direct people to the same page again. But that's not necessarily the actual URL used to load the page before handing it to BeautifulSoup!

如果您正在使用 requests 来加载页面,则只需向 it 索取URL. response.url 告诉您从哪个URL加载响应.您可以使用 response.history ,其中包含导致最终响应的30次响应.

If you are using requests to load your pages, then simply ask it for the URL. response.url tells you what URL the response was loaded from. You can access redirection history with response.history, which contains any 30x responses that led to the final response.

urllib2 响应具有 .geturl()方法,该方法返回使用的最终URL;同样适用于Python 3的 urllib.request.urlopen()响应.

urllib2 responses have a .geturl() method that returns the final URL used; ditto for Python 3's urllib.request.urlopen() responses.

更多推荐

使用美丽汤获取网址

本文发布于:2023-11-28 16:02:29,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1643059.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:网址   美丽

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!