当最终 url 是 https 时,我希望在 python 中缩短(解析)一个 url.我看到了这个问题:如何取消缩短网址使用 python? (以及其他类似的),但是正如对已接受答案的评论中所述,此解决方案仅在网址未重定向到 https 时有效.
作为参考,该问题中的代码(重定向到 http url 时工作正常)是:
# 这是针对 Py2k 的.对于 Py3k,请改用 http.client 和 urllib.parse,并且# 使用//代替/进行除法导入 httplib导入 urlparsedef unshorten_url(url):解析 = urlparse.urlparse(url)h = httplib.HTTPConnection(parsedloc)资源 = parsed.path如果 parsed.query != "":资源+=?"+ parsed.queryh.request('HEAD', 资源)响应 = h.getresponse()如果 response.status/100 == 3 和 response.getheader('Location'):return unshorten_url(response.getheader('Location')) # 改为处理短网址链别的:返回网址(注意 - 出于明显的带宽原因,我希望通过只请求文件头的 [即像上面的 http-only 版本] 而不是请求整个页面的内容来实现)
解决方案您可以从 url 获取方案,然后在 解析后使用 HTTPSConnection.方案是https.您也可以使用 requests 库非常简单地完成此操作.
>>>进口请求>>>r = requests.head('bit.ly/IFHzvO', allow_redirects=True)>>>打印(r.url)www.googleI am looking to unshorten (resolve) a url in python, when the final urls are https. I have seen the question: How can I un-shorten a URL using python? (as well as similar others), however as noted in the comment to the accepted answer, this solution only works when the urls is not redirected to https.
For reference, the code in that question (which works fine when redirecting to http urls) is:
# This is for Py2k. For Py3k, use http.client and urllib.parse instead, and # use // instead of / for the division import httplib import urlparse def unshorten_url(url): parsed = urlparse.urlparse(url) h = httplib.HTTPConnection(parsedloc) resource = parsed.path if parsed.query != "": resource += "?" + parsed.query h.request('HEAD', resource ) response = h.getresponse() if response.status/100 == 3 and response.getheader('Location'): return unshorten_url(response.getheader('Location')) # changed to process chains of short urls else: return url(note - for obvious bandwidth reasons, I am looking to achieve via only asking for the file header's [i.e. like the http-only version above] and not by asking for the content of the whole pages)
解决方案You can get the scheme from the url and then use HTTPSConnection if the parsed.scheme is https. You can also use the requests library to do this very simply.
>>> import requests >>> r = requests.head('bit.ly/IFHzvO', allow_redirects=True) >>> print(r.url) www.google
更多推荐
当最终 url 是 https 时,如何使用 python 取消缩短(解析)url?
发布评论