在请求中传递标头的影响?

编程入门 行业动态 更新时间:2024-10-24 20:20:45
本文介绍了在请求中传递标头的影响?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我想知道在 requests.get 中传递标头有什么不同,即 requests.get(url, headers) 和 requests 之间的区别.get(url).

I want to know what difference it makes when you pass headers in requests.get i.e. the difference between requests.get(url, headers) and requests.get(url).

我有这两段代码:

from lxml import html from lxml import etree import requests import re url = "www.amazon.in/SanDisk-micro-USB-connector-OTG-enabled-Android/dp/B00RBGYGMO" page = requests.get(url) tree = html.fromstring(page.text) XPATH_IMAGE_SOURCE = '//*[@id="main-image-container"]//img/@src' image_source = tree.xpath(XPATH_IMAGE_SOURCE) print 'type: ',type(image_source[0]) print image_source[0]

它的输出是您所期望的网址.但是这个:

this whose out put is a url as you'd expect. But this:

from lxml import html from lxml import etree import requests import re url = "www.amazon.in/SanDisk-micro-USB-connector-OTG-enabled-Android/dp/B00RBGYGMO" headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'} page = requests.get(url, headers=headers) tree = html.fromstring(page.text) XPATH_IMAGE_SOURCE = '//*[@id="main-image-container"]//img/@src' image_source = tree.xpath(XPATH_IMAGE_SOURCE) print 'type: ',type(image_source[0]) print image_source[0]

有一个以 data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAoHBwgHBgoIC 开头的输出我猜这是没有渲染的实际图像,只是普通数据.知道如何将它保存在 url 形式中吗?标头的存在还有哪些其他方式会影响我们得到的响应?

has an output that starts with data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAoHBwgHBgoIC I'm guessing this is the actual image without the rendering, just plain data. Any idea how I could keep it in url form? In what other ways does the presence of a header affect the response we get?

谢谢

推荐答案

将第一个代码的响应保存到 html 文件并在浏览器中打开:

Save the first code's response to html file and open in your browser:

如您所见,您在没有标题的情况下被亚马逊禁止.

as you can see, you are banned by amazon without headers.

使用这个 xpath:

use this xpath:

XPATH_IMAGE_SOURCE = '//*[@id="main-image-container"]//img/@data-old-hires'

出:

type: <class 'lxml.etree._ElementStringResult'> ecx.images-amazon/images/I/617TjMIouyL._SL1274_.jpg

这是原始 html 数据:

this is raw html data:

<img alt=".." src="&#10;data:image/webp;base64,UklGRuYIAABXRUJQVlA4INoIAACQQQCdASosAcsAPrFWpEqkIqQhIxN6gIgWCek6r4bUf/..." data-old-hires="ecx.images-amazon/images/I/617TjMIouyL._SL1274_.jpg"

图片url在data-old-hires属性中.

更多推荐

在请求中传递标头的影响?

本文发布于:2023-10-29 10:21:13,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1539438.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!