Python请求授权[关闭](Python Request Authorization [closed])

编程入门行业动态更新时间:2024-10-25 08:25:26

为了税收目的，我需要提取一些网站信息，不幸的是，“导出”功能并没有提供我需要的所有信息。该信息确实存在于网站上，作为250多个网页中每个网页中的单个字段。我可以点击每一个并保存它们并用脚本处理它们，但我宁愿学习一些东西。

request包自称是天赐之物，尽管我并不喜欢它。问题是我必须登录我的网站。 request的文档包括有关身份验证的链接，记录各种形式的身份验证，但不包含有关如何判断我的网站实际使用哪种身份验证形式的信息。我假设当我登录网站时，我的计算机上放了某种类型的cookie，理论上，我可以在我的硬盘上找到它并将其与我的请求一起发送，但我接下来没有使用cookies的经验和/或授权，不知道发送什么。

如果我可以在一个网址列表中创建一批请求，所有这些请求都在同一个网站上，并且只下载html，我可以处理它并构建一个报告。

在您的回复中，如果有关于http身份验证和Cookie如何协同工作的一般知识的链接，我也很乐意阅读。

非常感谢您提供的任何帮助。

I need to pull some of my information of a web site for tax purposes, and unfortunately, the "export" features don't give me all the information I need. The information does exist on the web site, as a single field in each of over 250 web pages. I can click on each one and save them all and process them with a script, but I'd rather learn something instead.

The request package bills itself as being godsend for this, though I'm not wedded to it. The problem is that I have to log into my web site. request's documentation includes this link on authentication, documenting various forms of authentication, but contains no information on how to tell which form of authentication my web site actually uses. I assume that a cookie of some sort is put on my computer when I log into the web site, and in theory, I could find that on my hard drive and send it along with my requests, but I have next to no experience with cookies and/or authorization and wouldn't know what to send.

If I can make a batch of requests to a list of urls, all on the same site, and download just the html, I can process it and build a report.

In your reply, if there are any links to general knowledge about how http authentication and cookies work together, I'd be happy to read that as well.

Thank you very much for any help you can provide.

最满意答案

请尝试以下方法：

我正在使用BasicAuth因为它是最常见的Auth形式; 您可以通过查看文档将其更改为任何其他形式

使用以下作为“基本代码”，您可以创建要访问的urls的list或dict并循环它们。上面的代码还可以省去“在硬盘上找到cookie并加载它们”的麻烦。

编辑：看完OP的评论后：

import requests login_url = "https://www.wyzant.com/sso/login" # this is the login for's action url, extracted it from the source code payload = { "Username" : "<username>", "Password" : "<password>" } with requests.Session() as s: r = s.post(login_url, data=payload) cookies = r.cookies r = s.get(url, cookies=cookies) # do whatever

我已经尝试了上面的确切代码，运行完美，我可以登录并访问学生仪表板。

干杯。

Try the following:

I'm using BasicAuth as it is the most common form of Auth; you can change it to any other form by looking at the documentation

Using the below as a "base code", you can create a list or dict of the urls to be visited and loop over them. The above code will also save you the trouble from having to "locate the cookies on the hard disk and load them".