Python机械化select

编程入门行业动态更新时间:2024-10-11 13:28:40

本文介绍了Python机械化select_form()-ParseError:SELECT之外的OPTION的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在使用Python 2.7和Mechanize 2.5.我正在尝试使用select_form()方法，但出现以下错误:

I am using Python 2.7 and Mechanize 2.5. I am trying to use the select_form() method, but I am getting the following error:

File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 499, in select_form global_form = self._factory.global_form File "C:\Python27\lib\site-packages\mechanize\_html.py", line 544, in __getattr__ self.forms() File "C:\Python27\lib\site-packages\mechanize\_html.py", line 557, in forms self._forms_factory.forms()) File "C:\Python27\lib\site-packages\mechanize\_html.py", line 237, in forms _urlunparse=_rfc3986.urlunsplit, File "C:\Python27\lib\site-packages\mechanize\_form.py", line 845, in ParseResponseEx _urlunparse=_urlunparse, File "C:\Python27\lib\site-packages\mechanize\_form.py", line 982, in _ParseFileEx fp.feed(data) File "C:\Python27\lib\site-packages\mechanize\_form.py", line 759, in feed _sgmllib_copy.SGMLParser.feed(self, data) File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 110, in feed self.goahead(0) File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 144, in goahead k = self.parse_starttag(i) File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 302, in parse_starttag self.finish_starttag(tag, attrs) File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 347, in finish_starttag self.handle_starttag(tag, method, attrs) File "C:\Python27\lib\site-packages\mechanize\_sgmllib_copy.py", line 387, in handle_starttag method(attrs) File "C:\Python27\lib\site-packages\mechanize\_form.py", line 736, in do_option _AbstractFormParser._start_option(self, attrs) File "C:\Python27\lib\site-packages\mechanize\_form.py", line 481, in _start_option raise ParseError("OPTION outside of SELECT") ParseError: OPTION outside of SELECT

这是我的代码:

cj = cookielib.LWPCookieJar() br = mechanize.Browser() br.set_cookiejar(cj) br.set_handle_equiv(True) br.set_handle_gzip(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) br.addheaders = [('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] br.open("website_url_which_i_will_not_share") br.select_form(nr=0)

以下是我打开的网页上html的表单部分

The following is the form section of the html on the webpage that I opened

<html lang="en-us" xml:lang="en-us" xmlns="www.w3/1999/xhtml"> <head> I omitted this section </head> <body class="login"> <div id="container"> <div id="header" style="background-color: #13397A;"> <div id="content" class="colM"> <div id="content-main"> <form id="login-form" method="post" action="/admin/"> <div style="display:none"> <input type="hidden" value="8a689f2e3d215a3465f1bb66e037d1a5" name="csrfmiddlewaretoken"> </div> <div class="form-row"> <label class="required" for="id_username">Username:</label> <input id="id_username" type="text" maxlength="30" name="username"> </div> <div class="form-row"> <label class="required" for="id_password">Password:</label> <input id="id_password" type="password" name="password"> <input type="hidden" value="1" name="this_is_the_login_form"> <input type="hidden" value="/admin/" name="next"> </div> <div class="submit-row"> <label> </label> <input type="submit" value="Log in"> </div> </form> <script type="text/javascript"> </div> <br class="clear"> </div> <div id="footer"></div> </div> <script type="text/javascript"> </body> </html>

我已经在stackoverflow和google上对此进行了研究，但是找不到类似的问题，甚至找不到对该错误的描述.

I have researched this on stackoverflow and on google, but I cannot find a similar question or even a description of this error.

如果有人可以告诉我此错误的含义并帮助我在这里找到问题所在，我将不胜感激.

If anyone could tell me what this error means and help me find whats wrong here, I would greatly appreciate it.

谢谢

我已经做了很多表单提交工作，除此站点外，每个站点都可以正常工作.这是一个数据库API，我正在尝试从中提取数据.

I have been doing a lot of form submitting and every site works fine except for this one. It is a database API, which I am trying to scrap data from.

推荐答案

我遇到了同样的问题(不幸的是尚未解决)，我发现了这段有趣的代码，可能会有所帮助

I had the same problem (and unfortunately haven't solved it yet), and I found this interesting piece of code, it might help

来自 comments.gmane/gmanep .python.wwwsearch.general/1991

import mechanize from BeautifulSoup import BeautifulSoup class SanitizeHandler(mechanize.BaseHandler): def http_response(self, request, response): if not hasattr(response, "seek"): response = mechanize.response_seek_wrapper(response) #if HTML used get it though a robust Parser like BeautifulSoup if response.info().dict.has_key('content-type') and ('html' in response.info().dict['content-type']): soup = BeautifulSoup(response.get_data()) response.set_data(soup.prettify()) return response br = mechanize.Browser() br.add_handler(SanitizeHandler()) # Now you get good HTML

这应覆盖http_response方法并清理"您的html.

This should override the http_response method and "clean" your html.

更多推荐

Python机械化select

本文发布于:2023-08-01 21:42:51，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1271991.html