如何使用urllib从网上下载图片(How to use urllib to download image from web)

我正尝试使用此代码下载图片：

from urllib import urlretrieve urlretrieve('http://gdimitriou.eu/wp-content/uploads/2008/04/google-image-search.jpg', 'google-image-search.jpg')

有效。该图像已下载，可以通过任何图像查看器软件打开。

但是，下面的代码不起作用。下载的图像仅为2KB，无法由任何图像查看器打开。

from urllib import urlretrieve urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg')

这是HTML格式的结果。

ERROR The requested URL could not be retrieved While trying to retrieve the URL: http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg The following error was encountered: Access Denied. Access control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect. Your cache administrator is nobody. Generated Mon, 05 Dec 2011 17:19:53 GMT by sq56.wikimedia.org (squid/2.7.STABLE9)

I'm trying to download an image using this code:

from urllib import urlretrieve urlretrieve('http://gdimitriou.eu/wp-content/uploads/2008/04/google-image-search.jpg', 'google-image-search.jpg')

It worked. The image was downloaded and can be open by any image viewer software.

However, the code below is not working. Downloaded image is only 2KB and can't be opened by any image viewer.

from urllib import urlretrieve urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg')

Here is the result in HTML format.

ERROR The requested URL could not be retrieved While trying to retrieve the URL: http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg The following error was encountered: Access Denied. Access control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect. Your cache administrator is nobody. Generated Mon, 05 Dec 2011 17:19:53 GMT by sq56.wikimedia.org (squid/2.7.STABLE9)

最满意答案

如果您使用以下内容，则可以下载图像：

wget http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg

但是，如果你做了以下事情：

from urllib import urlretrieve urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg')

您可能无法下载图像。这可能是这种情况，因为维基百科可能有规则（robot.txt）拒绝机器人或机器人（未知客户端）。 尝试模拟浏览器。

要做到这一点，你必须添加以下内容作为标题的一部分：

('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')

你可以做这样的事情 ：

>>> from urllib import FancyURLopener >>> class MyOpener(FancyURLopener): ... version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11' ... >>> myopener = MyOpener() >>> myopener.retrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg') ('Zindagi1976.jpg', <httplib.HTTPMessage instance at 0x1007bfe18>)

这将检索文件

If you used the following, you can download the image:

wget http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg

But if you did the following:

from urllib import urlretrieve urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg')

You may not be able to download image. This may be the case because wikipedia may have rules (robot.txt) to deny robots or bots (unknown clients). Try emulating a browser.

To do that you have to add the following as a part of header:

('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')

You can do something like this:

>>> from urllib import FancyURLopener >>> class MyOpener(FancyURLopener): ... version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11' ... >>> myopener = MyOpener() >>> myopener.retrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg') ('Zindagi1976.jpg', <httplib.HTTPMessage instance at 0x1007bfe18>)

This retrieves the file

更多推荐

如何使用urllib从网上下载图片(How to use urllib to download image from web)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表