本文介绍了使用python从HTML页面源下载图像文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在编写一个从 HTML 页面下载所有图像文件并将它们保存到特定文件夹的抓取工具.所有图片都是 HTML 页面的一部分.
I am writing a scraper that downloads all the image files from a HTML page and saves them to a specific folder. all the images are the part of the HTML page.
推荐答案这里是一些代码,用于从提供的 URL 下载所有图像,并将它们保存在指定的输出文件夹中.您可以根据自己的需要对其进行修改.
Here is some code to download all the images from the supplied URL, and save them in the specified output folder. You can modify it to your own needs.
""" dumpimages.py Downloads all the images on the supplied URL, and saves them to the specified output file ("/test/" by default) Usage: python dumpimages.py example/ [output] """ from bs4 import BeautifulSoup as bs from urllib.request import ( urlopen, urlparse, urlunparse, urlretrieve) import os import sys def main(url, out_folder="/test/"): """Downloads all the images at 'url' to /test/""" soup = bs(urlopen(url)) parsed = list(urlparse(url)) for image in soup.findAll("img"): print("Image: %(src)s" % image) filename = image["src"].split("/")[-1] parsed[2] = image["src"] outpath = os.path.join(out_folder, filename) if image["src"].lower().startswith("http"): urlretrieve(image["src"], outpath) else: urlretrieve(urlunparse(parsed), outpath) def _usage(): print("usage: python dumpimages.py example [outpath]") if __name__ == "__main__": url = sys.argv[-1] out_folder = "/test/" if not url.lower().startswith("http"): out_folder = sys.argv[-1] url = sys.argv[-2] if not url.lower().startswith("http"): _usage() sys.exit(-1) main(url, out_folder)您现在可以指定输出文件夹.
You can specify the output folder now.
更多推荐
使用python从HTML页面源下载图像文件?
发布评论