我有一个脚本可以通过以下方式获取DNS(CNAME,MX,NS)数据:
I've got a script that gets DNS (CNAME, MX, NS) data in the following way:
from dns import resolver ... def resolve_dns(url): response_dict = {} print "\nResolving DNS for %s" % (url) try: response_dict['CNAME'] = [rdata for rdata in resolver.query(url, 'CNAME')] except: pass try: response_dict['MX'] = [rdata for rdata in resolver.query(url, 'MX')] except: pass try: response_dict['NS'] = [rdata for rdata in resolver.query(url, 'NS')] except: pass return response_dict对于连续的URL依次调用此函数.如果可能的话,我想通过同时获取多个URL的数据来加快上述过程.
This function is being called sequentially for successive URLs. If possible, I'd like to speed up the above process by getting the data for multiple URLs simultaneously.
是否有一种方法可以完成上述脚本对一批URL的处理(也许返回一个字典对象列表,每个字典对应于特定URL的数据)?
Is there a way to accomplish what the above script does for a batch of URLs (perhaps returning a list of dict objects, with each dict corresponding to the data for a particular a URL)?
推荐答案您可以将工作放入线程池中.您的 resolve_dns 连续执行3个请求,因此我创建了一个更通用的工作程序,该工作程序仅执行1个查询,并使用 collections.product 生成所有组合.在线程池中,我将chunksize设置为1以减少线程池批处理,如果某些查询花费很长时间,这会增加执行时间.
You can put the work into a thread pool. Your resolve_dns does 3 requests serially so I created a slightly more generic worker that does only 1 query and used collections.product to generate all combinations. In the thread pool I set chunksize to 1 to reduce thread pool batching, which can increase exec time if some queries take a long time.
import dns from dns import resolver import itertools import collections import multiprocessing.pool def worker(arg): """query dns for (hostname, qname) and return (qname, [rdata,...])""" try: url, qname = arg rdatalist = [rdata for rdata in resolver.query(url, qname)] return qname, rdatalist except dns.exception.DNSException, e: return qname, [] def resolve_dns(url_list): """Given a list of hosts, return dict that maps qname to returned rdata records. """ response_dict = collections.defaultdict(list) # create pool for querys but cap max number of threads pool = multiprocessing.pool.ThreadPool(processes=min(len(url_list)*3, 60)) # run for all combinations of hosts and qnames for qname, rdatalist in pool.imap( worker, itertools.product(url_list, ('CNAME', 'MX', 'NS')), chunksize=1): response_dict[qname].extend(rdatalist) pool.close() return response_dict url_list = ['example', 'stackoverflow'] result = resolve_dns(url_list) for qname, rdatalist in result.items(): print qname for rdata in rdatalist: print ' ', rdata更多推荐
在python中进行批量/批量DNS查找?
发布评论