所以我的数据称为id_list ,以这种格式进入函数[(u'SGP-3630', 1202), (u'MTSCR-534', 1244)] 。 格式是两个值配对在一起,可以有1对或100对。
这是功能:
def ListParser(id_list): list_length = len(id_list) count = 0 table = "" while count < list_length: jira = id_list[count][0] stash = id_list[count][1] count = count + 1 table = table + RetrieveFromAPI(stash, jira) table = TableFormatter(table) table = TableColouriser(table) return table该函数的作用是遍历列表并提取对,并将它们放入一个名为RetrieveFromAPI()的函数中,该函数从URL获取信息。
有人知道如何在这里实现多线程吗? 我有一个机会将两个列表拆分到他们自己的列表中并让池重复遍历每个列表,但它没有完全奏效。
def ListParser(id_list): pool = ThreadPool(4) list_length = len(id_list) count = 0 table = "" jira_list = list() stash_list = list() while count < list_length: jira_list = jira_list.extend(id_list[count][0]) print jira_list stash_list = stash_list.extend(id_list[count][1]) print stash_list count = count + 1 table = table + pool.map(RetrieveFromAPI, stash_list, jira_list) table = TableFormatter(table) table = TableColouriser(table) return table我为这次尝试得到的错误是TypeError: 'int' object is not iterable
编辑2:好的,所以我设法得到第一个列表与元组分成两个不同的列表,但我不知道如何让多线程使用它。
jira,stash= map(list,zip(*id_list))
So I have data known as id_list that is coming into the function in this format [(u'SGP-3630', 1202), (u'MTSCR-534', 1244)]. The format being two values paired together, there could be 1 pair or a hundred pairs.
This is the function:
def ListParser(id_list): list_length = len(id_list) count = 0 table = "" while count < list_length: jira = id_list[count][0] stash = id_list[count][1] count = count + 1 table = table + RetrieveFromAPI(stash, jira) table = TableFormatter(table) table = TableColouriser(table) return tableWhat this function does is goes through the list and extracts the pairs and puts them through a function called RetrieveFromAPI() which fetches information from a URL.
Anyone have an idea on how to impliment multithreading here? I've had a shot at splitting both lists up into their own lists and getting the pool to iterate through each list but it hasn't quite worked.
def ListParser(id_list): pool = ThreadPool(4) list_length = len(id_list) count = 0 table = "" jira_list = list() stash_list = list() while count < list_length: jira_list = jira_list.extend(id_list[count][0]) print jira_list stash_list = stash_list.extend(id_list[count][1]) print stash_list count = count + 1 table = table + pool.map(RetrieveFromAPI, stash_list, jira_list) table = TableFormatter(table) table = TableColouriser(table) return tableThe error I'm getting for this attempt is TypeError: 'int' object is not iterable
EDIT 2: Okay so I've managed to get the first list with tuples split up into two different lists, but I'm unsure how to get multithreading working with it.
jira,stash= map(list,zip(*id_list))
最满意答案
你工作太辛苦了! 来自help(multiprocessing.pool.ThreadPool)
map(self, func, iterable, chunksize=None) Apply `func` to each element in `iterable`, collecting the results in a list that is returned.第二个参数是要传递给工作线程的参数的可迭代。 您有一个列表列表,并且您希望每个调用的内部列表中的前两个项目。 id_list已经可迭代了,所以我们已经接近了。 一个小函数(在这种情况下实现为lambda )填补了这个空白。
我制定了一个完整的模拟解决方案,以确保它的工作原理,所以就这样吧。 另外,您可以从相当大的池大小中受益,因为它们大部分时间都在等待I / O.
from multiprocessing.pool import ThreadPool def RetrieveFromAPI(stash, jira): # boring mock of api return '{}-{}.'.format(stash, jira) def TableFormatter(table): # mock return table def TableColouriser(table): # mock return table def ListParser(id_list): if id_list: pool = ThreadPool(min(12, len(id_list))) table = ''.join(pool.map(lambda item: RetrieveFromAPI(item[1], item[0]), id_list, chunksize=1)) pool.close() pool.join() else: table = '' table = TableFormatter(table) table = TableColouriser(table) return table id_list = [[0,1,'foo'], [2,3,'bar'], [4,5, 'baz']] print(ListParser(id_list))You're working too hard! From help(multiprocessing.pool.ThreadPool)
map(self, func, iterable, chunksize=None) Apply `func` to each element in `iterable`, collecting the results in a list that is returned.The second argument is an iterable of the arguments you want to pass to the worker threads. You have a list of lists and you want the first two items from the inner list for each call. id_list is already iterable, so we're close. A small function (in this case implemented as a lambda) bridges the gap.
I worked up a full mock solution just to make sure it works, so here it goes. As an aside, you can benefit from a fairly large pool size since they spend much of their time waiting on I/O.
from multiprocessing.pool import ThreadPool def RetrieveFromAPI(stash, jira): # boring mock of api return '{}-{}.'.format(stash, jira) def TableFormatter(table): # mock return table def TableColouriser(table): # mock return table def ListParser(id_list): if id_list: pool = ThreadPool(min(12, len(id_list))) table = ''.join(pool.map(lambda item: RetrieveFromAPI(item[1], item[0]), id_list, chunksize=1)) pool.close() pool.join() else: table = '' table = TableFormatter(table) table = TableColouriser(table) return table id_list = [[0,1,'foo'], [2,3,'bar'], [4,5, 'baz']] print(ListParser(id_list))更多推荐
发布评论