在for循环中存储for循环中的内容(Storing the content from for loop in the list python)

编程入门 行业动态 更新时间:2024-10-26 09:30:47
在for循环中存储for循环中的内容(Storing the content from for loop in the list python)

这是一个用pyspark ipython notebook编写的python程序。 我试图使用for循环计算每个RDD(可以被视为文件)列表'names'中给出的单词实例数。 我想将每个文件中的单词的计数存储在具有相同名称的单词列表中。

例如。 假设第一个RDD中的单词哈里计数为1214,第二个RDD中的单词哈里计数为1506 n等等。 我想创建一个列表harryList = [1214,1506,1825,2933,3748,2617,2887]

名称列表是动态的。

names = ['harry', 'hermione','ron','hagrid'] rdds = [hp1RDD,hp2RDD,hp3RDD,hp4RDD,hp5RDD,hp6RDD,hp7RDD] for n in names: a = [] for x in rdds: a.append(x.flatMap(lambda line: line.split(" ")).filter(lambda word: word==n).count()) print a

上面的代码,我可以打印列表的内容,但我不能按照上面显示的方式保存它。

This is a python program written in pyspark ipython notebook. I am trying to count the number of instances of words given in the list 'names' in each RDD(can be considered as file) using for loop. I want to store the count for a word in each file in a list which has same name an word.

For eg. suppose count of word harry in 1 st RDD is 1214, in 2nd RDD is 1506 n so on. I want to create a list harryList = [1214, 1506, 1825, 2933, 3748, 2617, 2887]

the list of names is dynamic.

names = ['harry', 'hermione','ron','hagrid'] rdds = [hp1RDD,hp2RDD,hp3RDD,hp4RDD,hp5RDD,hp6RDD,hp7RDD] for n in names: a = [] for x in rdds: a.append(x.flatMap(lambda line: line.split(" ")).filter(lambda word: word==n).count()) print a

with code above I can print the contents of list but I cannot save it the way shown above.

最满意答案

如果你不介意:

hagrid这样的单词可以独立于hagrid计算

使用collections.Counter将有助于:

from collections import Counter hp1RDD = "harry potter has a girlfriend who's name is hermione granger and a friend called ron. harry has an uncle who's name is hagrid. hagrid is a big guy" hp2RDD = "harry potter is the best movie I've ever saw. hermione is very beautfiful" names = ['harry', 'hermione','ron','hagrid'] rdds = [hp1RDD, hp2RDD] results = dict() for name in names: tmp_list = list() for rdd in rdds: count = Counter(rdd.split()) tmp_list.append(count[name]) results[name] = tmp_list print results

此外,您可以使用lower()来使用不区分大小写的版本:

count = Counter([x.lower() for x in rdd.split()])

If you don't mind having:

words like hagrid's to be counted independently from hagrid

Using collections.Counter will help:

from collections import Counter hp1RDD = "harry potter has a girlfriend who's name is hermione granger and a friend called ron. harry has an uncle who's name is hagrid. hagrid is a big guy" hp2RDD = "harry potter is the best movie I've ever saw. hermione is very beautfiful" names = ['harry', 'hermione','ron','hagrid'] rdds = [hp1RDD, hp2RDD] results = dict() for name in names: tmp_list = list() for rdd in rdds: count = Counter(rdd.split()) tmp_list.append(count[name]) results[name] = tmp_list print results

Also, you could use case-insensitive version just by using lower():

count = Counter([x.lower() for x in rdd.split()])

更多推荐

本文发布于:2023-07-29 14:13:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1316799.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:内容   Storing   content   python   list

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!