我想排序一个特定的字典,并返回一个top_n数的列表。该字典是txt文档中的单词集合,key是txt文件中的单个单词,value是文档中出现的次数。
I want to sort a specific dictionary and return a list of the top_n number of occurences. The dictionary is a collection of words from a txt document, with the 'key' being a single word from the txt file and the 'value' being its number of occurrences in the document.
我有 init 方法如下:
def __init__(self:'Collection_of_words', file_name: str) -> None: ''' this initializer will read in the words from the file, and store them in self.counts''' l_words = open(file_name).read().split() s_words = set(l_words) self.counts = dict([ [word, l_words.count(word)] for word in s_words])现在,我的一个实例方法之一将返回一个列表,显示顶点n出现次数的字符串int参数。我给它一个镜头:
Now, one of my instance methods will return a list of strings of the 'top n' number of occurrences givin some int argument. I gave it a shot:
def top_n_words(self, i): '''takes one additional parameter, an int, <i> which is the top number of occurences. Returns a list of the top <i> words.''' return [ pair[0] for pair in sorted(associations, key=lambda pair: pair[1], reverse=True)[:5]]然而,每当我运行这个代码,我得到错误,无法弄清楚为什么。我不知道如何排序字典对象(例如,self.counts)
However, whenever i run this code i get errors and cannot figure out why. I'm not sure how to sort dictionary objects(eg. self.counts)
推荐答案sorted(self.counts, key=lambda pair: pair[1], reverse=True)
迭代 self.counts 给出键,而不是键值对。这意味着 pair [1] 将无法正常工作。您需要 key = self.counts.get 。
Iterating over self.counts gives the keys, not key-value pairs. That means pair[1] won't work. You want key=self.counts.get.
如果您的列表需要包括计数以及密钥,您需要按值排序键值对:
If your list needs to include counts as well as keys, you'll need to instead sort the key-value pairs by values:
sorted(self.counts.items(), key=operator.itemgetter(1), reverse=True)另外,请注意 collections.Counter 已经您需要什么,并使用线性时间而不是二次方的计数算法。
Also, note that collections.Counter already does what you need, and with a counting algorithm in linear time instead of quadratic.
更多推荐
按最大值排序字典:类方法
发布评论