本文介绍了词频使用字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的问题是我无法弄清楚如何使用字典显示字数,并将引用到键长度。例如,考虑以下文字:
My problem is I can't figure out how to display the word count using the dictionary and refer to keys length. For example, consider the following piece of text:
"This is the sample text to get an idea!. "然后所需的输出将是
3 2 2 3 0 5
3 2 2 3 0 5
因为有长度为3的3个字,长度为3的2个字,以及中的0个字的5个样本文本。
as there are 3 words of length 2, 2 words of length 3, and 0 words of length 5 in the given sample text.
列出单词发生频率:
def word_frequency(filename): word_count_list = [] word_freq = {} text = open(filename, "r").read().lower().split() word_freq = [text.count(p) for p in text] dictionary = dict(zip(text,word_freq)) return dictionary print word_frequency("text.txt")其中显示了以下格式的dict:
which diplays the dict in this format:
{'all': 3, 'show': 1, 'welcomed': 1, 'not': 2, 'availability': 1, 'television,': 1, '28': 1, 'to': 11, 'has': 2, 'ehealth,': 1, 'do': 1, 'get': 1, 'they': 1, 'milestone': 1, 'kroes,': 1, 'now': 3, 'bringing': 2, 'eu.': 1, 'like': 1, 'states.': 1, 'them.': 1, 'european': 2, 'essential': 1, 'available': 4, 'because': 2, 'people': 3, 'generation': 1, 'economic': 1, '99.4%': 1, 'are': 3, 'eu': 1, 'achievement,': 1, 'said': 3, 'for': 3, 'broadband': 7, 'networks': 2, 'access': 2, 'internet': 1, 'across': 2, 'europe': 1, 'subscriptions': 1, 'million': 1, 'target.': 1, '2020,': 1, 'news': 1, 'neelie': 1, 'by': 1, 'improve': 1, 'fixed': 2, 'of': 8, '100%': 1, '30': 1, 'affordable': 1, 'union,': 2, 'countries.': 1, 'products': 1, 'or': 3, 'speeds': 1, 'cars."': 1, 'via': 1, 'reached': 1, 'cloud': 1, 'from': 1, 'needed': 1, '50%': 1, 'been': 1, 'next': 2, 'households': 3, 'commission': 5, 'live': 1, 'basic': 1, 'was': 1, 'said:': 1, 'more': 1, 'higher.': 1, '30mbps': 2, 'that': 4, 'but': 2, 'aware': 1, '50mbps': 1, 'line': 1, 'statement,': 1, 'with': 2, 'population': 1, "europe's": 1, 'target': 1, 'these': 1, 'reliable': 1, 'work': 1, '96%': 1, 'can': 1, 'ms': 1, 'many': 1, 'further.': 1, 'and': 6, 'computing': 1, 'is': 4, 'it': 2, 'according': 1, 'have': 2, 'in': 5, 'claimed': 1, 'their': 1, 'respective': 1, 'kroes': 1, 'areas.': 1, 'responsible': 1, 'isolated': 1, 'member': 1, '100mbps': 1, 'digital': 2, 'figures': 1, 'out': 1, 'higher': 1, 'development': 1, 'satellite': 4, 'who': 1, 'connected': 2, 'coverage': 2, 'services': 2, 'president': 1, 'a': 1, 'vice': 1, 'mobile': 2, "commission's": 1, 'points': 1, '"access': 1, 'rural': 1, 'the': 16, 'agenda,': 1, 'having': 1}推荐答案
def freqCounter(infilepath): answer = {} with open(infilepath) as infile: for line in infilepath: for word in line.strip().split(): l = len(word) if l not in answer: answer[l] = 0 answer[l] += 1 return answer
另请选择
import collections def freqCounter(infilepath): with open(infilepath) as infile: return collections.Counter(len(word) for line in infile for word in line.strip().split())更多推荐
词频使用字典
发布评论