我在字典'd'中有多个文本文件的地址列表:
I have a list of the addresses of multiple text files in a dictionary 'd':
'd:/individual-articles/9.txt', 'd:/individual-articles/11.txt', 'd:/individual-articles/12.txt',...以此类推...
现在,我需要阅读词典中的每个文件,并保留整个词典中每个单词出现的单词的列表.
Now, I need to read each file in the dictionary and keep a list of the word occurrences of each and every word that occurs in the entire dictionary.
我的输出应采用以下格式:
My output should be of the form:
the-500 a-78 in-56以此类推.
其中500是单词"the"在字典中的所有文件中出现的次数..依此类推.
where 500 is the number of times the word "the" occurs in all the files in the dictionary..and so on..
我需要对所有单词都这样做.
I need to do this for all the words.
我是python新手..plz帮助!
I am a python newbie..plz help!
我下面的代码不起作用,它没有显示输出!我的逻辑中肯定有一个错误,请纠正!!
My code below doesn't work,it shows no output!There must be a mistake in my logic, please rectify!!
import collections import itertools import os from glob import glob from collections import Counter folderpaths='d:/individual-articles' counter=Counter() filepaths = glob(os.path.join(folderpaths,'*.txt')) folderpath='d:/individual-articles/' # i am creating my dictionary here, can be ignored d = collections.defaultdict(list) with open('topics.txt') as f: for line in f: value, *keys = line.strip().split('~') for key in filter(None, keys): if key=='earn': d[key].append(folderpath+value+".txt") for key, value in d.items() : print(value) word_count_dict={} for file in d.values(): with open(file,"r") as f: words = re.findall(r'\w+', f.read().lower()) counter = counter + Counter(words) for word in words: word_count_dict[word].append(counter) for word, counts in word_count_dict.values(): print(word, counts)推荐答案
来自您使用的Counter集合的启发:
Inspired from the Counter collection that you use:
from glob import glob from collections import Counter import re folderpaths = 'd:/individual-articles' counter = Counter() filepaths = glob(os.path.join(folderpaths,'*.txt')) for file in filepaths: with open(file) as f: words = re.findall(r'\w+', f.read().lower()) counter = counter + Counter(words) print counter更多推荐
在多个文档中计算单词频率python
发布评论