在多个文档中计算单词频率python

编程入门 行业动态 更新时间:2024-10-26 03:35:04
本文介绍了在多个文档中计算单词频率python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我在字典'd'中有多个文本文件的地址列表:

I have a list of the addresses of multiple text files in a dictionary 'd':

'd:/individual-articles/9.txt', 'd:/individual-articles/11.txt', 'd:/individual-articles/12.txt',...

以此类推...

现在,我需要阅读词典中的每个文件,并保留整个词典中每个单词出现的单词的列表.

Now, I need to read each file in the dictionary and keep a list of the word occurrences of each and every word that occurs in the entire dictionary.

我的输出应采用以下格式:

My output should be of the form:

the-500 a-78 in-56

以此类推.

其中500是单词"the"在字典中的所有文件中出现的次数..依此类推.

where 500 is the number of times the word "the" occurs in all the files in the dictionary..and so on..

我需要对所有单词都这样做.

I need to do this for all the words.

我是python新手..plz帮助!

I am a python newbie..plz help!

我下面的代码不起作用,它没有显示输出!我的逻辑中肯定有一个错误,请纠正!!

My code below doesn't work,it shows no output!There must be a mistake in my logic, please rectify!!

import collections import itertools import os from glob import glob from collections import Counter folderpaths='d:/individual-articles' counter=Counter() filepaths = glob(os.path.join(folderpaths,'*.txt')) folderpath='d:/individual-articles/' # i am creating my dictionary here, can be ignored d = collections.defaultdict(list) with open('topics.txt') as f: for line in f: value, *keys = line.strip().split('~') for key in filter(None, keys): if key=='earn': d[key].append(folderpath+value+".txt") for key, value in d.items() : print(value) word_count_dict={} for file in d.values(): with open(file,"r") as f: words = re.findall(r'\w+', f.read().lower()) counter = counter + Counter(words) for word in words: word_count_dict[word].append(counter) for word, counts in word_count_dict.values(): print(word, counts)

推荐答案

来自您使用的Counter集合的启发:

Inspired from the Counter collection that you use:

from glob import glob from collections import Counter import re folderpaths = 'd:/individual-articles' counter = Counter() filepaths = glob(os.path.join(folderpaths,'*.txt')) for file in filepaths: with open(file) as f: words = re.findall(r'\w+', f.read().lower()) counter = counter + Counter(words) print counter

更多推荐

在多个文档中计算单词频率python

本文发布于:2023-05-27 09:58:31,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/285793.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多个   单词   频率   文档   python

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!