词典列表

编程入门行业动态更新时间:2024-10-28 19:25:25

本文介绍了词典列表 - 每个文件跟踪单词频率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述我已经写了一些代码来计算多个文本文件中的单词频率，并将它们存储在字典中。

我一直在试图找到一种方法来保持运行每个单词的每个文件的总计数为：

word1 [1] [20] [30] [22] word2 [5] [7] [0] [4]

我试过使用计数器，尚未找到适合的方法/数据结构。

import string from collections import defaultdict 从集合导入计数器 import glob import os ＃要删除的字词 noise_words_set = {''''' '''，'a'，'in'，'is'，... etc ...} ＃查找文件 path = rC：\\ $ os.chdir（路径） print（处理文件...）在glob.glob（*。txt）中的文件：＃读取文件 txt = open（{} \ {}。format（path，file），'r'，encoding =ut f8）read（）＃删除标点符号在string.punctuation中的标记： txt = txt.replace（punct，）＃分割成单词并使小写 words = [item.lower（）for txt.split（）] ＃删除不间断的单词 words = [如果w不在noise_words_set中，用w表示w）＃为单词的字典编写 D = defaultdict（int）单词中的单词： D [ word] + = 1 ＃添加到一些数据结构（？），保持每个文件的计数＃... word1 [1] [20] [30] [22] ＃... word2 [5] [7] [0] [4]

解决方案

使用几乎整个结构！

从集合导入计数器 files = dict（）＃这可能比列表更好，tbh table = str.maketrans（''，''，string.punctuation） glob.glob（*。txt）：打开（文件）为f： word_count = Counter（）在f中的行 word_count + = Counter（[word.lower（）for word.translate（table）if word not in noise_words_set] 文件[file] = word_count＃如果列表：files.append（word_count）

如果你希望他们翻译成某些字典，然后执行此操作

words_count = dict（）用于文件中的文件： for word，value in file.items（）： try：words_count [word] .append（value）除了KeyError：words_count [word] = [value]

I have written some code to count word frequency in multiple text files and store them in a dictionary.

I have been trying to find a method to keep a running total per file of counts for each word in a form something like:

word1 [1] [20] [30] [22] word2 [5] [7] [0] [4]

I have tried using counters but I've not been able to find an appropriate method/data structure for this yet.

import string from collections import defaultdict from collections import Counter import glob import os # Words to remove noise_words_set = {'the','to','of','a','in','is',...etc...} # Find files path = r"C:\Users\Logs" os.chdir(path) print("Processing files...") for file in glob.glob("*.txt"): # Read file txt = open("{}\{}".format(path, file),'r', encoding="utf8").read() # Remove punctuation for punct in string.punctuation: txt = txt.replace(punct,"") # Split into words and make lower case words = [item.lower() for item in txt.split()] # Remove unintersting words words = [w for w in words if w not in noise_words_set] # Make a dictionary of words D = defaultdict(int) for word in words: D[word] += 1 # Add to some data structure (?) that keeps count per file #...word1 [1] [20] [30] [22] #...word2 [5] [7] [0] [4]

解决方案

Using almost your entire structure!

from collections import Counter files = dict() # this may be better as a list, tbh table = str.maketrans('','',string.punctuation) for file in glob.glob("*.txt"): with open(file) as f: word_count = Counter() for line in f: word_count += Counter([word.lower() for word in line.translate(table) if word not in noise_words_set]) files[file] = word_count # if list: files.append(word_count)

If you want them translated to some dictionary, do this afterwards

words_count = dict() for file in files: for word,value in file.items(): try: words_count[word].append(value) except KeyError: words_count[word] = [value]

更多推荐

词典列表

本文发布于:2023-11-12 02:50:09，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1580290.html