创建动态嵌套计数字典

编程入门行业动态更新时间:2024-10-18 18:22:29

本文介绍了创建动态嵌套计数字典的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个文本文件 abc.txt :

abc/pqr/lmn/xyz:pass abc/pqr/lmn/bcd:pass

我需要解析这些语句，并且输出应在嵌套字典中，如下所示:

I need to parse these statements and output should be in nested dictionary as below:

{'abc':{'pqr':{'lmn':{'xyz':{'pass':1},{'bcd':{'pass':1}}}}}}

其中1是通过" 的计数.

我能做的很多:

import re d={} p=repile('[a-zA-z]+') for line in open('abc.txt'): for key in p.findall(line): d['key']={}

推荐答案

这是我答案的更新版本，其中树数据结构的叶子现在与其余部分不同.现在，树不再严格地是嵌套的 dict 的树，而是每个分支上的叶"现在是 dict 名为 collections.Counter

Here's an updated version of my answer in which leaves of the tree data-structure are now different from those in rest of it. Instead of the tree being strictly a dict-of-nested-dicts, the "leaves" on each branch are now instances of a different subclass of dict named collections.Counter which are useful for counting the number of times each of their keys occur. I did this because of your response to my question about what should happen if the last part of each line was something other than ":pass" (which was "we have to put new count for that key").

嵌套字典通常称为 Tree 数据结构，并且可以递归定义—根是字典，分支也是字典.以下内容使用了 dict 子类而不是普通的 dict ，因为它使构造它们变得更容易，因为您无需特殊情况下创建下一级的第一个分支(除了添加叶子"时我仍然会这样做，因为它们是不同的子类 collections.Counter ).

Nested dictionaries are often called Tree data-structures and can be defined recursively — the root is a dictionary as are the branches. The following uses a dict subclass instead of a plain dict because it makes constructing them easier since you don't need to special case the creation of the first branch of next level down (except I still do when adding the "leaves" because they are a different subclass, collections.Counter).

from collections import Counter from functools import reduce import re # (Optional) trick to make Counter subclass print like a regular dict. class Counter(Counter): def __repr__(self): return dict(self).__repr__() # Borrowed from answer @ stackoverflow/a/19829714/355230 class Tree(dict): def __missing__(self, key): value = self[key] = type(self)() return value # Utility functions based on answer @ stackoverflow/a/14692747/355230 def nested_dict_get(nested_dict, keys): return reduce(lambda d, k: d[k], keys, nested_dict) def nested_dict_set(nested_dict, keys, value): nested_dict_get(nested_dict, keys[:-1])[keys[-1]] = value def nested_dict_update_count(nested_dict, keys): counter = nested_dict_get(nested_dict, keys[:-1]) if counter: # Update existing Counter. counter.update([keys[-1]]) else: # Create a new Counter. nested_dict_set(nested_dict, keys[:-1], Counter([keys[-1]])) d = Tree() pat = repile(r'[a-zA-z]+') with open('abc.txt') as file: for line in file: nested_dict_update_count(d, [w for w in pat.findall(line.rstrip())]) print(d) # Prints like a regular dict.

为了测试修改后的代码的叶子计数功能，我使用了以下测试文件，该文件包含同一行两次，一次以:pass 结尾，另一个以:fail结尾.

To test the leaf-counting capabilities of the revised code, I used the following test file which includes the same line twice, once ending again with :pass and another ending in :fail.

扩展了 abc.txt 测试文件:

abc/pqr/lmn/xyz:pass abc/pqr/lmn/bcd:pass abc/pqr/lmn/xyz:fail abc/pqr/lmn/xyz:pass

输出:

{'abc': {'pqr': {'lmn': {'bcd': {'pass': 1}, 'xyz': {'fail': 1, 'pass': 2}}}}}

更多推荐

创建动态嵌套计数字典

本文发布于:2023-11-22 05:50:49，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1616205.html