本文介绍了检查嵌套字典中的成员资格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述



我在这里使用Python 3.x,btw。

我有一个员工,由字符串directory_id索引。每个值都是一个带有员工属性(电话号码,姓氏等)的嵌套字典。这些值之一是辅助ID,例如internal_id,另一个是其管理员,称为manager_internal_id。 internal_id字段是非强制性的,并不是每个员工都有一个。

{'6443410501':{'manager_internal_id' :'989634','givenName':'Mary','phoneNumber':'+65 3434 3434','sn':'Jones','internal_id':'434214'} '8117062158' manager_internal_id':'180682','givenName':'John','phoneNumber':'+65 3434 3434','sn':'Ashmore','internal_id':''} '9227629067' 'manager_internal_id':'347394','givenName':'Wright','phoneNumber':'+65 3434 3434','sn':'Earl','internal_id':'257839'} '1724696976' :{'manager_internal_id':'907239','givenName':'Jane','phoneNumber':'+65 3434 3434','sn':'Bronte','internal_id':'629067'}


(我简化了一些字段,这两个都使它更容易阅读,也为privac y /合规性原因)。



def lookup_supervisor(manager_internal_id,employees):如果manager_internal_id不为None,manager_internal_id!=和manager_internal_id在employees.keys()中: return(employees [manager_internal_id] ['mail' ],employee [manager_internal_id] ['givenName'],employees [manager_internal_id] ['sn']) else: return('Supervisor Not Found','Supervisor Not Found','Supervisor Not Found ')


我已经尝试将employee.keys()替换为employee.values(),这不起作用。此外,我希望有一些更高效的东西,不知道是否有办法获得一个值的子集,特别是员工的所有条目[directory_id] ['internal_id']。

希望有一些Pythonic的方式来做到这一点,没有使用大量的嵌套for / if循环。

我的第二个问题是那么我干净地返回所需的员工属性(mail,givenname,surname等)。我的for循环遍历每个员工,并调用lookup_supervisor。我在这里感觉有点笨拙/ stumped。

def tidy_data(employees): for directory_id,data in employees.items():#我们真的不应该像这样传递员工 - 呃,类? data ['SupervisorEmail'],data ['SupervisorFirstName'],data ['SupervisorSurname'] = lookup_supervisor(data ['manager_internal_id'],employees)



class Employees: def import_gd_dump(self,input_file =test.csv): gd_extract = csv.DictReader open(input_file),dialect ='excel') self.employees = {row ['directory_id']:row in gd_extract} def write_gd_formatted(self,output_file =gd_formatted .csv): gd_output_fieldnames =('internal_id','mail','givenName','sn','dbcostcenter','directory_id','manager_internal_id','PHFull','PHFull_message','SupervisorEmail ','SupervisorFirstName','SupervisorSurname') try: gd_formatted = csv.DictWriter(open(output_file,'w',newline =''),fieldnames = gd_output_fieldnames,extrasaction ='ignore' dialect ='excel')除了IOError: print('无法打开文件,IO错误(是否锁定?) sys.exit(1) headers = {n:n for g in gd_output_fieldnames} gd_formatted.writerow(headers) for internal_id,self.employees.items()中的数据: gd_formatted.writerow(data) def tidy_data(self): for directory_id,data in self。 employees.items(): data ['PHFull'],data ['PHFull_message'] = self.clean_phone_number(data ['telephoneNumber']) data ['SupervisorEmail'],data ['SupervisorFirstName '],data ['SupervisorSurname'] = self.lookup_supervisor(data ['manager_internal_id']) def clean_phone_number(self,original_telephone_number): standard_format = repile(r'^ \ +(?P< intl_prefix> \d {2})\((P< AREA_CODE> \d)\?)(P< local_first_half> \d {4}?) - (P<?; local_second_half> \d {4})') extra_zero = repile(r'^ \ +(?P


我甚至应该在做我正在做的tidy_data(),并调用clean_phone_number()和lookup_supervisor()在dict的项目上的for循环? Urgh。



为什么 Employees 类可以做所有的工作?您的单体Employees类有以下几种类型:

  • 从文件读取和写入数据 - 也称为序列化
  • 管理和访问个人员工的数据
  • 管理员工之间的关系。


定义一个 Employee 跟踪或员工数据,并处理现场处理/整理任务。

使用 Employees 类作为员工对象的容器。它可以处理跟踪员工主管等任务。

定义一个虚拟基类EmployeeLoader来定义一个接口(load,store,??)。然后实现CSV文件序列化的子类。 (虚拟基类是可选的 - 我不知道Python如何处理虚拟类,所以这可能甚至没有意义。)


  • 创建一个具有文件名称的 EmployeeCSVLoader 的实例。
  • 加载器然后可以构建一个 Employees 对象并解析文件。
  • 随着每个记录被读取,一个新的Employee对象将被创建并存储在Employees对象中。
  • 现在请求Employees对象填充主管链接。
  • 迭代Employees对象的集合的员工,并要求每个人自己整理。
  • 最后,让序列化对象处理更新数据文件。





This is a followup questions to this one:

Python DictReader - Skipping rows with missing columns?

Turns out I was being silly, and using the wrong ID field.

I'm using Python 3.x here, btw.

I have a dict of employees, indexed by a string, "directory_id". Each value is a nested dict with employee attributes (phone number, surname etc.). One of these values is a secondary ID, say "internal_id", and another is their manager, call it "manager_internal_id". The "internal_id" field is non-mandatory, and not every employee has one.

{'6443410501': {'manager_internal_id': '989634', 'givenName': 'Mary', 'phoneNumber': '+65 3434 3434', 'sn': 'Jones', 'internal_id': '434214'} '8117062158': {'manager_internal_id': '180682', 'givenName': 'John', 'phoneNumber': '+65 3434 3434', 'sn': 'Ashmore', 'internal_id': ''} '9227629067': {'manager_internal_id': '347394', 'givenName': 'Wright', 'phoneNumber': '+65 3434 3434', 'sn': 'Earl', 'internal_id': '257839'} '1724696976': {'manager_internal_id': '907239', 'givenName': 'Jane', 'phoneNumber': '+65 3434 3434', 'sn': 'Bronte', 'internal_id': '629067'}


(I've simplified the fields a little, both to make it easier to read, and also for privacy/compliance reasons).

The issue here is that we index (key) each employee by their directory_id, but when we lookup their manager, we need to find managers by their "internal_id".

Before, when our dict was using internal_id as the key, employee.keys() was a list of internal_ids, and I was using a membership check on this. Now, the last part of my if statement won't work, since the internal_ids is part of the dict values, instead of the key itself.

def lookup_supervisor(manager_internal_id, employees): if manager_internal_id is not None and manager_internal_id != "" and manager_internal_id in employees.keys(): return (employees[manager_internal_id]['mail'], employees[manager_internal_id]['givenName'], employees[manager_internal_id]['sn']) else: return ('Supervisor Not Found', 'Supervisor Not Found', 'Supervisor Not Found')

So the first question is, how do I fix the if statement to check whether the manager_internal_id is present in the dict's list of internal_ids?

I've tried substituting employee.keys() with employee.values(), that didn't work. Also, I'm hoping for something a little more efficient, not sure if there's a way to get a subset of the values, specifically, all the entries for employees[directory_id]['internal_id'].

Hopefully there's some Pythonic way of doing this, without using a massive heap of nested for/if loops.

My second question is, how do I then cleanly return the required employee attributes (mail, givenname, surname etc.). My for loop is iterating over each employee, and calling lookup_supervisor. I'm feeling a bit stupid/stumped here.

def tidy_data(employees): for directory_id, data in employees.items(): # We really shouldnt' be passing employees back and forth like this - hmm, classes? data['SupervisorEmail'], data['SupervisorFirstName'], data['SupervisorSurname'] = lookup_supervisor(data['manager_internal_id'], employees)

Should I redesign my data-structure? Or is there another way?

EDIT: I've tweaked the code slightly, see below:

class Employees: def import_gd_dump(self, input_file="test.csv"): gd_extract = csv.DictReader(open(input_file), dialect='excel') self.employees = {row['directory_id']:row for row in gd_extract} def write_gd_formatted(self, output_file="gd_formatted.csv"): gd_output_fieldnames = ('internal_id', 'mail', 'givenName', 'sn', 'dbcostcenter', 'directory_id', 'manager_internal_id', 'PHFull', 'PHFull_message', 'SupervisorEmail', 'SupervisorFirstName', 'SupervisorSurname') try: gd_formatted = csv.DictWriter(open(output_file, 'w', newline=''), fieldnames=gd_output_fieldnames, extrasaction='ignore', dialect='excel') except IOError: print('Unable to open file, IO error (Is it locked?)') sys.exit(1) headers = {n:n for n in gd_output_fieldnames} gd_formatted.writerow(headers) for internal_id, data in self.employees.items(): gd_formatted.writerow(data) def tidy_data(self): for directory_id, data in self.employees.items(): data['PHFull'], data['PHFull_message'] = self.clean_phone_number(data['telephoneNumber']) data['SupervisorEmail'], data['SupervisorFirstName'], data['SupervisorSurname'] = self.lookup_supervisor(data['manager_internal_id']) def clean_phone_number(self, original_telephone_number): standard_format = repile(r'^\+(?P<intl_prefix>\d{2})\((?P<area_code>\d)\)(?P<local_first_half>\d{4})-(?P<local_second_half>\d{4})') extra_zero = repile(r'^\+(?P<intl_prefix>\d{2})\(0(?P<area_code>\d)\)(?P<local_first_half>\d{4})-(?P<local_second_half>\d{4})') missing_hyphen = repile(r'^\+(?P<intl_prefix>\d{2})\(0(?P<area_code>\d)\)(?P<local_first_half>\d{4})(?P<local_second_half>\d{4})') if standard_format.search(original_telephone_number): result = standard_format.search(original_telephone_number) return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), '' elif extra_zero.search(original_telephone_number): result = extra_zero.search(original_telephone_number) return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), 'Extra zero in area code - ask user to remediate. ' elif missing_hyphen.search(original_telephone_number): result = missing_hyphen.search(original_telephone_number) return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), 'Missing hyphen in local component - ask user to remediate. ' else: return '', "Number didn't match format. Original text is: " + original_telephone_number def lookup_supervisor(self, manager_internal_id): if manager_internal_id is not None and manager_internal_id != "":# and manager_internal_id in self.employees.values(): return (employees[manager_internal_id]['mail'], employees[manager_internal_id]['givenName'], employees[manager_internal_id]['sn']) else: return ('Supervisor Not Found', 'Supervisor Not Found', 'Supervisor Not Found') if __name__ == '__main__': our_employees = Employees() our_employees.import_gd_dump('test.csv') our_employees.tidy_data() our_employees.write_gd_formatted()

I guess (1). I'm looking for a better way to structure/store Employee/Employees, and (2) I'm having issues in particular with lookup_supervisor().\

Should I be creating an Employee Class, and nesting these inside Employees?

And should I even be doing what I'm doing with tidy_data(), and calling clean_phone_number() and lookup_supervisor() on a for loop on the dict's items? Urgh. confused.


My python skills are poor, so I am far too ignorant to write out what I have in mind in any kind of reasonable time. But I do know how to do OO decomposition.

Why does the Employees class to do all the work? There are several types of things that your monolithic Employees class does:

  • Read and write data from a file - aka serialization
  • Manage and access data from individual employees
  • Manage relationships between exmployees.

I suggest that you create a class to handle each task group listed.

Define an Employee class to keep track or employee data and handle field processing/tidying tasks.

Use the Employees class as a container for employee objects. It can handle tasks like tracking down an Employee's supervisor.

Define a virtual base class EmployeeLoader to define an interface (load, store, ?? ). Then implement a subclass for CSV file serialization. (The virtual base class is optional--I'm not sure how Python handles virtual classes, so this may not even make sense.)


  • create an instance of EmployeeCSVLoader with a file name to work with.
  • The loader can then build an Employees object and parse the file.
  • As each record is read, a new Employee object will be created and stored in the Employees object.
  • Now ask the Employees object to populate supervisor links.
  • Iterate over the Employees object's collection of employees and ask each one to tidy itself.
  • Finally, let the serialization object handle updating the data file.

Why is this design worth the effort?

It makes things easier to understand. Smaller, task focused objects are easier to create clean, consistent APIs for.

If you find that you need an XML serialization format, it becomes trivial to add the new format. Subclass your virtual loader class to handle the XML parsing/generation. Now you can seamlessly move between CSV and XML formats.

In summary, use objects to simplify and structure your data. Section off common data and behaviors into separate classes. Keep each class tightly focused on a single type of ability. If your class is a collection, accessor, factory, kitchen sink, the API can never be usable: it will be too big and loaded with dissimilar groups of methods. But if your classes stay on topic, they will be easy to test, maintain, use, reuse, and extend.



