将SQL查询的结果转换为紧凑的Python表单(Turning results from SQL query into compact Python form)

我在Postgres中有一个看起来像这样的数据库模式（伪代码）：

users (table): pk (field, unique) name (field) permissions (table): pk (field, unique) permission (field, unique) addresses (table): pk (field, unique) address (field, unique) association1 (table): user_pk (field, foreign_key) permission_pk (field, foreign_key) association2 (table): user_pk (field, foreign_key) address_pk (field, foreign_key)

希望这具有直观意义。它是一个与权限表具有多对多关系的用户表，以及与地址表的多对多关系。

在Python中，当我执行正确的SQLAlchemy查询咒语时，我得到的结果看起来像这样（在将它们转换为Python中的字典列表之后）：

results = [ {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'}, {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'}, {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'}, {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'}, {'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'}, ]

所以在这个人为的例子中，Joe既是用户又是管理员。约翰只是一个用户。 Joe的家庭和工作地址都存在于数据库中。只有John的家庭住址存在。

所以问题是，是否有人知道从这些SQL查询“结果”到下面更紧凑的“desired_results”的最佳方式？

desired_results = [ { 'pk': 1, 'name': 'Joe', 'permissions': ['user', 'admin'], 'addresses': ['home', 'work'] }, { 'pk': 2, 'name': 'John', 'permissions': ['user'], 'addresses': ['home'] }, ]

所需的其他信息：描述“标签”的小词典列表，我希望在具有多对多关系的每个字段的desired_results中使用这些词典。

relationships = [ {'label': 'permissions', 'back_populates': 'permission'}, {'label': 'addresses', 'back_populates': 'address'}, ]

最后的考虑，我已经为这个问题的目的提出了一个具体的例子，但总的来说，我试图解决一般查询SQL数据库的问题，假设任意数量的关系。 SQLAlchemy ORM很好地解决了这个问题，但我只能使用SQLAlchemy Core; 所以我想建立自己的解决方案。

更新

这是一个答案，但我不确定它是最好/最有效的解决方案。任何人都能想出更好的东西吗？

# step 1: generate set of keys that will be replaced by new keys in desired_result back_populates = set(rel['back_populates'] for rel in relationships) # step 2: delete from results keys generated in step 1 intermediate_results = [ {k: v for k, v in res.items() if k not in back_populates} for res in results] # step 3: eliminate duplicates intermediate_results = [ dict(t) for t in set([tuple(ires.items()) for ires in intermediate_results])] # step 4: add back information from deleted fields but in desired form for ires in intermediate_results: for rel in relationships: ires[rel['label']] = set([ res[rel['back_populates']] for res in results if res['pk'] == ires['pk']]) # done desired_results = intermediate_results

I have a database schema in Postgres that looks like this (in pseudo code):

users (table): pk (field, unique) name (field) permissions (table): pk (field, unique) permission (field, unique) addresses (table): pk (field, unique) address (field, unique) association1 (table): user_pk (field, foreign_key) permission_pk (field, foreign_key) association2 (table): user_pk (field, foreign_key) address_pk (field, foreign_key)

Hopefully this makes intuitive sense. It's a users table that has a many-to-many relationship with a permissions table as well as a many-to-many relationship with an addresses table.

In Python, when I perform the correct SQLAlchemy query incantations, I get back results that look something like this (after converting them to a list of dictionaries in Python):

results = [ {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'}, {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'}, {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'}, {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'}, {'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'}, ]

So in this contrived example, Joe is both a user and and an admin. John is only a user. Both Joe's home and work addresses exist in the database. Only John's home address exists.

So the question is, does anybody know the best way to go from these SQL query 'results' to the more compact 'desired_results' below?

desired_results = [ { 'pk': 1, 'name': 'Joe', 'permissions': ['user', 'admin'], 'addresses': ['home', 'work'] }, { 'pk': 2, 'name': 'John', 'permissions': ['user'], 'addresses': ['home'] }, ]

Additional information required: Small list of dictionaries describing the 'labels' I would like to use in the desired_results for each of the fields that have many-to-many relationships.

relationships = [ {'label': 'permissions', 'back_populates': 'permission'}, {'label': 'addresses', 'back_populates': 'address'}, ]

Final consideration, I've put together a concrete example for the purposes of this question, but in general I'm trying to solve the problem of querying SQL databases in general, assuming an arbitrary amount of relationships. SQLAlchemy ORM solves this problem well, but I'm limited to using SQLAlchemy Core; so am trying to build my own solution.

Update

Here's an answer, but I'm not sure it's the best / most efficient solution. Can anyone come up with something better?

# step 1: generate set of keys that will be replaced by new keys in desired_result back_populates = set(rel['back_populates'] for rel in relationships) # step 2: delete from results keys generated in step 1 intermediate_results = [ {k: v for k, v in res.items() if k not in back_populates} for res in results] # step 3: eliminate duplicates intermediate_results = [ dict(t) for t in set([tuple(ires.items()) for ires in intermediate_results])] # step 4: add back information from deleted fields but in desired form for ires in intermediate_results: for rel in relationships: ires[rel['label']] = set([ res[rel['back_populates']] for res in results if res['pk'] == ires['pk']]) # done desired_results = intermediate_results

最满意答案

迭代部分条目组看起来像是itertools.groupby的工作。

但首先让relationships变成一种更容易使用的格式，或许是一个back_populates:label字典？

conversions = {d["back_populates"]:d['label'] for d in relationships}

接下来因为我们将使用itertools.groupby ，它将需要一个keyfunc来区分不同的条目组。因此，从初始results给出一个条目，此函数将返回仅包含不会被压缩/转换的对的字典

def grouper(entry): #each group is identified by all key:values that are not identified in conversions return {k:v for k,v in entry.items() if k not in conversions}

现在我们将能够以类似的方式遍历组中的results ：

for base_info, group in itertools.groupby(old_results, grouper): #base_info is dict with info unique to all entries in group for partial in group: #partial is one entry from results that will contribute to the final result #but wait, what do we add it too?

唯一的问题是，如果我们从base_info构建我们的entry ，它将混淆groupby所以我们需要创建一个entry来使用：

entry = {new_field:set() for new_field in conversions.values()} entry.update(base_info)

请注意，我在这里使用set ，因为它们是所有竞争都是唯一的自然容器，但是因为它不是json兼容的，所以我们需要在最后将它们更改为列表。

现在我们有一个构建条目，我们可以遍历该group以add到original每个new字段

for partial in group: for original, new in conversions.items(): entry[new].add(partial[original])

然后，一旦构造了最后的条目，剩下的就是将set s转换回list s

for new in conversions.values(): entry[new] = list(entry[new])

并且该条目已完成，现在我们可以append其append到名为new_results的列表中，但由于此过程实际上是生成结果，因此将它放入generator会更有意义，使最终代码看起来像这样：

import itertools results = [ {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'}, {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'}, {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'}, {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'}, {'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'}, ] relationships = [ {'label': 'permissions', 'back_populates': 'permission'}, {'label': 'addresses', 'back_populates': 'address'}, ] #first we put the "relationships" in a format that is much easier to use. conversions = {d["back_populates"]:d['label'] for d in relationships} def grouper(entry): #each group is identified by all key:values that are not identified in conversions return {k:v for k,v in entry.items() if k not in conversions} def parse_results(old_results, conversions=conversions): for base_info, group in itertools.groupby(old_results, grouper): entry = {new_field:set() for new_field in conversions.values()} entry.update(base_info) for partial in group: #for each entry in the original results set for original, new in conversions.items(): #for each field that will be condensed entry[new].add(partial[original]) #convert sets back to lists so it can be put back into json for new in conversions.values(): entry[new] = list(entry[new]) yield entry

然后new_results可以这样得到：

>>> new_results = list(parse_results(results)) >>> from pprint import pprint #for demo purpose >>> pprint(new_results,width=50) [{'addresses': ['home', 'work'], 'name': 'Joe', 'permissions': ['admin', 'user'], 'pk': 1}, {'addresses': ['home'], 'name': 'John', 'permissions': ['user'], 'pk': 2}]

Iterating over the groups of partial entries looks like a job for itertools.groupby.

But first lets put relationships into a format that is easier to use, prehaps a back_populates:label dictionary?

conversions = {d["back_populates"]:d['label'] for d in relationships}

Next because we will be using itertools.groupby it will need a keyfunc to distinguish between the different groups of entries. So given one entry from the initial results, this function will return a dictionary with only the pairs that will not be condensed/converted

def grouper(entry): #each group is identified by all key:values that are not identified in conversions return {k:v for k,v in entry.items() if k not in conversions}

Now we will be able to traverse the results in groups something like this:

for base_info, group in itertools.groupby(old_results, grouper): #base_info is dict with info unique to all entries in group for partial in group: #partial is one entry from results that will contribute to the final result #but wait, what do we add it too?

The only issue is that if we build our entry from base_info it will confuse groupby so we need to make an entry to work with:

entry = {new_field:set() for new_field in conversions.values()} entry.update(base_info)

Note that I am using sets here because they are the natural container when all contence are unique, however because it is not json-compatible we will need to change them into lists at the end.

Now that we have an entry to build we can just iterate through the group to add to each new field from the original

for partial in group: for original, new in conversions.items(): entry[new].add(partial[original])

then once the final entry is constructed all that is left is to convert the sets back into lists

for new in conversions.values(): entry[new] = list(entry[new])

And that entry is done, now we can either append it to a list called new_results but since this process is essentially generating results it would make more sense to put it into a generator making the final code look something like this:

import itertools results = [ {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'}, {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'}, {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'}, {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'}, {'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'}, ] relationships = [ {'label': 'permissions', 'back_populates': 'permission'}, {'label': 'addresses', 'back_populates': 'address'}, ] #first we put the "relationships" in a format that is much easier to use. conversions = {d["back_populates"]:d['label'] for d in relationships} def grouper(entry): #each group is identified by all key:values that are not identified in conversions return {k:v for k,v in entry.items() if k not in conversions} def parse_results(old_results, conversions=conversions): for base_info, group in itertools.groupby(old_results, grouper): entry = {new_field:set() for new_field in conversions.values()} entry.update(base_info) for partial in group: #for each entry in the original results set for original, new in conversions.items(): #for each field that will be condensed entry[new].add(partial[original]) #convert sets back to lists so it can be put back into json for new in conversions.values(): entry[new] = list(entry[new]) yield entry

Then the new_results can be gotten like this:

>>> new_results = list(parse_results(results)) >>> from pprint import pprint #for demo purpose >>> pprint(new_results,width=50) [{'addresses': ['home', 'work'], 'name': 'Joe', 'permissions': ['admin', 'user'], 'pk': 1}, {'addresses': ['home'], 'name': 'John', 'permissions': ['user'], 'pk': 2}]

更多推荐