如何使用Pymongo Aggregation查询此数据集?(How can I use Pymongo Aggregation to inquiry this dataset?)

编程入门 行业动态 更新时间:2024-10-24 23:14:53
如何使用Pymongo Aggregation查询此数据集?(How can I use Pymongo Aggregation to inquiry this dataset?)

这是我第一次使用MongoDB聚合查询。 我的数据集如下:

{ // doc 1 "_id" : ObjectId("55f2481bc9b4cd1c0c198c9f"), "channels" : [ "channel_3", "channel_2", "channel_1", "channel_4" ], "msd" : 25, "uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632" } { // doc 2 "_id" : ObjectId("55f2481bc9b4cd123452345f"), "channels" : [ "channel_3", "channel_4" ], "msd" : 50, "uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632" } { // doc 3 "_id" : ObjectId("55f2481bc9b4cd1c0c198c9f"), "channels" : [ "channel_2" ], "msd" : 100, "uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632" } { // doc 4 "_id" : ObjectId("55f2481bc9b4cd1c0c198c9f"), "channels" : [ "channel_2" ], "msd" : 80, "uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632" }

我已经构建了一个复合索引:

userlog.create_index([('uid', ASCENDING), ('channels', ASCENDING)])

现在, 给定一个用户和一个通道数组,我想检索其中至少有一个通道位于查询通道中的msd的平均值。 例如,查询是:

{"uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632", "channels" : ["channel_1", "channel_2"], }

doc 1的频道包含“channel_1”和“channel_2”,doc 3和4的频道包含“channels_2”。 所以预期的回报值是(25 + 100 + 80)/ 3 = 68.33

======================试用1 =======================

码:

pipe=[ {"$unwind":'$channels'}, {"$match":{'uid':"000012bb-2e5a-8bd3-d36a-fa037973e632", 'channels':{'$in':channels}}}, {"$group":{'_id': '$channels', 'averageMSD':{'$avg':'$msd'}}} ] for res in db.aggregate(pipeline=pipe): print(res)

结果:

{'_id': 'channel_1', 'averageMSD': 25.0} {'_id': 'channel_2', 'averageMSD': 68.33333333333333}

似乎“$ unwind”使得doc 1意外地被计算两次。 另外,“$ unwind”非常慢。

======================试用2 =======================

码:

pipe=[ {"$match":{'uid':"000012bb-2e5a-8bd3-d36a-fa037973e632", 'channels':{'$in':channels}}}, {"$group":{'_id': '$channels', 'averageMSD':{'$avg':'$msd'}}} ] for res in db.aggregate(pipeline=pipe): print(res)

结果:

{'averageMSD': 90.0, '_id': ['channel_2']} {'averageMSD': 25.0, '_id': ['channel_3', 'channel_2', 'channel_1', 'channel_4']}

结果仍然不是我想要的。 似乎我不应该通过“渠道”对结果进行分组。 但我不知道如何解决它。

如何使用聚合有效地查询数据库?

This is my first time use MongoDB aggregation query. The dataset I have is as follow:

{ // doc 1 "_id" : ObjectId("55f2481bc9b4cd1c0c198c9f"), "channels" : [ "channel_3", "channel_2", "channel_1", "channel_4" ], "msd" : 25, "uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632" } { // doc 2 "_id" : ObjectId("55f2481bc9b4cd123452345f"), "channels" : [ "channel_3", "channel_4" ], "msd" : 50, "uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632" } { // doc 3 "_id" : ObjectId("55f2481bc9b4cd1c0c198c9f"), "channels" : [ "channel_2" ], "msd" : 100, "uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632" } { // doc 4 "_id" : ObjectId("55f2481bc9b4cd1c0c198c9f"), "channels" : [ "channel_2" ], "msd" : 80, "uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632" }

I have built a compound index:

userlog.create_index([('uid', ASCENDING), ('channels', ASCENDING)])

Now, given a user and an array of channels, I want to retrieve the average of msd whose at least one channel is in the query channels. For instance, a query is:

{"uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632", "channels" : ["channel_1", "channel_2"], }

The channels of doc 1 contains "channel_1" and "channel_2" and the channels of doc 3 and 4 contains "channels_2". So the expected return value is (25+100+80)/3 = 68.33

====================== TRIAL 1 =======================

CODE:

pipe=[ {"$unwind":'$channels'}, {"$match":{'uid':"000012bb-2e5a-8bd3-d36a-fa037973e632", 'channels':{'$in':channels}}}, {"$group":{'_id': '$channels', 'averageMSD':{'$avg':'$msd'}}} ] for res in db.aggregate(pipeline=pipe): print(res)

RESULT:

{'_id': 'channel_1', 'averageMSD': 25.0} {'_id': 'channel_2', 'averageMSD': 68.33333333333333}

It seems that "$unwind" makes the doc 1 is unexpectedly counted twice. Also, "$unwind" is quite slow.

====================== TRIAL 2 =======================

CODE:

pipe=[ {"$match":{'uid':"000012bb-2e5a-8bd3-d36a-fa037973e632", 'channels':{'$in':channels}}}, {"$group":{'_id': '$channels', 'averageMSD':{'$avg':'$msd'}}} ] for res in db.aggregate(pipeline=pipe): print(res)

RESULT:

{'averageMSD': 90.0, '_id': ['channel_2']} {'averageMSD': 25.0, '_id': ['channel_3', 'channel_2', 'channel_1', 'channel_4']}

The result is still not what I want. It seems that I should not group the result by "channels". But I do not how I can fix it.

How can I use aggregation to inquire the DB efficiently?

最满意答案

我想到了:

pipe=[ {"$match":{'uid':"000012bb-2e5a-8bd3-d36a-fa037973e632", 'channels':{'$in':channels}}}, {"$group":{'_id': None, 'averageMSD':{'$avg':'$msd'}}} ]

I figured it out:

pipe=[ {"$match":{'uid':"000012bb-2e5a-8bd3-d36a-fa037973e632", 'channels':{'$in':channels}}}, {"$group":{'_id': None, 'averageMSD':{'$avg':'$msd'}}} ]

更多推荐

本文发布于:2023-08-02 14:43:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1376926.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:如何使用   数据   Pymongo   Aggregation   inquiry

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!