Mongo查询按不同计数排序

编程入门 行业动态 更新时间:2024-10-11 07:30:57
本文介绍了Mongo查询按不同计数排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有两个字段公司"和网址".我想按不同公司"发生的次数对其进行排序,然后显示与该特定公司相对应的三个网址".数据是这样存储的:

I have two fields 'company' and 'url'. I want to sort it by number of times distinct 'company' is occurring and then display three 'url' corresponding to that particular company. Data is stored like this:

{ "_id" : ObjectId("56c4f73664af6f7305f3670f"), "title" : "Full Stack Software Developer", "url" : "www.indeed/cmp/Upside-Commerce,-Inc./jobs/Full-Stack-Software-Developer-6e93e36ea5d0e57e?sjdu=QwrRXKrqZ3CNX5W-O9jEvRQls7y2xdBHzhqWkvhd5FFfs8wS9wesfMWXjNNFaUXen2pO-kyc_Qbr7-_3Gf40AvyEQT3jn6IRxIwvw9-aFy8", "company" : "Upside Commerce, Inc." }

以下查询计算不同公司的数量.

following query counts the number of distinct companies.

db.Books.aggregate({$group : { _id : '$company', count : {$sum : 1}}})

以下是输出:

{ "_id" : "Microsoft", "count" : 14 } { "_id" : "Tableau", "count" : 64 } { "_id" : "Amazon", "count" : 64 } { "_id" : "Dropbox", "count" : 64 } { "_id" : "Amazon Corporate LLC", "count" : 64 } { "_id" : "Electronic Arts", "count" : 64 } { "_id" : "CDK Global", "count" : 65 } { "_id" : "IDC Technologies", "count" : 64 } { "_id" : "Concur", "count" : 64 } { "_id" : "Microsoft", "count" : 14 } { "_id" : "Tableau", "count" : 64 } { "_id" : "Amazon", "count" : 64 } { "_id" : "Dropbox", "count" : 64 } { "_id" : "Amazon Corporate LLC", "count" : 64 } { "_id" : "Electronic Arts", "count" : 64 } { "_id" : "CDK Global", "count" : 65 } { "_id" : "IDC Technologies", "count" : 64 } { "_id" : "Concur", "count" : 64 }

但是我希望它按不同公司的数量排序(将其限制为前 10 名出现率最高的公司),然后显示与不同公司相对应的三个网址(如果不同公司的计数至少为三个).比如:

However I want it sort by count of distinct companies (limit it to Top 10 highest occurring companies) and then display three urls corresponding to distinct company (if count for distinct company is atleast three). Something like:

{for microsoft: {"url" : "careers.microsoft/jobdetails.aspx?jid=216571&memid=1071484607&utm_source=Indeed"} {"url" : "careers.microsoft/jobdetails.aspx?jid=216571&memid=1695844082&utm_source=Indeed" } { "url" : "careers.microsoft/jobdetails.aspx?jid=216571&memid=932148152&utm_source=Indeed"}}

其他公司也一样

推荐答案

这确实(仍然)最好由多个查询处理,因为 MongoDB 确实仍然"没有真正有效的运算符来执行此操作.

This really is (still) best handled by multiple queries, since MongoDB really "still" does not have the really efficient operators to do this yet.

你可以用 MongoDB 3.2 做这样的事情,但是有明显的问题":

You can do something like this with MongoDB 3.2 though, but there are obvious "catches":

db.Books.aggregate([ { "$group": { "_id": "$company", "count": { "$sum": 1 }, "urls": { "$push": "$url" } }}, { "$sort": { "count": -1 } }, { "$limit": 10 }, { "$project": { "count": 1, "urls": { "$slice": ["$urls",0, 3] } }} ])

显而易见的问题是,无论如何,您仍然将 all 的url"内容添加到分组数组中.这有可能超过 16MB 的 BSON 限制.可能不会,但是当您只想要三个"内容时添加所有"内容仍然有点浪费.

And the obvious problem is that no matter what, you are still adding all of the "url" content into the grouped array. This has the potential to exceed the BSON limit of 16MB. It might not, but it's still a bit wasteful to add "all" content when you only want "three" of them.

因此,即便如此,在前 10 个结果中分别实际查询url"可能更实用.

So even then it's probably more practical to just actually query for the "urls" seperately on each of the top 10 results.

这是一个 node.js 的清单,用于演示:

Here's a listing for node.js that demonstrates:

var async = require('async'), mongodb = require('mongodb'), MongoClient = mongodb.MongoClient; MongoClient.connect("mongodb://localhost/test",function(err,db) { if (err) throw err; // Get the top 10 db.collection("Books").aggregate( [ { "$group": { "_id": "$company", "count": { "$sum": 1 } }}, { "$sort": { "count": -1 } }, { "$limit": 10 } ],function(err,results) { if (err) throw err; // Query for each result and map query response as urls async.map( results, function(result,callback) { db.collection("Books").find({ "company": resultpany }).limit(3).toArray(function(err,items) { result.urls = items.map(function(item) { return item.url; }); callback(err,result); }) }, function(err,results) { if (err) throw err; // each result entry has 3 urls } ); } ) });

是的,它对数据库的调用次数更多,但确实只有十次,因此不是真正的问题.

Yes it's more calls to the database, but it really is only ten and therefore not really an issue.

SERVER-9377 - Extend$push 或 $max 允许在 $group 阶段为每个 _id 键收集前"N 个值.这具有有希望的进行中"状态,因此正在积极进行中.

The real resolution for this is covered in SERVER-9377 - Extend $push or $max to allow collecting "top" N values per _id key in $group phase. This has the promising "In Progress" status, so it is actively being worked on.

一旦解决,单个聚合语句就变得可行,从那时起,您就可以将初始 $push 中的结果url"限制"为三个条目,而不是事后删除除三个以外的所有内容.

Once that is resolved, then a single aggregation statement becomes viable, since then you would be able to "limit" the resulting "urls" in the intial $push to just three entries, rather than remove all but three after the fact.

更多推荐

Mongo查询按不同计数排序

本文发布于:2023-10-26 18:13:38,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1530947.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:按不   Mongo

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!