pymongo.errors.DocumentTooLarge: ‘aggregate‘ command document too large 解决

编程入门 行业动态 更新时间:2024-10-17 19:29:00

pymongo.<a href=https://www.elefans.com/category/jswz/34/1766625.html style=errors.DocumentTooLarge: ‘aggregate‘ command document too large 解决"/>

pymongo.errors.DocumentTooLarge: ‘aggregate‘ command document too large 解决

先说明:本文没有介绍直接快速的方式(例如:加个参数,改个配置等),而是通过转换思路解决的,介意勿看!

最近在使用mongo做聚合统计的时候遇到的问题,经过一通查询,知道mongdb和pymongo16MB有文档大小限制。之前只知道mongo有一条document文档会有16MB的限制,这次聚合match查询条件里的list大小也报了超限!

需求:有个mysql订单库和mongo订单库,mongo里存的订单包含mysql存的订单,目标是统计在mongo里不在mysql里的订单的电量和订单数!

之前的代码:

filter_pipe = {"StartType": 3, "EndTime": {"$lte": end_date, "$gte": start_date}, "OrderID": {"$nin": all_mysql_orders}}
ret = db.get_collection("table_name").aggregate([{"$match": filter_pipe},{"$group": {"_id": {}, "totalPower": {"$sum": "$TotalPower"}, "count": {"$sum": 1}}}])

all_mysql_orders 有24MB,超限报了错:

File "/usr/local/python3/lib/python3.7/site-packages/pymongo/message.py", line 1070, in _raise_document_too_large
raise DocumentTooLarge("%r command document too large" % (operation,))
pymongo.errors.DocumentTooLarge: 'aggregate' command document too large

问了同事,说可以:"加大mongo的内存配置,然后重启。因为数据库是基于内存做的,另外还需要扩大buffer",但这是生产的数据,不能做这种冒失的操作,万一搞崩了,直接凉凉!

于是想到曲线救国,修改统计程序,找出所有满足条件的OrderID把$nin改成 $in,不就可以了,但是经过修改重跑后发现in后的订单列表数据量比24MB还大,这条路也不行了!

最后尝试分页聚合修改,先跑一个分页聚合的测试方法:

def test_method():aggregate_num = 5diff = [i for i in range(0, 11)]print(diff)# 整页page = len(diff) // aggregate_num# 剩余条数left = len(diff) % aggregate_num# 能被整除的情况if page and not left:print("page: {},left:{}".format(page, 0))# 不能被整除的情况else:print("page: {},left:{}".format(page, left))page += 1# 初始总数据for i in range(page):begin_index = i * aggregate_numend_index = (i + 1)*aggregate_numprint(begin_index, end_index, diff[begin_index: end_index])

 没毛病,修改统计程序的代码为分页聚合,修改后的代码:

filter_pipes = {"StartType": 3, "EndTime": {"$lte": end_date, "$gte": start_date}}
ret0 = db.get_collection("table_name").find(filter_pipes, {"OrderID": 1})
# mongo满足条件的总订单数
all_in_mongo_ids = set([order_info['OrderID'] for order_info in ret0])
# 40万数据10.3MB,一个订单号是27bytes,可以使用聚合方法
aggregate_num = 400000
# 需要聚合统计订单的订单号
diff = all_in_mongo_ids.difference(set(all_mysql_ids))
# 整页
page = len(diff) // aggregate_num
# 剩余条数
left = len(diff) % aggregate_num
# 能被整除的情况
if page and not left:print("page: {},left:{}".format(page, 0))
# 不能被整除的情况
else:print("page: {},left:{}".format(page, left))page += 1
diff_list = list(diff)
total_power, order_num = 0, 0
for i in range(page):begin_index = i * aggregate_numend_index = (i + 1) * aggregate_numfilter_pipes.update({"OrderID": {"$in": diff_list[begin_index: end_index]}})ret = db.get_collection("table_name").aggregate([{"$match": filter_pipes},{"$group": {"_id": {}, "totalPower": {"$sum": "$TotalPower"}, "count": {"$sum": 1}}}])result = [info for info in ret]order_num += result [0]['count']total_power += result [0]['totalPower']

跑起来很快,交差!

还有同事提出,可以将mysql里订单用dict的形式取出,基于hash判断在不在字典里,会快些,虽然判断会快些,但查出mongo所有的数据(300多万条),再进行循环统计电量还是会拉慢速度!

注:希望文章可以帮到你,如果有问题的地方,可以留言一起交流!

更多推荐

pymongo.errors.DocumentTooLarge: ‘aggregate‘ command document too large 解决

本文发布于:2024-03-07 23:27:03,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1719200.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:DocumentTooLarge   errors   pymongo   aggregate   large

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!