pymongo文件批量导入困难(Difficulty with document batch import, pymongo)

编程入门行业动态更新时间:2024-10-24 16:27:39

我比我想要将Mongo中的多个文件批量导入RAM要困难得多。我正在编写一个应用程序，通过目前有2GB的pymongo与MongoDB进行通信，但在不久的将来可能会增长到超过1TB。因此，一次批量读取有限数量的记录到RAM中对于可伸缩性非常重要。

根据这篇文章和本文档，我认为这将是如此简单：

HOST = MongoClient(MONGO_CONN) DB_CONN = HOST.database_name collection = DB_CONN.collection_name cursor = collection.find() cursor.batch_size(1000) next_1K_records_in_RAM = cursor.next()

但是，这对我不起作用。即使我有一个填充了> 200K BSON对象的Mongo集合，它也会一次一个地读取它们作为单个字典，例如{_id : ID1, ...}而不是我正在寻找的，这是一个错误表示我的集合中的多个文档的字典，例如[{_id : ID1, ...}, {_id : ID2, ...}, ..., {_id: ID1000, ...}] 。

我不希望这很重要，但我使用的是python 3.5而不是2.7。

由于此示例引用了安全的远程数据源，因此这不是可重现的示例。为此道歉。如果您对如何改进问题有任何建议，请告诉我。

I'm having a much more difficult time than I thought I would importing multiple documents from Mongo into RAM in batch. I am writing an application to communicate with a MongoDB via pymongo that currently has 2GBs, but in the near future could grow to over 1TB. Because of this, batch reading a limited number of records into RAM at a time is important for scalability.

Based on this post and this documentation I thought this would be about as easy as:

HOST = MongoClient(MONGO_CONN) DB_CONN = HOST.database_name collection = DB_CONN.collection_name cursor = collection.find() cursor.batch_size(1000) next_1K_records_in_RAM = cursor.next()

This isn't working for me, however. Even though I have a Mongo collection populated with >200K BSON objects, this reads them in one at a time as single dictionaries, e.g. {_id : ID1, ...} instead of what I'm looking for, which is an error of dictionaries representing multiple documents in my collections, e.g. [{_id : ID1, ...}, {_id : ID2, ...}, ..., {_id: ID1000, ...}].

I wouldn't expect this to matter, but I'm on python 3.5 instead of 2.7.

As this example references a secure, remote data source this isn't a reproducible example. Apologies for that. If you have a suggestion for how the question can be improved please let me know.

最满意答案

Python版本在这里无关紧要，与您的输出无关。 Batch_size仅定义mongoDB在一次DB中返回的文档数量（在某些限制下：请参见此处） collection.find始终返回迭代器/游标或None。批处理可以透明地完成其工作）（如果没有找到文档则更晚）

要检查返回的文档，您必须遍历游标，即

For document in cursor: print (document)

或者如果你想要一份文件list(cursor) ： list(cursor)

如果需要重新访问文档，请记住执行cursor.rewind() Python version is irrelevant here, nothing to do with your output. Batch_size defines only how many documents mongoDB returns in a single trip to DB (under some limitations: see here here ) collection.find always returns an iterator/cursor or None. Batching does its job transparently) (the later if no documents are found)

To examine returned documents you have to iterate through the cursor i.e.

For document in cursor: print (document)

or if you want a list of the documents: list(cursor)

Remember to do a cursor.rewind() if you need to revisit the documents

更多推荐

本文发布于:2023-08-03 18:11:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1396275.html