我比我想要将Mongo中的多个文件批量导入RAM要困难得多。 我正在编写一个应用程序,通过目前有2GB的pymongo与MongoDB进行通信,但在不久的将来可能会增长到超过1TB。 因此,一次批量读取有限数量的记录到RAM中对于可伸缩性非常重要。
根据这篇文章和本文档,我认为这将是如此简单:
HOST = MongoClient(MONGO_CONN) DB_CONN = HOST.database_name collection = DB_CONN.collection_name cursor = collection.find() cursor.batch_size(1000) next_1K_records_in_RAM = cursor.next()但是,这对我不起作用。 即使我有一个填充了> 200K BSON对象的Mongo集合,它也会一次一个地读取它们作为单个字典,例如{_id : ID1, ...}而不是我正在寻找的,这是一个错误表示我的集合中的多个文档的字典,例如[{_id : ID1, ...}, {_id : ID2, ...}, ..., {_id: ID1000, ...}] 。
我不希望这很重要,但我使用的是python 3.5而不是2.7。
由于此示例引用了安全的远程数据源,因此这不是可重现的示例。 为此道歉。 如果您对如何改进问题有任何建议,请告诉我。
I'm having a much more difficult time than I thought I would importing multiple documents from Mongo into RAM in batch. I am writing an application to communicate with a MongoDB via pymongo that currently has 2GBs, but in the near future could grow to over 1TB. Because of this, batch reading a limited number of records into RAM at a time is important for scalability.
Based on this post and this documentation I thought this would be about as easy as:
HOST = MongoClient(MONGO_CONN) DB_CONN = HOST.database_name collection = DB_CONN.collection_name cursor = collection.find() cursor.batch_size(1000) next_1K_records_in_RAM = cursor.next()This isn't working for me, however. Even though I have a Mongo collection populated with >200K BSON objects, this reads them in one at a time as single dictionaries, e.g. {_id : ID1, ...} instead of what I'm looking for, which is an error of dictionaries representing multiple documents in my collections, e.g. [{_id : ID1, ...}, {_id : ID2, ...}, ..., {_id: ID1000, ...}].
I wouldn't expect this to matter, but I'm on python 3.5 instead of 2.7.
As this example references a secure, remote data source this isn't a reproducible example. Apologies for that. If you have a suggestion for how the question can be improved please let me know.
最满意答案
Python版本在这里无关紧要,与您的输出无关。 Batch_size仅定义mongoDB在一次DB中返回的文档数量(在某些限制下: 请参见此处 ) collection.find始终返回迭代器/游标或None。 批处理可以透明地完成其工作)(如果没有找到文档则更晚)要检查返回的文档,您必须遍历游标,即
For document in cursor: print (document)
或者如果你想要一份文件list(cursor) : list(cursor)
如果需要重新访问文档,请记住执行cursor.rewind() Python version is irrelevant here, nothing to do with your output. Batch_size defines only how many documents mongoDB returns in a single trip to DB (under some limitations: see here here ) collection.find always returns an iterator/cursor or None. Batching does its job transparently) (the later if no documents are found)To examine returned documents you have to iterate through the cursor i.e.
For document in cursor: print (document)
or if you want a list of the documents: list(cursor)
Remember to do a cursor.rewind() if you need to revisit the documents更多推荐
发布评论