我有一个Postgres数据库,其中包含大约200,000,000行和6列。这些列具有int,date和string,但没有主键,也没有作为主键基础的唯一值。
I've got a postgres DB with around 200,000,000 rows and 6 columns. The columns have int, date, and string, but no primary key and no unique values on which to base a primary key.
此数据库中的记录包含原始数据我需要我的rails postgres模型之一。我想遍历整个外部数据库,对它的每一行进行计算,然后将输出写入我的rails模型。
The records in this DB contain the raw data I need for one of my rails postgres models. I'd like to iterate through the full external DB, perform a calculation on each row of it, and then write the output to my rails model.
我已经连接到数据库或通过ActiveRecord访问记录都没有问题,但是我尝试遍历数据库的所有操作都失败或花费了太长时间。我尝试了以下操作:
I've got no issue connecting to the DB or accessing records through ActiveRecord, but everything I try for iterating over the DB is failing or taking far too long. I've tried the following:
- ExternalDB.all.each
- ExternalDB.find_all.each
- 使用这些说明
- ExternalDB.all.each
- ExternalDB.find_all.each
- Adding an "id" column to ExternalDB using these instructions
我认为答案将是使用SQL进行迭代,但是我什至不知道如何开始。
I think the answer will be to do the iterations using SQL, but I'm not even sure how to start on that.
推荐答案您将要使用协议级别的游标或带有 DECLARE 和 FETCH 的SQL级游标。
You will want to use a cursor, either a protocol-level one or an SQL-level cursor with DECLARE and FETCH.
很方便地,有人已经为PostgreSQL游标编写了ActiveRecord适配器;参见 rubygems 。
您可能还会发现这个问题的内容翔实:是否存在使用游标或智能获取的Ruby ORM?
You might also find this question informative: Are there any Ruby ORMs which use cursors or smart fetch?
我还没有检查源代码/文档,以查看Pg gem是否支持PostgreSQL的协议级游标用于批量读取,但是如果已经有工具(如上链接),可能就不值得探索
I haven't checked the source code / docs to see if the Pg gem supports PostgreSQL's protocol-level cursors for batched reads, but if there's already a tool to do it (as linked above) it's probably not worth exploring.
更多推荐
遍历大型外部postgres db,操作行,将输出写入rails postgres db
发布评论