我正在尝试从表中获取 15,000 个 ID 的 25 个随机样本.我不是每次都手动按运行,而是尝试进行循环.我完全理解这不是 Postgres 的最佳使用,但它是我拥有的工具.这是我目前所拥有的:
I am trying to get 25 random samples of 15,000 IDs from a table. Instead of manually pressing run every time, I'm trying to do a loop. Which I fully understand is not the optimum use of Postgres, but it is the tool I have. This is what I have so far:
for i in 1..25 LOOP insert into playtime.meta_random_sample select i, ID from tbl order by random() limit 15000 end loop 推荐答案过程元素,如循环不是SQL语言的一部分,只能在过程语言的主体内使用函数、procedure(Postgres 11 或更高版本)或 DO 语句,其中此类附加元素由相应的过程语言定义.默认为 PL/pgSQL,但 还有其他的.
Procedural elements like loops are not part of the SQL language and can only be used inside the body of a procedural language function, procedure (Postgres 11 or later) or a DO statement, where such additional elements are defined by the respective procedural language. The default is PL/pgSQL, but there are others.
plpgsql 示例:
Example with plpgsql:
DO $do$ BEGIN FOR i IN 1..25 LOOP INSERT INTO playtime.meta_random_sample (col_i, col_id) -- declare target columns! SELECT i, id FROM tbl ORDER BY random() LIMIT 15000; END LOOP; END $do$;对于许多可以用循环解决的任务,有一种更短、更快的基于集合的解决方案即将出现.您的示例的纯 SQL 等效项:
For many tasks that can be solved with a loop, there is a shorter and faster set-based solution around the corner. Pure SQL equivalent for your example:
INSERT INTO playtime.meta_random_sample (col_i, col_id) SELECT t.* FROM generate_series(1,25) i CROSS JOIN LATERAL ( SELECT i, id FROM tbl ORDER BY random() LIMIT 15000 ) t;关于generate_series():
- SELECT 子句中多个返回集合的函数的预期行为是什么?
关于优化随机选择的性能:
About optimizing performance of random selections:
- PostgreSQL 选择随机行的最佳方法
更多推荐
Postgres FOR 循环
发布评论