如何在postgres中编写combinatorics函数?

编程入门行业动态更新时间:2024-10-16 22:19:04

本文介绍了如何在postgres中编写combinatorics函数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有这种形式的PostgreSQL表:

I have a PostgreSQL table of this form:

base_id int | mods smallint[] 3 | {7,15,48}

我需要填充这种形式的表:

I need to populate a table of this form:

combo_id int | base_id int | mods smallint[] 1 | 3 | 2 | 3 | {7} 3 | 3 | {7,15} 4 | 3 | {7,48} 5 | 3 | {7,15,48} 6 | 3 | {15} 7 | 3 | {15,48} 8 | 3 | {48}

我认为我可以使用几乎做到这一点的函数来完成此任务，该函数遍历第一个表并将组合写入第二个表: 在SQL中生成所有组合

I think I could accomplish this using a function that does almost exactly this, iterating over the first table and writing combinations to the second table: Generate all combinations in SQL

但是，我是Postgres的新手，我一生都无法弄清楚如何使用plpgsql做到这一点.它并不需要特别快；它只会在后端定期运行.第一张表大约有80条记录，粗略计算表明我们可以预期第二张表大约有2600条记录.

But, I'm a Postgres novice and cannot for the life of me figure out how to do this using plpgsql. It doesn't need to be particularly fast; it will only be run periodically on the backend. The first table has approximately 80 records and a rough calculation suggests we can expect around 2600 records for the second table.

有人能至少指出我正确的方向吗?

Can anybody at least point me in the right direction?

:Craig:我有PostgreSQL 9.0.我可以成功使用UNNEST():

Craig: I've got PostgreSQL 9.0. I was successfully able to use UNNEST():

FOR messvar IN SELECT * FROM UNNEST(mods) AS mod WHERE mod BETWEEN 0 AND POWER(2, @n) - 1 LOOP RAISE NOTICE '%', messvar; END LOOP;

但随后不知道下一步该怎么做.

but then didn't know where to go next.

作为参考，我最终使用了Erwin的解决方案，在其中添加了一行以向每个集合添加一个空结果('{}')，并删除了Erwin提到的特殊情况:

For reference, I ended up using Erwin's solution, with a single line added to add a null result ('{}') to each set and the special case Erwin refers to removed:

CREATE OR REPLACE FUNCTION f_combos(_arr integer[], _a integer[] DEFAULT '{}'::integer[], _z integer[] DEFAULT '{}'::integer[]) RETURNS SETOF integer[] LANGUAGE plpgsql AS $BODY$ DECLARE i int; j int; _up int; BEGIN IF array_length(_arr,1) > 0 THEN _up := array_upper(_arr, 1); IF _a = '{}' AND _z = '{}' THEN RETURN QUERY SELECT '{}'::int[]; END IF; FOR i IN array_lower(_arr, 1) .. _up LOOP FOR j IN i .. _up LOOP CASE j-i WHEN 0,1 THEN RETURN NEXT _a || _arr[i:j] || _z; ELSE RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z; RETURN QUERY SELECT * FROM f_combos(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z); END CASE; END LOOP; END LOOP; ELSE RETURN NEXT _arr; END IF; END; $BODY$

然后，我使用该函数填充表格:

Then, I used that function to populate my table:

INSERT INTO e_ecosystem_modified (ide_ecosystem, modifiers) (SELECT ide_ecosystem, f_combos(modifiers) AS modifiers FROM e_ecosystem WHERE ecosystemgroup <> 'modifier' ORDER BY ide_ecosystem, modifiers);

在源表中的79行中，修饰符数组中最多包含7个项目，查询用了250ms的时间在我的输出表中填充了2630行.很棒.

From 79 rows in my source table with a maximum of 7 items in the modifiers array, the query took 250ms to populate 2630 rows in my output table. Fantastic.

推荐答案

在我睡过之后，我有了一个全新，更简单，更快的主意:

After I slept over it I had a completely new, simpler, faster idea:

CREATE OR REPLACE FUNCTION f_combos(_arr anyarray) RETURNS TABLE (combo anyarray) LANGUAGE plpgsql AS $BODY$ BEGIN IF array_upper(_arr, 1) IS NULL THEN combo := _arr; RETURN NEXT; RETURN; END IF; CASE array_upper(_arr, 1) -- WHEN 0 THEN -- does not exist WHEN 1 THEN RETURN QUERY VALUES ('{}'), (_arr); WHEN 2 THEN RETURN QUERY VALUES ('{}'), (_arr[1:1]), (_arr), (_arr[2:2]); ELSE RETURN QUERY WITH x AS ( SELECT fbo FROM f_combos(_arr[1:array_upper(_arr, 1)-1]) f ) SELECT xbo FROM x UNION ALL SELECT xbo || _arr[array_upper(_arr, 1)] FROM x; END CASE; END $BODY$;

致电:

SELECT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;

512行，总运行时间:2.899毫秒

512 rows, total runtime: 2.899 ms

使用NULL和空数组处理特殊情况.
为两个原始数组构建组合.
任何更长的数组都分解为:
- 长度为n-1的相同数组的组合
- 加上所有与元素n组合的元素.递归.
- Treat special cases with NULL and empty array.
- Build combinations for a primitive array of two.
- Any longer array is broken down into:
  - the combinations for same array of length n-1
  - plus all of those combined with element n .. recursively.
  一经掌握，便非常简单.
  
  Really simple, once you got it.
  - 适用于以下标1 开头的一维整数数组(请参见下文).
  - 速度是旧解决方案的2-3倍，扩展性更好.
  - 再次使用任何元素类型(使用多态类型).
  - 在问题中(以及@Craig在评论中向我指出)中显示的结果中包含空数组.
  - 更短，更优雅.
  - Works for 1-dimensional integer arrays starting with subscript 1 (see below).
  - 2-3 times as fast as old solution, scales better.
  - Works for any element type again (using polymorphic types).
  - Includes the empty array in the result as is displayed in the question (and as @Craig pointed out to me in the comments).
  - Shorter, more elegant.
  这假定数组下标从 1 (默认)开始.如果您不确定自己的值，请调用以下函数进行标准化:
  
  This assumes array subscripts starting at 1 (Default). If you are not sure about your values, call the function like this to normalize:
  SELECT * FROM f_combos(_arr[array_lower(_arr, 1):array_upper(_arr, 1)]);
  不确定是否有更优雅的方法来标准化数组下标.我对此发表了一个问题: 对一维数组的数组下标进行归一化，使其以1开头
  
  Not sure if there is a more elegant way to normalize array subscripts. I posted a question about that: Normalize array subscripts for 1-dimensional array so they start with 1
  CREATE OR REPLACE FUNCTION f_combos2(_arr int[], _a int[] = '{}', _z int[] = '{}') RETURNS SETOF int[] LANGUAGE plpgsql AS $BODY$ DECLARE i int; j int; _up int; BEGIN IF array_length(_arr,1) > 0 THEN _up := array_upper(_arr, 1); FOR i IN array_lower(_arr, 1) .. _up LOOP FOR j IN i .. _up LOOP CASE j-i WHEN 0,1 THEN RETURN NEXT _a || _arr[i:j] || _z; WHEN 2 THEN RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z; RETURN NEXT _a || _arr[i:j] || _z; ELSE RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z; RETURN QUERY SELECT * FROM f_combos2(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z); END CASE; END LOOP; END LOOP; ELSE RETURN NEXT _arr; END IF; END; $BODY$;
  致电:
  SELECT * FROM f_combos2('{7,15,48}'::int[]) ORDER BY 1;
  适用于一维整数数组. 可以进一步优化，但是对于此问题的范围肯定不是必需的. ORDER BY强制显示问题中显示的顺序.
  
  Works for 1-dimensional integer arrays. This could be further optimized, but that's certainly not needed for the scope of this question. ORDER BY to impose the order displayed in the question.
  
  提供NULL或空数组，因为注释中提到了NULL.
  
  Provide for NULL or empty array, as NULL is mentioned in the comments.
  
  已在PostgreSQL 9.1上进行了测试，但可以与任何中途的现代版本一起使用. array_lower()和array_upper() 至少自PostgreSQL 7.4起就存在了.在版本8.4中，只有参数默认值是新的.可以轻松更换.
  
  Tested with PostgreSQL 9.1, but should work with any halfway modern version. array_lower() and array_upper() have been around for at least since PostgreSQL 7.4. Only parameter defaults are new in version 8.4. Could easily be replaced.
  
  表现不错.
  SELECT DISTINCT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;
  511行，总运行时间:7.729毫秒
  
  511 rows, total runtime: 7.729 ms
  
  它基于此简单表单构建，该表单仅创建相邻元素的所有组合:
  
  It builds on this simple form that only creates all combinations of neighboring elements:
  CREATE FUNCTION f_combos(_arr int[]) RETURNS SETOF int[] LANGUAGE plpgsql AS $BODY$ DECLARE i int; j int; _up int; BEGIN _up := array_upper(_arr, 1); FOR i in array_lower(_arr, 1) .. _up LOOP FOR j in i .. _up LOOP RETURN NEXT _arr[i:j]; END LOOP; END LOOP; END; $BODY$;
  但是，这对于包含两个以上元素的子数组将失败.因此:
  
  But this will fail for sub-arrays with more than two elements. So:
  - 对于任何具有3个元素的子数组，将添加一个仅包含外部两个元素的数组.这是提高性能的特殊情况的快捷方式，并且并非严格要求.
  对于具有3个以上元素的任何子数组，我采用两个外部元素并填充由同一函数构建的内部元素的所有组合递归.
  
  For any sub-array with more than 3 elements I take the outer two elements and fill in with all combinations of inner elements built by the same function recursively.