有人可以帮助我了解在数据集中有行的情况下, regr_slope 返回NULL的情况吗?例如:
Can someone please help me understand the circumstances in which regr_slope returns NULL where there are rows in the data set? For example:
log=> select * from sb1 order by id, ts; id | elapsed | ts ------+---------+---------------- 317e | 86 | 1552861322.627 317e | 58 | 1552861324.747 317e | 52 | 1552861325.722 317e | 58 | 1552861326.647 317e | 82 | 1552861327.609 317e | 118 | 1552861328.514 317e | 58 | 1552861329.336 317e | 58 | 1552861330.317 317e | 54 | 1552861330.935 3441 | 68 | 1552861324.765 3441 | 84 | 1552861326.665 3441 | 56 | 1552861327.627 3441 | 50 | 1552861330.952 5fe6 | 42 | 1552993248.398 5fe6 | 44 | 1552993255.883 5fe6 | 44 | 1553166049.261 c742 | 62 | 1552861322.149 c742 | 68 | 1552861322.455 (18 rows) log=> select id, regr_slope(elapsed, ts) as trend from sb1 group by id; id | trend ------+---------------------- c742 | 317e | 5fe6 | 5.78750952760444e-06 3441 | (4 rows)有趣的是,相同的数据集和函数在Oracle 11.2中返回不同的结果:
Interestingly, the same dataset and function returns different results in Oracle 11.2 :
SQL> select * from sb1 order by id, ts; ID ELAPSED TS ---------- ---------- ---------------- 317e 86 1552861322.627 317e 58 1552861324.747 317e 52 1552861325.722 317e 58 1552861326.647 317e 82 1552861327.609 317e 118 1552861328.514 317e 58 1552861329.336 317e 58 1552861330.317 317e 54 1552861330.935 3441 68 1552861324.765 3441 84 1552861326.665 3441 56 1552861327.627 3441 50 1552861330.952 5fe6 42 1552993248.398 5fe6 44 1552993255.883 5fe6 44 1553166049.261 c742 62 1552861322.149 c742 68 1552861322.455 18 rows selected. SQL> select id, regr_slope(elapsed, ts) from sb1 group by id; ID REGR_SLOPE(ELAPSED,TS) ---------- ---------------------- c742 19.6078431 5fe6 5.7875E-06 317e -1.0838511 3441 -3.8283951尽管 5fe6 的结果是相同的,但我不知道这是否意味着Postgres,Oracle或两者都不存在问题.
I don't know if this means there is a problem with Postgres, Oracle, neither or both, although the results for 5fe6 are the same.
推荐答案在深入研究代码之后,我得到了答案:
After digging into the code, I have the answer:
问题在于,在这种情况下,PostgreSQL的原始方法到v12会导致不必要的大舍入错误.
The problem is that the naïve approach of PostgreSQL up to v12 leads to unnecessarily big rounding errors in this case.
让我们考虑 id ='c742':
regr_slope 的公式如下:
regr_slope :=( N ⋅Σ( X i ⋅ Y i )-Σ X i ⋅Σ Y i )/( N ⋅Σ( X i 2 )-Σ X i ⋅Σ X i )
regr_slope := (N ⋅ Σ(Xi⋅Yi) - ΣXi ⋅ ΣYi) / (N ⋅ Σ(Xi2) - ΣXi ⋅ ΣXi)
问题出在除数中:
SELECT 2::float8 * (1552861322.149::float8 * 1552861322.149::float8 + 1552861322.455::float8 * 1552861322.455::float8) - (1552861322.149::float8 + 1552861322.455::float8) * (1552861322.149::float8 + 1552861322.455::float8); ?column? ---------- -2048 (1 row)由于结果为负,因此PostgreSQL返回NULL结果.
Since the result is negative, PostgreSQL returns a NULL result.
使用精确的计算(使用数字)不会发生这种情况:
This would not have happened using exact computation (using numeric):
SELECT 2 * (1552861322.149 * 1552861322.149 + 1552861322.455 * 1552861322.455) - (1552861322.149 + 1552861322.455) * (1552861322.149 + 1552861322.455); ?column? ---------- 0.093636 (1 row)自PostgreSQL提交以来 e954a727f0c8872
Since PostgreSQL commit e954a727f0c8872bf5203186ad0f5312f6183746, things are improved, and in PostgreSQL v12, PostgreSQL also returns the correct result:
select id, regr_slope(elapsed, ts) from sb1 group by id; id | regr_slope ------+----------------------- c742 | 19.607858781290517 317e | -1.0838511987808963 5fe6 | 5.787509483586743e-06 3441 | -3.828395463097356 (4 rows)更多推荐
Postgres regr
发布评论