为什么这个postgres存储过程想要`使用utf8`？(Why does this postgres stored procedure want to `use utf8`?)

编程入门行业动态更新时间:2024-10-11 01:09:46

我在使用Perl 5.12.4的Postgres 9.2上的plperl存储过程中遇到了一个特殊性。

使用这个“破碎的”SP可以重现好奇的行为：

CREATE FUNCTION foo(VARCHAR) RETURNS VARCHAR AS $$ my ( $re ) = @_; $re = ''.qr/\b($re)\b/i; return $re; $$ LANGUAGE plperl;

执行时：

# select foo('foo'); ERROR: Unable to load utf8.pm into plperl at line 3. BEGIN failed--compilation aborted. CONTEXT: PL/Perl function "foo"

但是，如果我将qr//操作移动到eval中，它可以工作：

CREATE OR REPLACE FUNCTION bar(VARCHAR) RETURNS VARCHAR AS $$ my ( $re ) = @_; eval "\$re = ''.qr/\\b($re)\\b/i;"; return $re; $$ LANGUAGE plperl;

结果：

# select bar('foo'); bar ----------------- (?^i:\b(foo)\b) (1 row)

为什么eval会绕过自动use utf8 ？

为什么首先use utf8甚至是必需的？我的代码不是UTF8，据说这是唯一一个应该use utf8 。

如果有的话，我可能希望eval版本在不use utf8的情况下中断，在脚本输入包含非ASCII值的情况下。（进一步测试表明，将非ASCII值传递给bar（）确实会导致eval失败并出现相同的错误）

请注意，许多Postgres安装会在启动perl解释器时自动加载“utf8”。这是Debian中的默认值，正如执行 DO 'elog(WARNING, join ", ", sort keys %INC)' language plperl; ：

警告：Carp.pm，Carp / Heavy.pm，Exporter.pm，feature.pm，overload.pm，strict.pm，unicore / Heavy.pl，unicore / To / Fold.pl，unicore / lib / Perl / SpacePer。 pl，utf8.pm，utf8_heavy.pl，vars.pm，warnings.pm，warnings / register.pm CONTEXT：PL / Perl匿名代码块做

但在机器上却没有表现出奇怪的行为：

警告：Carp.pm，Carp / Heavy.pm，Exporter.pm，feature.pm，overload.pm，overloading.pm，strict.pm，vars.pm，warnings.pm，warnings / register.pm CONTEXT：PL / Perl匿名代码块做

这个问题不是关于如何让我的目标机器自动加载utf8; 我知道该怎么做。我很好奇为什么它首先似乎是必要的。

I have come across a peculiarity in a plperl stored procedure on Postgres 9.2 with Perl 5.12.4.

The curious behavior can be reproduced using this "broken" SP:

CREATE FUNCTION foo(VARCHAR) RETURNS VARCHAR AS $$ my ( $re ) = @_; $re = ''.qr/\b($re)\b/i; return $re; $$ LANGUAGE plperl;

When executed:

# select foo('foo'); ERROR: Unable to load utf8.pm into plperl at line 3. BEGIN failed--compilation aborted. CONTEXT: PL/Perl function "foo"

However, if I move the qr// operation into an eval, it works:

CREATE OR REPLACE FUNCTION bar(VARCHAR) RETURNS VARCHAR AS $$ my ( $re ) = @_; eval "\$re = ''.qr/\\b($re)\\b/i;"; return $re; $$ LANGUAGE plperl;

Result:

# select bar('foo'); bar ----------------- (?^i:\b(foo)\b) (1 row)

Why does the eval bypass the automatic use utf8?

Why is use utf8 even required in the first place? My code is not in UTF8, which is said to be the only time one should use utf8.

If anything, I might expect the eval version to break without use utf8, in the case where the input to the script contained non-ASCII values. (Further testing shows that passing non-ASCII values to bar() does indeed cause the eval to fail with the same error)

Note that many Postgres installations automatically load 'utf8' on startup of the perl interpreter. This is the default in Debian at least, as demonstrated by executing DO 'elog(WARNING, join ", ", sort keys %INC)' language plperl;:

WARNING: Carp.pm, Carp/Heavy.pm, Exporter.pm, feature.pm, overload.pm, strict.pm, unicore/Heavy.pl, unicore/To/Fold.pl, unicore/lib/Perl/SpacePer.pl, utf8.pm, utf8_heavy.pl, vars.pm, warnings.pm, warnings/register.pm CONTEXT: PL/Perl anonymous code block DO

But not so on the machine demonstrating the odd behavior:

WARNING: Carp.pm, Carp/Heavy.pm, Exporter.pm, feature.pm, overload.pm, overloading.pm, strict.pm, vars.pm, warnings.pm, warnings/register.pm CONTEXT: PL/Perl anonymous code block DO

This question is not about how to get my target machine to load utf8 automatically; I know how to do that. I'm curious why it seems to be necessary in the first place.

最满意答案

在失败的验证中，你正在执行

$re = ''.qr/\b($re)\b/i

在成功的版本中，您正在执行

$re = ''.qr/\b(foo)\b/i

当模式被编译为Unicode模式（无论这意味着什么）时，听起来像qr //需要utf8.pm，但后者不会被编译为Unicode模式。

加载utf8.pm失败是由于plperl创建的Safe隔离区所施加的限制。

修复方法是将模块加载到Safe隔离专区之外。

解决方法是使用更高效的方法

$re = '(?^u:\\b(?i:'.$re.')\\b)';

In the verison that's failing, you're executing

$re = ''.qr/\b($re)\b/i

In the version that's succeeding, you're executing

$re = ''.qr/\b(foo)\b/i

Sounds like qr// needs utf8.pm when the pattern was compiled as a Unicode pattern (whatever that means), but the latter isn't compiled as a Unicode pattern.

The failure to load utf8.pm is due to the limitations imposed by the Safe compartment created by plperl.

The fix is to load the module outside the Safe compartment.

The workaround is to use the more efficient

$re = '(?^u:\\b(?i:'.$re.')\\b)';

更多推荐

本文发布于:2023-07-27 03:57:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1284839.html