使用多字节字符集确定CLOB / NCLOB的字节长度(Determine length in bytes of CLOB/NCLOB with multibyte charset)

编程入门 行业动态 更新时间:2024-10-28 10:30:49
使用多字节字符集确定CLOB / NCLOB的字节长度(Determine length in bytes of CLOB/NCLOB with multibyte charset)

我正在使用Oracle数据库,并希望使用多字节字符集(UTF-8)确定NCLOB的字节长度。

LENGTHB()不支持具有多字节字符集的CLOB或NCLOBS。 我可以将NCLOB转换为BLOB然后获得它的长度。 但是,有没有更好的方法来做到这一点?

I'm working with an Oracle database and want to determine the length in bytes of a NCLOB using a multibyte charset (UTF-8).

LENGTHB() does not support CLOBs oder NCLOBS with multibyte charset. I could convert the NCLOB to a BLOB and get its length then. But isn't there a better way to do this?

最满意答案

Oracle将CLOB存储为UTF-16(或者可能是您的NCHARACTER_SET ?)。 每个字符都存储为两个字节。

以下是您可以查看原始数据的方法:

SQL> CREATE TABLE test_clob (c CLOB, v VARCHAR2(10 CHAR), nv NVARCHAR2(10)); Table created SQL> INSERT INTO test_clob VALUES ('0123456789', '0123456789', '0123456789'); 1 row inserted SQL> SELECT dbms_rowid.rowid_relative_fno(ROWID), 2 dbms_rowid.rowid_block_number(ROWID) 3 FROM test_clob; DBMS_ROWID.ROWID_RELATIVE_FNO( DBMS_ROWID.ROWID_BLOCK_NUMBER( ------------------------------ ------------------------------ 13 94314 SQL> alter system dump datafile 13 block 94314; System altered

导航到您的USER_DUMP_DEST目录并打开跟踪文件,您应该看到如下内容:

col 0: [56] 00 54 00 01 02 0c 80 00 00 02 00 00 00 01 00 00 03 64 c6 a5 00 24 09 00 00 00 00 00 00 14 00 00 00 00 00 01 00 30 00 31 00 32 00 33 00 34 00 35 00 36 00 37 00 38 00 39 LOB Locator: Length: 84(56) Version: 1 Byte Length: 2 LobID: 00.00.00.01.00.00.03.64.c6.a5 Flags[ 0x02 0x0c 0x80 0x00 ]: Type: CLOB [...] Inline data[20] [...] col 1: [10] 30 31 32 33 34 35 36 37 38 39 col 2: [20] 00 30 00 31 00 32 00 33 00 34 00 35 00 36 00 37 00 38 00 39

如您所见, CLOB (第0列)由几个头字节和与UTF-16 NVARCHAR2列相同的字节原始数据组成。

因此我认为您必须将CLOB转换为UTF-8以确定其在此字符集中的长度。

这是我使用DBMS_LOB.converttoblob的一个例子:

SQL> DECLARE 2 l_clob CLOB := 'abcdéfghij'; -- the é will take two bytes! 3 l_blob BLOB; 4 l_dest_offset NUMBER := 1; 5 l_src_offset NUMBER := 1; 6 l_lang_context NUMBER := 0; 7 l_warning NUMBER; 8 BEGIN 9 dbms_lob.createtemporary(l_blob, FALSE, dbms_lob.call); 10 dbms_lob.converttoblob(dest_lob => l_blob, 11 src_clob => l_clob, 12 amount => dbms_lob.lobmaxsize, 13 dest_offset => l_dest_offset, 14 src_offset => l_src_offset, 15 blob_csid => nls_charset_id('AL32UTF8'), 16 lang_context => l_lang_context, 17 warning => l_warning); 18 dbms_output.put_line('byte length:'||dbms_lob.getlength(l_blob)); 19 dbms_lob.freetemporary(l_blob); 20 END; 21 / byte length:11 PL/SQL procedure successfully completed

您可以使用函数nls_charset_id转换为任何字符集。

Oracle stores CLOB in UTF-16 (or possibly your NCHARACTER_SET?). Each character is stored as two bytes.

Here's how you can see the raw data:

SQL> CREATE TABLE test_clob (c CLOB, v VARCHAR2(10 CHAR), nv NVARCHAR2(10)); Table created SQL> INSERT INTO test_clob VALUES ('0123456789', '0123456789', '0123456789'); 1 row inserted SQL> SELECT dbms_rowid.rowid_relative_fno(ROWID), 2 dbms_rowid.rowid_block_number(ROWID) 3 FROM test_clob; DBMS_ROWID.ROWID_RELATIVE_FNO( DBMS_ROWID.ROWID_BLOCK_NUMBER( ------------------------------ ------------------------------ 13 94314 SQL> alter system dump datafile 13 block 94314; System altered

Navigate to your USER_DUMP_DEST directory and open the trace file, you should see something like this:

col 0: [56] 00 54 00 01 02 0c 80 00 00 02 00 00 00 01 00 00 03 64 c6 a5 00 24 09 00 00 00 00 00 00 14 00 00 00 00 00 01 00 30 00 31 00 32 00 33 00 34 00 35 00 36 00 37 00 38 00 39 LOB Locator: Length: 84(56) Version: 1 Byte Length: 2 LobID: 00.00.00.01.00.00.03.64.c6.a5 Flags[ 0x02 0x0c 0x80 0x00 ]: Type: CLOB [...] Inline data[20] [...] col 1: [10] 30 31 32 33 34 35 36 37 38 39 col 2: [20] 00 30 00 31 00 32 00 33 00 34 00 35 00 36 00 37 00 38 00 39

As you can see the CLOB (column 0) is composed of a few header bytes and the same byte raw data as the UTF-16 NVARCHAR2 column.

As such I think you will have to convert your CLOB to UTF-8 to determine its length in this character set.

Here's an example I've used using DBMS_LOB.converttoblob:

SQL> DECLARE 2 l_clob CLOB := 'abcdéfghij'; -- the é will take two bytes! 3 l_blob BLOB; 4 l_dest_offset NUMBER := 1; 5 l_src_offset NUMBER := 1; 6 l_lang_context NUMBER := 0; 7 l_warning NUMBER; 8 BEGIN 9 dbms_lob.createtemporary(l_blob, FALSE, dbms_lob.call); 10 dbms_lob.converttoblob(dest_lob => l_blob, 11 src_clob => l_clob, 12 amount => dbms_lob.lobmaxsize, 13 dest_offset => l_dest_offset, 14 src_offset => l_src_offset, 15 blob_csid => nls_charset_id('AL32UTF8'), 16 lang_context => l_lang_context, 17 warning => l_warning); 18 dbms_output.put_line('byte length:'||dbms_lob.getlength(l_blob)); 19 dbms_lob.freetemporary(l_blob); 20 END; 21 / byte length:11 PL/SQL procedure successfully completed

You can convert to any character set using the function nls_charset_id.

更多推荐

本文发布于:2023-07-25 10:35:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1259967.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多字   字符集   字节   长度   CLOB

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!