如何在R中将PostgreSQL的bytea列十六进制解码为int16 / uint16？

编程入门行业动态更新时间:2024-10-26 22:17:08

本文介绍了如何在R中将PostgreSQL的bytea列十六进制解码为int16 / uint16？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一些图像数据作为bytea存储在PostgreSQL数据库表列中。我也有关于数据的元数据，可用于解释它，相关的是图像尺寸和类。类包括int16，uint16。我找不到在R中正确解释带符号/无符号整数的任何信息。

I have some image data stored in a PostgreSQL database table column as bytea. I also have metadata about the data for use in interpreting it, relevant ones being image dimensions and class. Classes include int16, uint16. I cannot find any information on interpreting signed/unsigned ints correctly in R.

我正在使用RPostgreSQL将数据提取到R中，并且我想在R中查看图像。

I am using RPostgreSQL to pull the data into R and I want to view the image in R.

MWE：

# fakeDataQuery <- dbGetQuery(conn, # 'select byteArray, ImageSize, ImageClass from table where id = 1') # Example 1 (no negative numbers) # the actual byte array shown in octal sequences in pgadmin (1.22.2) Query Output is: # "\001\000\002\000\003\000\004\000\005\000\006\000\007\000\010\000\011\000" # but RPostgreSQL returns the hex-encoded version: byteArray <- "\\x010002000300040005000600070008000900" ImageSize <- c(3, 3, 1) ImageClass <- 'int16' # expected result > array(c(1,2,3,4,5,6,7,8,9), dim=c(3,3,1)) # , , 1 # # [,1] [,2] [,3] #[1,] 1 4 7 #[2,] 2 5 8 #[3,] 3 6 9 # Example 2: (with negtive numbers) byteArray <- "\\xffff00000100020003000400050006000700080009000a00" ImageSize <- c(3, 4, 1) ImageClass <- 'int16' # expectedResult > array(c(-1,0,1,2,3,4,5,6,7,8,9,10), dim=c(3,4,1)) #, , 1 # # [,1] [,2] [,3] [,4] #[1,] -1 2 5 8 #[2,] 0 3 6 9 #[3,] 1 4 7 10

我尝试过的操作：

来自PostgreSQL的bytea数据是一个长字符串，编码为十六进制，您可以通过 \\x来辨别预先添加到它中（我想还有一个 \ 用来转义现有的吗？）：> www.postgresql/docs/9.1/static/datatype-binary.html （请参阅：第8.4.1节字节十六进制格式）

The bytea data from PostgreSQL is a long character string of digits encoded as "hex", which you can tell by the \\x pre-pended to it (I believe there is an extra \ for escaping the existing one?): www.postgresql/docs/9.1/static/datatype-binary.html (see: section 8.4.1. 'bytea Hex format')

将十六进制解码回原始类型（基于ImageClass的 int16）

Decode 'hex' back to the original type ('int16' based on ImageClass)

每个同一网址以上，十六进制编码使用每个字节2个十六进制数字。因此，我需要将编码的byteArray拆分为适当的长度子字符串，请参见：此链接

Per the same url above, hex encoding uses '2 hexadecimal digits per byte'. So I need to split the encoded byteArray into the appropriate length substrings, see: this link

# remove the \\x hex encoding indicator(s) added by PostgreSQL byteArray <- gsub("\\x", "", x = byteArray, fixed=T) l <- 2 # hex digits per byte (substring length) byteArray <- strsplit(trimws(gsub(pattern = paste0("(.{",l,"})"), replacement = "\\1 ", x = byteArray)), " ")[[1]] # for some reason these appear to be in the opposite order than i expect # Ex: 1 is stored as '0100' rather than '0001' # so reverse the digits (int16 specific) byteArray <- paste0(byteArray[c(F,T)],byteArray[c(T,F)]) # strtoi() converts a vector of hex values given a decimal base byteArray <- strtoi(byteArray, 16L) # now make it into an n x m x s array, # e.g., 512 x 512 x (# slices) V = array(byteArray, dim = ImageSize)

此解决方案有两个问题：

There are two problems with this solution:

不适用于有符号类型，因此负整数值将被解释为无符号值（例如，'ffff'为-1（int16）但65535（uint16）和strtoi（）将始终返回65535。）

当前仅针对int16进行编码，并且需要一些额外的代码才能与其他类型（例如int32，int64）一起使用

任何人都可以使用带符号类型的解决方案吗？

Anyone have a solution that would work with signed types?

推荐答案

您可以从此转换功能开始，用更快的 strsplit 并在结果上使用 readBin

You can start with this conversion function, substitute a faster strsplit and use readBin on the result:

byteArray <- "\\xffff00000100020003000400050006000700080009000a00" ## Split a long string into a a vector of character pairs Rcpp::cppFunction( code = ' CharacterVector strsplit2(const std::string& hex) { unsigned int length = hex.length()/2; CharacterVector res(length); for (unsigned int i = 0; i < length; ++i) { res(i) = hex.substr(2*i, 2); } return res; }') ## A function to convert one string to an array of raw f <- function(x) { ## Split a long string into a a vector of character pairs x <- strsplit2(x) ## Remove the first element, "\\x" x <- x[-1] ## Complete the conversion as.raw(as.hexmode(x)) } raw <- f(byteArray) # int16 readBin(con = raw, what = "integer", n = length(raw) / 2, size = 2, signed = TRUE, endian = "little") # -1 0 1 2 3 4 5 6 7 8 9 10 # uint16 readBin(con = raw, what = "integer", n = length(raw) / 2, size = 2, signed = FALSE, endian = "little") # 65535 0 1 2 3 4 5 6 7 8 9 10 # int32 readBin(con = raw, what = "integer", n = length(raw) / 4, size = 4, signed = TRUE, endian = "little") # 65535 131073 262147 393221 524295 655369

这不适用于 uint32 和（u）int64 ，因为R使用 int32 内部。但是，R也可以使用数字来存储2 ^ 52以下的整数。这样我们就可以使用：

This won't work for uint32 and (u)int64, though, since R uses int32 internally. However, R can also use numerics to store integers below 2^52. So we can use this:

# uint32 byteArray <- "\\xffffffff0100020003000400050006000700080009000a00" int32 <- readBin(con = f(byteArray), what = "integer", n = length(raw) / 4, size = 4, signed = TRUE, endian = "little") ifelse(int32 < 0, int32 + 2^32, int32) # 4294967295 131073 262147 393221 524295 655369

对于 gzip 压缩数据：

# gzip byteArray <- "\\x1f8b080000000000000005c1870100200800209a56faffbd41d30dd3b285e37a52f9d033018818000000" con <- gzcon(rawConnection(f(byteArray))) readBin(con = con, what = "integer", n = length(raw) / 2, size = 2, signed = TRUE, endian = "little") close(con = con)