我在具有 400GB RAM 的 64 位 Ubuntu 环境中运行 64 位 R 3.1,但在处理大型矩阵时遇到了奇怪的限制.
I am running 64 bit R 3.1 in a 64bit Ubuntu environment with 400GB of RAM, and I am encountering a strange limitation when dealing with large matrices.
我有一个名为 A 的数字矩阵,它有 4000 行 x 950,000 列.当我尝试访问其中的任何元素时,收到以下错误:
I have a numeric matrix called A, that is 4000 rows by 950,000 columns. When I try to access any element in it, I receive the following error:
Error: long vectors not supported yet: subset.c:733虽然我的矩阵是通过scan读入的,但你可以用下面的代码复制
Although my matrix was read in via scan, you can replicate with the following code
test <- matrix(1,4000,900000) #no error test[1,1] #error我的谷歌搜索显示这是 R 3.0 之前的常见错误消息,其中大小为 2^31-1 的向量是限制.但是,鉴于我的环境,情况并非如此.
My Googling reveals this was a common error message prior to R 3.0, where a vector of size 2^31-1 was the limit. However, this is not the case, given my environment.
对于这种矩阵,我不应该使用原生矩阵类型吗?
Should I not be using the native matrix type for this kind of matrix?
推荐答案矩阵只是一个具有维度属性的原子向量,它允许 R 作为矩阵访问它.你的矩阵是一个长度为 4000*9000000 的向量,它是 3.6e+10 个元素(最大的整数值大约是 2.147e+9).原子向量支持对长向量进行子集(即访问超出 2.147e+9 限制的元素).只需将您的矩阵视为一个长向量.
A matrix is just an atomic vector with a dimension attribute which allows R to access it as a matrix. Your matrix is a vector of length 4000*9000000 which is 3.6e+10 elements (the largest integer value is approx 2.147e+9). Subsetting a long vector is supported for atomic vectors (i.e. accessing elements beyond the 2.147e+9 limit). Just treat your matrix as a long vector.
如果我们记得默认情况下 R 会按列填充矩阵,那么如果我们想检索 test[ 2701 , 850000 ] 处的值,我们可以通过以下方式访问它:
If we remember that by default R fills matrices column-wise then if we wanted to retrieve say the value at test[ 2701 , 850000 ] we could access it via:
i <- ( 2701 - 1 ) * 850000 + 2701 test[i] #[1] 1请注意,这确实是长向量子集,因为:
Note that this really is long vector subsetting because:
2701L * 850000L #[1] NA #Warning message: #In 2701L * 850000L : NAs produced by integer overflow更多推荐
R 中的大矩阵:尚不支持长向量
发布评论