我在具有400GB RAM的64位Ubuntu环境中运行64位R 3.1,在处理大型矩阵时遇到了一个奇怪的限制.
I am running 64 bit R 3.1 in a 64bit Ubuntu environment with 400GB of RAM, and I am encountering a strange limitation when dealing with large matrices.
我有一个称为A的数字矩阵,即4000行乘950,000列.当我尝试访问其中的任何元素时,都会出现以下错误:
I have a numeric matrix called A, that is 4000 rows by 950,000 columns. When I try to access any element in it, I receive the following error:
Error: long vectors not supported yet: subset.c:733尽管我的矩阵是通过scan读取的,但是您可以使用以下代码进行复制
Although my matrix was read in via scan, you can replicate with the following code
test <- matrix(1,4000,900000) #no error test[1,1] #error我的Google搜索显示这是R 3.0之前的常见错误消息,其中大小为2 ^ 31-1的向量为限制.但是,鉴于我的环境,情况并非如此.
My Googling reveals this was a common error message prior to R 3.0, where a vector of size 2^31-1 was the limit. However, this is not the case, given my environment.
我不应该将本机矩阵类型用于这种矩阵吗?
Should I not be using the native matrix type for this kind of matrix?
推荐答案矩阵只是一个具有标注属性的原子向量,它允许R将其作为矩阵进行访问.您的矩阵是一个长度为4000*9000000的向量,该向量为3.6e+10个元素(最大整数约为2.147e+9).原子向量支持对长向量 进行子集设置(即访问超出2.147e+9限制的元素).只需将矩阵视为长向量即可.
A matrix is just an atomic vector with a dimension attribute which allows R to access it as a matrix. Your matrix is a vector of length 4000*9000000 which is 3.6e+10 elements (the largest integer value is approx 2.147e+9). Subsetting a long vector is supported for atomic vectors (i.e. accessing elements beyond the 2.147e+9 limit). Just treat your matrix as a long vector.
如果我们记得默认情况下R会按列填充矩阵,那么如果要检索说test[ 2701 , 850000 ]处的值,我们可以通过以下方式访问它:
If we remember that by default R fills matrices column-wise then if we wanted to retrieve say the value at test[ 2701 , 850000 ] we could access it via:
i <- ( 2701 - 1 ) * 850000 + 2701 test[i] #[1] 1请注意,这确实是长向量子集,因为:
Note that this really is long vector subsetting because:
2701L * 850000L #[1] NA #Warning message: #In 2701L * 850000L : NAs produced by integer overflow更多推荐
R中的大型矩阵:尚不支持长向量
发布评论