我有类似这样的问题: 在使用SQLDF或读取时选择第N个列. csv.sql
I have a similiar problem like this question: selecting every Nth column in using SQLDF or read.csv.sql
我想读取大文件的某些列(150行表,> 500,000列,以空格分隔,填充有数字数据并且只有32位系统可用).该文件没有标题,因此上面线程中的代码不起作用,我决定写一篇新文章.
I want to read some columns of large files (table of 150rows, >500,000 columns, space separated, filled with numeric data and only a 32 bit system available). This file has no header, therefore the code in the thread above didn't work and I decided to write a new post.
您有解决此问题的想法吗?
Do you have an idea to solve this problem?
我考虑过类似的事情,但是任何使用fread或read.table的结果都可以:
I thought about something like that, but any results with fread or read.table are also ok:
MyConnection <- file("path/file.txt") df<-sqldf("select column 1 100 1000 235612 from MyConnection",file.format = list(header=F,sep=" "))推荐答案
如果固定宽度,您可以使用substr指定要读取的列的开始和结束位置:
You can use substr to specify the start and end position of the columns you want to read in if they are fixed width:
x <- tempfile() cat("12345", "67890", "09876", "54321", sep = "\n", file = x) myfile <- file(x) sqldf("select substr(V1, 1, 1) var1, substr(V1, 3, 5) var2 from myfile") # var1 var2 # 1 1 345 # 2 6 890 # 3 9 76 # 4 5 321有关其他示例,请参见此博客文章.如果您知道有关列起始位置和宽度的详细信息,则可以使用paste轻松构造"select"语句.
See this blog post for some more examples. The "select" statement can easily be constructed with paste if you know the details about the column starting positions and widths.
更多推荐
只读文本文件的第n列,该文本文件不包含带有R和sqldf的标题
发布评论