如何读取R中的单行.txt数据集?(How to read one

编程入门 行业动态 更新时间:2024-10-16 16:28:24
如何读取R中的单行.txt数据集?(How to read one-row .txt dataset in R?)

我的.txt数据集如下所示:

perms ['AC', 'AT', 'AG', 'AN', 'CA', 'CT', 'CG', 'CN', 'TA', 'TC', 'TG', 'TN', 'GA', 'GC', 'GT', 'GN', 'NA', 'NC', 'NT', 'NG', 'AA', 'CC', 'TT', 'GG', 'NN'] link [11413851, 16930583, 16197703, 1085, 16533859, 16218116, 2309941, 572, 14414084, 13609414, 16552907, 1015, 13594224, 10038778, 11427660, 480, 1055, 445, 1061, 591, 15557040, 9822185, 15583349, 9815249, 11653456]

这个数据集中有两个变量:'perms'和'link'。 如何在R中读取此数据集? 我不能使用蛮力,因为我的样本的确切大小太大(其中一些有n> 100 000)。 但结构完全相同。 先谢谢你!

My .txt dataset looks like the following:

perms ['AC', 'AT', 'AG', 'AN', 'CA', 'CT', 'CG', 'CN', 'TA', 'TC', 'TG', 'TN', 'GA', 'GC', 'GT', 'GN', 'NA', 'NC', 'NT', 'NG', 'AA', 'CC', 'TT', 'GG', 'NN'] link [11413851, 16930583, 16197703, 1085, 16533859, 16218116, 2309941, 572, 14414084, 13609414, 16552907, 1015, 13594224, 10038778, 11427660, 480, 1055, 445, 1061, 591, 15557040, 9822185, 15583349, 9815249, 11653456]

There's two variables in this dataset: 'perms' and 'link'. How can I read this dataset in R? I cannot use brute-force, because of the exact size of my sample is just too huge (some of them have n>100 000). But the structure is totally the same. Thank you in advance!

最满意答案

我们用readLines读取数据集,按空格分隔后跟[或]后跟零或更多空格。 创建逻辑索引('ind'),分割数据的子集,循环, scan以获取单个元素,并转换为'data.frame'。

lines <- readLines("file.txt") lines1 <- strsplit(lines, "\\s*\\[|\\]\\s*")[[1]] ind <- c(TRUE, FALSE) data.frame(setNames(lapply(lines1[!ind], function(x) trimws(scan(text=x, what = "", sep=",", quiet=TRUE))), lines1[ind])) # perms link #1 AC 11413851 #2 AT 16930583 #3 AG 16197703 #4 AN 1085 #5 CA 16533859 #6 CT 16218116 #7 CG 2309941 #8 CN 572 #9 TA 14414084 #10 TC 13609414 #11 TG 16552907 #12 TN 1015 #13 GA 13594224 #14 GC 10038778 #15 GT 11427660 #16 GN 480 #17 NA 1055 #18 NC 445 #19 NT 1061 #20 NG 591 #21 AA 15557040 #22 CC 9822185 #23 TT 15583349 #24 GG 9815249 #25 NN 11653456

We read the dataset with readLines, split by space followed by [ or ] followed by zero or more space. Create a logical index ('ind'), subset the split data, loop though it, scan to get the individual elements, and convert to 'data.frame'.

lines <- readLines("file.txt") lines1 <- strsplit(lines, "\\s*\\[|\\]\\s*")[[1]] ind <- c(TRUE, FALSE) data.frame(setNames(lapply(lines1[!ind], function(x) trimws(scan(text=x, what = "", sep=",", quiet=TRUE))), lines1[ind])) # perms link #1 AC 11413851 #2 AT 16930583 #3 AG 16197703 #4 AN 1085 #5 CA 16533859 #6 CT 16218116 #7 CG 2309941 #8 CN 572 #9 TA 14414084 #10 TC 13609414 #11 TG 16552907 #12 TN 1015 #13 GA 13594224 #14 GC 10038778 #15 GT 11427660 #16 GN 480 #17 NA 1055 #18 NC 445 #19 NT 1061 #20 NG 591 #21 AA 15557040 #22 CC 9822185 #23 TT 15583349 #24 GG 9815249 #25 NN 11653456

更多推荐

本文发布于:2023-07-07 21:31:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1068464.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数据   txt   read

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!