R:如何快速读取没有RAM限制的大型.dta文件

编程入门 行业动态 更新时间:2024-10-23 11:23:28
本文介绍了R:如何快速读取没有RAM限制的大型.dta文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个10 GB的.dta Stata文件,我正在尝试将其读入64位R 3.3.1.我正在使用具有约130 GB RAM(4 TB HD)的虚拟机,.dta文件大约有300万行,介于400到800个变量之间.

I have a 10 GB .dta Stata file and I am trying to read it into 64-bit R 3.3.1. I am working on a virtual machine with about 130 GB of RAM (4 TB HD) and the .dta file is about 3 million rows and somewhere between 400 and 800 variables.

我知道data.table()是读取.txt和.csv文件的最快方法,但是是否有人建议将较大的.dta文件读取到R中?将文件作为.dta文件读入Stata大约需要20-30秒,尽管我需要在打开文件之前设置最大工作内存(我将最大内存设置为100 GB).

I know data.table() is the fastest way to read in .txt and .csv files, but does anyone have a recommendation for reading largeish .dta files into R? Reading the file into Stata as a .dta file requires about 20-30 seconds, although I need to set my working memory max prior to opening the file (I set the max at 100 GB).

我还没有尝试在Stata中导入.csv,但是我希望避免与Stata接触文件.通过找到了解决方案,使用memisc将stata .dta文件导入到其中R ,但这是假设RAM不足.就我而言,我应该有足够的RAM来处理文件.

I have not tried importing to .csv in Stata, but I hope to avoid touching the file with Stata. A solution is found via Using memisc to import stata .dta file into R but this assumes RAM is scarce. In my case, I should have sufficient RAM to work with the file.

推荐答案

在R中加载大型Stata数据集的最快方法是使用readstata13包.我已经比较了大型数据集 在本文中 ,结果反复表明,readstata13是读取R中Stata数据集最快的可用软件包.

The fastest way to load a large Stata dataset in R is using the readstata13 package. I have compared the performance of foreign, readstata13, and haven packages on a large dataset in this post and the results repeatedly showed that readstata13 is the fastest available package for reading Stata dataset in R.

更多推荐

R:如何快速读取没有RAM限制的大型.dta文件

本文发布于:2023-11-11 06:18:22,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1577608.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:快速   文件   RAM   dta

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!