我想知道是否有方法或Python包可以让我使用大型数据集而无需将其写入RAM。
我也在使用pandas进行统计功能。
我需要访问整个数据集,因为许多统计函数需要整个数据集才能返回可靠的结果。
我在使用Windows 10的LiClipse上使用PyDev(带有解释器Python 3.4)。
I'd like to know if there's a method or a Python Package that can make me use a large dataset without writing it in RAM.
I'm also using pandas for statistical function.
I need to have access on the entire dataset because many statistical functions needs the entire dataset to return credible results.
I'm using PyDev (with interpreter Python 3.4) on LiClipse with Windows 10.
最满意答案
您也可以使用Sframes , Dask进行大型数据集支持,或者使用pandas和read / iterate in chunk,以最大限度地减少RAM使用。 另外值得一看火焰库
读入块:
chunksize = 10 ** 6 for chunk in pd.read_csv(filename, chunksize=chunksize): process(chunk)You could alternatively use Sframes, Dask for large dataset support or alternatively use pandas and read/iterate in chunks in order to minimise RAM usage. Also worth having a look at the blaze library
Read in chunks:
chunksize = 10 ** 6 for chunk in pd.read_csv(filename, chunksize=chunksize): process(chunk)更多推荐
发布评论