Java读取7000万行文本的大型文本文件

编程入门行业动态更新时间:2024-10-25 20:21:08

本文介绍了Java读取7000万行文本的大型文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含 7000 万行文本的大型测试文件.我必须逐行阅读文件.

I have a big test file with 70 million lines of text. I have to read the file line by line.

我使用了两种不同的方法:

I used two different approaches:

InputStreamReader isr = new InputStreamReader(new FileInputStream(FilePath),"unicode");
BufferedReader br = new BufferedReader(isr);
while((cur=br.readLine()) != null);

和

LineIterator it = FileUtils.lineIterator(new File(FilePath), "unicode");
while(it.hasNext()) cur=it.nextLine();

是否有其他方法可以使此任务更快?

Is there another approach that can make this task faster?

推荐答案

1) 我确信在速度方面没有区别，都在内部使用 FileInputStream 和缓冲

1) I am sure there is no difference speedwise, both use FileInputStream internally and buffering

2) 您可以自己测量并查看

2) You can take measurements and see for yourself

3) 虽然没有性能优势，但我喜欢 1.7 方法

3) Though there's no performance benefits I like the 1.7 approach

try (BufferedReader br = Files.newBufferedReader(Paths.get("test.txt"), StandardCharsets.UTF_8)) {
    for (String line = null; (line = br.readLine()) != null;) {
        //
    }
}

4) 基于扫描仪的版本

4) Scanner based version

    try (Scanner sc = new Scanner(new File("test.txt"), "UTF-8")) {
        while (sc.hasNextLine()) {
            String line = sc.nextLine();
        }
        // note that Scanner suppresses exceptions
        if (sc.ioException() != null) {
            throw sc.ioException();
        }
    }

5) 这可能比其他的要快

5) This may be faster than the rest

try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) {
    ByteBuffer bb = ByteBuffer.allocateDirect(1000);
    for(;;) {
        StringBuilder line = new StringBuilder();
        int n = ch.read(bb);
        // add chars to line
        // ...
    }
}

它需要一些编码，但由于 ByteBuffer.allocateDirect，它确实可以更快.它允许操作系统直接从文件读取字节到ByteBuffer，无需复制

it requires a bit of coding but it can be really faster because of ByteBuffer.allocateDirect. It allows OS to read bytes from file to ByteBuffer directly, without copying

6) 并行处理肯定会提高速度.创建一个大字节缓冲区，并行运行多个任务，将文件中的字节读入该缓冲区，准备就绪后找到第一个行尾，创建一个String，找到下一个...

6) Parallel processing would definitely increase speed. Make a big byte buffer, run several tasks that read bytes from file into that buffer in parallel, when ready find first end of line, make a String, find next...

这篇关于Java读取7000万行文本的大型文本文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

更多推荐

[db:关键词]

本文发布于:2023-03-27 04:18:14，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/678908.html