在java中处理大文件(Processing huge files in java)

编程入门 行业动态 更新时间:2024-10-28 04:26:57
在java中处理大文件(Processing huge files in java)

我有一个大约10 GB的大文件。 我必须对Java中的文件进行排序,过滤等操作。 每个操作可以并行完成。

并行启动10个线程并读取文件是否好? 每个线程读取1 GB的文件。 有没有其他选择来解决超大文件的问题并尽可能快地处理它们? NIO是否适合这种情况?

目前,我正在以串行方式执行操作,处理这些文件需要大约20分钟。

谢谢,

I have a huge file of around 10 GB. I have to do operations such as sort, filter, etc on the files in Java. Each operation can be done in parallel.

Is it good to start 10 threads and read the file in parallel ? Each thread reads 1 GB of the file. Is there any other option to solve the issue with extra large files and processing them as fast as possible? Is NIO good for such scenarios?

Currently, I am performing operations in serial and it takes around 20 mins to process such files.

Thanks,

最满意答案

并行启动10个线程并读取文件是否好?

几乎肯定不是 - 虽然这取决于。 如果它来自SSD(实际上没有寻道时间),那么也许 。 如果这是一个传统的磁盘,绝对不是。

这并不意味着你不能使用多线程 - 你可能会创建一个线程来读取文件,只执行最基本的任务来将数据转换为可处理的块。 然后使用生产者/消费者队列让多个线程处理数据。

除了“排序,过滤等”(这非常模糊)之外,我们无法真正了解该过程是如何并行化的 - 但是试图在单个文件上并行执行IO 可能无济于事。

Is it good to start 10 threads and read the file in parallel ?

Almost certainly not - although it depends. If it's from an SSD (where there's effectively no seek time) then maybe. If it's a traditional disk, definitely not.

That doesn't mean you can't use multiple threads though - you could potentially create one thread to read the file, performing only the most rudimentary tasks to get the data into processable chunks. Then use a producer/consumer queue to let multiple threads process the data.

Without knowing more than "sort, filter, etc" (which is pretty vague) we can't really tell how parallelizable the process is in the first place - but trying to perform the IO in parallel on a single file will probably not help.

更多推荐

本文发布于:2023-07-26 19:21:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1279855.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:大文件   java   Processing   files   huge

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!