使用K方式合并合并N个排序文件

编程入门 行业动态 更新时间:2024-10-20 07:54:51
本文介绍了使用K方式合并合并N个排序文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

关于合并排序的文件或说合并K排序的文件,有不错的文献.它们都基于以下理论:将每个文件的第一个元素放入堆中,然后直到堆为空,轮询该元素,然后从文件中获取该元素的位置再获取另一个.只要每个文件的一条记录可以放入堆中,此方法就起作用.

There is decent literature about merging sorted files or say merging K sorted files. They all work on the theory that first element of each file is put in a Heap, then until the heap is empty poll that element, get another from the file from where this element was taken. This works as long as one record of each file can be put in a heap.

现在让我们说我有N个排序文件,但是我只能将K个记录放入堆中,而K< N,我们说N = Kc,其中"c"是乘数,表示N很大,以至于是c的倍数.显然,这将需要一遍又一遍地进行K方式合并,直到我们只剩下K个文件,然后我们将它们作为最后一次合并到最终排序中.我该如何实现它?它的复杂性是什么?

Now let us say I have N sorted files but I can only bring K records in the heap and K < N and let us say N = Kc where "c" is the multiplier implying that N is so large that it is some multiple of c. Clearly, it will require doing K way merge over and over until we only are left with K files and then we merge them as one last time into the final sort. How do I implement this and what will be the complexity of this?

推荐答案

有许多用Java编写的k路合并示例.一个是 www.sanfoundry/java-program -k-way-merge-algorithm/.

There are multiple examples of k-way merge written in Java. One is www.sanfoundry/java-program-k-way-merge-algorithm/.

要实现合并,您只需要编写一个简单的包装程序,该包装程序就会连续扫描您的目录,馈送东西文件,直到只剩下一个.基本思想是:

To implement your merge, you just have to write a simple wrapper that continually scans your directory, feeding the thing files until there's only one left. The basic idea is:

while number of files > 1 fileList = Load all file names i = 0 while i < fileList.length filesToMerge = copy files i through i+k-1 from file list merge(filesToMerge, output file name) i += k end while end while

复杂度分析

如果我们假设每个文件包含相同数量的项目,这将更容易考虑.

Complexity analysis

This is easier to think about if we assume that each file contains the same number of items.

您必须合并M个文件,每个文件包含n个项目,但是一次只能合并k个文件.因此,您必须执行log k (M)遍.也就是说,如果您有1,024个文件,并且一次只能合并16个文件,那么您将进行一次通过,一次合并16个文件,总共创建了64个文件.然后,您将进行另一遍,一次合并16个文件,创建四个文件,最后一遍将合并这四个文件以创建输出.

You have to merge M files, each of which contains n items, but you can only merge k files at a time. So you have to do logk(M) passes. That is, if you have 1,024 files and you can only merge 16 at a time, then you'll make one pass that merges 16 files at a time, creating a total of 64 files. Then you'll make another pass that merges 16 files at a time, creating four files, and your final pass will merge those four files to create the output.

如果您有k个文件,每个文件包含n个项目,则合并它们的复杂度为O(n * k log 2 k).

If you have k files, each of which contains n items, then complexity of merging them is O(n*k log2 k).

因此,在第一遍中,您执行M/k合并,每个合并的复杂度为O(nk log k).就是O((M/k)* n * k * log 2 k),或者O(Mn log k).

So in the first pass you do M/k merges, each of which has complexity O(nk log k). That's O((M/k) * n * k * log2 k), or O(Mn log k).

现在,您的每个文件都包含n k k个项目,并且您对每个k个文件进行了M/k/k合并.因此,第二遍复杂度为O((M/k 2 )n * k 2 * log 2 k).简化后,也可以得出O(Mn log k).

Now, each of your files contains nkk items, and you do M/k/k merges of k files each. So the second pass complexity is O((M/k2) n * k2 * log2 k). Simplified, that, too works out to O(Mn log k).

在第二遍中,您将进行k个合并,每个合并的复杂度为O(nk). 请注意,在每遍操作中,您都在处理M * n个项目.因此,您每次通过的次数都是O(Mn log k).并且您正在执行log k (M)通过.因此,总复杂度为:O(log k (M)*(Mn log k))或

In the second pass, you do k merges, each of which has complexity O(nk). Note that in every pass you're working with M*n items. So each pass you do is O(Mn log k). And you're doing logk(M) passes. So the total complexity is: O(logk(M) * (Mn log k)), or

O((Mn log k) log M)

每个文件包含相同数量的项目的假设不会影响渐近分析,因为,正如我所展示的,每次通过都操纵相同数量的项目:M * n.

The assumption that every file contains the same number of items doesn't affect the asymptotic analysis because, as I've shown, every pass manipulates the same number of items: M*n.

更多推荐

使用K方式合并合并N个排序文件

本文发布于:2023-11-29 16:28:53,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1646945.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:方式   文件

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!