如何比较两个巨大的List< String>在Java中?

编程入门 行业动态 更新时间:2024-10-21 16:22:57
本文介绍了如何比较两个巨大的List< String>在Java中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我的应用程序生成2个大列表(最多3.5mill个字符串记录).我需要最好,最快的方法进行比较.目前,我正在这样做:

My application generates 2 big lists (up to 3.5mill string records). I need the best and fastest way to compare it. Currently I am doing it like this:

List list1 = ListUtils.subtract(sourceDbResults, hiveResults); List list2 = ListUtils.subtract(hiveResults, sourceDbResults);

但是,正如我从jconsole看到的那样,此方法在内存上确实非常昂贵,有时甚至可以在其上处理堆栈.有什么好的解决方案或想法吗?

But this method is really expensive on memory as i see from jconsole and sometimes process even stack on it. Any good solutions or ideas?

列表中的元素位置/顺序总是相同的,因此我不需要处理它.比较之后,我需要知道列表是否相同,如果这些列表不相同,则需要从这些列表中获取差异.减法非常适合小清单.

Element positions/order in the list are always the same, so I dont need to deal with it. After comparing I need to know if the list are the same and to get the differences from these list if they are not the same. Subtract works perfect for small lists.

推荐答案

鉴于您已经说过您的两个列表已经排序,可以在O(N)时间进行比较,这比当前解决方案要快得多.使用ListUtils.下面的方法使用一种与合并大多数教科书中可以找到的两个排序列表的算法相似的算法来实现此目的.

Given that you've said your two lists are already sorted, they can be compared in O(N) time, which is much faster than your current solution that uses ListUtils. The following method does this using a similar algorithm to the one that merges two sorted lists that can be found in most textbooks.

import java.util.*; public class CompareSortedLists { public static void main(String[] args) { List<Integer> sourceDbResults = Arrays.asList(1, 2, 3, 4, 5, 8); List<Integer> hiveResults = Arrays.asList(2, 3, 6, 7); List<Integer> inSourceDb_notInHive = new ArrayList<>(); List<Integer> inHive_notInSourceDb = new ArrayList<>(); compareSortedLists( sourceDbResults, hiveResults, inSourceDb_notInHive, inHive_notInSourceDb); assert inSourceDb_notInHive.equals(Arrays.asList(1, 4, 5, 8)); assert inHive_notInSourceDb.equals(Arrays.asList(6, 7)); } /** * Compares two sorted lists (or other iterable collections in ascending order). * Adds to onlyInList1 any and all elements in list1 that are not in list2; and * conversely to onlyInList2. The caller must ensure the two input lists are * already sorted and should initialize onlyInList1 and onlyInList2 to empty, * writable collections. */ public static <T extends Comparable<? super T>> void compareSortedLists( Iterable<T> list1, Iterable<T> list2, Collection<T> onlyInList1, Collection<T> onlyInList2) { Iterator<T> it1 = list1.iterator(); Iterator<T> it2 = list2.iterator(); T e1 = it1.hasNext() ? it1.next() : null; T e2 = it2.hasNext() ? it2.next() : null; while (e1 != null || e2 != null) { if (e2 == null) { // No more elements in list2, some remaining in list1 onlyInList1.add(e1); e1 = it1.hasNext() ? it1.next() : null; } else if (e1 == null) { // No more elements in list1, some remaining in list2 onlyInList2.add(e2); e2 = it2.hasNext() ? it2.next() : null; } else { int comp = e1pareTo(e2); if (comp < 0) { onlyInList1.add(e1); e1 = it1.hasNext() ? it1.next() : null; } else if (comp > 0) { onlyInList2.add(e2); e2 = it2.hasNext() ? it2.next() : null; } else /* comp == 0 */ { e1 = it1.hasNext() ? it1.next() : null; e2 = it2.hasNext() ? it2.next() : null; } } } } }

以上方法不使用任何外部库,并且可以与6以上版本的任何Java版本一起使用.如果您使用PeekingIterator(例如Apache Commons Collections或Guava的PeekingIterator)或编写自己的方法,则可以使该方法更简单,尤其是在您还使用Java 8的情况下:

The above method uses no external libraries, and can be used with any version of Java from 6 upwards. If you use a PeekingIterator, such as the one from Apache Commons Collections, or Guava, or write your own, then you can make the method simpler, especially if you also use Java 8:

public static <T extends Comparable<? super T>> void compareSortedLists( Iterable<T> list1, Iterable<T> list2, Collection<T> onlyInList1, Collection<T> onlyInList2) { PeekingIterator<T> it1 = new PeekingIterator<>(list1.iterator()); PeekingIterator<T> it2 = new PeekingIterator<>(list2.iterator()); while (it1.hasNext() && it2.hasNext()) { int comp = it1.peek()pareTo(it2.peek()); if (comp < 0) onlyInList1.add(it1.next()); else if (comp > 0) onlyInList2.add(it2.next()); else /* comp == 0 */ { it1.next(); it2.next(); } } it1.forEachRemaining(onlyInList1::add); it2.forEachRemaining(onlyInList2::add); }

更多推荐

如何比较两个巨大的List&lt; String&gt;在Java中?

本文发布于:2023-11-07 06:15:09,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1565756.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:两个   amp   List   lt   gt

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!