Java 使用stream()获取两个实体类列表的交集

编程入门行业动态更新时间:2024-10-19 12:47:44

Java 使用stream()获取<a href=https://www.elefans.com/category/jswz/34/1771443.html style= 两个实体类列表的交集"/>

Java 使用stream()获取两个实体类列表的交集

使用stream()获取两个实体类列表的交集。
使用几种不同的方式实现，并对比每种方式的效率

文章目录

数据
结论：
方法1：将列表2转为Map，使用filter + containsKey
方法2：将列表2的code转为List，使用filter + contains
方法2改进：提前将列表2的code转为List，然后在stream().filter()的时候直接使用
方法3：使用filter + anyMatch
二重循环时，大的循环尽量在里面，小的循环尽量在外面。

数据

List<Fruit> fruits = new ArrayList<>();
List<Fruit> fruit2 = new ArrayList<>();
for (int i = 0; i < 10000; i++) {fruits.add(new Fruit(i, "name" + i, "code" + i));
}
for (int i = 70; i < 170; i++) {fruit2.add(new Fruit(i * 49, "name" + i, "code" + i));
}

结论：

如果采用：以大的列表1为基础，过滤列表2中的元素这种策略，方法1是最快的。
列表大小：

列表1大小：1w 、50w、1kw
列表2大小：100

以下是三个方法，在不同列表大小的耗时，单位ms

	1w*100	50w*100	1kw* 100
方法1	5	24	217
方法2	45	588	12382
方法2改进	11	80	1204
方法3	22	240	3438

方法1：将列表2转为Map，使用filter + containsKey

先将List2的code转为Map<code,实体类>的形式，从列表1中过滤掉(!Map.containsKey)列表2中没有的，就是二者的交集。
这种方法是最快的，从原理上也很好解释，HashMap的底层是数据+链表/红黑树，查询效率上肯定比数组快。

Map<String, Fruit> frMap = fruit2.stream().collect(Collectors.toMap(Fruit::getCode, Function.identity(), (x1,x2) -> x1));
List<Fruit> res1 = fruits.stream().filter( item -> frMap.containsKey(item.getCode())).collect(Collectors.toList());

方法2：将列表2的code转为List，使用filter + contains

先将List2的code转为List<code>的形式,从列表1中过滤掉(!List.contains)列表2中没有的，就是二者的交集。
这种方法基本是最慢的，而且数据量越大越慢。

 List<Fruit> res2 = fruits.stream().filter( item ->fruit2.stream().map(Fruit::getCode).collect(Collectors.toList()).contains(item.getCode())).collect(Collectors.toList());

方法2改进：提前将列表2的code转为List，然后在stream().filter()的时候直接使用

CSDN上搜索，最多的就是方法2中的例子。但是仔细思考一下，每个元素循环过滤的时候，都要重新组织一次List<code>，不就是二重循环嘛。我们不妨把codeList提前组织好，共stream流使用。
结果可以看到，改进后，执行效率快了很多。数据量越大，这种效果越明显。

 List<String> codeList = fruit2.stream().map(Fruit::getCode).collect(Collectors.toList());List<Fruit> res2Ext = fruits.stream().filter( item -> codeList.contains(item.getCode())).collect(Collectors.toList());

方法3：使用filter + anyMatch

本来没有用过，写文章时，忽然看到了从CSDN上看到这种方法。

List<Fruit> res3 = fruits.stream().filter( item -> fruit2.stream().anyMatch(item1 -> item1.getCode().equals(item.getCode()))).collect(Collectors.toList());

二重循环时，大的循环尽量在里面，小的循环尽量在外面。

如果业务中，没有特殊的要求，尽量把大的循环放里面。
我们交换上述方法的list1，list2的次序。以小的列表2为基础，过滤列表1中的元素这种策略再次执行上述方法。
可以看到方法3 anyMatch是效率最快的，而且数据量越大，优势越明显。
anyMatch是存在一个元素满足条件就返回true，并且停止遍历，而本次的大结果集中的数，在小结果集中一定存在，而且是按序的，所以本次试验中，根本就没有遍历太多次，不禁怀疑自己这么测试到底有没有用…实际生产中，这几个方法的效率未知。表格仅供参考。