Hadoop一个Map和多个Reduce

编程入门 行业动态 更新时间:2024-10-24 12:25:35
本文介绍了Hadoop一个Map和多个Reduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我们有一个庞大的数据集,可以用多个减少函数进行分析。 所有减少算法适用于同一个 map 函数生成的相同数据集。读取大型数据集的成本太高,无法每次执行,因此最好只读取一次,并将映射的数据传递给多个 reduce 函数。

我可以使用Hadoop来做到这一点吗?我搜索了示例和intarweb,但找不到任何解决方案。

解决方案

您是否期望每个reducer都能够运行完全相同的映射数据?但至少关键应该是不同的,因为它决定了哪个reducer要去。

您可以在mapper中多次输出一个输出,并以key的形式输出$ i是第i个缩减器,而$ key是您的原始键)。并且您需要添加一个分区程序以确保这些记录是基于$ i分布在还原器中的。然后使用GroupingComparator按原始$ key对记录进行分组。

有可能做到这一点,但在一个MR中并不重要。

We have a large dataset to analyze with multiple reduce functions.

All reduce algorithm work on the same dataset generated by the same map function. Reading the large dataset costs too much to do it every time, it would be better to read only once and pass the mapped data to multiple reduce functions.

Can I do this with Hadoop? I've searched the examples and the intarweb but I could not find any solutions.

解决方案

Are you expecting every reducer to work on exactly same mapped data? But at least the "key" should be different since it decides which reducer to go.

You can write an output for multiple times in mapper, and output as key (where $i is for the i-th reducer, and $key is your original key). And you need to add a "Partitioner" to make sure these n records are distributed in reducers, based on $i. Then using "GroupingComparator" to group records by original $key.

It's possible to do that, but not in trivial way in one MR.

更多推荐

Hadoop一个Map和多个Reduce

本文发布于:2023-11-24 10:30:40,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多个   Hadoop   Map   Reduce

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!