hadoop MapReduce:从mapper的输出中找到最大键值对(hadoop MapReduce: find max key value pair from output of mapper)

编程入门 行业动态 更新时间:2024-10-20 20:40:39
hadoop MapReduce:从mapper的输出中找到最大键值对(hadoop MapReduce: find max key value pair from output of mapper)

这听起来像一个简单的工作,但使用MapReduce似乎并不那么简单。

我有N个文件,每个文件只有一行文本。 我希望Mapper输出像<filename,score>这样的键值对,其中'score'是从文本行计算的整数。 作为旁注我使用下面的代码片段(希望它是正确的)。

FileSplit fileSplit = (FileSplit)reporter.getInputSplit(); String fileName = fileSplit.getPath().getName();

假设映射器正确地完成其工作,它应该输出N个键值对。 现在问题是如何编程Reducer以输出具有最大'得分'的一个键值对

据我所知,Reducer仅适用于共享相同键的键值对。 由于这个场景中的输出都有不同的键,我猜测应该在Reduce步骤之前完成一些事情。 或者也许应该完全省略Reduce步骤?

It sounds like a simple job, but with MapReduce it doesn't seem that straight-forward.

I have N files in which there is only one line of text for each file. I'd like the Mapper to output key value pairs like < filename, score >, in which 'score' is an integer calculated from the line of text. As a sidenote I am using the below snippet to do so (hope it's correct).

FileSplit fileSplit = (FileSplit)reporter.getInputSplit(); String fileName = fileSplit.getPath().getName();

Assuming the mapper does its job correctly, it should output N key value pairs. Now the problem is how should I program the Reducer to output the one key value pair with the maximum 'score'?

From what I know Reducer only works with key value pairs that share the same key. Since the output in this scenario all have different keys, I am guessing something should be done before the Reduce step. Or perhaps should the Reduce step be omitted altogether?

最满意答案

您可以使用旧API中的setup()和cleanup()方法(configure()和close()方法)。 在reduce类中声明一个全局变量,它确定最大分数。 对于每次调用reduce,您都会将输入值(score)与全局变量进行比较。

在同一reduce任务中的所有reduce调用之前调用Setup()一次。 在同一个reduce任务中的最后一次reduce调用之后调用Cleanup()。 因此,如果您有多个reducer,则会在每个reduce任务上单独调用Setup()和cleanup()方法。

You can use the setup() and cleanup() methods (configure() and close() methods in old API). Declare a global variable in reduce class, which determines the maximum score. For each call to reduce, you would compare the input value (score) with the global variable.

Setup() is called once before all reduce invocations in the same reduce task. Cleanup() is called after last reduce invocation in the same reduce task. So, if you have multiple reducers, Setup() and cleanup() methods would be called separately on each reduce task.

更多推荐

本文发布于:2023-07-04 17:19:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1027003.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:键值   中找到   MapReduce   mapper   hadoop

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!