Hive如何选择一份工作的减员人数?

编程入门 行业动态 更新时间:2024-10-26 17:19:22
本文介绍了Hive如何选择一份工作的减员人数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

有几个地方表示Hadoop作业中默认的reducer数是1.您可以使用mapred.reduce.tasks手动设置reducer数。

Several places say the default # of reducers in a Hadoop job is 1. You can use the mapred.reduce.tasks symbol to manually set the number of reducers.

当我运行Hive作业(在Amazon EMR上,AMI 2.3.3)时,它的一些减速器数量大于1。看看作业设置,有些东西已经设置了mapred.reduce.tasks,我认为Hive。它是如何选择这个数字的?

When I run a Hive job (on Amazon EMR, AMI 2.3.3), it has some number of reducers greater than one. Looking at job settings, something has set mapred.reduce.tasks, I presume Hive. How does it choose that number?

注意:下面是运行Hive作业时的一些消息,应该是一个线索:

Note: here are some messages while running a Hive job that should be a clue:

... Number of reduce tasks not specified. Estimated from input data size: 500 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> ...

推荐答案

默认值为1也许是为了安装vanilla Hadoop。 Hive重写它。

The default of 1 maybe for a vanilla Hadoop install. Hive overrides it.

在开放源码配置单元(和EMR可能)中

In open source hive (and EMR likely)

# reducers = (# bytes of input to mappers) / (hive.exec.reducers.bytes.per.reducer)

此帖子 a>表示默认的hive.exec.reducers.bytes.per.reducer是1G。

This post says default hive.exec.reducers.bytes.per.reducer is 1G.

您可以使用 hive.exec.reducers.max 。

如果你确切知道你想要的reducer的数量,你可以设置 mapred.reduce.tasks ,这个将覆盖所有启发式。 (默认情况下,它设置为-1,表示Hive应该使用它的启发式方法。)

If you know exactly the number of reducers you want, you can set mapred.reduce.tasks, and this will override all heuristics. (By default this is set to -1, indicating Hive should use its heuristics.)

在某些情况下 - 比如'从T'选择count(1) - Hive会无论输入数据的大小如何,将减速器的数量设置为1。这些被称为'完整聚合' - 如果查询所做的唯一事情是完全聚合 - 那么编译器知道来自映射器的数据将被减少到微不足道的数量,并且运行多个还原器没有意义。

In some cases - say 'select count(1) from T' - Hive will set the number of reducers to 1 , irrespective of the size of input data. These are called 'full aggregates' - and if the only thing that the query does is full aggregates - then the compiler knows that the data from the mappers is going to be reduced to trivial amount and there's no point running multiple reducers.

更多推荐

Hive如何选择一份工作的减员人数?

本文发布于:2023-11-17 03:33:31,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1608670.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:如何选择   人数   工作   Hive

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!