Hive集群通过vs排序通过vs排序

编程入门行业动态更新时间:2024-10-10 19:20:31

本文介绍了Hive集群通过vs排序通过vs排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

据我了解，

按照简化顺序排序

按全局顺序排列事物，但将所有内容都集中到一个reducer中。

并按照

进行排序。所以我的问题是通过保证全局顺序来确保集群吗？通过将相同的密钥分配到相同的缩减器中，但相邻的密钥又如何？

我能找到的唯一文件是这里，从这个例子看来，它似乎在全球订购它们。但从定义上来说，我觉得它并不总是这样。 简短回答：是， CLUSTER BY 保证全局排序，假设您愿意自己加入多个输出文件。

更长的版本：

ORDER BY x ：保证全局排序，但是通过只将一个数据减速器。对于大型数据集来说，这基本上是不可接受的。您最终得到一个排序文件作为输出。
SORT BY x ：在N个缩减器中的每一个处订购数据，但每个缩减器都可以接收重叠的数据范围。您最终会得到N个或多个重叠范围的排序文件。
范围 x ，但不排序每个缩减器的输出。最终得到N个或未排序的文件，其中包含非重叠范围。
CLUSTER BY x 重叠范围，然后在减速器中按这些范围进行排序。这为您提供全局排序，与执行（ DISTRIBUTE BY x 和 SORT BY x ）相同。您最终会得到N个或更多的非重叠范围的排序文件。

有意义吗？因此 CLUSTER BY 基本上是 ORDER BY 的可扩展版本。

As far as I understand;

sort by only sorts with in the reducer

order by orders things globally but shoves everything into one reducers

cluster by intelligently distributes stuff into reducers by the key hash and make a sort by

So my question is does cluster by guarantee a global order? distribute by puts the same keys into same reducers but what about the adjacent keys?

The only document I can find on this is here and from the example it seems like it orders them globally. But from the definition I feel like it doesn't always do that.
解决方案
A shorter answer: yes, CLUSTER BY guarantees global ordering, provided you're willing to join the multiple output files yourself.

The longer version:

ORDER BY x: guarantees global ordering, but does this by pushing all data through just one reducer. This is basically unacceptable for large datasets. You end up one sorted file as output.

SORT BY x: orders data at each of N reducers, but each reducer can receive overlapping ranges of data. You end up with N or more sorted files with overlapping ranges.

DISTRIBUTE BY x: ensures each of N reducers gets non-overlapping ranges of x, but doesn't sort the output of each reducer. You end up with N or unsorted files with non-overlapping ranges.

CLUSTER BY x: ensures each of N reducers gets non-overlapping ranges, then sorts by those ranges at the reducers. This gives you global ordering, and is the same as doing (DISTRIBUTE BY x and SORT BY x). You end up with N or more sorted files with non-overlapping ranges.

Make sense? So CLUSTER BY is basically the more scalable version of ORDER BY.

更多推荐

Hive集群通过vs排序通过vs排序

本文发布于:2023-11-25 10:24:38，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1629385.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

集群 Hive

上一篇：通过Akka以编程方式获取临时端口

下一篇：【Orangepi Zero2 全志H616】驱动蜂鸣器

发布评论取消回复

评论列表（有 0 条评论）

最近发表

荆门网站建设的重要性

win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法

您可以尝试添加 --skip-broken 选项来解决该问题您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案

关于无线网络波动大的解决办法

Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法

VS 2019 点击页面自动定位到解决方案资源管理器目录位置

（亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法

Typora官网下载的最新版本mac10.13以下版本用不了的解决办法

成功解决ModuleNotFoundError: No module named ‘torch._C‘

MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍！

热门文章

从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止

币安API错误代码1102，未发送强制参数“时间戳”

如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题

在 Node.js 中从网络流创建 blob

使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？

使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流

如何从nodejs连接laravel>laravel

使用nodejs观看目录

如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？

FirebaseError：无法从.env加载环境变量

标签列表

文件

如何在

Python

系统

java

方法

数据

错误

windows

函数

android

linux

教程

如何使用

代码

字符串

计算机

电脑

服务器

NET

应用程序

数组

PHP

MySQL

SQL

对象

项目

程序

数据库

word