如何评估HIVE中的CTE(通用表格表达)

编程入门 行业动态 更新时间:2024-10-28 01:27:04
本文介绍了如何评估HIVE中的CTE(通用表格表达)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我打算通过定义一个基本投影然后定义多个CTE来重用代码。我的问题是围绕性能和CTE在运行时进行评估的方式。

这是否会导致任何性能问题。更具体地说,是否每次都会评估基础投影。

例如:

WITH CTE_PERSON as( SELECT * FROM PersonTable ), CTE_PERSON_WITH_AGE as( SELECT * FROM CTE_PERSON年龄> 24 ), CTE_PERSON_WITH_AGE_AND_GENDER as( SELECT * FROM CTE_PERSON_WITH_AGE WHERE gender ='m'), CTE_PERSON_WITH_NAME as( SELECT * FROM CTE_PERSON WHERE name ='abc')

  • 每次来自PersonTable的所有条目都会被加载内存,然后过滤器将应用于(或)
  • $之后b $ b
  • 只有筛选器的结果集将被加载到内存中。

解决方案

单次扫描。

注意: - 单个阶段 - 单个 TableScan

create table t(i int,j int,k int);

解释 with t1 as(select i,j,k from t where i = 1),t2 as(从t1选择i,j,k,其中j = 2),t3 as从t2中选择i,j,k,其中k = 3) select * from t3 ;

解释舞台依赖:舞台-0是根舞台 舞台计划:舞台:舞台-0 提取操作符限制: -1 处理器树: TableScan 别名:t 统计数据:数据行数:1数据大小:0基本统计数据:PARTIAL列统计数据:无筛选运算符谓词:(((i = 1)和(j = 2))和(k = 3))(类型:布尔值)统计:数据行数:1数据大小:0基本统计数据:PARTIAL Column stats:NONE 选择运算符表达式:1(类型:int),2(类型:int),3(类型:int) outputColumnNames:_col0,_col1,_col2 统计:数据行数:1数据大小:0基本统计数据:PARTIAL列统计数据:无 ListSink

My question is around performance and the way a CTE gets evaluated in runtime.

I am planning to reuse code by defining a base projection and then defining multiple CTE's on top of this base projection with different filters.

Does that cause any performance issues.More specifically, does base projection will be evaluated every time.

For example:

WITH CTE_PERSON as ( SELECT * FROM PersonTable ), CTE_PERSON_WITH_AGE as ( SELECT * FROM CTE_PERSON WHERE age > 24 ), CTE_PERSON_WITH_AGE_AND_GENDER as ( SELECT * FROM CTE_PERSON_WITH_AGE WHERE gender = 'm' ), CTE_PERSON_WITH_NAME as ( SELECT * FROM CTE_PERSON WHERE name = 'abc' )

  • Does every time all the entries from PersonTable will get loaded into memory and then filters will be applied after (or)
  • Only Result set after filters will be loaded into memory.

解决方案

A single scan.

Note: - a single stage - a single TableScan - predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)

create table t (i int,j int,k int);

explain with t1 as (select i,j,k from t where i=1) ,t2 as (select i,j,k from t1 where j=2) ,t3 as (select i,j,k from t2 where k=3) select * from t3 ;

Explain STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: t Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Filter Operator predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: 1 (type: int), 2 (type: int), 3 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE ListSink

更多推荐

如何评估HIVE中的CTE(通用表格表达)

本文发布于:2023-10-26 07:27:35,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:表格   HIVE   CTE

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!