我打算通过定义一个基本投影然后定义多个CTE来重用代码。我的问题是围绕性能和CTE在运行时进行评估的方式。
这是否会导致任何性能问题。更具体地说,是否每次都会评估基础投影。
例如:
WITH CTE_PERSON as( SELECT * FROM PersonTable ), CTE_PERSON_WITH_AGE as( SELECT * FROM CTE_PERSON年龄> 24 ), CTE_PERSON_WITH_AGE_AND_GENDER as( SELECT * FROM CTE_PERSON_WITH_AGE WHERE gender ='m'), CTE_PERSON_WITH_NAME as( SELECT * FROM CTE_PERSON WHERE name ='abc')
- 每次来自PersonTable的所有条目都会被加载内存,然后过滤器将应用于(或) $之后b $ b
- 只有筛选器的结果集将被加载到内存中。
单次扫描。
注意: - 单个阶段 - 单个 TableScan $ c $ (((i = 1)和(j = 2))和(k = 3))(type:boolean) >
create table t(i int,j int,k int);解释 with t1 as(select i,j,k from t where i = 1),t2 as(从t1选择i,j,k,其中j = 2),t3 as从t2中选择i,j,k,其中k = 3) select * from t3 ;
解释舞台依赖:舞台-0是根舞台 舞台计划:舞台:舞台-0 提取操作符限制: -1 处理器树: TableScan 别名:t 统计数据:数据行数:1数据大小:0基本统计数据:PARTIAL列统计数据:无筛选运算符谓词:(((i = 1)和(j = 2))和(k = 3))(类型:布尔值)统计:数据行数:1数据大小:0基本统计数据:PARTIAL Column stats:NONE 选择运算符表达式:1(类型:int),2(类型:int),3(类型:int) outputColumnNames:_col0,_col1,_col2 统计:数据行数:1数据大小:0基本统计数据:PARTIAL列统计数据:无 ListSink
My question is around performance and the way a CTE gets evaluated in runtime.
I am planning to reuse code by defining a base projection and then defining multiple CTE's on top of this base projection with different filters.
Does that cause any performance issues.More specifically, does base projection will be evaluated every time.
For example:
WITH CTE_PERSON as ( SELECT * FROM PersonTable ), CTE_PERSON_WITH_AGE as ( SELECT * FROM CTE_PERSON WHERE age > 24 ), CTE_PERSON_WITH_AGE_AND_GENDER as ( SELECT * FROM CTE_PERSON_WITH_AGE WHERE gender = 'm' ), CTE_PERSON_WITH_NAME as ( SELECT * FROM CTE_PERSON WHERE name = 'abc' )
- Does every time all the entries from PersonTable will get loaded into memory and then filters will be applied after (or)
- Only Result set after filters will be loaded into memory.
A single scan.
Note: - a single stage - a single TableScan - predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)
create table t (i int,j int,k int);explain with t1 as (select i,j,k from t where i=1) ,t2 as (select i,j,k from t1 where j=2) ,t3 as (select i,j,k from t2 where k=3) select * from t3 ;
Explain STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: t Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Filter Operator predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: 1 (type: int), 2 (type: int), 3 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE ListSink
更多推荐
如何评估HIVE中的CTE(通用表格表达)
发布评论