蜂房:按整数列的一部分进行分区

编程入门 行业动态 更新时间:2024-10-21 11:49:28
本文介绍了蜂房:按整数列的一部分进行分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我想创建一个外部Hive表,按记录类型和日期(年,月,日)划分.一种复杂的情况是,我在数据文件中使用的日期格式是单值整数yyyymmddhhmmss,而不是所需的日期格式yyyy-mm-dd hh:mm:ss.是否可以仅基于单个数据值指定3个新分区列?类似于下面的示例(无效)

I want to create an external Hive table, partitioned by record type and date (year, month, day). One complication is that the date format I have in my data files is a single value integer yyyymmddhhmmss instead of the required date format yyyy-mm-dd hh:mm:ss. Can I specify 3 new partition column based on just single data value? Something like the example below (which doesn't work)

create external table cdrs ( record_id int, record_detail tinyint, datetime_start int ) partitioned by (record_type int, createyear=datetime_start(0,3) int, createmonth=datetime_start(4,5) int, createday=datetime_start(6,7) int) row format delimited fields terminated by '|' lines terminated by '\n' stored as TEXTFILE location 'hdfs://nameservice1/tmp/sbx_unleashed.db' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");

推荐答案

如果您希望能够使用 MSCK REPAIR TABLE 根据目录结构为您添加分区,则应使用以下约定:

If you want to be able to use MSCK REPAIR TABLE to add the partition for you based on the directories structure you should use the following convention:

  • 目录的嵌套应与分区列的顺序匹配.
  • 目录名称应为 {分区列名称} = {value}

如果打算手动添加分区,则该结构没有意义.任何设置值都可以与任何目录耦合.例如-

If you intends to add the partitions manually then the structure has no meaning. Any set values can be coupled with any directory. e.g. -

alter table cdrs add if not exist partition (record_type='TYP123',createdate=date '2017-03-22') location 'hdfs://nameservice1/tmp/sbx_unleashed.db/2017MAR22_OF_TYPE_123';

假定目录结构-

Assuming directory structure -

.../sbx_unleashed.db/record_type=.../createyear=.../createmonth=.../createday=.../

例如

.../sbx_unleashed.db/record_type=TYP123/createyear=2017/createmonth=03/createday=22/

create external table cdrs ( record_id int ,record_detail tinyint ,datetime_start int ) partitioned by (record_type int,createyear int, createmonth tinyint, createday tinyint) row format delimited fields terminated by '|' lines terminated by '\n' stored as TEXTFILE location 'hdfs://nameservice1/tmp/sbx_unleashed.db' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1") ;

假定目录结构-

.../sbx_unleashed.db/record_type=.../createdate=.../

例如

.../sbx_unleashed.db/record_type=TYP123/createdate=2017-03-22/

create external table cdrs ( record_id int ,record_detail tinyint ,datetime_start int ) partitioned by (record_type int,createdate date) row format delimited fields terminated by '|' lines terminated by '\n' stored as TEXTFILE location 'hdfs://nameservice1/tmp/sbx_unleashed.db' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1") ;

更多推荐

蜂房:按整数列的一部分进行分区

本文发布于:2023-06-07 22:01:06,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/568543.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:蜂房   整数   分区

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!