按现有字段对 Hive 表进行分区?

编程入门 行业动态 更新时间:2024-10-25 02:21:07
本文介绍了按现有字段对 Hive 表进行分区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

Can I partition a Hive table upon insert by an existing field?

I have a 10 GB file with a date field and an hour of day field. Can I load this file into a table, then insert-overwrite into another partitioned table that uses those fields as a partition? Would something like the following work?

INSERT OVERWRITE TABLE tealeaf_event PARTITION(dt=evt.datestring,hour=evt.hour) SELECT * FROM staging_event evt;

Thanks!

Travis

解决方案

I just ran across this trying to answer the same question and it was helpful but not quite complete. The short answer is yes, something like the query in the question will work but the syntax is not quite right.

Say you have three tables which were created using the following statements:

CREATE TABLE staging_unpartitioned (datestring string, hour int, a int, b int); CREATE TABLE staging_partitioned (a int, b int) PARTITIONED BY (datestring string, hour int); CREATE TABLE production_partitioned (a int, b int) PARTITIONED BY (dt string, hour int);

Columns a and b are just some example columns. dt and hour are the values we want to partition on once it gets to the production table. Moving the staging data to production from staging_unpartitioned and staging_partitioned looks exactly the same.

INSERT OVERWRITE TABLE production_partitioned PARTITION (dt, hour) SELECT a, b, datestring, hour FROM staging_unpartitioned; INSERT OVERWRITE TABLE production_partitioned PARTITION (dt, hour) SELECT a, b, datestring, hour FROM staging_partitioned;

This uses a process called Dynamic Partitioning which you can read about here. The important thing to note is that which columns are associated with which partitions is determined by the SELECT order. All dynamic partitions must be selected last and in order.

There's a good chance when you try to run the code above you will hit an error due to the properties you have set. First, it will not work if you have dynamic partitioning disabled so make sure to:

set hive.exec.dynamic.partition=true;

Then you might hit an error if you aren't partitioning on at least one static partition before the dynamic partitions. This restriction would save you accidentally removing a root partition when you meant to overwrite its sub-partitions with dynamic partitions. In my experience this behavior has never been helpful and has often been annoying, but your mileage may vary. At any rate, it is easy to change:

set hive.exec.dynamic.partition.mode=nonstrict;

And that should do it.

更多推荐

按现有字段对 Hive 表进行分区?

本文发布于:2023-06-07 22:03:50,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/568582.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:字段   分区   Hive

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!