问题描述
CREATE EXTERNAL TABLE IF NOT EXISTS LOGS (LGACT STRING,NTNAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
LOCATION '/user/hive/warehouse/LOGS/test';
在test"文件夹下,我每天都在写文件.例如:
under 'test' folder I am writing files daily. for eg:
/user/hive/warehouse/LOGS/test/20170420
/user/hive/warehouse/LOGS/test/20170421
/user/hive/warehouse/LOGS/test/20170422
我在创建的 LOGS 表中看不到任何数据.
I cannot see any data inside LOGS table that i have created.
但是,我使用
LOCATION '/user/hive/warehouse/LOGS/test/20170422';
我可以看到那几天的记录.
I can see that days records.
我想在我的 HIVE 表中查看/test 目录下的所有数据,而且/test 目录每天都会填充新文件.
I want to see all the data under /test directory in my HIVE table, also the /test directory is populated daily with new files.
推荐答案
选项 1
为了支持子目录
set mapred.input.dir.recursive=true;
如果您的 Hive 版本低于 2.0.0 那么也
and if you Hive version is lower than 2.0.0 then also
set hive.mapred.supports.subdirectories=false;
选项 2
创建分区表
CREATE EXTERNAL TABLE IF NOT EXISTS LOGS (LGACT STRING,NTNAME STRING)
partitioned by (dt date)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
LOCATION '/user/hive/warehouse/LOGS/test';
<小时>
alter table LOGS add if not exists partition (dt=date '2017-04-20') LOCATION '/user/hive/warehouse/LOGS/test/20170420';
alter table LOGS add if not exists partition (dt=date '2017-04-21') LOCATION '/user/hive/warehouse/LOGS/test/20170421';
alter table LOGS add if not exists partition (dt=date '2017-04-22') LOCATION '/user/hive/warehouse/LOGS/test/20170422';
如果您使用标准约定保留目录,则管理起来会更容易,例如dt=2017-04-20
而不是 20170420
It would be easier to manage if you keep your directories using the standard convention, e.g. dt=2017-04-20
instead of 20170420
这篇关于创建外部表配置单元,位置内部包含多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论