我是学习Hive和Hadoop的新手。 我创建了一个表,它引用了包含文件的特定位置。
CREATE DATABASE IF NOT EXISTS <dbname> LOCATION '/user/<username>/hive/<dbname>.db'; USE <dbname>; CREATE EXTERNAL TABLE IF NOT EXISTS my_table (json STRING) PARTITIONED BY (year INT, month INT, day INT) STORED AS Parquet LOCATION '/my-data/my/files';此表有四列:年,月,日和json。
json看起来像:
{ "t_id":"user.login", "e_time":"2014-11-30T23:59:52Z", "user_email_address":"someemail@email.com", "la_id":"10", "dbnum":16, "remote_ip":"171.154.1.8", "server_name":"some.server", "protocol":"IMAPS", "secure":true, "result":"success" }一个有效的基本查询看起来像这样:
SELECT json FROM mydb WHERE year=2015 AND month=12 LIMIT 10;我想做的是有一个where子句,我可以在上面列出的json字段上进行过滤。 我想它会如下所示,但它不起作用:
SELECT get_json_object(mytable.json, '$.t_id') as whatever FROM mytable WHERE year=2015 AND month=12 AND json like '%user.login%' LIMIT 1;或者更好的是,能够基于json进行查询,如下所示:
SELECT COUNT(*) FROM mytable WHERE json.t_id = 'user.login' AND json.someDate > ... and so on...任何建议表示赞赏。
I'm new to learning about Hive and Hadoop. There is a table that I've created, which references a certain location containing files.
CREATE DATABASE IF NOT EXISTS <dbname> LOCATION '/user/<username>/hive/<dbname>.db'; USE <dbname>; CREATE EXTERNAL TABLE IF NOT EXISTS my_table (json STRING) PARTITIONED BY (year INT, month INT, day INT) STORED AS Parquet LOCATION '/my-data/my/files';This table has four columns: year, month, day, and json.
The json would look something like:
{ "t_id":"user.login", "e_time":"2014-11-30T23:59:52Z", "user_email_address":"someemail@email.com", "la_id":"10", "dbnum":16, "remote_ip":"171.154.1.8", "server_name":"some.server", "protocol":"IMAPS", "secure":true, "result":"success" }A basic query, that works, looks something like this:
SELECT json FROM mydb WHERE year=2015 AND month=12 LIMIT 10;What I would like to do is have a where clause where I could filter on the json fields listed above. I imagine that it would look like the following, but it does not work:
SELECT get_json_object(mytable.json, '$.t_id') as whatever FROM mytable WHERE year=2015 AND month=12 AND json like '%user.login%' LIMIT 1;Or better yet, be able to query based on the json like so:
SELECT COUNT(*) FROM mytable WHERE json.t_id = 'user.login' AND json.someDate > ... and so on...Any advice is appreciated.
最满意答案
试试这个查询:
select b.t_id from my_table a lateral view json_tuple(a.json,'t_id') b as t_id where a.year=2015 and a.month=12 LIMIT 10;
你可以在json_tuple中调用另一个键,并在where子句中使用它。 例如: select b.t_id from my_table a lateral view json_tuple(a.json,'t_id','result') b as t_id, result where a.year=2015 and a.month=12 and b.result ='true' LIMIT 10;
try this query:
select b.t_id from my_table a lateral view json_tuple(a.json,'t_id') b as t_id where a.year=2015 and a.month=12 LIMIT 10;
you can call another key in the json_tuple and use it in a where clause as well. e.g.: select b.t_id from my_table a lateral view json_tuple(a.json,'t_id','result') b as t_id, result where a.year=2015 and a.month=12 and b.result ='true' LIMIT 10;
更多推荐
发布评论