如何在Pig中展平复杂的数据类型?(How to flatten complex data types in Pig?)
我有一个input.txt如下所示:
{"item":[{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}],"table_id":62} {"item":[{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}],"table_id":62} {"item":[{"sdb_id":107836,"quantity":1}],"table_id":34}这里我已经加载了input.txt。
raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader() AS (json:map[]); item_data = FOREACH raw_data GENERATE json#'item AS (item:{(sdb_id:int, quantity:int)});而DUMP item_data看起来像:
([{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}]) ([{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}]) ([{"sdb_id":107836,"quantity":1}])我的问题是如何让输出看起来如下(仅“sdb_id”值和“数量”值):
(107817, 1) (101733, 1) (107795, 1) (107785, 1) (107836, 1)非常感谢你的帮助。 对此,我真的非常感激。
I have a input.txt looks like the following:
{"item":[{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}],"table_id":62} {"item":[{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}],"table_id":62} {"item":[{"sdb_id":107836,"quantity":1}],"table_id":34}Here I already loaded the input.txt.
raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader() AS (json:map[]); item_data = FOREACH raw_data GENERATE json#'item AS (item:{(sdb_id:int, quantity:int)});And the DUMP item_data looks like:
([{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}]) ([{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}]) ([{"sdb_id":107836,"quantity":1}])My question is how can I get the ouput looks like the following (only "sdb_id" value and "quantity" value):
(107817, 1) (101733, 1) (107795, 1) (107785, 1) (107836, 1)Thank you so much for your help. I really appreciate it.
最满意答案
尝试下面的脚本,我已经改变了JsonLoader来做嵌套加载 -
raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]); b = foreach raw_data generate flatten(json#'item') as (k:MAP[]); c = foreach b generate k#'sdb_id', k#'quantity';希望这可以帮助。
Try the script below, I have altered JsonLoader to do a nested load -
raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]); b = foreach raw_data generate flatten(json#'item') as (k:MAP[]); c = foreach b generate k#'sdb_id', k#'quantity';Hope this helps.
更多推荐
发布评论