如何在Pig中展平复杂的数据类型?(How to flatten complex data types in Pig?)

编程入门 行业动态 更新时间:2024-10-25 22:26:13
如何在Pig中展平复杂的数据类型?(How to flatten complex data types in Pig?)

我有一个input.txt如下所示:

{"item":[{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}],"table_id":62} {"item":[{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}],"table_id":62} {"item":[{"sdb_id":107836,"quantity":1}],"table_id":34}

这里我已经加载了input.txt。

raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader() AS (json:map[]); item_data = FOREACH raw_data GENERATE json#'item AS (item:{(sdb_id:int, quantity:int)});

而DUMP item_data看起来像:

([{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}]) ([{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}]) ([{"sdb_id":107836,"quantity":1}])

我的问题是如何让输出看起来如下(仅“sdb_id”值和“数量”值):

(107817, 1) (101733, 1) (107795, 1) (107785, 1) (107836, 1)

非常感谢你的帮助。 对此,我真的非常感激。

I have a input.txt looks like the following:

{"item":[{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}],"table_id":62} {"item":[{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}],"table_id":62} {"item":[{"sdb_id":107836,"quantity":1}],"table_id":34}

Here I already loaded the input.txt.

raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader() AS (json:map[]); item_data = FOREACH raw_data GENERATE json#'item AS (item:{(sdb_id:int, quantity:int)});

And the DUMP item_data looks like:

([{"sdb_id":107817,"quantity":1},{"sdb_id":101733,"quantity":1}]) ([{"sdb_id":107795,"quantity":1},{"sdb_id":107785,"quantity":1}]) ([{"sdb_id":107836,"quantity":1}])

My question is how can I get the ouput looks like the following (only "sdb_id" value and "quantity" value):

(107817, 1) (101733, 1) (107795, 1) (107785, 1) (107836, 1)

Thank you so much for your help. I really appreciate it.

最满意答案

尝试下面的脚本,我已经改变了JsonLoader来做嵌套加载 -

raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]); b = foreach raw_data generate flatten(json#'item') as (k:MAP[]); c = foreach b generate k#'sdb_id', k#'quantity';

希望这可以帮助。

Try the script below, I have altered JsonLoader to do a nested load -

raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]); b = foreach raw_data generate flatten(json#'item') as (k:MAP[]); c = foreach b generate k#'sdb_id', k#'quantity';

Hope this helps.

更多推荐

本文发布于:2023-08-05 04:47:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1428467.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数据类型   如何在   Pig   flatten   data

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!