在PIG的嵌套FOREACH中使用过滤器

编程入门 行业动态 更新时间:2024-10-28 05:16:03
本文介绍了在PIG的嵌套FOREACH中使用过滤器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有两个猪的关系.第一个count_pairs显示成对的单词以及它们被看到了多少次.例如((car,tire), 4).第二个是word_counts,它跟踪每个单词在ex之前被看到了多少次. (car, 20).我想找到每对被看过的次数与仅看到第一个单词所得到的次数相比所占的百分比.在我们的情况下,我需要((car,tire), 4/20).我试图写一个嵌套的foreach来解决这个问题:

I have two pig relations. The first one count_pairs shows pairs of words and how many times they were seen. ex ((car,tire), 4). The second is word_counts, which keeps track of how many times each word was seen ex. (car, 20). I would like to find the percentage of how many times each pair was seen compared to how many times just the first word was seen. In our case I would want ((car,tire), 4/20). I tried to write a nested foreach to solve this problem :

> percent_count_pairs = FOREACH count_pairs { > denom = FILTER word_counts BY ($0 ==count_pairs.pair.word1); > GENERATE pair, count2/(double)denom.$1;}

我不断收到此错误:

'Pig script failed to parse: <file src/cluster.pig, line 27, column 15> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)'

指向FILTER所在的行; 谷歌搜索此错误并没有使我有所帮助.请帮忙! (请注意,如果我将FILTER的行从foreach中取出,这确实可行...)

This point to the line with the FILTER; googling this error did not lead me to anything helpful. Please help! (ps. this does work if I take the line with FILTER out of the foreach...)

推荐答案

经过更多的搜索之后,我意识到这是Pig中的错误,不允许这样做: issues.apache/jira/browse/PIG-1798 .我最终编写了自己的UDF进行过滤.

After more googling I came to realize that this is a bug in Pig that will not allow this: issues.apache/jira/browse/PIG-1798. I ended up writing my own UDF to filter.

更多推荐

在PIG的嵌套FOREACH中使用过滤器

本文发布于:2023-07-07 13:35:17,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1063918.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:嵌套   过滤器   PIG   FOREACH

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!