Pig UDF中的分布式缓存(Distributed Cache in Pig UDF)

编程入门 行业动态 更新时间:2024-10-27 15:23:37
Pig UDF中的分布式缓存(Distributed Cache in Pig UDF)

这是我使用Pig使用分布式缓存实现UDF的代码。

public class Regex extends EvalFunc<Integer> { static HashMap<String, String> map = new HashMap<String, String>(); public List<String> getCacheFiles() { Path lookup_file = new Path( "hdfs://localhost.localdomain:8020/user/cloudera/top"); List<String> list = new ArrayList<String>(1); list.add(lookup_file + "#id_lookup"); return list; } public void VectorizeData() throws IOException { FileReader fr = new FileReader("./id_lookup"); BufferedReader brd = new BufferedReader(fr); String line; while ((line = brd.readLine()) != null) { String str[] = line.split("#"); map.put(str[0], str[1]); } fr.close(); } @Override public Integer exec(Tuple input) throws IOException { // TODO Auto-generated method stub return map.size(); } }

鉴于以下是我的分布式缓存输入文件(hdfs://localhost.localdomain:8020 / user / cloudera / top)

Impetigo|Streptococcus pyogenes#Impetigo indeterminate leprosy|Uncharacteristic leprosy#indeterminate leprosy

我得到的输出是

(0) (0) (0) (0) (0)

这意味着我的hashmap为空。 如何使用分布式缓存填充我的hashmap?

Here is my code to Implement a UDF using Distributed Cache Using Pig.

public class Regex extends EvalFunc<Integer> { static HashMap<String, String> map = new HashMap<String, String>(); public List<String> getCacheFiles() { Path lookup_file = new Path( "hdfs://localhost.localdomain:8020/user/cloudera/top"); List<String> list = new ArrayList<String>(1); list.add(lookup_file + "#id_lookup"); return list; } public void VectorizeData() throws IOException { FileReader fr = new FileReader("./id_lookup"); BufferedReader brd = new BufferedReader(fr); String line; while ((line = brd.readLine()) != null) { String str[] = line.split("#"); map.put(str[0], str[1]); } fr.close(); } @Override public Integer exec(Tuple input) throws IOException { // TODO Auto-generated method stub return map.size(); } }

Given Below is my Distributed Cache Input File (hdfs://localhost.localdomain:8020/user/cloudera/top)

Impetigo|Streptococcus pyogenes#Impetigo indeterminate leprosy|Uncharacteristic leprosy#indeterminate leprosy

Output I get is

(0) (0) (0) (0) (0)

This means that my hashmap is empty. How do i fill my hashmap using Distributed Cache?.

最满意答案

这是因为VectorizeData()未在可执行文件中调用。

This was because VectorizeData() was not called in the executable.

更多推荐

本文发布于:2023-08-05 22:27:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1440223.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:分布式   缓存   UDF   Pig   Distributed

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!