Pig UDF中的分布式缓存(Distributed Cache in Pig UDF)
这是我使用Pig使用分布式缓存实现UDF的代码。
public class Regex extends EvalFunc<Integer> { static HashMap<String, String> map = new HashMap<String, String>(); public List<String> getCacheFiles() { Path lookup_file = new Path( "hdfs://localhost.localdomain:8020/user/cloudera/top"); List<String> list = new ArrayList<String>(1); list.add(lookup_file + "#id_lookup"); return list; } public void VectorizeData() throws IOException { FileReader fr = new FileReader("./id_lookup"); BufferedReader brd = new BufferedReader(fr); String line; while ((line = brd.readLine()) != null) { String str[] = line.split("#"); map.put(str[0], str[1]); } fr.close(); } @Override public Integer exec(Tuple input) throws IOException { // TODO Auto-generated method stub return map.size(); } }鉴于以下是我的分布式缓存输入文件(hdfs://localhost.localdomain:8020 / user / cloudera / top)
Impetigo|Streptococcus pyogenes#Impetigo indeterminate leprosy|Uncharacteristic leprosy#indeterminate leprosy我得到的输出是
(0) (0) (0) (0) (0)这意味着我的hashmap为空。 如何使用分布式缓存填充我的hashmap?
Here is my code to Implement a UDF using Distributed Cache Using Pig.
public class Regex extends EvalFunc<Integer> { static HashMap<String, String> map = new HashMap<String, String>(); public List<String> getCacheFiles() { Path lookup_file = new Path( "hdfs://localhost.localdomain:8020/user/cloudera/top"); List<String> list = new ArrayList<String>(1); list.add(lookup_file + "#id_lookup"); return list; } public void VectorizeData() throws IOException { FileReader fr = new FileReader("./id_lookup"); BufferedReader brd = new BufferedReader(fr); String line; while ((line = brd.readLine()) != null) { String str[] = line.split("#"); map.put(str[0], str[1]); } fr.close(); } @Override public Integer exec(Tuple input) throws IOException { // TODO Auto-generated method stub return map.size(); } }Given Below is my Distributed Cache Input File (hdfs://localhost.localdomain:8020/user/cloudera/top)
Impetigo|Streptococcus pyogenes#Impetigo indeterminate leprosy|Uncharacteristic leprosy#indeterminate leprosyOutput I get is
(0) (0) (0) (0) (0)This means that my hashmap is empty. How do i fill my hashmap using Distributed Cache?.
最满意答案
这是因为VectorizeData()未在可执行文件中调用。
This was because VectorizeData() was not called in the executable.
更多推荐
发布评论