从另一个Java程序运行Hadoop作业

编程入门 行业动态 更新时间:2024-10-24 22:21:09
本文介绍了从另一个Java程序运行Hadoop作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在编写一个程序,它接收mapper / reducers的源代码,动态编译mappers / reducers,并使JAR文件不在其中。然后它必须在hadoop集群上运行这个JAR文件。

最后一部分,我通过我的代码动态设置了所有必需的参数。但是,我现在面临的问题是,代码在编译时需要编译的mapper和reducer类。但在编译时,我没有这些类,它们将在运行时间后被接收(例如,通过从远程节点接收的消息)。我会很感激任何想法/建议如何通过这个问题?

下面你可以找到我的最后部分的代码与问题在job.setMapperClass(Mapper_Class .class)和job.setReducerClass(Reducer_Class.class),这些类需要类(Mapper_Class.class和Reducer_Class.class)文件在编译时出现:

private boolean run_Hadoop_Job(String className){ try { System.out.println(开始在Hadoop上运行代码...); String [] argsTemp = {project_test / input,project_test / output}; //创建配置配置conf = new Configuration(); conf.set(fs.default.name,hdfs:// localhost:54310); conf.set(mapred.job.tracker,localhost:54311); conf.set(mapred.jar,jar_Output_Folder + java.io.File.separator + className +。jar); conf.set(mapreduce.map.class,Mapper_Reducer_Classes $ Mapper_Class.class); conf.set(mapreduce.reduce.class,Mapper_Reducer_Classes $ Reducer_Class.class); //根据配置创建一个新的作业 Job job = new Job(conf,用于动态和编程编译的Hadoop示例 - 运行作业); job.setJarByClass(Platform.class); //job.setMapperClass(Mapper_Class.class); //job.setReducerClass(Reducer_Class.class); //减速器输出的键/值 job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job,new Path(argsTemp [0])); //这将删除可能的输出路径以防止作业失败 FileSystem fs = FileSystem.get(conf); Path out = new Path(argsTemp [1]); fs.delete(out,true); //最后设置空出路径 FileOutputFormat.setOutputPath(job,new Path(argsTemp [1])); //job.submit(); System.exit(job.waitForCompletion(true)?0:1); System.out.println(Job Finished!); } catch(Exception e){return false; } 返回true;

修正了所以我修改了代码以指定使用conf的映射器和减速器。 set(mapreduce.map.class,my mapper.class)。现在代码编译正确,但是当它被执行时,它会抛出以下错误:

ec 24 ,2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO:Task Id:attempt_201212240511_0006_m_000001_2,Status:FAILED java.lang.RuntimeException:java.lang.ClassNotFoundException:Mapper_Reducer_Classes $ Mapper_Class.class at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)

解决方案

如果你没有他们在编译时,然后直接在配置中设置名称,如下所示:

conf.set(mapreduce.map.class,org。 what.ever.ClassName); conf.set(mapreduce.reduce.class,org.what.ever.ClassName);

I am writing a program that receives the source code of the mapper/reducers, dynamically compiles the mappers/reducers and makes a JAR file out of them. It then has to run this JAR file on a hadoop cluster.

For the last part, I setup all the required parameters dynamically through my code. However, the problem I am facing now is that the code requires the compiled mapper and reducer classes at the time of compiling. But at the time of compiling, I do not have these classes and they will later be received during the run time (e.g. through a message received from a remote node). I would appreciate any idea/suggestion on how to pass this problem?

Here's below you can find the code for my last part with the problem being at job.setMapperClass(Mapper_Class.class) and job.setReducerClass(Reducer_Class.class) requiring the classes (Mapper_Class.class and Reducer_Class.class) files to be present at the time of compiling:

private boolean run_Hadoop_Job(String className){ try{ System.out.println("Starting to run the code on Hadoop..."); String[] argsTemp = { "project_test/input", "project_test/output" }; // create a configuration Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://localhost:54310"); conf.set("mapred.job.tracker", "localhost:54311"); conf.set("mapred.jar", jar_Output_Folder+ java.io.File.separator + className+".jar"); conf.set("mapreduce.map.class", "Mapper_Reducer_Classes$Mapper_Class.class"); conf.set("mapreduce.reduce.class", "Mapper_Reducer_Classes$Reducer_Class.class"); // create a new job based on the configuration Job job = new Job(conf, "Hadoop Example for dynamically and programmatically compiling-running a job"); job.setJarByClass(Platform.class); //job.setMapperClass(Mapper_Class.class); //job.setReducerClass(Reducer_Class.class); // key/value of your reducer output job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(argsTemp[0])); // this deletes possible output paths to prevent job failures FileSystem fs = FileSystem.get(conf); Path out = new Path(argsTemp[1]); fs.delete(out, true); // finally set the empty out path FileOutputFormat.setOutputPath(job, new Path(argsTemp[1])); //job.submit(); System.exit(job.waitForCompletion(true) ? 0 : 1); System.out.println("Job Finished!"); } catch (Exception e) { return false; } return true; }

Revised: So I revised the code to specify the mapper and reducers using conf.set("mapreduce.map.class, "my mapper.class"). Now the code compiles correctly but when it is executed it throws the following error:

ec 24, 2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO: Task Id : attempt_201212240511_0006_m_000001_2, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: Mapper_Reducer_Classes$Mapper_Class.class at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)

解决方案

If you don't have them at compile time, then directly set the name in the configuration like this:

conf.set("mapreduce.map.class", "org.what.ever.ClassName"); conf.set("mapreduce.reduce.class", "org.what.ever.ClassName");

更多推荐

从另一个Java程序运行Hadoop作业

本文发布于:2023-11-24 10:27:31,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:作业   程序   Java   Hadoop

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!