我正在编写一个程序,它接收mapper / reducers的源代码,动态编译mappers / reducers,并使JAR文件不在其中。然后它必须在hadoop集群上运行这个JAR文件。
最后一部分,我通过我的代码动态设置了所有必需的参数。但是,我现在面临的问题是,代码在编译时需要编译的mapper和reducer类。但在编译时,我没有这些类,它们将在运行时间后被接收(例如,通过从远程节点接收的消息)。我会很感激任何想法/建议如何通过这个问题?
下面你可以找到我的最后部分的代码与问题在job.setMapperClass(Mapper_Class .class)和job.setReducerClass(Reducer_Class.class),这些类需要类(Mapper_Class.class和Reducer_Class.class)文件在编译时出现:
private boolean run_Hadoop_Job(String className){ try { System.out.println(开始在Hadoop上运行代码...); String [] argsTemp = {project_test / input,project_test / output}; //创建配置配置conf = new Configuration(); conf.set(fs.default.name,hdfs:// localhost:54310); conf.set(mapred.job.tracker,localhost:54311); conf.set(mapred.jar,jar_Output_Folder + java.io.File.separator + className +。jar); conf.set(mapreduce.map.class,Mapper_Reducer_Classes $ Mapper_Class.class); conf.set(mapreduce.reduce.class,Mapper_Reducer_Classes $ Reducer_Class.class); //根据配置创建一个新的作业 Job job = new Job(conf,用于动态和编程编译的Hadoop示例 - 运行作业); job.setJarByClass(Platform.class); //job.setMapperClass(Mapper_Class.class); //job.setReducerClass(Reducer_Class.class); //减速器输出的键/值 job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job,new Path(argsTemp [0])); //这将删除可能的输出路径以防止作业失败 FileSystem fs = FileSystem.get(conf); Path out = new Path(argsTemp [1]); fs.delete(out,true); //最后设置空出路径 FileOutputFormat.setOutputPath(job,new Path(argsTemp [1])); //job.submit(); System.exit(job.waitForCompletion(true)?0:1); System.out.println(Job Finished!); } catch(Exception e){return false; } 返回true;修正了所以我修改了代码以指定使用conf的映射器和减速器。 set(mapreduce.map.class,my mapper.class)。现在代码编译正确,但是当它被执行时,它会抛出以下错误:
ec 24 ,2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO:Task Id:attempt_201212240511_0006_m_000001_2,Status:FAILED java.lang.RuntimeException:java.lang.ClassNotFoundException:Mapper_Reducer_Classes $ Mapper_Class.class at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
解决方案如果你没有他们在编译时,然后直接在配置中设置名称,如下所示:
conf.set(mapreduce.map.class,org。 what.ever.ClassName); conf.set(mapreduce.reduce.class,org.what.ever.ClassName);
I am writing a program that receives the source code of the mapper/reducers, dynamically compiles the mappers/reducers and makes a JAR file out of them. It then has to run this JAR file on a hadoop cluster.
For the last part, I setup all the required parameters dynamically through my code. However, the problem I am facing now is that the code requires the compiled mapper and reducer classes at the time of compiling. But at the time of compiling, I do not have these classes and they will later be received during the run time (e.g. through a message received from a remote node). I would appreciate any idea/suggestion on how to pass this problem?
Here's below you can find the code for my last part with the problem being at job.setMapperClass(Mapper_Class.class) and job.setReducerClass(Reducer_Class.class) requiring the classes (Mapper_Class.class and Reducer_Class.class) files to be present at the time of compiling:
private boolean run_Hadoop_Job(String className){ try{ System.out.println("Starting to run the code on Hadoop..."); String[] argsTemp = { "project_test/input", "project_test/output" }; // create a configuration Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://localhost:54310"); conf.set("mapred.job.tracker", "localhost:54311"); conf.set("mapred.jar", jar_Output_Folder+ java.io.File.separator + className+".jar"); conf.set("mapreduce.map.class", "Mapper_Reducer_Classes$Mapper_Class.class"); conf.set("mapreduce.reduce.class", "Mapper_Reducer_Classes$Reducer_Class.class"); // create a new job based on the configuration Job job = new Job(conf, "Hadoop Example for dynamically and programmatically compiling-running a job"); job.setJarByClass(Platform.class); //job.setMapperClass(Mapper_Class.class); //job.setReducerClass(Reducer_Class.class); // key/value of your reducer output job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(argsTemp[0])); // this deletes possible output paths to prevent job failures FileSystem fs = FileSystem.get(conf); Path out = new Path(argsTemp[1]); fs.delete(out, true); // finally set the empty out path FileOutputFormat.setOutputPath(job, new Path(argsTemp[1])); //job.submit(); System.exit(job.waitForCompletion(true) ? 0 : 1); System.out.println("Job Finished!"); } catch (Exception e) { return false; } return true; }Revised: So I revised the code to specify the mapper and reducers using conf.set("mapreduce.map.class, "my mapper.class"). Now the code compiles correctly but when it is executed it throws the following error:
ec 24, 2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO: Task Id : attempt_201212240511_0006_m_000001_2, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: Mapper_Reducer_Classes$Mapper_Class.class at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
解决方案If you don't have them at compile time, then directly set the name in the configuration like this:
conf.set("mapreduce.map.class", "org.what.ever.ClassName"); conf.set("mapreduce.reduce.class", "org.what.ever.ClassName");
更多推荐
从另一个Java程序运行Hadoop作业
发布评论