从另一个Java程序运行Hadoop作业

编程入门行业动态更新时间:2024-10-24 22:21:09

本文介绍了从另一个Java程序运行Hadoop作业的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在编写一个程序，它接收mapper / reducers的源代码，动态编译mappers / reducers，并使JAR文件不在其中。然后它必须在hadoop集群上运行这个JAR文件。

最后一部分，我通过我的代码动态设置了所有必需的参数。但是，我现在面临的问题是，代码在编译时需要编译的mapper和reducer类。但在编译时，我没有这些类，它们将在运行时间后被接收（例如，通过从远程节点接收的消息）。我会很感激任何想法/建议如何通过这个问题？

下面你可以找到我的最后部分的代码与问题在job.setMapperClass（Mapper_Class .class）和job.setReducerClass（Reducer_Class.class），这些类需要类（Mapper_Class.class和Reducer_Class.class）文件在编译时出现：

private boolean run_Hadoop_Job（String className）{ try { System.out.println（开始在Hadoop上运行代码...）; String [] argsTemp = {project_test / input，project_test / output}; //创建配置配置conf = new Configuration（）; conf.set（fs.default.name，hdfs：// localhost：54310）; conf.set（mapred.job.tracker，localhost：54311）; conf.set（mapred.jar，jar_Output_Folder + java.io.File.separator + className +。jar）; conf.set（mapreduce.map.class，Mapper_Reducer_Classes $ Mapper_Class.class）; conf.set（mapreduce.reduce.class，Mapper_Reducer_Classes $ Reducer_Class.class）; //根据配置创建一个新的作业 Job job = new Job（conf，用于动态和编程编译的Hadoop示例 - 运行作业）; job.setJarByClass（Platform.class）; //job.setMapperClass(Mapper_Class.class）; //job.setReducerClass(Reducer_Class.class）; //减速器输出的键/值 job.setOutputKeyClass（Text.class）; job.setOutputValueClass（IntWritable.class）; FileInputFormat.addInputPath（job，new Path（argsTemp [0]））; //这将删除可能的输出路径以防止作业失败 FileSystem fs = FileSystem.get（conf）; Path out = new Path（argsTemp [1]）; fs.delete（out，true）; //最后设置空出路径 FileOutputFormat.setOutputPath（job，new Path（argsTemp [1]））; //job.submit（）; System.exit（job.waitForCompletion（true）？0：1）; System.out.println（Job Finished！）; } catch（Exception e）{return false; } 返回true;

修正了所以我修改了代码以指定使用conf的映射器和减速器。 set（mapreduce.map.class，my mapper.class）。现在代码编译正确，但是当它被执行时，它会抛出以下错误：

ec 24 ，2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO：Task Id：attempt_201212240511_0006_m_000001_2，Status：FAILED java.lang.RuntimeException：java.lang.ClassNotFoundException：Mapper_Reducer_Classes $ Mapper_Class.class at org.apache.hadoop.conf.Configuration.getClass（Configuration.java:809） at org.apache.hadoop.mapreduce.JobContext.getMapperClass（JobContext.java:157） at org.apache.hadoop.mapred.MapTask.runNewMapper（MapTask.java:569） at org.apache.hadoop.mapred.MapTask.run（MapTask.java:305） at org.apache.hadoop.mapred.Child.main（Child.java:170）

解决方案
如果你没有他们在编译时，然后直接在配置中设置名称，如下所示：
conf.set（mapreduce.map.class，org。 what.ever.ClassName）; conf.set（mapreduce.reduce.class，org.what.ever.ClassName）;

I am writing a program that receives the source code of the mapper/reducers, dynamically compiles the mappers/reducers and makes a JAR file out of them. It then has to run this JAR file on a hadoop cluster.

For the last part, I setup all the required parameters dynamically through my code. However, the problem I am facing now is that the code requires the compiled mapper and reducer classes at the time of compiling. But at the time of compiling, I do not have these classes and they will later be received during the run time (e.g. through a message received from a remote node). I would appreciate any idea/suggestion on how to pass this problem?

Here's below you can find the code for my last part with the problem being at job.setMapperClass(Mapper_Class.class) and job.setReducerClass(Reducer_Class.class) requiring the classes (Mapper_Class.class and Reducer_Class.class) files to be present at the time of compiling:
private boolean run_Hadoop_Job(String className){ try{ System.out.println("Starting to run the code on Hadoop..."); String[] argsTemp = { "project_test/input", "project_test/output" }; // create a configuration Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://localhost:54310"); conf.set("mapred.job.tracker", "localhost:54311"); conf.set("mapred.jar", jar_Output_Folder+ java.io.File.separator + className+".jar"); conf.set("mapreduce.map.class", "Mapper_Reducer_Classes$Mapper_Class.class"); conf.set("mapreduce.reduce.class", "Mapper_Reducer_Classes$Reducer_Class.class"); // create a new job based on the configuration Job job = new Job(conf, "Hadoop Example for dynamically and programmatically compiling-running a job"); job.setJarByClass(Platform.class); //job.setMapperClass(Mapper_Class.class); //job.setReducerClass(Reducer_Class.class); // key/value of your reducer output job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(argsTemp[0])); // this deletes possible output paths to prevent job failures FileSystem fs = FileSystem.get(conf); Path out = new Path(argsTemp[1]); fs.delete(out, true); // finally set the empty out path FileOutputFormat.setOutputPath(job, new Path(argsTemp[1])); //job.submit(); System.exit(job.waitForCompletion(true) ? 0 : 1); System.out.println("Job Finished!"); } catch (Exception e) { return false; } return true; }
Revised: So I revised the code to specify the mapper and reducers using conf.set("mapreduce.map.class, "my mapper.class"). Now the code compiles correctly but when it is executed it throws the following error:

ec 24, 2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO: Task Id : attempt_201212240511_0006_m_000001_2, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: Mapper_Reducer_Classes$Mapper_Class.class at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
解决方案
If you don't have them at compile time, then directly set the name in the configuration like this:
conf.set("mapreduce.map.class", "org.what.ever.ClassName"); conf.set("mapreduce.reduce.class", "org.what.ever.ClassName");

更多推荐

从另一个Java程序运行Hadoop作业

本文发布于:2023-11-24 10:27:31，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1624837.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

作业程序 Java Hadoop

上一篇： Hadoop MapReduce vs MPI(vs Spark vs Mahout vs Mesos)

下一篇：如何从一个卡桑德拉集群复制的ColumnFamily到另一个？

发布评论取消回复

评论列表（有 0 条评论）

最近发表

荆门网站建设的重要性

win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法

您可以尝试添加 --skip-broken 选项来解决该问题您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案

关于无线网络波动大的解决办法

Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法

VS 2019 点击页面自动定位到解决方案资源管理器目录位置

（亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法

Typora官网下载的最新版本mac10.13以下版本用不了的解决办法

成功解决ModuleNotFoundError: No module named ‘torch._C‘

MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍！

热门文章

从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止

币安API错误代码1102，未发送强制参数“时间戳”

如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题

在 Node.js 中从网络流创建 blob

使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？

使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流

如何从nodejs连接laravel>laravel

使用nodejs观看目录

如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？

FirebaseError：无法从.env加载环境变量

标签列表

文件

如何在

Python

系统

java

方法

数据

错误

windows

函数

android

linux

教程

如何使用

代码

字符串

计算机

电脑

服务器

NET

应用程序

数组

PHP

MySQL

SQL

对象

项目

程序

数据库

word