通过Rest API运行MapReduce作业

编程入门 行业动态 更新时间:2024-10-24 20:19:34
本文介绍了通过Rest API运行MapReduce作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我使用hadoop2.7.1的rest api在集群外运行mapreduce作业.此示例" hadoop-forum/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api "帮助过我.但是当我提交帖子回复时,会发生一些奇怪的事情:

I use hadoop2.7.1's rest apis to run a mapreduce job outside the cluster. This example "hadoop-forum/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api" really helped me. But when I submit a post response, some strange things happen:

  • 我看着" master:8088/cluster/apps "一个帖子响应产生两个工作,如下图: 奇怪的事情:一个响应产生两个工作

  • I look at "master:8088/cluster/apps" and a post response produce two jobs as following picture: strange things: a response produces two jobs

    长时间等待后,由于FileAlreadyExistsException,我在http响应正文中定义的作业失败.原因是另一个作业创建了输出目录,因此输出目录hdfs://master:9000/output/output16已经存在.

    After wait a long time, the job which I defined in the http response body fail because of FileAlreadyExistsException. The reason is another job creates the output directory, so Output directory hdfs://master:9000/output/output16 already exists.

    这是我的回复正文:

    { "application-id": "application_1445825741228_0011", "application-name": "wordcount-demo", "am-container-spec": { "commands": { "command": "{{HADOOP_HOME}}/bin/hadoop jar /home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /data/ /output/output16" }, "environment": { "entry": [{ "key": "CLASSPATH", "value": "{{CLASSPATH}}<CPS>./*<CPS>{{HADOOP_CONF_DIR}}<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/*<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/lib/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/lib/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/lib/*<CPS>./log4j.properties" }] } }, "unmanaged-AM": false, "max-app-attempts": 2, "resource": { "memory": 1024, "vCores": 1 }, "application-type": "MAPREDUCE", "keep-containers-across-application-attempts": false }

    这是我的命令:

    curl -i -X POST -H 'Accept: application/json' -H 'Content-Type: application/json' master:8088/ws/v1/cluster/apps?user.name=hadoop -d @post-json.txt

    有人可以帮助我吗?非常感谢.

    Can anybody help me? Thanks a lot.

    推荐答案

    运行map reduce时,请注意您没有输出文件夹,因为如果存在该作业将无法运行.您可以编写程序,以便删除存在的文件夹,或者在调用其余api之前手动将其删除.这只是为了防止数据丢失并避免覆盖其他作业的输出.

    When you run the map reduce, see that you do not have output folder as the job will not run if it is present. You can write program so that you can delete the folder is it exists, or manually delete it before calling the rest api. This is just to prevent the data loss and avoid overwriting the output of other job.

  • 更多推荐

    通过Rest API运行MapReduce作业

    本文发布于:2023-10-25 08:07:26,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1526411.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:作业   Rest   API   MapReduce

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!