Hadoop Streaming命令失败并显示Python错误

编程入门 行业动态 更新时间:2024-10-27 12:24:12
本文介绍了Hadoop Streaming命令失败并显示Python错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我是Ubuntu,Hadoop和DFS的新手,但是我已经成功地在我的本地ubuntu机器上安装了一个单节点hadoop实例,遵循Michael-Noll上发布的指示:

www.michael-noll/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#copy-local-example- data-to-hdfs

www.michael-noll/tutorials/writing-an-hadoop-mapreduce-program-in-python/

我目前坚持在Hadoop上运行基本的字数统计范例。我不知道我从下载目录中运行Hadoop的事实是否有很大的不同,但我试图通过放置它们来绕过我的文件位置来寻找mapper.py和reducer.py函数在Hadooop工作目录中没有成功。我已经用尽了所有的研究,仍然无法解决这个问题(即使用文件参数等)。我真的很感谢任何帮助提前,我希望我以一种可以帮助其他刚刚开始的人的方式构建这个问题与Python + Hadoop。

我单独测试了mapper.py和reduce.py,并且在提示来自bash shell的玩具文本数据时都正常工作。

从我的Bash Shell输出:

hduser @ chris-linux:/ home / chris / Downloads / hadoop $ bin / hadoop jar /home/chris/Downloads/hadoop/contrib/streaming/hadoop-streaming-1.0.4.jar -file mapper.py -file reducer.py -mapper mapper.py -reducer reducer.py -input / user / hduser / gutenberg / * -output / user / hduser / gutenberg-output3 警告:$ HADOOP_HOME已弃用。 packageJobJar:[mapper.py,reducer.py,/ app / hadoop / tmp / hadoop-unjar4681300115516015516 /] [] /tmp/streamjob2215860242221125845.jar tmpDir = null 13/03 / 08 14:43:46 INFO util.NativeCodeLoader:加载native-hadoop库 13/03/08 14:43:46 WARN snappy.LoadSnappy:快速本地库未加载 13/03/08 14:43:46信息mapred.FileInputFormat:进程的总输入路径:3 13/03/08 14:43:47 INFO streaming.StreamJob:getLocalDirs():[/ app / hadoop / tmp / mapred /本地] 13/03/08 14:43:47信息streaming.StreamJob:正在运行的作业:job_201303081155_0032 13/03/08 14:43:47信息streaming.StreamJob:要杀死这份工作,运行: 13/03/08 14:43:47 INFO streaming.StreamJob:/home/chris/Downloads/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker = localhost:54311 -kill job_201303081155_0032 13/03/08 14:43:47 INFO streaming.StreamJob:跟踪URL:http:// localhost:50030 / jobdetails.jsp?jobid = job_201303081155_0032 13/03/08 14:43 :48 INFO streaming.StreamJob :map 0%reduce 0% 13/03/08 14:44:12 INFO streaming.StreamJob:map 100%减少100% 13/03/08 14:44:12 INFO streaming.StreamJob :要杀死这项工作,请运行: 13/03/08 14:44:12 INFO streaming.StreamJob:/home/chris/Downloads/hadoop/libexec/../bin/hadoop job -Dmapred.job。 tracker = localhost:54311 -kill job_201303081155_0032 13/03/08 14:44:12 INFO streaming.StreamJob:跟踪URL:http:// localhost:50030 / jobdetails.jsp?jobid = job_201303081155_0032 13 / 03/08 14:44:12错误streaming.StreamJob:作业不成功。错误:JobCleanup任务失败,任务:task_201303081155_0032_m_000003 13/03/08 14:44:12 INFO streaming.StreamJob:killJob ... Streaming Command Failed!

我的HDFS位于/ app / hadoop / tmp,我相信它也与我的/ user / hduser目录在我的hadoop实例上。

输入数据位于/ user / hduser / gutenberg / *(3个UTF纯文本文件)输出设置为在/ user / hduser / gutenberg-output中创建

解决方案

查看以下路径中的日志(基于上面提供的信息):

$ HADOOP_HOME $ / logs / userlogs / job_201303081155_0032 / task_201303081155_0032_m_000003

code>

这应该为您提供有关该特定任务的一些信息。

Hadoop提供的日志非常好,只需要一些时间来查找信息:)

I'm a newcomer to Ubuntu, Hadoop and DFS but I've managed to install a single-node hadoop instance on my local ubuntu machine following the directions posted on Michael-Noll here:

www.michael-noll/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#copy-local-example-data-to-hdfs

www.michael-noll/tutorials/writing-an-hadoop-mapreduce-program-in-python/

I'm currently stuck on running the basic word count example on Hadoop. I'm not sure if the fact I've been running Hadoop out of my Downloads directory makes too much of a difference, but I've atempted to tweek around my file locations for the mapper.py and reducer.py functions by placing them in the Hadooop working directory with no success. I've exhausted all of my research and still cannot solve this problem (i.e.- using -file parameters, etc.) I really appreciate any help in advance and I hope I framed this question in a way that can help others who are just beginning with Python + Hadoop.

I tested the mapper.py and reduce.py independently and both work fine when prompted with toy text data from the bash shell.

Output from my Bash Shell:

hduser@chris-linux:/home/chris/Downloads/hadoop$ bin/hadoop jar /home/chris/Downloads/hadoop/contrib/streaming/hadoop-streaming-1.0.4.jar -file mapper.py -file reducer.py -mapper mapper.py -reducer reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output3 Warning: $HADOOP_HOME is deprecated. packageJobJar: [mapper.py, reducer.py, /app/hadoop/tmp/hadoop-unjar4681300115516015516/] [] /tmp/streamjob2215860242221125845.jar tmpDir=null 13/03/08 14:43:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/03/08 14:43:46 WARN snappy.LoadSnappy: Snappy native library not loaded 13/03/08 14:43:46 INFO mapred.FileInputFormat: Total input paths to process : 3 13/03/08 14:43:47 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local] 13/03/08 14:43:47 INFO streaming.StreamJob: Running job: job_201303081155_0032 13/03/08 14:43:47 INFO streaming.StreamJob: To kill this job, run: 13/03/08 14:43:47 INFO streaming.StreamJob: /home/chris/Downloads/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201303081155_0032 13/03/08 14:43:47 INFO streaming.StreamJob: Tracking URL: localhost:50030/jobdetails.jsp?jobid=job_201303081155_0032 13/03/08 14:43:48 INFO streaming.StreamJob: map 0% reduce 0% 13/03/08 14:44:12 INFO streaming.StreamJob: map 100% reduce 100% 13/03/08 14:44:12 INFO streaming.StreamJob: To kill this job, run: 13/03/08 14:44:12 INFO streaming.StreamJob: /home/chris/Downloads/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201303081155_0032 13/03/08 14:44:12 INFO streaming.StreamJob: Tracking URL: localhost:50030/jobdetails.jsp?jobid=job_201303081155_0032 13/03/08 14:44:12 ERROR streaming.StreamJob: Job not successful. Error: JobCleanup Task Failure, Task: task_201303081155_0032_m_000003 13/03/08 14:44:12 INFO streaming.StreamJob: killJob... Streaming Command Failed!

My HDFS is located at /app/hadoop/tmp which, I believe, is also the same as my /user/hduser directory on my hadoop instance.

Input data is located at /user/hduser/gutenberg/* (3 UTF plain text files) Output is set to be created at /user/hduser/gutenberg-output

解决方案

Have a look at the logs in the following path (based on the information supplied above):

$HADOOP_HOME$/logs/userlogs/job_201303081155_0032/task_201303081155_0032_m_000003

This should provide you with some information on that specific task.

The logs supplied by Hadoop are pretty good, it just takes some digging around to find the information :)

更多推荐

Hadoop Streaming命令失败并显示Python错误

本文发布于:2023-07-05 14:45:16,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:命令   错误   Hadoop   Streaming   Python

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!