如何在操作手册中运行mahout中的示例

编程入门行业动态更新时间:2024-10-09 10:21:49

本文介绍了如何在操作手册中运行mahout中的示例的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我试图在第7章中运行hello world示例。我在eclipse中创建了以下代码，然后将它打包到jar中： - $ / $>

package com.mycode.mahout import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.List; 导入org.apache.hadoop.conf.Configuration; 导入org.apache.hadoop.fs.FileSystem; 导入org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; import org.apache.mahout.clustering.WeightedVectorWritable; import org.apache.mahout.clustering.kmeans.Cluster; import org.apache.mahout.clustering.kmeans.KMeansDriver; 导入org.apache.mahoutmon.distance.EuclideanDistanceMeasure; import org.apache.mahout.math.RandomAccessSparseVector; import org.apache.mahout.math.Vector; import org.apache.mahout.math.VectorWritable; public class SimpleKMeansClustering { public static final double [] [] points = {{1，1}，{2，1}，{1,2}， { 2，2}，{3，3}，{8,8}， {9,8}，{8,9}，{9,9}}; public static void writePointsToFile（List< Vector> points， String fileName， FileSystem fs， Configuration conf）throws IOException { Path path =新路径（文件名）; SequenceFile.Writer writer = new SequenceFile.Writer（fs，conf， path，LongWritable.class，VectorWritable.class）; 长recNum = 0; VectorWritable vec = new VectorWritable（）; for（Vector point：points）{ vec.set（point）; writer.append（new LongWritable（recNum ++），vec）; } writer.close（）; } public static List< Vector> getPoints（double [] [] raw）{ List< Vector> points = new ArrayList< Vector>（）; for（int i = 0; i< raw.length; i ++）{ double [] fr = raw [i]; Vector vec = new RandomAccessSparseVector（fr.length）; vec.assign（fr）; points.add（vec）; } 回报点; public static void main（String args []）throws Exception { int k = 2; List< Vector> vectors = getPoints（points）; 文件testData = new File（testdata）; if（！testData.exists（））{ testData.mkdir（）; } testData = new File（testdata / points）; if（！testData.exists（））{ testData.mkdir（）; } 配置conf = new Configuration（）; FileSystem fs = FileSystem.get（conf）; writePointsToFile（向量，testdata / points / file1，fs，conf）; 路径路径=新路径（testdata / clusters / part-00000）; SequenceFile.Writer writer = new SequenceFile.Writer（fs，conf， path，Text.class，Cluster.class）; for（int i = 0; i 我将它打包为myjob.jar

现在我该如何在群集上执行此操作？

我尝试了以下操作： -

hadoop jar myjob.jar com.mycode.mahout.SimpleKMeansClustering java -jar myjob.jar java -cp myjob。 jar

我得到follwing错误： - $ / b>

[root @ node1 tmp]＃hadoop jar mahoutfirst.jar com.mahout.emc.SimpleKMeansClustering 线程main中的异常java.lang.NoClassDefFoundError：org / apache / mahout / math / Vector`在java.lang.Class.forName0处使用（本地方法）$ b在java.lang.Class.forName（Class.java:270）处使用$ b $ org.apache.hadoop .util.RunJar.main（RunJar.java:201）导致：java.lang.ClassNotFoundException：org.apache.mahout.math.Vector $ b $ java.URLClassLoader $ 1.run（URLClassLoader .java：366）java.URLClassLoader $ 1.run（URLClassLoader.java:35 5）$ java.util.AccessController.doPrivileged（Native方法） ClassLoader.java:424） at java.lang.ClassLoader.loadClass（ClassLoader.java:357） ... 3 more

请指教使用mahout编写代码的正确方法是什么。 解决方案

虽然这很晚，但我面临类似的问题，下面的方法对我来说不起作用，因为我不想使用maven：

1）转到你的mahout安装目录&寻找* job.jar作为

ls / usr / lib / mahout / conf lib mahout-core-0.5 -cdh3u3-job.jar mahout-examples-0.5-cdh3u3-job.jar mahout-taste-webapp-0.5-cdh3u3.war
2）将mahout-examples-0.5-cdh3u3-job.jar复制到代码所在的目录中 3）使用jobJAR由Mahout提供的文件。它打包了所有的依赖关系。你也需要添加你的类。正如你已经使用hadoop和mahout库编译你的类，你已经准备好了你的.class文件。 $ b 4）将你的类文件添加到作业jar mahout-core-0.5- cdh3u3-job.jar在您的目录中：
jar uf mahout-core-0.5-cdh3u3-job.jar SimpleKMeansClustering.class
4）使用代码运行hadoop jar：
hadoop jar mahout-core-0.5-cdh3u3-job.jar SimpleKMeansClustering
5）在map-reduce作业结束时，您可以看到：
1.0：[1.000， 1.000]属于集群0 1.0：[2.000,1.000]属于集群0 1.0：[1.000,2000]属于集群0 1.0：[2.000,2.000]属于集群0 1.0：[3.000,3000]属于集群0 1.0：[8.000,8.000]属于集群1 1.0：[9.000,8.000]属于集群1 1.0：[8.000，9.000]属于群集1 1.0：[9.000,9.000]属于群集1

I am trying to run the hello world example in chapter 7. I created the following in eclipse and then packed it into a jar:-
package com.mycode.mahout import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; import org.apache.mahout.clustering.WeightedVectorWritable; import org.apache.mahout.clustering.kmeans.Cluster; import org.apache.mahout.clustering.kmeans.KMeansDriver; import org.apache.mahoutmon.distance.EuclideanDistanceMeasure; import org.apache.mahout.math.RandomAccessSparseVector; import org.apache.mahout.math.Vector; import org.apache.mahout.math.VectorWritable; public class SimpleKMeansClustering { public static final double[][] points = { {1, 1}, {2, 1}, {1, 2}, {2, 2}, {3, 3}, {8, 8}, {9, 8}, {8, 9}, {9, 9}}; public static void writePointsToFile(List<Vector> points, String fileName, FileSystem fs, Configuration conf) throws IOException { Path path = new Path(fileName); SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path, LongWritable.class, VectorWritable.class); long recNum = 0; VectorWritable vec = new VectorWritable(); for (Vector point : points) { vec.set(point); writer.append(new LongWritable(recNum++), vec); } writer.close(); } public static List<Vector> getPoints(double[][] raw) { List<Vector> points = new ArrayList<Vector>(); for (int i = 0; i < raw.length; i++) { double[] fr = raw[i]; Vector vec = new RandomAccessSparseVector(fr.length); vec.assign(fr); points.add(vec); } return points; } public static void main(String args[]) throws Exception { int k = 2; List<Vector> vectors = getPoints(points); File testData = new File("testdata"); if (!testData.exists()) { testData.mkdir(); } testData = new File("testdata/points"); if (!testData.exists()) { testData.mkdir(); } Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); writePointsToFile(vectors, "testdata/points/file1", fs, conf); Path path = new Path("testdata/clusters/part-00000"); SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path, Text.class, Cluster.class); for (int i = 0; i < k; i++) { Vector vec = vectors.get(i); Cluster cluster = new Cluster(vec, i, new EuclideanDistanceMeasure()); writer.append(new Text(cluster.getIdentifier()), cluster); } writer.close(); KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"), new Path("output"), new EuclideanDistanceMeasure(), 0.001, 10, true, false); SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000"), conf); IntWritable key = new IntWritable(); WeightedVectorWritable value = new WeightedVectorWritable(); while (reader.next(key, value)) { System.out.println(value.toString() + " belongs to cluster " + key.toString()); } reader.close(); } }
I packed it as myjob.jar

now how shall I execute this on my cluster ?

I tried following:-
hadoop jar myjob.jar com.mycode.mahout.SimpleKMeansClustering java -jar myjob.jar java -cp myjob.jar
I get follwing error:-
[root@node1 tmp]# hadoop jar mahoutfirst.jar com.mahout.emc.SimpleKMeansClustering Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector` at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.hadoop.util.RunJar.main(RunJar.java:201) Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector at java.URLClassLoader$1.run(URLClassLoader.java:366) at java.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 3 more
Please advice what is the right way to run the code written using mahout.
解决方案
Even though this pretty late but I was facing similar issues and following approach does work for me as i don't wanted to use maven:

1) Go to your mahout installation directory & look for *job.jar as
ls /usr/lib/mahout/ conf lib mahout-core-0.5-cdh3u3-job.jar mahout-examples-0.5-cdh3u3-job.jar mahout-taste-webapp-0.5-cdh3u3.war
2) Copy mahout-examples-0.5-cdh3u3-job.jar to the directory where code resides

3) Use the "job" JAR file provided by Mahout. It packages up all the dependencies. You need to add your classes to it too. As you have compiled your class using hadoop and mahout libraries you have your .class file ready.

4) Add your class file to the job jar mahout-core-0.5-cdh3u3-job.jar in your directory:
jar uf mahout-core-0.5-cdh3u3-job.jar SimpleKMeansClustering.class
4) Run the hadoop jar as using your code:
hadoop jar mahout-core-0.5-cdh3u3-job.jar SimpleKMeansClustering
5) At the end of your map-reduce job you can see:
1.0: [1.000, 1.000] belongs to cluster 0 1.0: [2.000, 1.000] belongs to cluster 0 1.0: [1.000, 2.000] belongs to cluster 0 1.0: [2.000, 2.000] belongs to cluster 0 1.0: [3.000, 3.000] belongs to cluster 0 1.0: [8.000, 8.000] belongs to cluster 1 1.0: [9.000, 8.000] belongs to cluster 1 1.0: [8.000, 9.000] belongs to cluster 1 1.0: [9.000, 9.000] belongs to cluster 1

更多推荐

如何在操作手册中运行mahout中的示例

本文发布于:2023-11-23 07:25:55，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1620597.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

操作手册示例如何在 mahout

上一篇： NTFS文件系统解析（二）

下一篇：合同设计和测试驱动开发

发布评论取消回复

评论列表（有 0 条评论）

最近发表

荆门网站建设的重要性

win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法

您可以尝试添加 --skip-broken 选项来解决该问题您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案

关于无线网络波动大的解决办法

Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法

VS 2019 点击页面自动定位到解决方案资源管理器目录位置

（亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法

Typora官网下载的最新版本mac10.13以下版本用不了的解决办法

成功解决ModuleNotFoundError: No module named ‘torch._C‘

MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍！

热门文章

从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止

币安API错误代码1102，未发送强制参数“时间戳”

如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题

在 Node.js 中从网络流创建 blob

使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？

使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流

如何从nodejs连接laravel>laravel

使用nodejs观看目录

如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？

FirebaseError：无法从.env加载环境变量

标签列表

文件

如何在

Python

系统

java

方法

数据

错误

windows

函数

android

linux

教程

如何使用

代码

字符串

计算机

电脑

服务器

NET

应用程序

数组

PHP

MySQL

SQL

对象

项目

程序

数据库

word