将数据从MongoDb导入Hdfs时出错(Error while importing data from MongoDb to Hdfs)

编程入门 行业动态 更新时间:2024-10-24 23:29:17
数据从MongoDb导入Hdfs时出错(Error while importing data from MongoDb to Hdfs)

我正在尝试通过MapReduce作业将MongoDb中的集合文档导入HDFS。 我正在使用旧的Api。 这是驱动程序代码

package my.pac;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.mongodb.hadoop.mapred.MongoInputFormat;
import com.mongodb.hadoop.util.MongoConfigUtil;

public class ImportDriver extends Configured implements Tool {

	public static void main(String[] args) throws Exception {
		int exitCode = ToolRunner.run(new ImportDriver(), args);
		System.exit(exitCode);
	}
	@Override
	public int run(String[] args) throws Exception {
		JobConf conf = new JobConf();
  MongoConfigUtil.setInputURI(conf,"mongodb://127.0.0.1:27017/SampleDb.shows");
        conf.setJarByClass(ImportDriver.class);
		conf.addResource(new Path("/usr/lib/hadoop/hadoop-1.2.1/conf/core-site.xml"));
		conf.addResource(new Path("/usr/lib/hadoop/hadoop-1.2.1/conf/hdfs-site.xml"));
		FileOutputFormat.setOutputPath(conf, new Path(args[0]));
		conf.setInputFormat(MongoInputFormat.class);
		conf.setOutputFormat(TextOutputFormat.class);
		conf.setMapperClass(ImportMapper.class);
		conf.setMapOutputKeyClass(Text.class);
		conf.setMapOutputKeyClass(Text.class);
		JobClient.runJob(conf);
		return 0;
	}
} 
  
 

这是我的Mapper代码:

package my.pac;

import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.bson.BSONObject;
import com.mongodb.hadoop.io.BSONWritable;

public class ImportMapper extends MapReduceBase implements Mapper<BSONWritable, BSONWritable, Text, Text>{

	@Override
	public void map(BSONWritable key, BSONWritable value,
			OutputCollector<Text, Text> o, Reporter arg3)
			throws IOException {
		String val = ((BSONObject) value).get("_id").toString();
		System.out.println(val);
		
		o.collect( new Text(val), new Text(val));
		
	}

} 
  
 

我在用

Ubuntu的14.0 Hadoop的1.2.1 MongoDB的-3.0.4

我添加了以下罐子:

蒙戈 - 2.9.3.jar 蒙戈 - Hadoop的核心1.3.0.jar 蒙戈-Java的应用程序,2.13.2.jar

当我运行它时,我收到这样的错误:

java.lang.Exception: java.lang.ClassCastException: com.mongodb.hadoop.io.BSONWritable cannot be cast to org.bson.BSONObject
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ClassCastException: com.mongodb.hadoop.io.BSONWritable cannot be cast to org.bson.BSONObject
	at my.pac.ImportMapper.map(ImportMapper.java:18)
	at my.pac.ImportMapper.map(ImportMapper.java:1)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745) 
  
 

我怎么能纠正这个?

I'm trying to import documents of a collection in MongoDb to HDFS through MapReduce job. I am using old Api. This is the driver code

package my.pac;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.mongodb.hadoop.mapred.MongoInputFormat;
import com.mongodb.hadoop.util.MongoConfigUtil;

public class ImportDriver extends Configured implements Tool {

	public static void main(String[] args) throws Exception {
		int exitCode = ToolRunner.run(new ImportDriver(), args);
		System.exit(exitCode);
	}
	@Override
	public int run(String[] args) throws Exception {
		JobConf conf = new JobConf();
  MongoConfigUtil.setInputURI(conf,"mongodb://127.0.0.1:27017/SampleDb.shows");
        conf.setJarByClass(ImportDriver.class);
		conf.addResource(new Path("/usr/lib/hadoop/hadoop-1.2.1/conf/core-site.xml"));
		conf.addResource(new Path("/usr/lib/hadoop/hadoop-1.2.1/conf/hdfs-site.xml"));
		FileOutputFormat.setOutputPath(conf, new Path(args[0]));
		conf.setInputFormat(MongoInputFormat.class);
		conf.setOutputFormat(TextOutputFormat.class);
		conf.setMapperClass(ImportMapper.class);
		conf.setMapOutputKeyClass(Text.class);
		conf.setMapOutputKeyClass(Text.class);
		JobClient.runJob(conf);
		return 0;
	}
} 
  
 

This is my Mapper Code:

package my.pac;

import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.bson.BSONObject;
import com.mongodb.hadoop.io.BSONWritable;

public class ImportMapper extends MapReduceBase implements Mapper<BSONWritable, BSONWritable, Text, Text>{

	@Override
	public void map(BSONWritable key, BSONWritable value,
			OutputCollector<Text, Text> o, Reporter arg3)
			throws IOException {
		String val = ((BSONObject) value).get("_id").toString();
		System.out.println(val);
		
		o.collect( new Text(val), new Text(val));
		
	}

} 
  
 

I am using

Ubuntu-14.0 Hadoop-1.2.1 MongoDb-3.0.4

I have added the following jars:

mongo-2.9.3.jar mongo-hadoop-core-1.3.0.jar mongo-java-driver-2.13.2.jar

When I run this, I am getting an error like this :

java.lang.Exception: java.lang.ClassCastException: com.mongodb.hadoop.io.BSONWritable cannot be cast to org.bson.BSONObject
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ClassCastException: com.mongodb.hadoop.io.BSONWritable cannot be cast to org.bson.BSONObject
	at my.pac.ImportMapper.map(ImportMapper.java:18)
	at my.pac.ImportMapper.map(ImportMapper.java:1)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745) 
  
 

How can I rectify this?

最满意答案

您的类路径中可能有一个过时的驱动程序,导致读取首选项设置中的冲突。

请参阅以下链接了解类似问题: https : //jira.mongodb.org/browse/JAVA-849

https://serverfault.com/questions/268953/mongodb-java-r2-5-3-nosuchmethoderror-on-dbcollection-savedbobject-in-tomca

如果这没有帮助, https://jira.talendforge.org/browse/TBD-1002建议您可能需要重新运行MongoDB或使用单独的连接。

Apparently all the jars I used are correct. The way I tried getting data out of BSONWritable was wrong. I tried to cast BSONWritable to BSONObject, which cannot be casted. Here is how I solved the problem.

String name = (String)value.getDoc().get("name");

更多推荐

本文发布于:2023-08-04 16:16:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1417645.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数据   MongoDb   Hdfs   data   importing

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!