Spark不会从s3读取/写入信息(ResponseCode = 400,ResponseMessage = Bad Request)

编程入门 行业动态 更新时间:2024-10-17 18:16:49
本文介绍了Spark不会从s3读取/写入信息(ResponseCode = 400,ResponseMessage = Bad Request)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我实现了spark应用程序。 我创建了spark上下文: pre $ private私有JavaSparkContext createJavaSparkContext(){ SparkConf conf = new SparkConf (); conf.setAppName(test); if(conf.get(spark.master,null)== null){ conf.setMaster(local [4]); } conf.set(fs.s3a.awsAccessKeyId,getCredentialConfig()。getS3Key()); conf.set(fs.s3a.awsSecretAccessKey,getCredentialConfig()。getS3Secret()); conf.set(fs.s3a.endpoint,getCredentialConfig()。getS3Endpoint()); 返回新的JavaSparkContext(conf); }

我试图通过spark数据集API(Spark SQL) :

String s =s3a://+ getCredentialConfig()。getS3Bucket(); 数据集<行> () .csv(s +/dataset.csv); System.out.println(Read size:+ csv.count());

出现错误:

线程main中的异常com.amazonaws.services.s3.model.AmazonS3Exception:状态码:400,AWS服务:Amazon S3,AWS请求ID:1A3E8CBD4959289D,AWS错误代码:null,AWS错误消息:错误请求,S3扩展请求ID:Q1Fv8sNvcSOWGbhJSu2d3Nfgow00388IpXiiHNKHz8vI / zysC8V8 / YyQ1ILVsM2gWQIyTy1miJc =

Hadoop版本:2.7

AWS端点:s3.eu-central-1.amazonaws

(关于hadoop 2.8 - 全部工作正常)

解决方案

问题是:法兰克福不支持s3n。需要使用s3a。 此地区拥有V4 auth版本。 http://docs.aws.amazon。 com / general / latest / gr / rande.html#s3_region

欧盟(法兰克福)eu-central-1仅限版本4

这意味着需要在aws客户端上启用它。 需要添加系统属性 $ b com.amazonaws.services.s3.enableV4 - > true

conf.set(com.amazonaws.services.s3.enableV4,true); //不适用于我

在我用过的本地机器上:

System.setProperty(com.amazonaws.services.s3.enableV4,true);

为了在AWS EMR上运行,需要在spark-submit中添加params:

spark.executor.extraJavaOptions = -Dcom.amazonaws.services.s3.enableV4 = true spark.driver.extraJavaOptions = -Dcom.amazonaws。 services.s3.enableV4 = true

另外,您应该为文件系统添加类实现:

conf.set(spark.hadoop.fs.s3a.impl,org.apache.hadoop.fs.s3a.S3AFileSystem.class。的getName()); conf.set(spark.hadoop.fs.hdfs.impl,org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); conf.set(spark.hadoop.fs.file.impl,org.apache.hadoop.fs.LocalFileSystem.class.getName());

I implemented spark application. I've created spark context:

private JavaSparkContext createJavaSparkContext() { SparkConf conf = new SparkConf(); conf.setAppName("test"); if (conf.get("spark.master", null) == null) { conf.setMaster("local[4]"); } conf.set("fs.s3a.awsAccessKeyId", getCredentialConfig().getS3Key()); conf.set("fs.s3a.awsSecretAccessKey", getCredentialConfig().getS3Secret()); conf.set("fs.s3a.endpoint", getCredentialConfig().getS3Endpoint()); return new JavaSparkContext(conf); }

And I try to get data from s3 via spark dataset API (Spark SQL):

String s = "s3a://" + getCredentialConfig().getS3Bucket(); Dataset<Row> csv = getSparkSession() .read() .option("header", "true") .csv(s + "/dataset.csv"); System.out.println("Read size :" + csv.count());

There is an error:

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 1A3E8CBD4959289D, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: Q1Fv8sNvcSOWGbhJSu2d3Nfgow00388IpXiiHNKHz8vI/zysC8V8/YyQ1ILVsM2gWQIyTy1miJc=

Hadoop version: 2.7

AWS endpoint: s3.eu-central-1.amazonaws

(On hadoop 2.8 - all works fine)

解决方案

The problem is: Frankfurt doesn't support s3n. Need to use s3a. And this region has V4 auth version. docs.aws.amazon/general/latest/gr/rande.html#s3_region

EU (Frankfurt) eu-central-1 Version 4 only

It mean's need to enable it on aws client. Need to add system property

com.amazonaws.services.s3.enableV4 -> true

conf.set("com.amazonaws.services.s3.enableV4", "true");//doesn't work for me

On local machine I've used:

System.setProperty("com.amazonaws.services.s3.enableV4", "true");

For running on AWS EMR need to add params to spark-submit:

spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true

Additionally you should add class implementation for file systems:

conf.set("spark.hadoop.fs.s3a.impl", org.apache.hadoop.fs.s3a.S3AFileSystem.class.getName()); conf.set("spark.hadoop.fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); conf.set("spark.hadoop.fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());

更多推荐

Spark不会从s3读取/写入信息(ResponseCode = 400,ResponseMessage = Bad Request)

本文发布于:2023-11-24 04:48:51,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:信息   ResponseCode   Spark   Bad   Request

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!