在 Datastax Spark 提交中使用 Scala 从 S3 存储桶读取文件到 Spark 数据帧,给出 AWS 错误消息:错误请求

编程入门 行业动态 更新时间:2024-10-25 18:25:42
本文介绍了在 Datastax Spark 提交中使用 Scala 从 S3 存储桶读取文件到 Spark 数据帧,给出 AWS 错误消息:错误请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时送ChatGPT账号..

我正在尝试读取位于孟买地区的 s3 存储桶上的 CSV 文件.我正在尝试使用 datastax dse spark-submit 读取文件.

I'm trying to read CSV files which are on s3 bucket which is located in Mumbai Region.I'm trying to read the files using datastax dse spark-submit.

我尝试将 hadoop-aws 版本更改为各种其他版本.目前,hadoop-aws 版本为 2.7.3

I tried changing hadoop-aws version to various other versions. Currently, hadoop-aws version is 2.7.3

spark.sparkContext.hadoopConfiguration.set("com.amazonaws.services.s3.enableV4", "true")

spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "s3.ap-south-1.amazonaws")

spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", accessKeyId)

spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", secretAccessKey)

spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

val df = spark.read.csv("s3a://bucket_path/csv_name.csv")

执行后,以下是我得到的错误,

Upon Executing, Following is the error which I'm getting,

线程main"中的异常com.amazonaws.services.s3.model.AmazonS3Exception:状态代码:400,AWS 服务:Amazon S3,AWS 请求 ID:8C7D34A38E359FCE,AWS 错误代码:null,AWS 错误消息:错误请求在com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)在com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)在com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)在com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)在com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)在com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)在org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)在org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)在 org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) 在org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)在 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)在 org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) 在org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) 在org.apache.spark.sql.execution.datasources.DataSource$$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:616)在org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)在org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)在scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)在scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)在 scala.collection.immutable.List.foreach(List.scala:392) 在scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)在 scala.collection.immutable.List.flatMap(List.scala:355) 在org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:349)在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)在org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533)在org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412)

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 8C7D34A38E359FCE, AWS Error Code: null, AWS Error Message: Bad Request at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.spark.sql.execution.datasources.DataSource$$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:616) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:355) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:349) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412)

推荐答案

未应用您的签名 V4 选项.请参阅此

Your signature V4 option is not applied. See This

在运行 spark-submit 或 spark-shell 时添加 java 选项.

Add the java option when you run the spark-submit or spark-shell.

spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true
spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true

或者,设置系统属性如:

Or, set the system property such as:

System.setProperty("com.amazonaws.services.s3.enableV4", "true");

这篇关于在 Datastax Spark 提交中使用 Scala 从 S3 存储桶读取文件到 Spark 数据帧,给出 AWS 错误消息:错误请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

更多推荐

[db:关键词]

本文发布于:2023-04-18 21:02:08,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/947882.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:错误   消息   文件   数据   Spark

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!