Spark异常:写入行时任务失败

编程入门 行业动态 更新时间:2024-10-28 12:27:29
本文介绍了Spark异常:写入行时任务失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在阅读文本文件并将其转换为实木复合地板文件。我正在使用spark代码。但是当我尝试运行代码时,出现以下异常: $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ org.apache.spark.SparkException:由于阶段失败导致作业中止:阶段1.0中的任务2失败4次,最近失败:阶段1.0中丢失的任务2.3(TID 9,ukfhpdbivp12.uk.experian.local):org.apache.spark.SparkException:写入行时任务失败。 在org.apache.spark.sql.sources.InsertIntoHadoopFsRelation $ apache $ spark $ sql $ sources $ InsertIntoHadoopFsRelation $$ writeRows $ 1(commands.scala:191) at org.apache.spark。 sql.sources.InsertIntoHadoopFsRelation $$ anonfun $ insert $ 1.apply(commands.scala:160) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation $$ anonfun $ insert $ 1.apply(commands.scala:160 ) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70)在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:213)在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)在java。 util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745)引起:java.lang.ArithmeticException:/ by在parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:101)$零在parquet.hadoop.InternalParquetRecordWriter。< init>(InternalParquetRecordWriter.java:94) at parquet.hadoop.ParquetRecordWriter。< init>(ParquetRecordWriter.java:64) at parquet。在parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252) at org.apache.spark.sql.parquet.ParquetOutputWriter。< init>(newParquet.scala:83) at org.apache.spark.sql.parquet.ParquetRelation2 $$ anon $ 4.newInstance(newParquet.scala:229) at org.apache.spark.sql .sources.DefaultWriterContainer.initWriters(commands.scala:470) at org.apache.spark.sql.sources.BaseWriterContainer.executorSideSetup(commands.scala:360) at org.apache.spark.sql .sources.InsertIntoHadoopFsRelation $ apache $ spark $ sql $ sources $ InsertIntoHadoopFsRelation $$ writeRows $ 1(commands.scala:172) ... 8 more

我正在努力写作数据框以下列方式:

dataframe.write()。parquet(Path)

任何帮助都将得到高度赞赏。 另一个可能的原因是你正在达到s3请求率限制。如果你仔细看看你的日志,你可能会看到类似这样的消息:

AmazonS3Exception:请降低请求率。

虽然Spark UI会说

写入行时任务失败

我怀疑它是你遇到问题的原因,但是如果你正在运行一个高度密集的工作,它的一个可能的原因。所以我只是为了答案的完整性。

I am reading text files and converting them to parquet files. I am doing it using spark code. But when i try to run the code I get following exception

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (TID 9, ukfhpdbivp12.uk.experian.local): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:191) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArithmeticException: / by zero at parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:101) at parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94) at parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252) at org.apache.spark.sql.parquet.ParquetOutputWriter.<init>(newParquet.scala:83) at org.apache.spark.sql.parquet.ParquetRelation2$$anon$4.newInstance(newParquet.scala:229) at org.apache.spark.sql.sources.DefaultWriterContainer.initWriters(commands.scala:470) at org.apache.spark.sql.sources.BaseWriterContainer.executorSideSetup(commands.scala:360) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:172) ... 8 more

I am trying to write the dataframe in following fashion :

dataframe.write().parquet(Path)

Any help is highly appreciated.

解决方案

Another possible reason is that you're hitting s3 request rate limits. If you look closely at your logs you may see something like this

AmazonS3Exception: Please reduce your request rate.

While the Spark UI will say

Task failed while writing rows

I doubt its the reason you're getting an issue, but its a possible reason if you're running a highly intensive job. So I included just for answer's completeness.

更多推荐

Spark异常:写入行时任务失败

本文发布于:2023-11-06 07:35:26,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:异常   Spark

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!