Spark flatmap给出了Iterator错误(Spark flatmap is giving Iterator error)

编程入门 行业动态 更新时间:2024-10-25 05:16:04
Spark flatmap给出了Iterator错误(Spark flatmap is giving Iterator error)

如果我将一个flatMap应用于JSONArray到JSONObject,我会收到错误如果我从eclipse运行我的本地(笔记本电脑),它运行正常,但是当在集群(YARN)上运行时,它会产生奇怪的错误。 Spark版本2.0.0

码:-

JavaRDD<JSONObject> rdd7 = rdd6.flatMap(new FlatMapFunction<JSONArray, JSONObject>(){ @Override public Iterable<JSONObject> call(JSONArray array) throws Exception { List<JSONObject> list = new ArrayList<JSONObject>(); for (int i = 0; i < array.length();list.add(array.getJSONObject(i++))); return list; } });

错误日志: -

java.lang.AbstractMethodError: com.pwc.spark.tifcretrolookup.TIFCRetroJob$2.call(Ljava/lang/Object;)Ljava/util/Iterator; at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) at com.pwc.spark.ElasticsearchClientLib.CommonESClient.index(CommonESClient.java:33) at com.pwc.spark.ElasticsearchClientLib.ESClient.call(ESClient.java:34) at com.pwc.spark.ElasticsearchClientLib.ESClient.call(ESClient.java:15) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

I am getting error if I apply a flatMap over a JSONArray to JSONObject If I am running on my local(laptop) from eclipse, it runs fine, but when running on cluster(YARN), it gives weird error. Spark Version 2.0.0

Code:-

JavaRDD<JSONObject> rdd7 = rdd6.flatMap(new FlatMapFunction<JSONArray, JSONObject>(){ @Override public Iterable<JSONObject> call(JSONArray array) throws Exception { List<JSONObject> list = new ArrayList<JSONObject>(); for (int i = 0; i < array.length();list.add(array.getJSONObject(i++))); return list; } });

error-log:-

java.lang.AbstractMethodError: com.pwc.spark.tifcretrolookup.TIFCRetroJob$2.call(Ljava/lang/Object;)Ljava/util/Iterator; at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) at com.pwc.spark.ElasticsearchClientLib.CommonESClient.index(CommonESClient.java:33) at com.pwc.spark.ElasticsearchClientLib.ESClient.call(ESClient.java:34) at com.pwc.spark.ElasticsearchClientLib.ESClient.call(ESClient.java:15) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218) at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

最满意答案

从Spark 2.0.0开始, flatMap调用中的函数必须返回Iterator而不是Iterable ,因为发行说明指出:

Java RDD的flatMap和mapPartitions函数用于要求函数返回Java Iterable。 它们已被更新为要求函数返回Java迭代器,所以函数不需要实现所有数据。

这是相关的Jira问题

Since Spark 2.0.0, the function inside a flatMap call must return an Iterator instead of Iterable, as the release notes state:

Java RDD’s flatMap and mapPartitions functions used to require functions returning Java Iterable. They have been updated to require functions returning Java iterator so the functions do not need to materialize all the data.

And here is the relevant Jira issue

更多推荐

本文发布于:2023-07-29 15:50:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1317597.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:出了   错误   Spark   flatmap   error

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!