如何在Apache Spark中将RDD [ParentClass]与RDD [Subclass]匹配?

编程入门 行业动态 更新时间:2024-10-28 20:29:24
本文介绍了如何在Apache Spark中将RDD [ParentClass]与RDD [Subclass]匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我必须将rdd与它的类型匹配.

I have to match an rdd with its types.

trait Fruit case class Apple(price:Int) extends Fruit case class Mango(price:Int) extends Fruit

现在将出现类型为DStream[Fruit]的dstream.它是Apple或Mango.

Now a dstream of type DStream[Fruit] is coming. It is either Apple or Mango.

如何基于子类执行操作?类似于以下内容(无效):

How to perform operation based on the subclass? Something like the below (which doesn't work):

dStream.foreachRDD{rdd:RDD[Fruit] => rdd match { case rdd: RDD[Apple] => //do something case rdd: RDD[Mango] => //do something case _ => println(rdd.count() + "<<<< not matched anything") } }

推荐答案

由于我们有RDD[Fruit],所以任何行都可以是Apple或Mango.使用foreachRDD时,每个RDD都将包含这些(以及其他可能的)类型的混合.

Since we have an RDD[Fruit], any row can be either Apple or Mango. When using foreachRDD, each RDD will contain a mix of these (and possible other) types.

要区分不同类型,我们可以使用 collect[U](f: PartialFunction[T, U]): RDD[U] (不要与collect(): Array[T]混淆,后者会返回包含RDD中的元素的列表). 通过应用函数f,此函数将返回包含所有匹配值的RDD(在这种情况下,我们可以在此处使用模式匹配).

To differentiate between the different types, we can use collect[U](f: PartialFunction[T, U]): RDD[U] (this is not to be confused with collect(): Array[T] that returns a list with the elements from the RDD). This function will return an RDD that contains all matching values by applying a function f (in this case, we can use a pattern match here).

下面是一个小的说明性示例(也将Orange添加到了水果中).

Below follows a small illustrative example (adding Orange to the fruits as well).

设置:

val ssc = new StreamingContext(spark.sparkContext, Seconds(1)) val inputData: Queue[RDD[Fruit]] = Queue() val dStream: InputDStream[Fruit] = ssc.queueStream(inputData) inputData += spark.sparkContext.parallelize(Seq(Apple(5), Apple(5), Mango(11))) inputData += spark.sparkContext.parallelize(Seq(Mango(10), Orange(1), Orange(3)))

这将创建具有两个单独的RDD的RDD[Fruit]流.

This creates a stream of RDD[Fruit] with two separate RDDs.

dStream.foreachRDD{rdd: RDD[Fruit] => val mix = rdd.collect{ case row: Apple => ("APPLE", row.price) // do any computation on apple rows case row: Mango => ("MANGO", row.price) // do any computation on mango rows //case _@row => do something with other rows (will be removed by default). } mix foreach println }

在上面的collect中,我们稍微更改每行(删除类),然后打印结果RDD.结果:

In the above collect, we change each row slightly (removing the class) and then prints the resulting RDD. Result:

// First RDD (MANGO,11) (APPLE,5) (APPLE,5) // Second RDD (MANGO,10)

可以看出,模式匹配保留并更改了包含Apple和Mango的行,同时删除了所有Orange类.

As can be seen, the pattern match have kept and changed the rows containing Apple and Mango while removing all Orange classes.

单独的RDD

如果需要,也可以如下将两个子类分成各自的RDD.然后可以在这两个RDD上执行任何计算.

If wanted, it is also possible to separate the two subclasses into their own RDDs as follows. Any computations can then be performed on these two RDDs.

val apple = rdd.collect{case row: Apple => row} val mango = rdd.collect{case row: Mango => row}

完整的示例代码

trait Fruit case class Apple(price:Int) extends Fruit case class Mango(price:Int) extends Fruit case class Orange(price:Int) extends Fruit object Test { def main(args: Array[String]) { val spark = SparkSession.builder.master("local[*]").getOrCreate() val ssc = new StreamingContext(spark.sparkContext, Seconds(1)) val inputData: Queue[RDD[Fruit]] = Queue() val inputStream: InputDStream[Fruit] = ssc.queueStream(inputData) inputData += spark.sparkContext.parallelize(Seq(Apple(5), Apple(5), Mango(11))) inputData += spark.sparkContext.parallelize(Seq(Mango(10), Orange(1), Orange(3))) inputStream.foreachRDD{rdd:RDD[Fruit] => val mix = rdd.collect{ case row: Apple => ("APPLE", row.price) // do any computation on apple rows case row: Mango => ("MANGO", row.price) // do any computation on mango rows //case _@row => do something with other rows (will be removed by default). } mix foreach println } ssc.start() ssc.awaitTermination() } }

更多推荐

如何在Apache Spark中将RDD [ParentClass]与RDD [Subclass]匹配?

本文发布于:2023-10-07 20:00:57,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1470394.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:中将   如何在   Apache   Subclass   Spark

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!