如何在Spark 1.6的窗口聚合中使用collect

编程入门 行业动态 更新时间:2024-10-19 12:41:51
本文介绍了如何在Spark 1.6的窗口聚合中使用collect_set和collect_list函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

在Spark 1.6.0/Scala中,是否有机会获得collect_list("colC")或collect_set("colC").over(Window.partitionBy("colA").orderBy("colB")?

In Spark 1.6.0 / Scala, is there an opportunity to get collect_list("colC") or collect_set("colC").over(Window.partitionBy("colA").orderBy("colB")?

推荐答案

鉴于您拥有dataframe作为

+----+----+----+ |colA|colB|colC| +----+----+----+ |1 |1 |23 | |1 |2 |63 | |1 |3 |31 | |2 |1 |32 | |2 |2 |56 | +----+----+----+

您可以通过执行以下操作Window功能

You can Window functions by doing the following

import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions._ df.withColumn("colD", collect_list("colC").over(Window.partitionBy("colA").orderBy("colB"))).show(false)

结果:

+----+----+----+------------+ |colA|colB|colC|colD | +----+----+----+------------+ |1 |1 |23 |[23] | |1 |2 |63 |[23, 63] | |1 |3 |31 |[23, 63, 31]| |2 |1 |32 |[32] | |2 |2 |56 |[32, 56] | +----+----+----+------------+

collect_set的结果也与此类似.但是最后一个set中的元素顺序不会像collect_list

Similar is the result for collect_set as well. But the order of elements in the final set will not be in order as with collect_list

df.withColumn("colD", collect_set("colC").over(Window.partitionBy("colA").orderBy("colB"))).show(false) +----+----+----+------------+ |colA|colB|colC|colD | +----+----+----+------------+ |1 |1 |23 |[23] | |1 |2 |63 |[63, 23] | |1 |3 |31 |[63, 31, 23]| |2 |1 |32 |[32] | |2 |2 |56 |[56, 32] | +----+----+----+------------+

如果您按以下说明删除orderBy

If you remove orderBy as below

df.withColumn("colD", collect_list("colC").over(Window.partitionBy("colA"))).show(false)

结果应为

+----+----+----+------------+ |colA|colB|colC|colD | +----+----+----+------------+ |1 |1 |23 |[23, 63, 31]| |1 |2 |63 |[23, 63, 31]| |1 |3 |31 |[23, 63, 31]| |2 |1 |32 |[32, 56] | |2 |2 |56 |[32, 56] | +----+----+----+------------+

我希望答案会有所帮助

更多推荐

如何在Spark 1.6的窗口聚合中使用collect

本文发布于:2023-11-22 05:46:41,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1616196.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:窗口   如何在   Spark   collect

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!