Spark每组最多n个

编程入门 行业动态 更新时间:2024-10-09 06:26:48
本文介绍了Spark每组最多n个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

如何在 spark-sql ?

$ b中得到top-n(可以说是前10或前3) $ b

答案是畅销产品和第二畅销产品在每个类别中如下

选择产品,类别,收入FROM (SELECT product,category ,收入dense_rank() OVER(PARTITION BY category ORDER BY revenue DESC)as rank FROM productRevenue)tmp WHERE rank <= 2

Tis会给你想要的结果

How can I get the top-n (lets say top 10 or top 3) per group in spark-sql?

www.xaprb/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ provides a tutorial for general SQL. However, spark does not implement subqueries in the where clause.

解决方案

You can use the window function feature that was added in Spark 1.4 Suppose that we have a productRevenue table as shown below.

the answer to What are the best-selling and the second best-selling products in every category is as follows

SELECT product,category,revenue FROM (SELECT product,category,revenue,dense_rank() OVER (PARTITION BY category ORDER BY revenue DESC) as rank FROM productRevenue) tmp WHERE rank <= 2

Tis will give you the desired result

更多推荐

Spark每组最多n个

本文发布于:2023-11-30 15:15:09,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1650469.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:最多   每组   Spark

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!