本文介绍了无法在SparkSQL中选择每个组的前10条记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我是Spark SQL的新手.我有一个这样的数据框.
Hi I am new to spark sql. I have a data frame like this.
---+----------+----+----+----+------------------------+ |tag id|timestamp|listner| orgid |org2id|RSSI +---+----------+----+----+----+------------------------+ | 4|1496745912| 362| 4| 3| 0.60| | 4|1496745924|1901| 4| 3| 0.60| | 4|1496746030|1901| 4| 3| 0.60| | 4|1496746110| 718| 4| 3| 0.30| | 2|1496746128| 718| 4| 3| 0.60| | 2|1496746188|1901| 4| 3| 0.10|我想为每个列表器选择Spark sql中的前10个时间戳值.
I want to select for each listner top 10 timestamp values in spark sql.
我尝试了以下查询.它引发错误.
I tried the following query.It throws errors.
val avg = sqlContext.sql("select top 10 * from avg_table") // throws error. val avg = sqlContext.sql("select rssi,timestamp,tagid from avg_table order by desc limit 10") // it prints only 10 records.我想为每个列表器选择我需要采用的前10个时间戳值.任何帮助将不胜感激.
I want to select for each listner I need to take top 10 timestamp values. Any help will be appreciated.
推荐答案这行不通吗?
select rssi, timestamp, tagid from avg_table order by timestamp desc limit 10;哦,明白了.您要row_number():
select rssi, timestamp, tagid from (select a.*, row_number() over (partition by listner order by timestamp desc) as seqnum from avg_table ) a where seqnum <= 10 order by a.timestamp desc;更多推荐
无法在SparkSQL中选择每个组的前10条记录
发布评论