本文介绍了使用Window函数将前一行与当前行相加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
限时送ChatGPT账号..我有一个 spark 数据框,我想根据当前行的金额值和基于 groupid 和 id 的金额值的上一行总和来计算运行总计.让我放出df
I have a spark dataframe where, I want to calculate a running total based on current row Amount value and Previous row sum of Amount value based on groupid and id. Let me put out the df
import findspark
findspark.init()
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
import pandas as pd
sc = spark.sparkContext
data1 = {'date': {0: '2018-04-03', 1: '2018-04-04', 2: '2018-04-05', 3: '2018-04-06', 4: '2018-04-07'},
'id': {0: 'id1', 1: 'id2', 2: 'id1', 3: 'id3', 4: 'id2'},
'group': {0: '1', 1: '1', 2: '1', 3: '2', 4: '1'},
'amount': {0: 50, 1: 40, 2: 50, 3: 55, 4: 20}}
df1_pd = pd.DataFrame(data1, columns=data1.keys())
df1 = spark.createDataFrame(df1_pd)
df1.show()
+----------+---+-----+------+
| date| id|group|amount|
+----------+---+-----+------+
|2018-04-03|id1| 1| 50|
|2018-04-04|id2| 1| 40|
|2018-04-05|id1| 1| 50|
|2018-04-06|id3| 2| 55|
|2018-04-07|id2| 1| 20|
+----------+---+-----+------+
我正在寻找的输出
+----------+---+-----+------+---+
| date| id|group|amount|sum|
+----------+---+-----+------+---+
|2018-04-03|id1| 1| 50|50 |
|2018-04-04|id2| 1| 40|90 |
|2018-04-05|id1| 1| 50|140|
|2018-04-06|id3| 2| 55|55 |
|2018-04-07|id2| 1| 20|160|
+----------+---+-----+------+---+
推荐答案
窗口定义:
from pyspark.sql.window import Window
from pyspark.sql.functions import sum
w = Window.partitionBy("group").orderBy("date").rowsBetween(
Window.unboundedPreceding, # Take all rows from the beginning of frame
Window.currentRow # To current row
)
总和:
(df1.withColumn("sum", sum("amount").over(w))
.orderBy("date") # Sort for easy inspection. Not necessary
.show())
结果:
+----------+---+-----+------+---+
| date| id|group|amount|sum|
+----------+---+-----+------+---+
|2018-04-03|id1| 1| 50| 50|
|2018-04-04|id2| 1| 40| 90|
|2018-04-05|id1| 1| 50|140|
|2018-04-06|id3| 2| 55| 55|
|2018-04-07|id2| 1| 20|160|
+----------+---+-----+------+---+
这篇关于使用Window函数将前一行与当前行相加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论