本文介绍了使用列名称数组聚合Spark数据框,并保留名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想使用列名称数组作为输入来汇总Spark数据帧,同时保留列的原始名称。
I would like to aggregate a Spark data frame using an array of column names as input, and at the same time retain the original names of the columns.
df.groupBy($"id").sum(colNames:_*)这有效,但无法保留名称。在此处中找到的答案启发了我尝试过这个:
This works but fails to preserve the names. Inspired by the answer found here I unsucessfully tried this:
df.groupBy($"id").agg(sum(colNames:_*).alias(colNames:_*)) error: no `: _*' annotation allowed here它可以采用单个元素,例如
It works to take a single element like
df.groupBy($"id").agg(sum(colNames(2)).alias(colNames(2)))如何在整个阵列中做到这一点?
How can make this happen for the entire array?
推荐答案只需提供一系列具有别名的列:
Just provide an sequence of columns with aliases:
val colNames: Seq[String] = ??? val exprs = colNames.map(c => sum(c).alias(c)) df.groupBy($"id").agg(exprs.head, exprs.tail: _*)更多推荐
使用列名称数组聚合Spark数据框,并保留名称
发布评论