使用列名称数组聚合Spark数据框，并保留名称

编程入门行业动态更新时间:2024-10-09 18:19:23

本文介绍了使用列名称数组聚合Spark数据框，并保留名称的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我想使用列名称数组作为输入来汇总Spark数据帧，同时保留列的原始名称。

I would like to aggregate a Spark data frame using an array of column names as input, and at the same time retain the original names of the columns.

df.groupBy($"id").sum(colNames:_*)

这有效，但无法保留名称。在此处中找到的答案启发了我尝试过这个：

This works but fails to preserve the names. Inspired by the answer found here I unsucessfully tried this:

df.groupBy($"id").agg(sum(colNames:_*).alias(colNames:_*)) error: no `: _*' annotation allowed here

它可以采用单个元素，例如

It works to take a single element like

df.groupBy($"id").agg(sum(colNames(2)).alias(colNames(2)))

如何在整个阵列中做到这一点？

How can make this happen for the entire array?

推荐答案

只需提供一系列具有别名的列：

Just provide an sequence of columns with aliases:

val colNames: Seq[String] = ??? val exprs = colNames.map(c => sum(c).alias(c)) df.groupBy($"id").agg(exprs.head, exprs.tail: _*)

更多推荐

使用列名称数组聚合Spark数据框,并保留名称

本文发布于:2023-11-22 08:00:38，感谢您对本站的认可！

名称数组数据 Spark

评论列表（有 0 条评论）