加入聚合列星火数据框

编程入门行业动态更新时间:2024-10-10 06:13:53

本文介绍了加入聚合列星火数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个数据框的Spark看起来像：

I have a Spark DataFrame that looks like:

| id | value | bin | |----+-------+-----| | 1 | 3.4 | 2 | | 2 | 2.6 | 1 | | 3 | 1.8 | 1 | | 4 | 9.6 | 2 |

我有一个函数˚F的取值数组，并返回一个数字。我想列添加到上面的数据帧中的每一行的新列中的值 F的的价值为所有的值具有相同斌项，即条目：

I have a function f that takes an array of values and returns a number. I want to add a column to the above data frame where the value for the new column in each row is the value of f for all the value entries that have the same bin entry, i.e:

| id | value | bin | f_value | |----+-------+-----+---------------| | 1 | 3.4 | 2 | f([3.4, 9.6]) | | 2 | 2.6 | 1 | f([2.6, 1.8]) | | 3 | 1.8 | 1 | f([2.6, 1.8]) | | 4 | 9.6 | 2 | f([3.4, 9.6]) |

因为我需要聚合所有值每豳，我不能使用 withColumn 函数来添加这个新列。什么是做到这一点，直到用户定义的聚合功能使得存在的方式进入星火的最佳方式？

Since I need to aggregate all values per bin, I cannot use the withColumn function to add this new column. What is the best way to do this until user defined aggregation functions make there way into Spark?

推荐答案

下面code未测试，但只是一个想法。

Below code is not tested, but just an idea.

在配置单元，它可以像这样使用的 collect_list 功能。

In Hive, it can be done like this using collect_list function.

val newDF = sqlContext.sql( "select bin, collect_list() from aboveDF group by bin")

下一页加入 aboveDF 和 newDF 上完事。

这是你在寻找什么？

更多推荐

加入聚合列星火数据框

本文发布于:2023-11-22 07:59:27，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1616590.html