在 pyspark.sql.functions.max().over(window) 上使用 .where() 在 Spark 2.4 上抛出 Java 异常

编程入门行业动态更新时间:2024-10-12 10:24:20

本文介绍了在 pyspark.sql.functions.max().over(window) 上使用 .where() 在 Spark 2.4 上抛出 Java 异常的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

限时送ChatGPT账号..

我关注了 StackOverflow 关于返回由另一列分组的列的最大值，并出现意外的 Java 异常.

I followed a post on StackOverflow about returning the maximum of a column grouped by another column, and got an unexpected Java exception.

这是测试数据:

import pyspark.sql.functions as f
data = [('a', 5), ('a', 8), ('a', 7), ('b', 1), ('b', 3)]
df = spark.createDataFrame(data, ["A", "B"])
df.show()

+---+---+
|  A|  B|
+---+---+
|  a|  5|
|  a|  8|
|  a|  7|
|  b|  1|
|  b|  3|
+---+---+

以下是据称适用于其他用户的解决方案:

Here is the solution that allegedly works for other users:

from pyspark.sql import Window
w = Window.partitionBy('A')
df.withColumn('maxB', f.max('B').over(w))\
    .where(f.col('B') == f.col('maxB'))\
    .drop('maxB').show()

应该产生这个输出:

#+---+---+
#|  A|  B|
#+---+---+
#|  a|  8|
#|  b|  3|
#+---+---+

相反，我得到:

java.lang.UnsupportedOperationException: Cannot evaluate expression: max(input[2, bigint, false]) windowspecdefinition(input[0, string, true], specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()))

我只在 Databricks 上的 Spark 2.4 上尝试过这个.我尝试了等效的 SQL 语法并得到了相同的错误.

I have only tried this on Spark 2.4 on Databricks. I tried the equivalent SQL syntax and got the same error.

推荐答案

Databricks Support 能够在 Spark 2.4 上重现该问题，但在早期版本上无法重现.显然，这是由于制定物理计划的方式不同(如果需要，我可以发布他们的回复).计划进行修复.

Databricks Support was able to reproduce the issue on Spark 2.4 but not on earlier versions. Apparently, it arises from a difference in the way the physical plan is formulated (I can post their response if requested). A fix is planned.

与此同时，这里是原始问题的另一种解决方案，不会成为 2.4 版问题的牺牲品:

Meanwhile, here is one alternative solution to the original problem that does not fall prey to the version 2.4 issue:

df.withColumn("maxB", f.max('B').over(w)).drop('B').distinct().show()

+---+----+
|  A|maxB|
+---+----+
|  b|   3|
|  a|   8|
+---+----+

这篇关于在 pyspark.sql.functions.max().over(window) 上使用 .where() 在 Spark 2.4 上抛出 Java 异常的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

更多推荐

[db:关键词]

本文发布于:2023-04-18 20:53:37，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/947138.html

抛出异常 functions max pyspark

上一篇：计蒜客 A1596.蒜头君王国概率计算(dp)
下一篇：【codevs4355】王的对决（简单数论）莫比乌斯反演

发布评论取消回复

评论列表（有 0 条评论）

在 pyspark.sql.functions.max().over(window) 上使用 .where() 在 Spark 2.4 上抛出 Java 异常

问题描述

推荐答案

发布评论取消回复

最近发表

热门文章

标签列表