根据列值联接

编程入门 行业动态 更新时间:2024-10-26 11:17:18
本文介绍了根据列值联接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在使用spark-sql-2.4.1v 如何进行各种连接取决于column的值

I am using spark-sql-2.4.1v how to do various joins depend on the value of column

样本数据

val data = List( ("20", "score", "school", 14 ,12), ("21", "score", "school", 13 , 13), ("22", "rate", "school", 11 ,14) ) val df = data.toDF("id", "code", "entity", "value1","value2") +---+-----+------+------+------+ | id| code|entity|value1|value2| +---+-----+------+------+------+ | 20|score|school| 14| 12| | 21|score|school| 13| 13| | 22| rate|school| 11| 14| | 21| rate|school| 13| 12|

基于代码"列值,我需要与其他各种表联接

based the "code" column value i need to do join with various other tables

val rateDs = // val data1= List( ("22", 11 ,A), ("22", 14 ,B), ("20", 13 ,C), ("21", 12 ,C), ("21", 13 ,D) )

val df = data1.toDF("id","map_code","map_val")

val df = data1.toDF("id", "map_code","map_val")

val scoreDs = // scoreTable

如果代码"列值是"rate".我需要加入rateDs 如果代码"是列值是分数"我需要加入scoreDs

if the "code" column value is "rate" i need to join with rateDs if the "code" column value is "score" i need to join with scoreDs

如何处理这类事情?有什么最佳方法可以实现这一目标?

how to handle these kind of things in spark ? any optimum way to achieve this?

预期的费率"结果字段

+---+-----+------+------+------+ | id| code|entity|value1|value2| +---+-----+------+------+------+ | 22| rate|school| A| B | | 21| rate|school| D| C |

推荐答案

例如,您可以简单地加入两次

You can simply join twice, for example

val data = List( ("20", "score", "school", 14 , 12), ("21", "score", "school", 13 , 13), ("22", "rate", "school", 11 , 14), ("21", "rate", "school", 13 , 12) ) val df = data.toDF("id", "code", "entity", "value1","value2") val data1 = List( ("22", 11 ,"A"), ("22", 14 ,"B"), ("20", 13 ,"C"), ("21", 12 ,"C"), ("21", 13 ,"D") ) val rateDF = data1.toDF("id", "map_code","map_val") df.as("a") .join(rateDF.as("b"), col("a.code") === lit("rate") && col("a.id") === col("b.id") && col("a.value1") === col("b.map_code"), "inner") .join(rateDF.as("c"), col("a.code") === lit("rate") && col("a.id") === col("c.id") && col("a.value2") === col("c.map_code"), "inner") .select(col("a.id"), col("a.code"), col("a.entity"), col("b.map_val").as("value1"), col("c.map_val").as("value2")) .show(false) +---+----+------+------+------+ |id |code|entity|value1|value2| +---+----+------+------+------+ |22 |rate|school|A |B | |21 |rate|school|D |C | +---+----+------+------+------+

好吧,这看起来有点脏,但是我不知道多列...

Well, this looks a bit dirty, but I have no idea for the multiple columns...

更多推荐

根据列值联接

本文发布于:2023-05-29 23:04:29,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/355018.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!