加入一个数据帧spark java

编程入门 行业动态 更新时间:2024-10-12 22:33:50
本文介绍了加入一个数据帧spark java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

首先,感谢您抽出时间阅读我的问题。

First of all, thank you for the time in reading my question.

我的问题如下:在Spark with Java中,我在数据框中加载数据两个csv文件。

My question is the following: In Spark with Java, i load in two dataframe the data of two csv files.

这些数据框将包含以下信息。

These dataframes will have the following information.

Dataframe Airport

Dataframe Airport

Id | Name | City ----------------------- 1 | Barajas | Madrid

Dataframe airport_city_state

Dataframe airport_city_state

City | state ---------------- Madrid | España

我想加入这两个数据帧,看起来像这样:

I want to join these two dataframes so that it looks like this:

数据帧结果

Id | Name | City | state -------------------------- 1 | Barajas | Madrid | España

其中 dfairport.city = dfaiport_city_state.city

但是我无法用语法来澄清所以我可以正确地进行连接。关于我如何创建变量的一些代码:

But I can not clarify with the syntax so I can do the join correctly. A little code of how I have created the variables:

// Load the csv, you have to specify that you have header and what delimiter you have Dataset <Row> dfairport = Load.Csv (sqlContext, data_airport); Dataset <Row> dfairport_city_state = Load.Csv (sqlContext, data_airport_city_state); // Change the name of the columns in the csv dataframe to match the columns in the database // Once they match the name we can insert them Dfairport .withColumnRenamed ("leg_key", "id") .withColumnRenamed ("leg_name", "name") .withColumnRenamed ("leg_city", "city") dfairport_city_state .withColumnRenamed("city", "ciudad") .withColumnRenamed("state", "estado");

推荐答案

首先,非常感谢您的回复。

First, thank you very much for your response.

我已经尝试了我的两个解决方案但没有一个工作,我收到以下错误:方法dfairport_city_state(String)未定义类型ETL_Airport

I have tried both of my solutions but none of them work, I get the following error: The method dfairport_city_state (String) is undefined for the type ETL_Airport

我无法访问数据框的特定列以进行加入。

I can not access a specific column of the dataframe for join.

编辑:已经有了为了加入,我把解决方案放在这里以防其他人帮忙;)

Already got to do the join, I put here the solution in case someone else helps;)

感谢您的一切和最好的问候

Thanks for everything and best regards

//Join de tablas en las que comparten ciudad Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport.col("leg_city").equalTo(dfairport_city_state.col("city")));

更多推荐

加入一个数据帧spark java

本文发布于:2023-06-11 03:32:23,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/625761.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数据   spark   java

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!