获取数据帧架构加载到元数据表

编程入门 行业动态 更新时间:2024-10-25 00:30:37
本文介绍了获取数据帧架构加载到元数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

用例是读取一个文件并在其之上创建一个数据框,然后获取该文件的架构并存储到数据库表中.

Use case is to read a file and create a dataframe on top of it.After that get the schema of that file and store into a DB table.

出于示例目的,我只是创建一个case类并获取printschema,但是无法从中创建数据框

For example purpose I am just creating a case class and getting the printschema however I am unable create a dataframe out of it

这是示例代码

case class Employee(Name:String, Age:Int, Designation:String, Salary:Int, ZipCode:Int) val spark = SparkSession .builder() .appName("Spark SQL basic example") .config("spark.master", "local") .getOrCreate() import spark.implicits._ val EmployeesData = Seq( Employee("Anto", 21, "Software Engineer", 2000, 56798)) val Employee_DataFrame = EmployeesData.toDF val dfschema = Employee_DataFrame.schema

现在dfschema是一种structype,想将其转换为两列的数据帧,如何实现该目标

Now dfschema is a structype and wanted to convert it in a dataframe of two columns , how to achieve that

推荐答案

火花> = 2.4.0

为了将架构保存为字符串格式,可以使用StructType的toDDL方法.在您的情况下,DDL格式应为:

In order to save the schema into a string format you can use the toDDL method of the StructType. In your case the DDL format should be:

`Name` STRING, `Age` INT, `Designation` STRING, `Salary` INT, `ZipCode` INT

保存架构后,您可以从数据库中加载它并将其用作StructType.fromDDL(my_schema),这将返回StructType的实例,您可以使用它来创建带有spark.createDataFrame的新数据框,如已经提到的@Ajay.

After saving the schema you can load it from the database and use it as StructType.fromDDL(my_schema) this will return an instance of StructType which you can use to create the new dataframe with spark.createDataFrame as @Ajay already mentioned.

记住要始终提取为模式指定案例类:

Also is useful to remember that you can always extract the schema given a case class with:

import org.apache.spark.sql.catalyst.ScalaReflection val empSchema = ScalaReflection.schemaFor[Employee].dataType.asInstanceOf[StructType]

然后您可以使用empSchema.toDDL获取DDL表示形式.

And then you can get the DDL representation with empSchema.toDDL.

火花< 2.4

对于Spark< 2.4相应地使用DataType.fromDDL和schema.simpleString.另外,除了返回StructType之外,还应使用DataType实例,省略掉对StructType的强制转换:

For Spark < 2.4 use DataType.fromDDL and schema.simpleString accordingly. Also instead of returning a StructType you should use an DataType instance omitting the cast to StructType as next:

val empSchema = ScalaReflection.schemaFor[Employee].dataType

empSchema.simpleString的示例输出:

Sample output for empSchema.simpleString:

struct<Name:string,Age:int,Designation:string,Salary:int,ZipCode:int>

更多推荐

获取数据帧架构加载到元数据表

本文发布于:2023-10-16 10:32:49,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1497317.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:架构   数据表   加载   数据

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!