用例是读取一个文件并在其之上创建一个数据框,然后获取该文件的架构并存储到数据库表中.
Use case is to read a file and create a dataframe on top of it.After that get the schema of that file and store into a DB table.
出于示例目的,我只是创建一个case类并获取printschema,但是无法从中创建数据框
For example purpose I am just creating a case class and getting the printschema however I am unable create a dataframe out of it
这是示例代码
case class Employee(Name:String, Age:Int, Designation:String, Salary:Int, ZipCode:Int) val spark = SparkSession .builder() .appName("Spark SQL basic example") .config("spark.master", "local") .getOrCreate() import spark.implicits._ val EmployeesData = Seq( Employee("Anto", 21, "Software Engineer", 2000, 56798)) val Employee_DataFrame = EmployeesData.toDF val dfschema = Employee_DataFrame.schema现在dfschema是一种structype,想将其转换为两列的数据帧,如何实现该目标
Now dfschema is a structype and wanted to convert it in a dataframe of two columns , how to achieve that
推荐答案火花> = 2.4.0
为了将架构保存为字符串格式,可以使用StructType的toDDL方法.在您的情况下,DDL格式应为:
In order to save the schema into a string format you can use the toDDL method of the StructType. In your case the DDL format should be:
`Name` STRING, `Age` INT, `Designation` STRING, `Salary` INT, `ZipCode` INT保存架构后,您可以从数据库中加载它并将其用作StructType.fromDDL(my_schema),这将返回StructType的实例,您可以使用它来创建带有spark.createDataFrame的新数据框,如已经提到的@Ajay.
After saving the schema you can load it from the database and use it as StructType.fromDDL(my_schema) this will return an instance of StructType which you can use to create the new dataframe with spark.createDataFrame as @Ajay already mentioned.
记住要始终提取为模式指定案例类:
Also is useful to remember that you can always extract the schema given a case class with:
import org.apache.spark.sql.catalyst.ScalaReflection val empSchema = ScalaReflection.schemaFor[Employee].dataType.asInstanceOf[StructType]然后您可以使用empSchema.toDDL获取DDL表示形式.
And then you can get the DDL representation with empSchema.toDDL.
火花< 2.4
对于Spark< 2.4相应地使用DataType.fromDDL和schema.simpleString.另外,除了返回StructType之外,还应使用DataType实例,省略掉对StructType的强制转换:
For Spark < 2.4 use DataType.fromDDL and schema.simpleString accordingly. Also instead of returning a StructType you should use an DataType instance omitting the cast to StructType as next:
val empSchema = ScalaReflection.schemaFor[Employee].dataTypeempSchema.simpleString的示例输出:
Sample output for empSchema.simpleString:
struct<Name:string,Age:int,Designation:string,Salary:int,ZipCode:int>更多推荐
获取数据帧架构加载到元数据表
发布评论