如何在Azure Databricks PySpark中执行存储过程?

编程入门 行业动态 更新时间:2024-10-25 12:21:15
本文介绍了如何在Azure Databricks PySpark中执行存储过程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我能够使用Azure Databricks中的PySpark执行简单的SQL语句,但是我想改为执行存储过程.以下是我尝试过的PySpark代码.

I am able to execute a simple SQL statement using PySpark in Azure Databricks but I want to execute a stored procedure instead. Below is the PySpark code I tried.

#initialize pyspark import findspark findspark.init('C:\Spark\spark-2.4.5-bin-hadoop2.7') #import required modules from pyspark import SparkConf, SparkContext from pyspark.sql import SparkSession from pyspark.sql import * import pandas as pd #Create spark configuration object conf = SparkConf() conf.setMaster("local").setAppName("My app") #Create spark context and sparksession sc = SparkContext.getOrCreate(conf=conf) spark = SparkSession(sc) table = "dbo.test" #read table data into a spark dataframe jdbcDF = spark.read.format("jdbc") \ .option("url", f"jdbc:sqlserver://localhost:1433;databaseName=Demo;integratedSecurity=true;") \ .option("dbtable", table) \ .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \ .load() #show the data loaded into dataframe #jdbcDF.show() sqlQueries="execute testJoin" resultDF=spark.sql(sqlQueries) resultDF.show(resultDF.count(),False)

这行不通-我该怎么办?

This doesn't work — how do I do it?

推荐答案

目前尚不支持通过JDBC连接从azure databricks运行存储过程.但是您可以选择:

Running a stored procedure through a JDBC connection from azure databricks is not supported as of now. But your options are:

  • 使用 pyodbc 库连接并执行您的过程.但是,通过使用此库,这意味着您将在所有工作线程空闲时在驱动程序节点上运行代码.有关详细信息,请参见本文. datathirst/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark

  • Use a pyodbc library to connect and execute your procedure. But by using this library, it means that you will be running your code on the driver node while all your workers are idle. See this article for details. datathirst/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark

    使用 SQL 表函数而不是过程.从某种意义上说,您可以使用在SQL查询的 FORM 子句中可以使用的任何东西.

    Use a SQL table function rather than procedures. In a sense, you can use anything that you can use in the FORM clause of a SQL query.

    由于您处于azure环境中,因此结合使用azure数据工厂(以执行过程)和azure数据块可以帮助您构建功能强大的管道.

    Since you are in an azure environment, then using a combination of azure data factory (to execute your procedure) and azure databricks can help you to build pretty powerful pipelines.

  • 更多推荐

    如何在Azure Databricks PySpark中执行存储过程?

    本文发布于:2023-10-26 09:28:22,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1529765.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:存储过程   如何在   Databricks   Azure   PySpark

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!