我想列出 Azure Databricks 中每个数据库中的所有表.
i want to list all the tables in every database in Azure Databricks.
所以我希望输出看起来像这样:
so i want the output to look somewhat like this:
Database | Table_name Database1 | Table_1 Database1 | Table_2 Database1 | Table_3 Database2 | Table_1 etc..这是我目前所拥有的:
from pyspark.sql.types import * DatabaseDF = spark.sql(f"show databases") df = spark.sql(f"show Tables FROM {DatabaseDF}") #df = df.select("databaseName") #list = [x["databaseName"] for x in df.collect()] print(DatabaseDF) display(DatabaseDF) df = spark.sql(f"show Tables FROM {schemaName}") df = df.select("TableName") list = [x["TableName"] for x in df.collect()] ## Iterate through list of schema for x in list: ### INPUT Required: Change for target table tempTable = x df2 = spark.sql(f"SELECT COUNT(*) FROM {schemaName}.{tempTable}").collect() for x in df2: rowCount = x[0] if rowCount == 0: print(schemaName + "." + tempTable + " has 0 rows")但我不太清楚结果.
推荐答案有一个 catalog 属性可以触发会话,可能是您正在寻找的内容:
There is a catalog property to spark session, probably what you are looking for :
spark.catalog.listDatabases() spark.catalog.listTables("database_name")listDatabases 返回您拥有的数据库列表.listTables 返回某个数据库名称的表列表.
listDatabases returns the list of database you have. listTables returns for a certain database name, the list of tables.
你可以做这样的事情,例如:
You can do something like this for example :
[ (table.database, table.name) for database in spark.catalog.listDatabases() for table in spark.catalog.listTables(database.name) ]获取数据库和表的列表.
to get the list of database and tables.
(thx @Alex Ott)即使这个解决方案工作正常,它也很慢.直接使用一些 sql 命令,例如 show databases 或 show tables in ... 应该可以更快地完成工作.
(thx @Alex Ott) even if this solution works fine, it is quite slow. Using directly some sql commands like show databases or show tables in ... should do the work faster.
更多推荐
如何查看Databricks中的所有数据库和表
发布评论