对Spark数据框/ Hive结果集进行排序(Sort a Spark data frame/ Hive result set)
我正在尝试从Hive表中检索列列表并将结果存储在spark数据帧中。
var my_column_list = hiveContext.sql(s""" SHOW COLUMNS IN $my_hive_table""")但是我无法按字母顺序对数据帧进行排序,甚至无法对显示列查询的结果进行排序。 我尝试使用sort和orderBy()。
我怎么能按字母顺序排序结果?
更新:添加了我的代码示例
import org.apache.spark.{ SparkConf, SparkContext } import org.apache.spark.sql.DataFrame import org.apache.spark.sql.hive.HiveContext val hiveContext = new HiveContext(sc) hiveContext.sql("USE my_test_db") var lv_column_list = hiveContext.sql(s""" SHOW COLUMNS IN MYTABLE""") //WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems lv_column_list.show //Works fine lv_column_list.orderBy("result").show //Error arisesI'm trying to retrieve the list of columns from a Hive table and store the result in a spark dataframe.
var my_column_list = hiveContext.sql(s""" SHOW COLUMNS IN $my_hive_table""")But I'm unable to alphabetically sort the dataframe or even the result of the show columns query. I tried using sort and orderBy().
How could I sort the result alphabetically?
Update: Added a sample of my code
import org.apache.spark.{ SparkConf, SparkContext } import org.apache.spark.sql.DataFrame import org.apache.spark.sql.hive.HiveContext val hiveContext = new HiveContext(sc) hiveContext.sql("USE my_test_db") var lv_column_list = hiveContext.sql(s""" SHOW COLUMNS IN MYTABLE""") //WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems lv_column_list.show //Works fine lv_column_list.orderBy("result").show //Error arises最满意答案
SHOW COLUMNS查询生成一个Dataframe,其中包含一个名为result的列。 如果您按此栏目订购,则可获得所需内容:
val df = hiveContext.sql(s""" SHOW COLUMNS IN $my_hive_table """) df.orderBy("result").showInstead of 'SHOW COLUMNS', I used 'DESC' and retrieved the column list with "col_name".
var lv_column_list = hiveContext.sql(s""" DESC MYTABLE""") lv_column_list.select("col_name").orderBy("col_name")更多推荐
发布评论