我知道Spark Sql与Hive几乎相同。
现在我已经创建了一个表,当我在做Spark sql查询来创建表索引时,它总是给我这个错误:
SQL语句中的错误:AnalysisException:创建索引语句中的输入''期望AS接近')'不匹配
我正在使用的Spark sql查询是:
CREATE INDEX word_idx ON TABLE t (id)id的数据类型是bigint。 在此之前,我还尝试在此表的“word”列上创建表索引,它给了我同样的错误。
那么,无论如何通过Spark sql查询创建索引?
I know Spark Sql is almost same as Hive.
Now I have created a table and when I am doing Spark sql query to create the table index, it always gives me this error:
Error in SQL statement: AnalysisException: mismatched input '' expecting AS near ')' in create index statement
The Spark sql query I am using is:
CREATE INDEX word_idx ON TABLE t (id)The data type of id is bigint. Before this, I have also tried to create table index on "word" column of this table, it gave me the same error.
So, is there anyway to create index through Spark sql query?
最满意答案
实际上,通过Spark SQL查询无法做到这一点。 但是有一个名为zipWithIndex的RDD函数。 您可以将DataFrame转换为RDD ,执行zipWithIndex ,并将生成的RDD转换回DataFrame 。
请参阅此社区Wiki文章,了解完整的解决方案。
另一种方法可能是使用Spark MLLib String Indexer 。
There's no way to do this through a Spark SQL query, really. But there's an RDD function called zipWithIndex. You can convert the DataFrame to an RDD, do zipWithIndex, and convert the resulting RDD back to a DataFrame.
See this community Wiki article for a full-blown solution.
Another approach could be to use the Spark MLLib String Indexer.
更多推荐
发布评论