在PySpark中爆炸

编程入门 行业动态 更新时间:2024-10-19 10:25:22
本文介绍了在PySpark中爆炸的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我想从包含单词列表的DataFrame转换为每个单词都在其自己行中的DataFrame.

I would like to transform from a DataFrame that contains lists of words into a DataFrame with each word in its own row.

如何在DataFrame的列上爆炸?

How do I do explode on a column in a DataFrame?

这是我的一些尝试的示例,您可以在其中取消注释每个代码行并获取以下注释中列出的错误.我在带有Spark 1.6.1的Python 2.7中使用PySpark.

Here is an example with some of my attempts where you can uncomment each code line and get the error listed in the following comment. I use PySpark in Python 2.7 with Spark 1.6.1.

from pyspark.sql.functions import split, explode DF = sqlContext.createDataFrame([('cat \n\n elephant rat \n rat cat', )], ['word']) print 'Dataset:' DF.show() print '\n\n Trying to do explode: \n' DFsplit_explode = ( DF .select(split(DF['word'], ' ')) # .select(explode(DF['word'])) # AnalysisException: u"cannot resolve 'explode(word)' due to data type mismatch: input to function explode should be array or map type, not StringType;" # .map(explode) # AttributeError: 'PipelinedRDD' object has no attribute 'show' # .explode() # AttributeError: 'DataFrame' object has no attribute 'explode' ).show() # Trying without split print '\n\n Only explode: \n' DFsplit_explode = ( DF .select(explode(DF['word'])) # AnalysisException: u"cannot resolve 'explode(word)' due to data type mismatch: input to function explode should be array or map type, not StringType;" ).show()

请咨询

推荐答案

explode和split是SQL函数.两者都在SQL Column上运行. split将Java正则表达式作为第二个参数.如果要在任意空格上分离数据,则需要这样的东西:

explode and split are SQL functions. Both operate on SQL Column. split takes a Java regular expression as a second argument. If you want to separate data on arbitrary whitespace you'll need something like this:

df = sqlContext.createDataFrame( [('cat \n\n elephant rat \n rat cat', )], ['word'] ) df.select(explode(split(col("word"), "\s+")).alias("word")).show() ## +--------+ ## | word| ## +--------+ ## | cat| ## |elephant| ## | rat| ## | rat| ## | cat| ## +--------+

更多推荐

在PySpark中爆炸

本文发布于:2023-11-22 05:47:37,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1616197.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:PySpark

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!