本文介绍了pyspark程序抛出名称'spark'未定义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
未定义以下程序引发错误名称"spark"
Below program throwing error name 'spark' is not defined
Traceback (most recent call last): File "pgm_latest.py", line 232, in <module> sconf =SparkConf().set(spark.dynamicAllocation.enabled,true) .set(spark.dynamicAllocation.maxExecutors,300) .set(spark.shuffle.service.enabled,true) .set(spark.shuffle.spillpress,true) NameError: name 'spark' is not defined spark-submit --driver-memory 12g --master yarn-cluster --executor-memory 6g --executor-cores 3 pgm_latest.py代码
#!/usr/bin/python import sys import os from datetime import * from time import * from pyspark.sql import * from pyspark import SparkContext from pyspark import SparkConf sc = SparkContext() sqlCtx= HiveContext(sc) sqlCtx.sql('SET spark.sql.autoBroadcastJoinThreshold=104857600') sqlCtx.sql('SET Tungsten=true') sqlCtx.sql('SET spark.sql.shuffle.partitions=500') sqlCtx.sql('SET spark.sql.inMemoryColumnarStoragepressed=true') sqlCtx.sql('SET spark.sql.inMemoryColumnarStorage.batchSize=12000') sqlCtx.sql('SET spark.sql.parquet.cacheMetadata=true') sqlCtx.sql('SET spark.sql.parquet.filterPushdown=true') sqlCtx.sql('SET spark.sql.hive.convertMetastoreParquet=true') sqlCtx.sql('SET spark.sql.parquet.binaryAsString=true') sqlCtx.sql('SET spark.sql.parquetpression.codec=snappy') sqlCtx.sql('SET spark.sql.hive.convertMetastoreParquet=true') ## Main functionality def main(sc): if name == 'main': # Configure OPTIONS sconf =SparkConf() \ .set("spark.dynamicAllocation.enabled","true")\ .set("spark.dynamicAllocation.maxExecutors",300)\ .set("spark.shuffle.service.enabled","true")\ .set("spark.shuffle.spillpress","true") sc =SparkContext(conf=sconf) # Execute Main functionality main(sc) sc.stop()推荐答案
我认为您使用的是旧的Spark版本,而不是2.x.
I think you are using old spark version than 2.x.
代替此
spark.createDataFrame(..)在下面使用
> df = sqlContext.createDataFrame(...)更多推荐
pyspark程序抛出名称'spark'未定义
发布评论