Airflow ModuleNotFoundError:没有名为“pyspark"的模块

编程入门 行业动态 更新时间:2024-10-18 01:38:52
本文介绍了Airflow ModuleNotFoundError:没有名为“pyspark"的模块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我在我的机器上安装了 Airflow,它运行良好,我也有一个本地火花(也可以运行).我想使用气流来编排两个火花任务:task_spark_datatransform >>task_spark_model_reco.与这两个任务相关的两个 pyspark 模块经过测试,在 spark 下运行良好.

I installed Airflow on my machine which works well and I have a local spark also (which is operational too). I want to use airflow to orchestrate two sparks tasks: task_spark_datatransform >> task_spark_model_reco. The two pyspark modules associated to these two tasks are tested and work well under spark.

我还使用 bashOperator* 创建了一个非常简单的 Airflow Dag 来运行每个 spark 任务.例如,对于任务 task_spark_datatransform 我有:

I also create a very simple Airflow Dag using bashOperator* to run each spark task. For example, for the task task_spark_datatransform I have:

task_spark_datatransform = BashOperator (task_id = 'task_spark_datatransform', bash_command = spark_home + 'spark-submit --master local [*]' + srcDir + 'dataprep.py'), where, in my case, spark_home = '/usr/bin/spark/bin/'

*如关于同一主题的几个严肃教程所示.

*As indicated in several serious tutorials on this same subject.

问题:为什么 Airflow 不能识别 pyspark?

Question: Why doesn't Airflow recognize pyspark?

日志:

[2019-09-20 10:21:21 +0200] [5945] [INFO] Worker exiting (pid: 5945) [2019-09-20 10:21:51 +0200] [5554] [INFO] Handling signal: ttin [2019-09-20 10:21:51 +0200] [6128] [INFO] Booting worker with pid: 6128 [2019-09-20 10:21:51,609] {__init__.py:51} INFO - Using executor SequentialExecutor [2019-09-20 10:21:52,021] {__init__.py:305} INFO - Filling up the DagBag from /home/ach/airflow/dags [2019-09-20 10:21:52,026] {__init__.py:416} ERROR - Failed to import: /home/ach/airflow/dags/spark_af.py Traceback (most recent call last): File "/home/ach/airflow/lib/python3.7/site-packages/airflow/models/__init__.py", line 413, in process_file m = imp.load_source(mod_name, filepath) File "/home/ach/airflow/lib/python3.7/imp.py", line 171, in load_source module = _load(spec) File "<frozen importlib._bootstrap>", line 696, in _load File "<frozen importlib._bootstrap>", line 677, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 728, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/home/ach/airflow/dags/spark_af.py", line 3, in <module> import dataprep File "/home/ach/airflow/dags/dataprep.py", line 2, in <module> from pyspark.sql import SparkSession ModuleNotFoundError: No module named 'pyspark'

推荐答案

看起来你缺少pyspark:

运行以下命令:

pip install pyspark

更多推荐

Airflow ModuleNotFoundError:没有名为“pyspark"的模块

本文发布于:2023-11-23 20:01:23,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1622715.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:模块   ModuleNotFoundError   Airflow   quot   pyspark

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!