Airflow Browser UI not detecting imported Module

Airflow Browser UI not detecting imported Module - python

Through a Linux server, I am running Airflow with Docker compose. Other DAGs created with .py scripts work fine. Other python scripts creating DAGS that import different modules will run fine and show up in the DAG list.
However, importing the modules below within my Launch.py results in a Broken DAG: [/usr/local/airflow/dags/ScanLaunchDemo.py] No module named 'tenable_io'.
Ironically, Launch.py runs perfectly fine within the Linux instance and within a python terminal (the 'no tenable_io' error does not show). It seems like only Airflow cannot 'detect' the module below.
from tenable_io.client import TenableIOClient
from tenable_io.api.scans import ScanCreateRequest
from tenable_io.api.models import ScanSettings
Running pip3 list will show that tenable-io is installed.
Thanks for the helps peeps

If you are using Docker Compose, then in order to make the module available to airflow, you need to use custom image where you installed your own additional dependencies. We just updated the documentation make it clearer why you need it, and how to do it, including examples:
https://airflow.apache.org/docs/docker-stack/build.html

Luckily for me, I rewrote the API request such that it did not require a module that was not already built into Airflow. Jarek seems to be correct however, that for modules that don't come installed within airflow, a custom image will need to be created.

Related

MWAA - Airflow - PythonVirtualenvOperator requires virtualenv

I am using AWS's MWAA service (2.2.2) to run a variety of DAGs, most of which are implemented with standard PythonOperator types. I bundle the DAGs into an S3 bucket alongside any shared requirements, then point MWAA to the relevant objects & versions. Everything runs smoothly so far.
I would now like to implement a DAG using the PythonVirtualenvOperator type, which AWS acknowledge is not supported out of the box. I am following their guide on how to patch the behaviour using a custom plugin, but continue to receive an error from Airflow, shown at the top of the dashboard in big red writing:
DAG Import Errors (1)
... ...
AirflowException: PythonVirtualenvOperator requires virtualenv, please install it.
I've confirmed that the plugin is indeed being picked up by Airflow (I see it referenced in the admin screen), and for the avoidance of doubt I am using the exact code provided by AWS in their examples for the DAG. AWS's documentation on this is pretty light and I've yet to stumble across any community discussion for the same.
From AWS's docs, we'd expect the plugin to run at startup prior to any DAGs being processed. The plugin itself appears to effectively rewrite the venv command to use the pip-installed version, rather than that which is installed on the machine, however I've struggled to verify that things are happening in the order I expect. Any pointers on debugging the instance's behavior would be very much appreciated.
Has anyone faced a similar issue? Is there a gap in the MWAA documentation that needs addressing? Am I missing something incredibly obvious?
Possibly related, but I do see this warning in the scheduler's logs, which may indicate why MWAA is struggling to resolve the dependency?
WARNING: The script virtualenv is installed in '/usr/local/airflow/.local/bin' which is not on PATH.

Airflow uses shutil.which to look for virtualenv. The installed virtualenv via requirements.txt isn't on the PATH. Adding the path to virtualenv to PATH solves this.
The doc here is wrong https://docs.aws.amazon.com/mwaa/latest/userguide/samples-virtualenv.html
import os
from airflow.plugins_manager import AirflowPlugin
import airflow.utils.python_virtualenv
from typing import List
def _generate_virtualenv_cmd(tmp_dir: str, python_bin: str, system_site_packages: bool) -> List[str]:
cmd = ['python3','/usr/local/airflow/.local/lib/python3.7/site-packages/virtualenv', tmp_dir]
if system_site_packages:
cmd.append('--system-site-packages')
if python_bin is not None:
cmd.append(f'--python={python_bin}')
return cmd
airflow.utils.python_virtualenv._generate_virtualenv_cmd=_generate_virtualenv_cmd
#This is the added path code
os.environ["PATH"] = f"/usr/local/airflow/.local/bin:{os.environ['PATH']}"
class VirtualPythonPlugin(AirflowPlugin):
name = 'virtual_python_plugin'

AWS lambda, Unable to import a module(in package) (written in Cython)

I am trying to import python-dependency-injector package within lambda function.
To run lambda function, I have zipped my project contents together with packages in /opt/anaconda3/envs/.../python3.9/site-packages to deploy a fast-api(with mangum) app.
Use of any other packages works fine, but weirdly using this package (written in Cython) results in below error :\
{"errorMessage": "Unable to import module 'main': cannot import name 'providers' from 'dependency_injector' (/var/task/dependency_injector/init.py)", "errorType": "Runtime.ImportModuleError", "requestId": "5c63c01b-5be1-4481-adf8-691167cb54bd", "stackTrace": []}
I am not sure whether this is a known issue when using packages written in Cython or there are any workarounds.
I have also tried manually importing the package (that was in site-packages) within a newly created EC2 (AWS linux) and also my local (Mac)
Without importing this specific package, everything else works fine.
Would really appreciate if someone can guide me how to resolve this issue.

import matplotlib failed while deploying my model in AWS sagemaker

I have deployed my AWS model successfully.
but while testing i am getting runtime Error: "import matplotlib.pyplot as plt" . I think it is due to pytorch framework version i used(framework_version=1.2.0). I am facing the same issue when i use higher versions as well.
PyTorchModel(model_data=model_artifact,
role = role,
framework_version=1.2.0,
entry_point='predict.py',
predictor_cls=ImagePredictor)
I have other issue when i use version=1.0.0. i.e i am not able to import libraries from sub directories and deployment itself is failing.
Eg: i have some code files in "Code" directory.
from Code.CTModel import NetWork ---> **this line will fail as "No module named Code" when i use version=1.0.0**
Ultimately i want to how to use/import libraries which are written under sub-directories.

It sounds like you want to inject some additional code libraries into the SageMaker PyTorch serving container. You might have to dig into the source code for how the PyTorch serving container is built to further customize it: https://github.com/aws/sagemaker-pytorch-inference-toolkit, or build your own image.
Digging into that source code a bit, I see that the container has enabled the importing of arbitrary code, but only when "multi-model mode" is enabled. Can you verify that the code exists under a directory "code" in your model directory and that "multi-model mode" is enabled?
def initialize(self, context):
# Adding the 'code' directory path to sys.path to allow importing user modules when multi-model mode is enabled.
if (not self._initialized) and ENABLE_MULTI_MODEL:
code_dir = os.path.join(context.system_properties.get("model_dir"), 'code')
sys.path.append(code_dir)
self._initialized = True
Reference: https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/c4e7abc49aeebc2f9b6035337548a90e4330113d/src/sagemaker_pytorch_serving_container/handler_service.py#L47
If this all seems complicated to you (it is), you might want to look into some standardized formats for serializing your PyTorch model such as https://onnx.ai/. I'd love to learn more about what you're trying to do here sometime if you reach out to me at contact#modelzoo.dev. I'm beta-testing a platform that enables deployment in a single line of code and would love to test it out here.

Let me make my query little bit high level: I have predict.py, jupyter notebook , Code(Direcotry),Evoludation(directory) and other .py files in source_dir.
--Code
--ResNet.py
--Densenet.py
--DataLoader.py
--Evaluation
--Evaluation.py
--predict.py
--CT_Code.ipynb
When i execute the predict file from jupyter notebook in my local system, all the modules are imported properly and everything is working fine. But when i am deploying same thing in sagemaker notebook facing issues as mentioned in my question.(Not able to import libraries from Code directory and some basic modules like imageio,PIL, Matplotlib)

In Shiny, Python Virtual environment PERMISSION DENIED (Error 126)

We are building a User Interface APP (predicting a continuous variable through a machine learning model) through R Shiny.
Since we built the machine learning model in Python3 sklearn module, we hope that we could write python codes in R Shiny to call that model and corresponding functions.
We used R-package "reticulate" to create virtual python environment where it would save python packages, and through which we could call python3 functions.
We created the virtual environment using the following line of code (the function in R package "reticulate")
use_virtualenv("env", required = TRUE)
Where we indeed have the following directory "env/bin" in which there are python and python3 to execute.
The Shiny APP worked perfectly locally. HOWEVER, when we made attempts to publish, it gave the following error (please see picture) (after the APP was successfully deployed and on shinyapps.io, it said the APP was running).
The issue was "Error 126", which denied the permission for our APP to access the virtual environment. This issue had no previous (similar) case on Stackoverflow, and therefore we spent a long time to debug (issue not resolved).
If anyone knows how to solve this problem, would it be possible for you to kindly mark your solution tips below? (We hope your solution would not modify our basic layout, i.e. "calling python-made model in Shiny and publish through Shiny") We really appreciate your efforts to help us out!
Thank you so much!

Could you share the code where actual call to python script is being made? is it a python module function that you are calling from Rshiny? what does the python module/function do and return? I have used reticulate inside shiny to call Python scripts and it works fine. Didn't require to set the environment. Just provide the source to python script and call it like any other R function.

If you're trying to deploy to shinyapps.io, you may need to set the RETICULATE_PYTHON env variable so that reticulate uses the correct version of Python when running your app:
VIRTUALENV_NAME = 'env'
Sys.setenv(RETICULATE_PYTHON = paste0('/home/shiny/.virtualenvs/',
VIRTUALENV_NAME,
'/bin/python'))
Full example here demonstrates one method for configuring a Shiny + reticulate app so that it can easily run both locally and on shinyapps.io.

Unable to launch specific module properly

I am installing specific module from Github, but i am having problems using it's functions.
These are steps i took to install module:
Downloaded the zip file and unzipped it normally.
Launched setup.py with install option. (python setup.py install)
The module didn't have any documentation, so i checked setup.py and it's name was "Exchange".
I tried importing the module ( import Exchange ) and it worked.
Now since i couldn't find any documentation, i viewed exchange.py from github ( /Exchange/exchange.py ).
I tried using one of the functions, didn't work.
Then i realized that i was in folder file, so:
i imported exchange.py itself ( from Exchange import exchange).
Now from exchange.py i imported Exchange class (from Exchange.exchange import Exchange ).
I tried to call the class, (Exchange), but i needed to specify 7 arguments for __init__.
and again, i realized that i needed to launch the exchange.py itself, so i would have no problems, this is where i got stuck, i couldn't launch it.
How could i launch the module properly? Am i right that i need to start from exchange.py? If so how could i launch it? If not then what's the proper entry point?

Fixed the problem, Thanks to #metatoaster.
The entry of module, is main function from exchange.py, after calling it, module will be started.
So to start application, you need to import exchange.py from Exchange package, and from exchange.py, import and call main function.
from Exchange.exchange import main
main()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.