Airflow occasionally cannot find the local python module

Airflow occasionally cannot find the local python module - python

I import my own python modules via
pip install -e path
The webserver warns with No module named and cannot import the dag directly.
However, after refreshing, it some import the dag success while sometimes fail.
If I execute it, the tasks are working without error.
How can I solve this issue?

what you must do is put all your modules in a folder and then change the environment variable with the value of your modules eg.
- name: PYTHONPATH
value: "/opt/airflow/modules/"
Remember that for your package folder you must include an __init__.py file

I finally figured out. Because I am using Jupyter Notebook, it creates .ipynb_checkpoints for me.
After I remove the folder, all dags work

Maybe you installed the module in a different python executable than the one used by airflow. To check the location, run:
airflow info
In the result, under the System info section, you'll find python_location. Copy its value and run:
<python_location> -m pip install <module>

Related

import error that doesn't replicate in different clones of the same repository

I have a big python project. When I try to execute a certain script I get an import error
something like:
from my_package import xyz
ImportError: cannot import name 'xyz'
Where my_package is an existing directory in the code base, and xyz.py is within this directory. Yes, I checked, and this directory is in the search path.
What really boggles my mind is that on a different machine, when I clone the same repository, activate the same virtual environment, and try the exact same script - I don't get this error.
I try to figure what is wrong and how to fix it, and the fact that it happens on a remote server, and does not replicate on my own computer just seem weird.
Any clue what could go wrong and where should I look for the bug?
EDIT: some additional information -
When I ran the script as it should i.e.
python ./myscript.py
It gives the above error. But when I run the problematic import command within the interpreter:
python
from my_package import xyz
I get no error whatsoever.

The three potential errors I can think of are the following:
Does the git project use submodules? If so, you need to make sure you run a git clone with the --recursive flag, since it might not have downloaded all the submodules correctly.
How are you setting up the virtual environments? Usually they shouldn't be stored within the git repository, but should instead be created on each machine. The dependencies can then be installed from the project's requirements.txt file. Sharing the same virtual environment between multiple machines could mean that it doesn't behave correctly on one of them.
Have you compared the versions of Python used between both machines. If it's a built-in module that's failing to import then perhaps your second machine isn't using a new enough version of Python.

How to use local python library

I've written a python script that uses steampy.
To that library I cloned it to a local folder, but now I don't know how to make my script use the local library instead of the installed one.
I'm coming from Angular where this is achievable by making a link with npm link between the two libraries.
Also, in my local steampy all imports referring to steampy error out, for example:
from steampy.exceptions import ApiException, ...
No name 'exceptions' in module 'steampy.exceptions' pylint(no-name-in-module)
Unable to import 'steampy.exceptions' pylint(import-error)`

If you are working in a virtualenv, you can just try:
pip install -e <path to the lib>
The -e flag makes the install editable, this means that if you do changes on the steampy repo, those will be available on the virtualenv.

Simply put the steampy folder in the same directory as your script.
steampy
main.py

how to install python packages in kubernetes pods

I have a custom airflow image that has some python packages in it. I am running it in local kubernetes cluster. I have created a DAG in my airflow that uses one of the python packages and it works totally fine.
But if I use some other python package that's not in my custom base image(imageio or anyother image), it gives me module not found error.
So I added the line "RUN pip3 install imageio==2.8.0" (or any other package) in my Dockerfile, first of all, it gave me the warning: python install warning
Now the import error is no more there but if I run my DAG it fails without outputting any logs.
So next I added the line "ENV PATH="/usr/local/airflow/.local/bin:${PATH}" but still the same thing.
I am guessing my DAG is not able to find the extra python packages that are being installed, or more clearly somehow the pods being created don't have those extra python packages with them.
However, if I do "docker run -it {image_name} bash" and type import imageio inside a python shell, it works fine.
Is there some config file in which I will need to mention the extra python packages that I want so that the pods running the DAGs will register those packages?
or is there a way to specify it in the values.yaml file?

No module named error but pip freeze shows module is instealled in virtualenv

I'm currently writing a Flask app using virtual env. When I try and run some of my python files I am getting:
ImportError: No module named <module>
In this case, the module I am trying to use is 'Click'. If I do a pip freeze or a pip list inside the virtual env, I can see the module listed there. I'm inside my virtual env when I'm trying to run the .py file too. How come pip freeze/list can find the module but my .py program cannot? Could it also be an issue with my .wsgi file?

Actually I've just figured this out, the path in my .wsgi file for "activate_this.py" was incorrect because I was trying to run it in an EC2 instance instead of my normal directory. After changing the path things seem to be working again :)

Module Not found during import in Jupyter Notebook

I have the following package (and working directory):
WorkingDirectory--
|--MyPackage--
| |--__init__.py
| |--module1.py
| |--module2.py
|
|--notebook.ipynb
In __init__.py I have:
import module1
import module2
If I try to import MyPackage into my notebook:
import MyPackage as mp
I will get ModuleNotFoundError: No module named 'module1'. But import works fine if I execute the script outside a notebook: if I create test.py in the same directory and do the same as in the notebook the import would work properly. It will work inside the notebook if I use fully qualified name in __init__.py (import MyPackage.module1).
What's the reason for different import behavior?
I have confirmed the working directory of the notebook is WorkingDirectory.
---Update---------
Exact error is:
C:\Users\Me\Documents\Working Directory\MyPackage\__init__.py in <module>()
---> 17 import module1
ModuleNotFoundError: No module named 'module1'
My problem differs from the possible duplicate:
The notebook was able to find the package, but only unable to load the module. This was inferred from substituting module1 with MyPackage.module1 worked well and suggests it may not be a problem related with PATH.
I cded into WorkingDirectory and started the server there. The working directory should be the folder containing my package.

I'm pretty sure this issue is related and the answer there will help you: https://stackoverflow.com/a/15622021/7458681
tl;dr the cwd of the notebook server is always the base path where you started the server, no matter was running import os os.getcwd() says. Use import sys sys.path.append("/path/to/your/module/folder").
I ran it with some dummy modules in the same structure as you had specified, and before modifying sys.path it wouldn't run and after it would

understand this two functions, your problem will be solved.
#list the current work dir
os.getcwd()
#change the current work dir
os.chdir()
change the path, and import module, have fun.
sometime it won't work.try this
import sys
# sys.path is a list of absolute path strings
sys.path.append('/path/to/application/app/folder')
import file
-, -

if you face module not found on jupyter environment you had to install it on jupyter environment instead of installing it on command prompt
by this command(for windows) on jupyter
!pip install module name
after that you can easily import and use it.
Whenever you want to tell jupyter that this is system command you should put ( ! ) before your command.

The best way to tackle this issue is to create a virtual env and point your kernel to that virtual environment:
Steps:
python -m venv venv
source venv/bin/activate
ipython kernel install --user --name=venv
jupyter lab
go to the jupyter lab ->kernel-->change kernel-->add the venv from the dropdown
Now if your venv has the package installed, jupyter lab can also see the package and will have no problem importing the package.

You can do that by installing the import_ipynb package.
pip install import_ipynb
Suppose you want to import B.ipynb in A.ipynb, you can do as follows:
In A.ipynb:
import import_ipynb
import B as b
Then you may use all the functions of B.ipynb in A.

My problem was that I used the wrong conda enviroment when using Vs Code.
Enter your conda enviroment
conda activate **enviroment_name**
To check where a module is installed you can enter python interactive mode by writing python or python3. Then importing cv2
import cv2
Then to see where this module is installed
print(cv2.__file__)
You will see the installed path of the module. My problem was that my vs code kernel was set to the wrong enviroment. This can be changed in the top right corner for vs code.
hope this helps

this happened to me when I moved my journal into a new directory while the Jupyter lab server was running. The import broke for that journal, but when I made a new journal in the same directory I just moved to and used the same import, it worked. To fix this I:
Went to the root dir for my project.
Searched for all folders labeled “pycache”
Deleted all “pycache” folders that were found in my root and subfolders.
Restarted Jupyter lab server
Once Jupyter lab restarts and compiles your code, the “pycache” folders will be regenerated. Also the pycache folders have two leading and trailing “_”, but stackoverflow is formatting the pycache’s without them

The best solution by far (for me) is to have a kernel for each environment you are working in. Then, with that kernel defined, all you have to do is to update this kernel's environment variables to look at your project folder where your modules are located.
Steps (using pip):
pip install ipykernel (if not installed already)
source activate <your environment name>
python -m ipykernel install --user --name <your environment name> --display-name "<a display name>" (where is the name you want to give to your kernel and is just a name used for display by jupyter.
Once you ran the command above, it will output the location of the kernel configuration files. E.g.: C:\Users\<your user name>\AppData\Roaming\jupyter\kernels\<selected environment name>. Go to this folder and open the kernel.json file.
Add the following entry to this file:
"env": {
"PYTHONPATH": "${PYTHONPATH};<the path to your project with your modules>
}
Good reference about the kernel install command here.

The reason is that your MyPackage/__init__.py is ran from the current working directory. E.g. from WorkingDirectory in this case. It means, that interpreter cannot find the module named module1 since it is not located in either current or global packages directory.
There are few workarounds for this. For example, you can temporarily override a current working directory like this
cwd = os.getcwd()
csd = __path__[0]
os.chdir(csd)
and then, after all a package initialization actions like import module1 are done, restore "caller's" working directory with os.chdir(cwd).
This is quite a bad approach as for me, since, for example, if an exception is raised on initialization actions, a working directory would not be restored. You'll need to play with try..except statements to fix this.
Another approach would be using relative imports. Refer to the documentation for more details.
Here is an example of MyPackage/__init__.py that will work for your example:
from .module1 import *
But it has few disadvantages that are found rather empirically then through the documentation. For example, you cannot write something like import .module1.
Upd:
I've found this exception to be raised even if import MyPackage is ran from usual python console. Not from IPython or Jupyter Notebook. So this seems to be not an IPython itself issue.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Airflow occasionally cannot find the local python module - python

I import my own python modules via pip install -e path The webserver warns with No module named and cannot import the dag directly. However, after refreshing, it some import the dag success while sometimes fail. If I execute it, the tasks are working without error. How can I solve this issue?

what you must do is put all your modules in a folder and then change the environment variable with the value of your modules eg. - name: PYTHONPATH value: "/opt/airflow/modules/" Remember that for your package folder you must include an init.py file

I finally figured out. Because I am using Jupyter Notebook, it creates .ipynb_checkpoints for me. After I remove the folder, all dags work

Maybe you installed the module in a different python executable than the one used by airflow. To check the location, run: airflow info In the result, under the System info section, you'll find python_location. Copy its value and run: <python_location> -m pip install <module>

Related

import error that doesn't replicate in different clones of the same repository

How to use local python library

how to install python packages in kubernetes pods

No module named error but pip freeze shows module is instealled in virtualenv

Module Not found during import in Jupyter Notebook

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Airflow occasionally cannot find the local python module - python

I import my own python modules via pip install -e path The webserver warns with No module named and cannot import the dag directly. However, after refreshing, it some import the dag success while sometimes fail. If I execute it, the tasks are working without error. How can I solve this issue?

what you must do is put all your modules in a folder and then change the environment variable with the value of your modules eg. - name: PYTHONPATH value: "/opt/airflow/modules/" Remember that for your package folder you must include an __init__.py file

I finally figured out. Because I am using Jupyter Notebook, it creates .ipynb_checkpoints for me. After I remove the folder, all dags work

Maybe you installed the module in a different python executable than the one used by airflow. To check the location, run: airflow info In the result, under the System info section, you'll find python_location. Copy its value and run: <python_location> -m pip install <module>

Related

import error that doesn't replicate in different clones of the same repository

How to use local python library

how to install python packages in kubernetes pods

No module named error but pip freeze shows module is instealled in virtualenv

Module Not found during import in Jupyter Notebook

Categories

Resources

what you must do is put all your modules in a folder and then change the environment variable with the value of your modules eg. - name: PYTHONPATH value: "/opt/airflow/modules/" Remember that for your package folder you must include an init.py file