how to install python packages in kubernetes pods - python

I have a custom airflow image that has some python packages in it. I am running it in local kubernetes cluster. I have created a DAG in my airflow that uses one of the python packages and it works totally fine.
But if I use some other python package that's not in my custom base image(imageio or anyother image), it gives me module not found error.
So I added the line "RUN pip3 install imageio==2.8.0" (or any other package) in my Dockerfile, first of all, it gave me the warning: python install warning
Now the import error is no more there but if I run my DAG it fails without outputting any logs.
So next I added the line "ENV PATH="/usr/local/airflow/.local/bin:${PATH}" but still the same thing.
I am guessing my DAG is not able to find the extra python packages that are being installed, or more clearly somehow the pods being created don't have those extra python packages with them.
However, if I do "docker run -it {image_name} bash" and type import imageio inside a python shell, it works fine.
Is there some config file in which I will need to mention the extra python packages that I want so that the pods running the DAGs will register those packages?
or is there a way to specify it in the values.yaml file?

Related

Airflow occasionally cannot find the local python module

I import my own python modules via
pip install -e path
The webserver warns with No module named and cannot import the dag directly.
However, after refreshing, it some import the dag success while sometimes fail.
If I execute it, the tasks are working without error.
How can I solve this issue?
what you must do is put all your modules in a folder and then change the environment variable with the value of your modules eg.
- name: PYTHONPATH
value: "/opt/airflow/modules/"
Remember that for your package folder you must include an __init__.py file
I finally figured out. Because I am using Jupyter Notebook, it creates .ipynb_checkpoints for me.
After I remove the folder, all dags work
Maybe you installed the module in a different python executable than the one used by airflow. To check the location, run:
airflow info
In the result, under the System info section, you'll find python_location. Copy its value and run:
<python_location> -m pip install <module>

How do I install python dependency modules through bamboo

I am trying to run a python program through bamboo.
How do I install python dependency modules through bamboo
I need to install some python modules like flask , xldr etc.
You have two options:
Remote or log into the Bamboo agent and manually install the modules. This is a one time install and then they will be there for the task to use in the future.
Run the job using a Docker host instead of the local agent. Then you can specify all the dependencies in the Docker image that is used to build (e.g., Python version, imports).

root:OSError while attempting to symlink the latest log directory - Apache Airflow

I am using Windows 10, python 3.6.4, and apache-airflow 1.10.2
I am trying to use Apache-Airflow for creating a workflow for my data pipeline. When I import airflow (the first time in a new session), I keep running into the following error:
import airflow
**WARNING:root:OSError while attempting to symlink the latest log directory**
import airflow
(No Error)
When I try to import aiflow again, I do not get this error. I checked my config file and it is pointing to the right directory (C:\Users\user\airflow, where my logs folder is).
To solve this error, when it comes to creating the symbolic link to the log folder, I am not sure how to approach this or if I should even do this manually?
I also create a environment variables to my airflow folder.Please let me know if more details are needed!
Airflow does not work on Windows directly. Instead, you can run it from a Linux virtual machine, Windows Subsystem for Linux (WSL), or Docker container.
To run it from a virtual machine, you can download VirtualBox along with the latest ubuntu-desktop ISO file. Now, you can create a virtual machine and install Airflow against it.

python script on azure web jobs - No module named request

I need to run the python scripts on Azure web jobs but i am getting the below error. I tried all the possible ways like scripts with virtualenv and append the path but none of them is working.
[10/08/2018 11:27:27 > ca6024: ERR ] ImportError: No module named request
Can you please help me to fix?
The script used in the file is,
import urllib.request
print('success')
according to
https://docs.python.org/2/library/urllib.html
you can check your python version. it's different between python2 and python3.
in python2.7, use :
urllib.urlopen()
instead of :
urllib.request.urlopen()
Please refer to below steps which I uploaded Python script into Webjobs previously.
1: Use the virtualenv component to create an independent Python runtime environment in your system.If you don't have it, just install it first with command pip install virtualenv
If you installed it successfully, you could see it in your python/Scripts file.
2 : Run the command to create independent Python runtime environment.
3: Then go into the created directory's Scripts folder and activate it (this step is important , don't miss it)
Please don't close this command window and use pip install <your libraryname> to download external libraries in this command window. Such as pip install request for you.
4:Keep the Sample.py uniformly compressed into a folder with the libs packages in the Libs/site-packages folder that you rely on.
5: Create webjob in Web app service and upload the zip file, then you could execute your Web Job and check the log
You could also refer to the SO thread :Options for running Python scripts in Azure

Azure functions: Installing Python modules and extensions on consumption plan

I am trying to run a python script with Azure functions.
I had success updating the python version and installing modules on Azure functions under the App Services plan but I need to use it under the Consumption plan as my script will only execute once everyday and for only a few minutes, so I want to pay only for the time of execution. See: https://azure.microsoft.com/en-au/services/functions/
Now I'm still new to this but from my understanding the consumption plan spins up the vm and terminates it after your script has been executed unlike the App Service plan which is always on.
I am not sure why this would mean that I can't have install anything on it. I thought that would just mean I have to install it every time I spin it up.
I have tried installing modules through the python script itself and the kudu command line with no success.
While under the app service plan it was simple, following this tutorial: https://prmadi.com/running-python-code-on-azure-functions-app/
On Functions Comsumption plan, Kudu extensions are not available. However, you can update pip to be able to install all your dependencies correctly:
Create your Python script on Functions (let's say NameOfMyFunction/run.py)
Open a Kudu console
Go to the folder of your script (should be d:/home/site/wwwroot/NameOfMyFunction)
Create a virtualenv in this folder (python -m virtualenv myvenv)
Load this venv (cd myenv/Scripts and call activate.bat)
Your shell should be now prefixed by (myvenv)
Update pip (python -m pip install -U pip)
Install what you need (python -m pip install flask)
Now in the Azure Portal, in your script, update the sys.path to add this venv:
import sys, os.path
sys.path.append(os.path.abspath(os.path.join(os.path.dirname( __file__ ), 'myvenv/Lib/site-packages')))
You should be able to start what you want now.
(Reference: https://github.com/Azure/azure-sdk-for-python/issues/1044)
Edit: reading previous comment, it seems you need numpy. I just tested right now and I was able to install 1.12.1 with no issues.
You may upload the modules for the Python version of your choice in Consumption Plan. Kindly refer to the instructions at this link: https://github.com/Azure/azure-webjobs-sdk-script/wiki/Using-a-custom-version-of-Python
This is what worked for me:
Dislaimer: I use C# Function that includes Python script execution, using command line with System.Diagnostics.Process class.
Add relevant Python extension for the Azure Function from Azure Portal:
Platform Features -> Development Tools -> Extensions
It installed python to D:\home\python364x86 (as seen from Kudu console)
Add an application setting called WEBSITE_USE_PLACEHOLDER and set its value to 0. This is necessary to work around an Azure Functions issue that causes the Python extension to stop working after the function app is unloaded.
See: Using Python 3 in Azure Functions question.
Install the packages from Kudu CMD line console using pip install ...
(in my case it was pip install pandas)

Categories