Dask Worker can't find module on AWS

Dask Worker can't find module on AWS - python

So I have a distributed setup of dask with a scheduler running on one container and worker running on another. I have a similar setup on AWS, where the scheduler is running on an EC2 instance and the worker is on a docker container in another EC2 machine.
I want a python file to be available to the worker. I don't want to install this as a dependency directly to the worker as of yet but want to manually copy this file to the worker such that it's available in the python environment that the worker uses. To achieve this, I am add this to the DOCKERFILE:
# syntax=docker/dockerfile:experimental
FROM daskdev/dask:2020.12.0
WORKDIR /src/
COPY ./python_file.py /src/python_file.py
Basically I want the dask worker to be able to run a method inside python_file.py. So I submit the method like:
client.submit(python_file.some_method, arg1, arg2)
This works fine on my local setup of dask and the worker is able to deserialize this call and run the method. Somehow this doesn't work on the AWS setup. The worker keeps complaining:
ModuleNotFoundError: No module named 'python_file'
To debug:
I logged into the EC2 machine and I see that the container is alive.
I entered the container and I see that the file also exists where I want it to (exactly like my local).
I ran python and tried importing the module and that works too.
I ran `pickle.loads(b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00\x8c\python_file\x94\x8c\x0esome_method\x94\x93\x94.') and that returns the desired method.
If all this works, what else could be the reason that the worker still complains of the module not existing? Has anybody else faced something similar?

This sounds like a PYTHONPATH issue - on your local machine the python file is probably present in the current working directory (which is on your PYTHONPATH). Can you confirm that src is on your PYTHONPATH? If not, I would add that as an ENV in your docker image.

Related

Installing packages in a Kubernetes Pod

I am experimenting with running jenkins on kubernetes cluster. I have achieved running jenkins on the cluster using helm chart. However, I'm unable to run any test cases since my code base requires python, mongodb
In my JenkinsFile, I have tried the following
1.
withPythonEnv('python3.9') {
pysh 'pip3 install pytest'
}
stage('Test') {
sh 'python --version'
}
But it says java.io.IOException: error=2, No such file or directory.
It is not feasible to always run the python install command and have it hardcoded into the JenkinsFile. After some research I found out that I have to declare kube to install python while the pod is being provisioned but there seems to be no PreStart hook/lifecycle for the pod, there is only PostStart and PreStop.
I'm not sure how to install python and mongodb use it as a template for kube pods.
This is the default YAML file that I used for the helm chart - jenkins-values.yaml
Also I'm not sure if I need to use helm.

You should create a new container image with the packages installed. In this case, the Dockerfile could look something like this:
FROM jenkins/jenkins
RUN apt install -y appname
Then build the container, push it to a container registry, and replace the "Image: jenkins/jenkins" in your helm chart with the name of the container image you built plus the container registry you uploaded it to. With this, your applications are installed on your container every time it runs.
The second way, which works but isn't perfect, is to run environment commands, with something like what is described here:
https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/
the issue with this method is that some deployments already use the startup commands, and by redefining the entrypoint, you can stop the starting command of the container from ever running, thus causing the container to fail.
(This should work if added to the helm chart in the deployment section, as they should share roughly the same format)
Otherwise, there's a really improper way of installing programs in a running pod - use kubectl exec -it deployment.apps/jenkins -- bash then run your installation commands in the pod itself.
That being said, it's a poor idea to do this because if the pod restarts, it will revert back to the original image without the required applications installed. If you build a new container image, your apps will remain installed each time the pod restarts. This should basically never be used, unless it is a temporary pod as a testing environment.

Using Pycharm debugger with Airflow running locally

I am using Airflow locally to execute some ETL tasks (everything is done locally using Airflow, Python and Docker), I have a task which failed.
If I could get to use Pycharm debugger it would be great. I am looking for a way so that Pycharm listen to what is happening on Airflow (localhost/airflow) so that once I run a task on Airflow I only need to jump to Pycharm to start debugging and see the logs.
I have read about remote debbug server but what I see on tutorials is that they all run their programm using Pycharm with a main function inside the file.
What I want is to launch my task through Airflow and then jump to Pycharm to see the logs and start debugging.
So I started something but I am not sure if this is the good way, when I try to add a remote interpreter to my project (preference/interpreter/add/docker compose here is what I get.
enter image description here

Run Python Code on SSH Target using Airflow

There are 2 systems: A and B. Airflow Scheduler, webserver, redis and flower runs on A while an Airflow worker runs on B. Both systems are running Ubuntu 18.04 and uses Airflow 1.10.10 in docker containers.
Is it possible to create a DAG that remotely runs Python code (defined in that DAG) on B?
SSHOperator allows the remote execution of a bash command on B over SSH, but we require a remote execution of Python code over SSH instead.
Thank you!

I don't know if you've gotten your answer already, but I had a very similar (if not the very same) problem until just a few moments ago and thought I could provide the answer here.
The easiest way is to mount a shared folder onto both nodes so they can both access the actual physical DAG files.
More details about my case can be found here.

One-time running Python code when AWS EC2 instance start

I have a two EC2 instances named "aws-example" and "aws-sandbox". At same time, they are docker machines. "aws-example" is a manager and "aws-sandbox" is a worker in a docker-swarm.
I wrote two Python script, when run the script in "aws-example", it stopped to "aws-sandbox" instance and start it again.
When I run the script in "aws-sandbox", worker has to left the swarm and join again.
I do by my hand all of this process. However, I have to automate that. How I do the one-time running Python script in "aws-sandbox" when the "aws-sandbox" instance started? I've had investigate to services AWS Lambda, CloudWatch etc. and I'm very confused. Are here any person who have clear pathway?

Make use of #reboot /path/to/script.py in cron, it should work.
For more info check this out.

How do you debug python code with kubernetes and skaffold?

I am currently running a django app under python3 through kubernetes by going through skaffold dev. I have hot reload working with the Python source code. Is it currently possible to do interactive debugging with python on kubernetes?
For example,
def index(request):
import pdb; pdb.set_trace()
return render(request, 'index.html', {})
Usually, outside a container, hitting the endpoint will drop me in the (pdb) shell.
In the current setup, I have set stdin and tty to true in the Deployment file. The code does stop at the breakpoint but it doesn't give me access to the (pdb) shell.

There is a kubectl command that allows you to attach to a running container in a pod:
kubectl attach <pod-name> -c <container-name> [-n namespace] -i -t
-i (default:false) Pass stdin to the container
-t (default:false) Stdin is a TTY
It should allow you to interact with the debugger in the container.
Probably you may need to adjust your pod to use a debugger, so the following article might be helpful:
How to use PDB inside a docker container.
There is also telepresence tool that helps you to use different approach of application debugging:
Using telepresence allows you to use custom tools, such as a debugger and IDE, for a local service and provides the service full access to ConfigMap, secrets, and the services running on the remote cluster.
Use the --swap-deployment option to swap an existing deployment with the Telepresence proxy. Swapping allows you to run a service locally and connect to the remote Kubernetes cluster. The services in the remote cluster can now access the locally running instance.

It might be worth looking into Rookout which allows in-prod live debugging of Python on Kubernetes pods without restarts or redeploys. You lose path-forcing etc but you gain loads of flexibility for effectively simulating breakpoint-type stack traces on the fly.

This doesn't use Skaffold, but you can attach the VSCode debugger to any running Python pod with an open source project I wrote.
There is some setup involved to install it on your cluster, but after installation you can debug any pod with one command:
robusta playbooks trigger python_debugger name=myapp namespace=default

You can take a look at okteto/okteto. There's a good tutorial which explains how you can develop and debug directly on Kubernetes.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.