Dagster running pipelines with multiple projects

Dagster running pipelines with multiple projects - python

I need a bit of help for the deployment in AWS using dagster projects, unfortunately couldn't found in the offical documentation.
So a bit of context with the simple solids and pipelines using repo.py is working perfectly file. But the problem starts to occur in aws when I change the structure of solid and pipelines in a new directory project. So the objective is not to use repo.py for the trigger of pipelines (here is the example which shows I am refereeing too ). This line 65 of docker-compose.yaml file uses this command
dagster api grpc -h 0.0.0.0 -p 4000 -f repo.py
and the same command goes to our AWS infrastructure for the trigger of pipelines. Instead what I am looking for is to utilise the workspace.yaml file (where i can add multiple python packages ).
So does anyone think can be command be used like this ? (presently there is no existing '-w' parameter with dagster api )
dagster api grpc -h 0.0.0.0 -p 4000 -w workspace.yaml
If not then another Idea is to use Module instead of 'repo.py' in the main directory. Dagit works really well with the module
dagit -m project-01
but can this be possible with the dagster ? so the command would become like this dagster api grpc -h 0.0.0.0 -p 4000 -m project-01 (presently it throws an error that project-01 don't exists )

Related

Installing packages in a Kubernetes Pod

I am experimenting with running jenkins on kubernetes cluster. I have achieved running jenkins on the cluster using helm chart. However, I'm unable to run any test cases since my code base requires python, mongodb
In my JenkinsFile, I have tried the following
1.
withPythonEnv('python3.9') {
pysh 'pip3 install pytest'
}
stage('Test') {
sh 'python --version'
}
But it says java.io.IOException: error=2, No such file or directory.
It is not feasible to always run the python install command and have it hardcoded into the JenkinsFile. After some research I found out that I have to declare kube to install python while the pod is being provisioned but there seems to be no PreStart hook/lifecycle for the pod, there is only PostStart and PreStop.
I'm not sure how to install python and mongodb use it as a template for kube pods.
This is the default YAML file that I used for the helm chart - jenkins-values.yaml
Also I'm not sure if I need to use helm.

You should create a new container image with the packages installed. In this case, the Dockerfile could look something like this:
FROM jenkins/jenkins
RUN apt install -y appname
Then build the container, push it to a container registry, and replace the "Image: jenkins/jenkins" in your helm chart with the name of the container image you built plus the container registry you uploaded it to. With this, your applications are installed on your container every time it runs.
The second way, which works but isn't perfect, is to run environment commands, with something like what is described here:
https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/
the issue with this method is that some deployments already use the startup commands, and by redefining the entrypoint, you can stop the starting command of the container from ever running, thus causing the container to fail.
(This should work if added to the helm chart in the deployment section, as they should share roughly the same format)
Otherwise, there's a really improper way of installing programs in a running pod - use kubectl exec -it deployment.apps/jenkins -- bash then run your installation commands in the pod itself.
That being said, it's a poor idea to do this because if the pod restarts, it will revert back to the original image without the required applications installed. If you build a new container image, your apps will remain installed each time the pod restarts. This should basically never be used, unless it is a temporary pod as a testing environment.

Python: get OS version of server with kubernetes running (inside containers)

im looking for the OS version(such as Ubuntu 20.04.1 LTS) to get it from container that run on kuberentes server. i mean, i need to OS of the server which on that server i have kubernetes with number of pods(and containers).
i saw there is a library which call "kubernetes" but didn't found any relevant info on this specific subject.
is there a way to get this info with python?
many thanks for the help!

If you need to get an OS version of running container you should read
https://kubernetes.io/docs/tasks/debug/debug-application/get-shell-running-container/
as it described above you can get access to your running pod by command:
kubectl exec --stdin --tty <pod_name> -- /bin/bash
then just type "cat /etc/os-release" and you will see the OS info which your pod running on. In most cases containers run on unix systems and you will find current pod OS.
You also can install python or anything else inside your pod. But I do not recommend to do it. Containers have minimum thing to make you app work. For checking it is ok, but after it just deploy new container.

Using the node info on which pod is running via kubectl. In the below command, replace the <PODNAME> with your pod name.
kubectl get node $(kubectl get pod <PODNAME> -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.nodeInfo.osImage}'

Running a python script inside an nginx docker container

I'm using nginx to serve some of my docs. I have a python script that processes these docs for me. I don't want to pre-process the docs and then add them in before the docker container is built since these docs can grow to be pretty big and they increase in number. What I want is to run my python (and bash) scripts inside the nginx container and have nginx just serve those docs. Is there a way to do this without pre-processing the docs before building the container?
I've attempted to execute RUN python3 process_docs.py, but I keep seeing the following error:
/bin/sh: 1: python: not found
The command '/bin/sh -c python process_docs.py' returned a non-zero code: 127
Is there a way to get python3 onto the Nginx docker container? I was thinking of installing python3 using:
apt-get update -y
apt-get install python3.6 -y
but I'm not sure that this would be good practice. Please let me know the best way to run my pre processing script.

You can use a bind mount to inject data from your host system into the container. This will automatically update itself when the host data changes. If you're running this in Docker Compose, the syntax looks like
version: '3.8'
services:
nginx:
image: nginx
volumes:
- ./html:/usr/share/nginx/html
- ./data:/usr/share/nginx/html/data
ports:
- '8000:80' # access via http://localhost:8000
In this sample setup, the html directory holds your static assets (checked into source control) and the data directory holds the generated data. You can regenerate the data from the host, outside Docker, the same way you would if Docker weren't involved
# on the host
. venv/bin/activate
./regenerate_data.py --output-directory ./data
You should not need docker exec in normal operation, though it can be extremely useful as a debugging tool. Conceptually it might help to think of a container as identical to a process; if you ask "can I run this Python script inside the Nginx process", no, you should generally run it somewhere else.

How do you debug python code with kubernetes and skaffold?

I am currently running a django app under python3 through kubernetes by going through skaffold dev. I have hot reload working with the Python source code. Is it currently possible to do interactive debugging with python on kubernetes?
For example,
def index(request):
import pdb; pdb.set_trace()
return render(request, 'index.html', {})
Usually, outside a container, hitting the endpoint will drop me in the (pdb) shell.
In the current setup, I have set stdin and tty to true in the Deployment file. The code does stop at the breakpoint but it doesn't give me access to the (pdb) shell.

There is a kubectl command that allows you to attach to a running container in a pod:
kubectl attach <pod-name> -c <container-name> [-n namespace] -i -t
-i (default:false) Pass stdin to the container
-t (default:false) Stdin is a TTY
It should allow you to interact with the debugger in the container.
Probably you may need to adjust your pod to use a debugger, so the following article might be helpful:
How to use PDB inside a docker container.
There is also telepresence tool that helps you to use different approach of application debugging:
Using telepresence allows you to use custom tools, such as a debugger and IDE, for a local service and provides the service full access to ConfigMap, secrets, and the services running on the remote cluster.
Use the --swap-deployment option to swap an existing deployment with the Telepresence proxy. Swapping allows you to run a service locally and connect to the remote Kubernetes cluster. The services in the remote cluster can now access the locally running instance.

It might be worth looking into Rookout which allows in-prod live debugging of Python on Kubernetes pods without restarts or redeploys. You lose path-forcing etc but you gain loads of flexibility for effectively simulating breakpoint-type stack traces on the fly.

This doesn't use Skaffold, but you can attach the VSCode debugger to any running Python pod with an open source project I wrote.
There is some setup involved to install it on your cluster, but after installation you can debug any pod with one command:
robusta playbooks trigger python_debugger name=myapp namespace=default

You can take a look at okteto/okteto. There's a good tutorial which explains how you can develop and debug directly on Kubernetes.

How to deploy scrapyd to a network

I currently have an instance of scrapyd up and running locally on my machine. This instance of scrapyd needs to be available to other PC's on my employers network. I've read about scrapy-cloud (https://doc.scrapinghub.com/scrapy-cloud.html) and other cloud based services. However I'd much rather host scrapyd on our network, since the spiders I've built pull data from csv files stored on our servers.
I've searched through the scrapyd documentation (https://scrapyd.readthedocs.io/en/stable/) and understand how to install and run scrapyd. I am also comfortable with uploading scrapy projects to scrapyd and running specific spiders.
What steps do I need to take in order to make my scrapyd instance available to other machines on our network? All of our PC's and servers run on a windows OS
The answer doesn't need to be a specific step by step guide. I'm just looking for someone to point me in the right direction, because I am unsure how to proceed.

if you are in a lan in the same range of ip.
you can follow the manual and check your ip
ifconfig in linux
ipconfig in windows
and run the commands in the manual
curl http://localhost:6800/addversion.json -F project=myproject -F version=r23 -F egg=#myproject.egg
and change the localhost with your ipaddress
for example if your ip is 192.168.1.10
you will run
in other pc }.
curl http://192.168.1.10:6800/addversion.json -F project=myproject -F version=r23 -F egg=#myproject.egg
You need open the port if you use firewalls, and if you don't use cURL in windows can download and install it:
How do I install/set up and use cURL on Windows?
More information about the api check the manual

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dagster running pipelines with multiple projects - python

Related

Installing packages in a Kubernetes Pod

Python: get OS version of server with kubernetes running (inside containers)

Running a python script inside an nginx docker container

How do you debug python code with kubernetes and skaffold?

How to deploy scrapyd to a network

Categories

Resources