Getting data in Kubernetes pod before the container starts

Getting data in Kubernetes pod before the container starts - python

I have a python program that can control a Kubernetes cluster (from outside). During that program execution, it obtains a byte array.
I have full pod spec ready to be created.
I need to modify the pod spec (adding an init container) so that when the main container starts, there is a file somewhere with those exact bytes.
What's the easiest way to do that?

If I understand your question correctly, you want to run a Python script that will extract or derive a byte array from somewhere before your Pod starts, and writes this byte array to a file for your actual application to read from when running inside the Pod.
I can see 2 ways to achieve this:
Slightly modify your Docker image to run your script as an entrypoint and then running your application (command: & args: in your Pod spec). You would ship both together and won't need an initContainer.
Or as you were tending for: use a combination of initContainer and Volumes
For the later:
template:
spec:
volumes:
- name: byte-array
emptyDir: {}
initContainers:
- name: byte-array-generator
image: your/init-image:latest
command: ["/usr/bin/python", "byte_array_generator.py"]
volumeMounts:
- mountPath: /my/byte-array/
name: byte-array
containers:
- name: application
image: your/actual-app:latest
volumeMounts:
- name: byte-array
mountPath: /byte-array/
I condensed all 3 parts:
1 empty volume definition used to pass over the file
1 initContainer with your script generating the byte array and writing it to disk in let's say /my/byte-array/bytearray.bin (where the volume has been mounted)
1 actual container running your application and reading the byte array from /byte-array/bytearray.bin (where the volume has been mounted)
One important note from there you also need to take into consideration is: if you mount your volume on a pre-existing folder with actual files, they will all "overwritten" by your volume. The source will take the destination's place.
You might be able to prevent that using subPath but I never tried with this way, I only know it works if you mount ConfigMaps as volumes.
Edit: the answer to your comment is too long
Outside the container or outside the Kubernetes cluster ? Maybe there is a misconception, so just in case: an initContainer doesn't have to use the same image as your Pod's. You could even load your script as a ConfigMap and mount it into the initContainer using Python base image to run it if you want...
But if really your script has to be running outside of the cluster and send a file enabling your Pod to start, I'd suggest you add a logic to your byte generation that will output it to a file taking the Pod hostname for example (from Kubernetes API), and scp it to the Kubernetes node running it (pulled from Kubernetes API too) into a know destination. Just define a folder on each of your nodes, like /var/data/your_app/ and mount it on all your pods.
volumes:
- hostPath:
path: /var/data/your_app
type: Directory
name: bite-arrays
then mount bite-arrays wherever you want in whatever container needs to read it by reusing its hostname (to allow you to scale if necessary).
Since you said your script is controlling the cluster, I assume it's already talking to Kubernetes' API... You might also want to create a logic to cleanup left-overs...
Or maybe we got it all wrong and somehow your script is also generating and applying the Pod spec on the fly, which in this case could just be solved by an environment variable or a ConfigMap shipped alongside...
Pod spec:
volumes:
- name: byte-array
configMap:
name: your-app-bytes
volumeMounts:
- name: byte-array
mountPath: /data/your-app/byte-array
readOnly: true
subPath: byte-array
ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: your-app-bytes
data:
byte-array: |-
WHATEVERBYTESAREGENERATEDHERE

This is more of an opinion question/answer.
If it happens that your python script generates specific bytes I would go with an initContainer with a volume. Something like this:
initContainers:
- name: init-container
image: container:version
command: [ 'your-python.py' ]
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
containers:
- name: app-container
image: container:version
command: [ 'your-actual-app' ]
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
If your bytes are straight UTF-8 characters, for example, it's easier to just use a ConfigMap

Related

How to detect the current node where a pod is running in python

So I am not a good coder in python or an kubernetes expert but I have a project that need to do this:
In python, I want to connect to the BMC (ilo interface of of a baremetal node) to get some hardware info.
My goal is to create a daemonset so the code can run on every node of the k8s cluster and retreive some hardware info. Now, I need the code to detect on which node the daemon is currently running so I can use this a way to connect to the node bmc interface with some API calls (like, if the node detected is node1.domain.com, I can then check node1.bmc.domain.com for ex).
If my question is not clear enough, please let me know. If you can give me some code sample that could acheive this, it will very appreciated :)
Thanks!
Right now, I have only in python a way to connect to the K8s api and get the list of nodes of a cluster but I do not found a way to detect while running as a pod, which node the pod is currently running. Found some infos here https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CoreV1Api.md#read_namespaced_pod but not sure how to combine runing the code in a pod and getting the pod own info..
I saw this also how to get the host name of the node where a POD is running from within POD but not sure if I have to add something to the pod or the info comes as a environement variable already in a pod.

You can use the downward API to get pod details to specific container as below:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: my-daemon-set
namespace: my-namespace
spec:
selector:
matchLabels:
name: app-name
template:
metadata:
labels:
name: app-name
spec:
containers:
- name: my-image-name
image: my-image:v1
env:
- name: MY_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
More info in Expose Pod Information to Containers Through Environment Variables

Environment variables: Pod vs Container. Trying to access Container envvar with Python os.getenv

I have deployed a Pod with several containers. In my Pod I have certain environment variables that I can access in Python script with os.getenv(). However, if I try to use os.getenv to access the Container's environment variables I get an error stating they don't exist (NoneType). When I write kubectl describe pod <POD_Name> I see that all the environment variables (both Pod and Container) are set.
Any ideas?

The issue was in creating helm tests. In order to get the environment variables from the containers in a helm test then the environment variables need to be duplicated in the test.yaml file or injected from a shared configmap.

According to your answer I would like to add a little theory.
See this documentation about ConfigMaps.
A ConfigMap is an API object used to store non-confidential data in key-value pairs. Pods can consume ConfigMaps as environment variables, command-line arguments, or as configuration files in a volume.
Here one can find also an example of Pod that uses values from ConfigMap to configure a Pod:
env:
# Define the environment variable
- name: PLAYER_INITIAL_LIVES # Notice that the case is different here
# from the key name in the ConfigMap.
valueFrom:
configMapKeyRef:
name: game-demo # The ConfigMap this value comes from.
key: player_initial_lives # The key to fetch.

AWS ECS cli command equivalent in boto3

I'm trying to port in python an ECS services deployment that at the moment is done with a bunch of bash script containing commands like the following:
ecs-cli compose -f foo.yml -p foo --cluster bar --ecs-params "dir/ecs-params.yml" service up
I thought that the easiest/fastest way could be using boto3 (which I already extensively use elsewhere so It's a safe spot), but I didn't understand from the documentation what would be the instruction equivalent of the formerly written command.
Thanks in advance.
UPDATE: this is the content of foo.yml:
version: '3'
services:
my-service:
image: ecr-image:version
env_file:
- ./some_envs.env
- ./more_envs.env
command: python3 src/main.py param1 param2
logging:
driver: awslogs
options:
awslogs-group: /my-service-log-group
awslogs-region: my-region
awslogs-stream-prefix: my-prefix
UPDATE2: this is the content of dir/ecs-params.yml:
version: 1
task_definition:
task_role_arn: my-role
services:
my-service:
cpu_shares: my-cpu-shares
mem_reservation: my-mem-reservation

The ecs-cli is a high level construct that creates a workflow that wraps many lower level API calls. NOT the same thing but you can think of the ecs-cli compose up command the trigger to deploy what's included in your foo.yml file. Based on what's in your foo.yml file you can walk backwards and try to map to single atomic ECS API calls.
None of this answers your question but, for background, the ecs-cli is no longer what we suggest to use for deploying on ECS. Its evolution is Copilot (if you are not starting from a docker compose story) OR the new docker compose integration with ECS (if docker compose is your jam).
If you want / can post the content of your foo.yml file I can take a stab at how many lower level API calls you'd need to make to do the same (or suggest some other alternatives).
[UPDATE]
Based on the content of your two files you could try this one docker compose file:
services:
my-service:
image: ecr-image:version
env_file:
- ./some_envs.env
- ./more_envs.env
x-aws-policies:
- <my-role>
deploy:
resources:
limits:
cpus: '0.5'
memory: 2048M
Some of the ECS params are interpreted off the compose spec (e.g. resource limits). Some other do not have a specific compose-ECS mapping so they are managed through x-aws extensions (e.g. IAM role). Please note that compose only deploy to Fargate so the shares do not make much sense and you'd need to use limits (to pick the right Fargate task size). As a reminder this is an alternative CLI way to deploy the service to ECS but it does not solve for how you translate ALL API calls to boto3.

Why can't my two docker containers communicate even though they are both responding seperatly?

I know this question has been asked in various ways already, but so far none of the existing answers seem to work as they all reference docker-compose which I'm already using.
I'm trying to start a multi-container service (locally for now). One is a web frontend container running flask and exposing port 5000 (labeled 'web_page' in my docker-compose file). The other container is a text generation model (labeled "model" in my docker-compose file).
Here is my docker-compose.yml file:
version: '3'
services:
web_page:
build: ./web_app
ports:
- "5000:5000"
model:
build: ./gpt-2-cloud-run
ports:
- "8080:8080"
After I run docker-compose up and I use a browser (or postman) and go to 0.0.0.0:5000 or 0.0.0.0:8080 I get back a response and it shows exactly what I expect to get back. So both services are up and running and responding on the correct ip/port. But when I click "submit" on the web_page to send the request to the 'model" I get a connection error even though both ip/ports are responding if I test them.
If I run the 'model' container as a stand alone container and just start up the web_page app NOT in a container it works fine. When I put BOTH in containers the web_page immediately gives me
requests.exceptions.ConnectionError
Within the web_page.py code is:
requests.post('http://0.0.0.0:8080',json={'length': 100, 'temperature': 0.85,"prefix":question})
which goes out to that IP with the payload and receives the response back. Again, this works fine when the 'model' is running in a container and has port 8080:8080 mapped. When the web_page is running in the container it can't reach the model endpoint for some reason. Why would this be and how could I fix it?

Looks like you're using the default network that gets spun up by docker-compose (so it'll be named something like <directory-name_default>). If you switch your base URL for the requests to the host name of the backend docker container (so model rather than 0.0.0.0), your requests should be able to succeed. Environment variables are good here.
Btw incase you weren't aware you don't need to expose the backend application if you only ever intend on it being access by the frontend one. They both sit in the same docker network so they'll be able to talk to one another.

Elements of the other answers are correct but there are a couple points that are missing or were assumed in the other answers but not were not made explicit.
According to the docker documentation the default bridge network created will NOT provide dns resolution to the image name; only to ip addresses of other containers https://docs.docker.com/network/bridge/#differences-between-user-defined-bridges-and-the-default-bridge
So, my final compose file was:
version: '3'
services:
web_page:
build: ./web_app
ports:
- "5000:5000"
networks:
- bot-net
depends_on:
- model
model:
image: sports_int_bot_api_model
networks:
- bot-net
networks:
bot-net:
external: true
After I created a 'bot-net' network first on the CLI. I don't know that that is necessarily what has to be done, perhaps you can create a non-default bridge network in the docker-compose file as well. But it does seem, that you cannot use the default bridge network created and resolve an image name (per the docs)
The final endpoint that I pointed to is:
'http://model:8080'
I suppose this was alluded too in the other answers but they omitted the need to include the 'http' section. This also is not shown in the docs, where they use the name of the image in-place of http as in the docker example they use postgres://db:5432
https://docs.docker.com/compose/networking/

How to Get Seldon Sklearn servers to work with Google Cloud Storage from GKE

Needed to understand how to make seldon work with prepackaged python pickles and servers.
After following instructions from seldonio site for sklearn server, am still unable to get the predefined server models to work.
I have the iris model placed on google cloud storage at location
-> gcs://mymodels/sklearn/iris.pkl
I successfully installed seldon-core on gke and ran through a wrapped model example successfully. Now I want to be able to use the pre-packaged servers which can pickup python-pickles from google cloud storage.
When I specify the location to the SeldonDeplymentObject , the service never comes up and the pod continues to die.
Here is my SeldonDeployment:
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: sklearniris
spec:
name: seldon_skiris
predictors:
- graph:
children: []
implementation: SKLEARN_SERVER
modelUri: gcs://mymodels/sklearn/iris.pkl
name: classifier
name: default
replicas: 1
What do I setup on gke and gcs to make this work?

The error is in SeldonDeployment line modelUri: gcs://mymodels/sklearn/iris.pkl. gcs is not the proper path for Google Cloud storage. It should be modelUri: gs://mymodels/sklearn/iris.pkl
Checking the logs for the pod should help see why it continues to die. Describing the pod would also help looking the events for that pod.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.