Gcloud Internal Error while submitting training inside Docker container

Gcloud Internal Error while submitting training inside Docker container - python

I'm building a Docker container to submit ML training jobs using gcloud - the runnable is actually a Python program and gcloud is being executed via subprocess.check_output. Running the program outside a Docker container works just fine which makes me wonder if there is some dependency that is not installed but gcloud simply outputs no useful logs at all.
While running gcloud ml-engine jobs submit training the executable returns exit status 1 simply outputting Internal Error. The logs that are available on Google Cloud Console are always 5 entries of "Validating job requirements..." with no further information.
The Docker container has the following installed dependencies (some are not relevant to Google Cloud ML but are used by other features in the program):
Via apt-get: python, python-pip, python-dev, libmysqlclient-dev, curl
Via pip install: flask, MySQL-python, configparser, pandas, tensorflow
The gcloud tool itself is installed by downloading the SDK and installing it through command line:
RUN curl https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz > /tmp/google-cloud-sdk.tar.gz
RUN mkdir -p /usr/local/gcloud
RUN tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz
RUN /usr/local/gcloud/google-cloud-sdk/install.sh
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin
Account credentials are setup via
RUN gcloud auth activate-service-account --key-file path-to-keyfile-in-docker-container
RUN gsutil version -l
Last gsutil version command is pretty much just to make sure SDK installation is working.
Does anyone have any clue what might be happening or how to further debug what might me causing an Internal Error on gcloud?
Thanks in advance! :)

Please make sure all the parameters are set properly and make sure you have all your dependencies uploaded and packaged properly.
If everything is done and you still can't run the job, you will need more than just "Internal Error" to pinpoint the issue. Please either contact Google Cloud Platform support or file a bug on the Public Issue Tracker to get further assistance.

Related

Receiving OSError: [Errno 8] Exec format error in app running in Docker Container

I have a React/Flask app running within a Docker container. There is no issue with me building the project using docker-compose, and running the app itself in the container. Where I am running into issues is a particular API route that is supposed to fetch user profiles from the DB, encrypt the values in a text file, and return to the frontend for download. The encryption script is written in C, though the API route is written in Python. When I try and encrypt through the app running in Docker, I am given the following error message:
OSError: [Errno 8] Exec format error: './app/Crypto/encrypt.exe'
I know the following command works in the CLI if invoked outside of the Docker Container (still invoked at the same directory level as it would in app):
./app/Crypto/encrypt.exe -k ./app/Crypto/secretkey -i ./profile.txt -o ./profile.encr
I am using the following Python code to invoke the command in the API route which is where it fails:
proc = subprocess.Popen(f"./app/Crypto/encrypt.exe -k ./app/Crypto/secretkey -i ./{profile.profile_name}.txt -o ./{profile.profile_name}.encr", shell=True)
The Dockerfile for my backend is pasted below:
FROM python:3
WORKDIR /app
ENV FLASK_APP=main.py
COPY ./requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "main.py"]
I have tried to tackle the issue a few different ways:
By default my Docker Container was built with Architecture of ARM64. I read that the OS Error was caused by Architecture not being AMD64, so I rebuilt the container with AMD64 and it gave me the same error.
In case this was a permissions error, I ran chmod +rwx on encrypt.exe through the Dockerfile when building the container. Pretty sure it has nothing to do with permissions especially as it still failed.
I added a shebang (#!/bin/bash) to the script as well as to the Dockerfile.
At the end of the day I know I am failing when using subprocess.Popen, so I am positive I must be missing something when invoking the script using Python, or there is a configuration in my Docker Container that is preventing this functionality. My machine is a Macbook Pro which the script runs fine on. The script has also successfully been utilized on a machine running Linux.
Any chances folks have seen similar issues arise with this error? Thanks in advance!

So thanks to David Maze's comment on this, I followed the lead that maybe the executable I wanted to run needed to be built within the Dockerfile. I destroyed my original container, added in a step to run the Makefile that generates the executable, and finally ran the program through the app running in the Docker container. This did the trick! Not sure as to why the executable needed to be compiled within the Docker container, but running 'make' on the Makefile within the Dockerfile did the trick.

Installing packages in a Kubernetes Pod

I am experimenting with running jenkins on kubernetes cluster. I have achieved running jenkins on the cluster using helm chart. However, I'm unable to run any test cases since my code base requires python, mongodb
In my JenkinsFile, I have tried the following
1.
withPythonEnv('python3.9') {
pysh 'pip3 install pytest'
}
stage('Test') {
sh 'python --version'
}
But it says java.io.IOException: error=2, No such file or directory.
It is not feasible to always run the python install command and have it hardcoded into the JenkinsFile. After some research I found out that I have to declare kube to install python while the pod is being provisioned but there seems to be no PreStart hook/lifecycle for the pod, there is only PostStart and PreStop.
I'm not sure how to install python and mongodb use it as a template for kube pods.
This is the default YAML file that I used for the helm chart - jenkins-values.yaml
Also I'm not sure if I need to use helm.

You should create a new container image with the packages installed. In this case, the Dockerfile could look something like this:
FROM jenkins/jenkins
RUN apt install -y appname
Then build the container, push it to a container registry, and replace the "Image: jenkins/jenkins" in your helm chart with the name of the container image you built plus the container registry you uploaded it to. With this, your applications are installed on your container every time it runs.
The second way, which works but isn't perfect, is to run environment commands, with something like what is described here:
https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/
the issue with this method is that some deployments already use the startup commands, and by redefining the entrypoint, you can stop the starting command of the container from ever running, thus causing the container to fail.
(This should work if added to the helm chart in the deployment section, as they should share roughly the same format)
Otherwise, there's a really improper way of installing programs in a running pod - use kubectl exec -it deployment.apps/jenkins -- bash then run your installation commands in the pod itself.
That being said, it's a poor idea to do this because if the pod restarts, it will revert back to the original image without the required applications installed. If you build a new container image, your apps will remain installed each time the pod restarts. This should basically never be used, unless it is a temporary pod as a testing environment.

How to Deploy Flask app on AWS EC2 Linux/UNIX instance

How to deploy Flask app on AWS Linux/UNIX EC2 instance.
With any way either
1> using Gunicorn
2> using Apache server

It's absolutely possible, but it's not the quickest process! You'll probably want to use Docker to containerize your flask app before you deploy it as well, so it boils down to these steps:
Install Docker (if you don't have it) and build an image for your application and make sure you can start the container locally and the app works as intended. You'll also need to write a Dockerfile that sets your runtime, copies all your directories and exposes port 80 (this will be handy for AWS later).
The command to build an image is docker build -t your-app-name .
Once you're ready to deploy the container, head over to AWS and launch an EC2 instance with the Linux 2 machine. You'll be required to create a security key (.pem file) and move it to somewhere on your computer. This acts like your credential to login to your instance. This is where things get different depending on what OS you use. On Mac, you need to cd into your directory where the key is and modify the permissions of it by running chmod 400 key-file-name.pem. On Windows, you have to go into the security settings and make sure only your account (ideally the owner of the computer) can use this file, basically setting it to private. At this point, you can connect to your instance from your command prompt with the command AWS gives you when you click connect to instance on the EC2 dashboard.
Once you're logged in, you can configure your instance to install docker and let you use it by running the following:
sudo amazon-linux-extras install docker
sudo yum install docker
sudo service docker start
sudo usermod -a -G docker ec2-user
Great, now you need to copy all your files from your local directory to your instance using SCP (secure transfer protocol). The long way is to use this command for each file: scp -i /path/my-key-pair.pem file-to-copy ec2-user#public-dns-name:/home/ec2-user. Another route is to install FileZilla or WinSCP to speed up this process.
Now that all your files are in the instance, build the docker container using the same command from the first step and activate it. If you go to the URL that AWS gives you, your app should be running on AWS!
Here's a reference I used when I did this for the first time, it might be helpful for you to look at too

Azure Function - trigger Python script containing Azure CLI commands

I have a python script for provisioning infrastructure in Azure - IaC. This script is using mostly the Python SDK, but is also running multiple Azure CLI commands - it is needed at times when I didn't find an equivalent command in the Python SDK.
My goal is to trigger this script on-demand using Azure Functions. When testing the Azure Function locally, everything works fine as I have Azure CLI installed on my machine, however, when I publish it to Azure functions, I will run into an error that: /bin/sh: 1: az: not found
Below is the sample python function I trigger in the Azure Function (Please note that the rest of script works fine, so I can create RG, SQL server etc, the problem is just the az commands). I wonder, if and how could I install Azure CLI on the Azure Function in order to be able to run the CLI commands?
Here is the python function causing the error:
# Loging to AZ
call("az login --service-principal -u '%s' -p '%s' --tenant '%s'" % (client_id, client_secret, tenant_id), shell=True)
b2c_id = check_output("az resource show -g '<rg_name>' -n '<b2c_name>' --resource-type 'Microsoft.AzureActiveDirectory/b2cDirectories' --query id --output tsv", shell=True)
print("The B2C ID is: %s" % b2c_id)```

I tried to create a simple Azure Function with HttpTrigger to invoke Azure CLI tools by different ways, but never works on cloud after published.
It seems the only solution is to publish the function as docker image with --build-native-deps option for the command func azure functionapp publish <your function app name> after add the required package azure-cli into the requirements.txt file, as the figure with error below said,
There was an error restoring dependencies.ERROR: cannot install antlr4-python3-runtime-4.7.2 dependency: binary dependencies without wheels are not supported. Use the --build-native-deps option to automatically build and configure the dependencies using a Docker container. More information at https://aka.ms/func-python-publish
Due to there is not docker tools in my local, I did not successfully run func azure functionapp publish <your function app name> --build-native-deps.
Meanwhile, running Azure CLI commands is not the only one ways to use the functions of Azure CLI. The az command is just a runnable script file, not a binary execute file. After I reviewd az and some source codes of azure-cli package, I think you can directly import the package via from azure.cli.core import get_default_cli and to use it to do the same operations like the code below.
from azure.cli.core import get_default_cli
az_cli = get_default_cli()
exit_code = az_cli.invoke(args)
sys.exit(exit_code)
The code is written by referring to the source code of azure/cli/__main__.py of azure-cli package, you can see it from the lib/python3.x/site-packages path of your virtual environment.
Hope it helps.

Thanks, Peter, I used something similar in the end and got it working. I have this function which will run the AZ CLI commands and return results if we need to (like for example if I need to run the cli command but also store the output, for example if I want to know what an object ID of a service principal is, I can get the results like in this example:
def az_cli (args):
cli = get_default_cli()
cli.invoke(args)
if cli.result.result:
return cli.result.result
elif cli.result.error:
raise cli.result.error
return True
Now, I can make the call like this (client_id is the ServicePrincipal ID):
ob_id = az_cli(['ad', 'sp', 'show', '--id', client_id])
print(ob_id["objectId"])

Download Google App Engine Project

I've gotten appcfg.py to run. However, when I run the command it doesn't actually download the files. Nothing appears in the destination directory.
There was a mixup and a lot of work got lost and the only way to recover them is to download them from the host.
python appcfg.py download_app -A beatemup-1097 /home/chaserlewis/Desktop/gcloud
The output is
Host: appengine.google.com
Fetching file list...
Fetching files...
Then it just returns without having downloaded anything. It is definitely hosted so I'm not sure what else to do.
I am doing this from a different computer then I deployed from if that matters. I couldn't get appcfg.py to run on my Windows machine unfortunately.

It might be due to the omitted version flag. Try the following:
Go to the App Engine versions page in the console and check the version of your app that is serving traffic. If you don't specify the -V flag, the appcfg command will try to download the default version, which isn't necessarily your latest version or the version serving traffic.
Add the -V flag to your command with the target version that you identified from the console.
python appcfg.py download_app -A beatemup-1097 -V [YOUR_VERSION] /home/chaserlewis/Desktop/gcloud

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.