Dask Gateway - Dask Workers Dying Due to PermissionError - python

I am trying to deploy Dask Gateway on Google Kubernetes Engine. No issues w/ the deployment. However, I am experiencing issues when using a custom dask-gateway dockerfile that inherits from the default docker image from dockerhub; the container is then submitted to Google Container Registry (GCR). It seems to result in the following PermissionError.
PermissionError: [Errno 13] Permission denied: '/home/dask/dask-worker-space
(See screenshot below for full stacktrace)
The intriguing part is that the dark workers start up without any issue when the dask workers use the docker image directly from dockerhub instead of GCR. I need to use a custom dockerfile to add a few more python packages to the dark workers, but other than that, there are no other configuration changes. It's as though sending the docker container to GCR does something funky to the permissions.
Here is the full stacktrace of the error:
Here is the dockerfile I am using for the dask workers:
FROM daskgateway/dask-gateway:0.9.0
RUN pip --no-cache-dir install --upgrade cloudpickle dask-ml scikit-learn \
nltk gensim spacy keras asyncio google-cloud-storage SQLAlchemy snowflake-sqlalchemy google-api-core gcsfs pyarrow mlflow \
tensorflow prefect hvac aiofile google-cloud-logging
Any help would be greatly appreciated because I have no idea how to debug.

As you are using a GKE cluster, make sure that the service account that you set for the cluster has the correct permissions on the Container registry.
You are creating an image, and submitting it to Container Registry, so you will need writer permissions there. The process is different if you are using the default service account or a custom one.
If you are using the default service account, you will need, at least, the Storage reader and writer scopes for this action. (GKE clusters are created by default only with reader scope).
If you have a running cluster, you will need to change the scopes on every nodepool
gcloud container node-pools create [new pool name] \
--cluster [cluster name] \
--machine-type [your desired machine type] \
--num-nodes [the same amount of nodes you have] \
--scopes [your new set of scopes]
(All the possible options can be found on the command gcloud container node-pools create --help)
After you have done it, you will need to drain the nodes kubectl drain [node], and delete the old nodepool
gcloud container node-pools delete [POOL_NAME] \
--cluster [CLUSTER_NAME]
If you don't have a cluster, you can edit the scopes on the console while creating it, or, if you will create it using gcloud, with the scopes that you want (full list)
If you are using a custom service account, make sure it has the role "roles/storage.admin" granted. (source)

Related

How to use Code from another Repository in Azure Devops Docker Build

I need to include code from another Azure DevOps Repo in a Docker Build step of an Azure Devops Pipeline. Specifically, I have another repository that contains a python package that needs to be installed in a docker container. However, the Python package and the Dockerfile are in separate Azure Devops Repo's.
On my local machine I have az and
jq installed, so I can grab a user token like this
TOK=$(az account get-access-token | jq -r .accessToken)
and use it to install the python package locally like this:
pip install git+https://${TOK}#dev.azure.com/MyTenant/MyProject/_git/MyPythonPackage
And likewise, I can add these steps to my Dockerfile:
ARG ACCESS_TOKEN
RUN pip install git+https://${TOK}#dev.azure.com/MyTenant/MyProject/_git/MyPythonPackage
and build the Dockerfile locally like this:
TOK=$(az account get-access-token | jq -r .accessToken)
docker build --build-arg ACCESS_TOKEN=${TOK} .
However, when I try to do the same thing in my Azure pipeline like this:
resources:
repositories:
- repository: ANameLocalToThisPipeline
type: git
name: MyProject/MyPythonPackage
steps:
- task: Docker#2
displayName: Build the Docker Image
inputs:
command: build
Dockerfile: "$(Build.SourcesDirectory)/Dockerfile"
buildContext: "$(Build.SourcesDirectory)/"
tags: |
latest
arguments: --build-arg ACCESS_TOKEN=$(System.AccessToken)
The docker build step fails with this message:
fatal: could not read Password for 'https://dev.azure.com': No such device or address
error: subprocess-exited-with-error
The short answer is that this happens because the service principal that is
used by your pipeline doesn't have permissions for the other repositories in
your project. A quick and dirty hack to get those permissions is to define
your other repository in the list of resources and check it out, like this:
steps:
checkout: ANameLocalToThisPipeline
Only after adding the checkout step will the pipeline realize it has insufficient
permissions and ask you to grant the required permissions to the pipeline's service principal.
To be clear, without additional changes, your pipeline will still fail because
adding a checkout step currently has several side effects that will break other parts of your pipeline.
However, after
your pipeline has run and failed, you can revert the pipeline definition to
the previous state, and the pipeline will retain the newly granted
permissions to your other repo, allowing your original pipeline to
successfully clone (via pip install) the other package the the Dockerfile
using the service principal (via $(System.AccessToken))

GCP Dockerfile using Artifact Registry

I have a question.
What's the best approach to building a Docker image using the pip artifact from the Artifact Registry?
I have a Cloud Build build that runs a Docker build, the only Dockerfile is pip install -r requirements.txt, one of the dependencies of which is the library located in the Artifact Registry.
When executing a stage with the image gcr.io / cloud-builders / docker, I get the error that my Artifact Registry is not accessible, which is quite logical. I have access only from the image performing the given step, not from the image that is being built in this step.
Any ideas?
Edit:
For now I will use Secret Manager to pass JSON key to my Dockerfile, but hope for better solution.
When you use Cloud Build, you can forward the metadata server access through the Docker build process. It's documented, but absolutely not clear (personally, the first time I made a mail to Cloud Build PM to ask him, and he send me the documentation link.)
Now, your docker build can access the metadata server and be authenticated with the Cloud Build runtime service account. It should make your process easiest.

Is there a way to run an already-built python API from google cloud?

I built a functioning python API that runs from my local machine. I'd like to run this API from Google Cloud SDK, but after looking through the documentation and googling every variation of "run local python API from google cloud SDK" I had no luck finding anything that wouldn't involve me restructuring the script heavily. I have a hunch that "google run" or "API endpoint" might be what I'm looking for, but as a complete newbie to everything other than Firestore (which I would rather not convert my entire api into if I don't have to), I want to ask if there's a straightforward way to do this.
tl;dr The API runs successfully when I simply type "python apiscript.py" into local console, is there a way I can transfer it to Google Cloud without adjusting the script itself too much?
IMO, the easiest solution for portable app is to use Container. And to host the container in serverless mode, you can use Cloud Run.
In the getting started guide, you have python example. The main task for you is to create a Dockerfile
FROM python:3.9-slim
ENV PYTHONUNBUFFERED True
# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
# Install production dependencies.
RUN pip install -r requirements.txt
CMD python apiscript.py
I adapted the script to your description, and I assumed that you have a requirements.txt file for the dependencies.
Now, build your container
gcloud builds submit --tag gcr.io/<PROJECT_ID>/apiscript
Replace the PROJECT_ID by your project ID, not the name of the project (even if sometimes it's the same, it's a common mistake for the newcomers)
Deploy on Cloud Run
gcloud run deploy --region=us-central1 --image=gcr.io/<PROJECT_ID>/apiscript --allow-unauthenticated --platform=managed apiscript
I assume that your API is served on the port 8080. else you need to add a --port parameter to override this.
That should be enough
Here it's a getting started example, you can change the region, the security mode (here no security) the name and the project.
In addition, for this deployment, the Compute Engine default service account is used. You can use another service account if you want, but, in any cases, you need to grant the used service account the permission to access to the Firestore database.

How to upload packages to an instance in a Processing step in Sagemaker?

I have to do large scale feature engineering on some data. My current approach is to spin up an instance using SKLearnProcessor and then scale the job by choosing a larger instance size or increasing the number of instances. I require using some packages that are not installed on Sagemaker instances by default and so I want to install the packages using .whl files.
Another hurdle is that the Sagemaker role does not have internet access.
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import SKLearnProcessor
sess = sagemaker.Session()
sess.default_bucket()
region = boto3.session.Session().region_name
role = get_execution_role()
sklearn_processor = SKLearnProcessor(framework_version='0.20.0',
role=role,
sagemaker_session = sess,
instance_type="ml.t3.medium",
instance_count=1)
sklearn_processor.run(code='script.py')
Attempted resolutions:
Upload the packages to a CodeCommit repository and clone the repo into the SKLearnProcessor instances. Failed with error fatal: could not read Username for 'https://git-codecommit.eu-west-1.amazonaws.com': No such device or address. I tried cloning the repo into a sagemaker notebook instance and it works, so its not a problem with my script.
Use a bash script to copy the packages from s3 using the CLI. The bash script I used is based off this post. But the packages never get copied, and an error message is not thrown.
Also looked into using the package s3fs but it didn't seem suitable to copy the wheel files.
Alternatives
My client is hesitant to spin up containers from custom docker images. Any alternatives?
2. Use a bash script to copy the packages from s3 using the CLI. The bash script I used is based off this post. But the packages never get copied, and an error message is not thrown.
This approach seems sound.
You may be better off overriding the command field on the SKLearnProcessor to /bin/bash, run a bash script like install_and_run_my_python_code.sh that installs the wheel containing your python dependencies, then runs your main python entry point script.
Additionally, instead of using AWS S3 calls to download your code in a script, you could use a ProcessingInput to download your code rather than doing this with AWS CLI calls in a bash script, which is what the SKLearnProcessor does to download your entry point script.py code across all the instances.

Mirror Docker container image to Google Container Registry using least dependencies/permissions

I need to perform the following from a python program:
docker pull foo/bar:tag
docker tag foo/bar:tag gcr.io/project_id/mirror/foo/bar:tag
gcloud auth configure-docker --quiet
docker push gcr.io/project_id/mirror/foo/bar:tag
I want to accomplish this with the minimal possible footprint - no root, no privileged Docker installation, etc. The Google Cloud SDK is installed.
How to programmatically mirror the image with minimal app footprint?
Google cloud build API can be used to perform all your required steps in one command Or use Trigger.
gcloud builds submit --tag gcr.io/$DEVSHELL_PROJECT_ID/$UMAGE_NAME:v0.1 .
Above command, you can call using Python cloud Build API
https://googleapis.dev/python/cloudbuild/latest/gapic/v1/api.html

Categories