I have a question.
What's the best approach to building a Docker image using the pip artifact from the Artifact Registry?
I have a Cloud Build build that runs a Docker build, the only Dockerfile is pip install -r requirements.txt, one of the dependencies of which is the library located in the Artifact Registry.
When executing a stage with the image gcr.io / cloud-builders / docker, I get the error that my Artifact Registry is not accessible, which is quite logical. I have access only from the image performing the given step, not from the image that is being built in this step.
Any ideas?
Edit:
For now I will use Secret Manager to pass JSON key to my Dockerfile, but hope for better solution.
When you use Cloud Build, you can forward the metadata server access through the Docker build process. It's documented, but absolutely not clear (personally, the first time I made a mail to Cloud Build PM to ask him, and he send me the documentation link.)
Now, your docker build can access the metadata server and be authenticated with the Cloud Build runtime service account. It should make your process easiest.
Related
I built a functioning python API that runs from my local machine. I'd like to run this API from Google Cloud SDK, but after looking through the documentation and googling every variation of "run local python API from google cloud SDK" I had no luck finding anything that wouldn't involve me restructuring the script heavily. I have a hunch that "google run" or "API endpoint" might be what I'm looking for, but as a complete newbie to everything other than Firestore (which I would rather not convert my entire api into if I don't have to), I want to ask if there's a straightforward way to do this.
tl;dr The API runs successfully when I simply type "python apiscript.py" into local console, is there a way I can transfer it to Google Cloud without adjusting the script itself too much?
IMO, the easiest solution for portable app is to use Container. And to host the container in serverless mode, you can use Cloud Run.
In the getting started guide, you have python example. The main task for you is to create a Dockerfile
FROM python:3.9-slim
ENV PYTHONUNBUFFERED True
# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
# Install production dependencies.
RUN pip install -r requirements.txt
CMD python apiscript.py
I adapted the script to your description, and I assumed that you have a requirements.txt file for the dependencies.
Now, build your container
gcloud builds submit --tag gcr.io/<PROJECT_ID>/apiscript
Replace the PROJECT_ID by your project ID, not the name of the project (even if sometimes it's the same, it's a common mistake for the newcomers)
Deploy on Cloud Run
gcloud run deploy --region=us-central1 --image=gcr.io/<PROJECT_ID>/apiscript --allow-unauthenticated --platform=managed apiscript
I assume that your API is served on the port 8080. else you need to add a --port parameter to override this.
That should be enough
Here it's a getting started example, you can change the region, the security mode (here no security) the name and the project.
In addition, for this deployment, the Compute Engine default service account is used. You can use another service account if you want, but, in any cases, you need to grant the used service account the permission to access to the Firestore database.
I am using google cloud appengine and deploying with gcloud app deploy and a standard app.yaml file. My requirements.txt file has one private package that is fetched from github (git+ssh://git#github.com/...git). This install works locally, but when I run the deploy I get
Host key verification failed.
fatal: Could not read from remote repository.
This suggests there is no ssh key when installing. Reading docs (https://cloud.google.com/appengine/docs/standard/python3/specifying-dependencies) it appears that this just isn't an option???
Dependencies are installed in a Cloud Build environment that does not provide access to SSH keys. Packages hosted on repositories that require SSH-based authentication must be copied into your project directory and uploaded alongside your project's code using the pip package manager.
To me this seems severely not-optimal - the whole point of factoring out code into a package was to be able to avoid duplication in repos. Now, if I want to use appengine, you're telling me this not possible?
Is there really no workaround?
See:
https://cloud.google.com/appengine/docs/standard/python3/specifying-dependencies#private_dependencies
The App Engine service does not (and should not) have access to your private repo.
One alternative (that you don't want) is to upload your public key to the App Engine service.
The other -- as documented -- is that you must provide the content of your private repo to the service as part of your upload.
I'm going through the same issue, deploying on gcloud a python project that contains in its requirements.txt some private repositories. As #DazWilkin wrote already, there's no way to deploy it like you do normally.
One option would be to create a docker image of the whole project and its dependencies, save it into the gcloud docker registry and then pull it into the App Engine instance.
I want to use dataflow to process in parallel a bunch of video clips I have stored in google storage. My processing algorithm has non-python dependencies and is expected to change over development iterations.
My preference would be to use a dockerized container with the logic to process the clips, but it appears that custom containers are not supported (in 2017):
use docker for google cloud data flow dependencies
Although they may be supported now - since it was being worked on:
Posthoc connect FFMPEG to opencv-python binary for Google Cloud Dataflow job
According to this issue a custom docker image may be pulled, but I couldn't find any documentation on how to do it with dataflow.
https://issues.apache.org/jira/browse/BEAM-6706?focusedCommentId=16773376&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16773376
Another option might be to use setup.py to install any dependencies as described in this dated example:
https://cloud.google.com/blog/products/gcp/how-to-do-distributed-processing-of-landsat-data-in-python
However, when running the example I get an error that there is no module named osgeo.gdal.
For pure python dependencies I have also tried to pass the --requirements_file argument, however I still get an error: Pip install failed for package: -r
I could find documentation for adding dependencies to apache_beam, but not to dataflow, and it appears the apache_beam instructions do not work, based on my tests of --requirements_file and --setup_file
This was answered in the comments, rewriting here for clarity:
In Apache Beam you can modify the setup.py file while will be run once per container on start-up. This file allows you to perform arbitrary commands before the the SDK Harness start to receive commands from the Runner Harness.
A complete example can be found in the Apache Beam repo.
As of 2020, you can use Dataflow Flex Templates, which allow you to specify a custom Docker container in which to execute your pipeline.
I'm new to using Docker, so I'm either looking for direct help or a link to a relevant guide. I need to train some deep learning models on my school's linux server, but I can't manually install pytorch and other python packages since I don't have root access (sudo). Another student said that he uses docker and has everything ready to go in his container.
I'm wondering how to wrap up my code and relevant packages into a container that I can push to the linux server and then run.
To address your specific problem the easiest way I found to get code into a container is to use git.
start the container in interactive mode or ssh to it if it's attached to a network.
git clone <your awesome deep learning code>. In your git repo have a requirements.txt file. Change directories into your local clone of your repo and run pip install -r requirements.txt
Run whatever script you need to run your code. Note you can easily put your pip install command in one of your run scripts.
It's important to remember that docker containers are stateless/ephemeral. You should not expect the container nor its contents to exist in some durable fashion. This specific issue is addressed by mapping a directory on the host system to a directory in the container.
Side note: I first recommend starting with the docker tutorial. You can easily skip over the installation parts if you are working on system that already has docker installed and where you have permissions to build, start, and stop containers.
I don't have root access (sudo). Another student said that he uses docker
I would like to point out that docker requires sudo permissions.
Instead I think you should look at using something like Google Colab or JupyterLab. This gives you the added benefit of code that is backed-up on a remote server
My Python App Engine Flex application needs to connect to an external Oracle database. Currently I'm using the cx_Oracle Python package which requires me to install the Oracle Instant Client.
I have successfully run this locally (on macOS) by following the Instant Client installation steps. The steps required me to do the following:
Make a directory called /opt/oracle
Create a symlink from /opt/oracle/instantclient_12_2/libclntsh.dylib.12.1 to ~/lib/
However, I am confused about how to do the same thing in App Engine Flex (instructions). Specifically, here's what I'm confused about:
The instructions say I should run sudo yum install libaio to install the libaio package. How do I do this on GAE Flex? Or is this package already available?
I think I can add the Instant Client files to GAE (a whopping ~100MB!), then set the LD_LIBRARY_PATH environment variable in app.yaml to export LD_LIBRARY_PATH=/opt/oracle/instantclient_12_2:$LD_LIBRARY_PATH. Will this work?
Is this even feasible without using custom Docker containers on App Engine Flex?
Overall I'm not sure if I'm on the right track. Would love to hear from someone who has managed this before :)
If any of your dependencies is not available in the base GAE flex images provided by Google and cannot be installed via pip (because it's not a python package or it's not available in PyPI or whatever other reason) then you can't use the requirements.txt file to get it installed in your GAE flex app.
The proper way to satisfy such dependencies would be to build your own custom runtime. From About Custom Runtimes:
Custom runtimes allow you to define new runtime environments, which
might include additional components like language interpreters or
application servers.
Yes, that means providing a custom Docker file. In your particular case you'd be installing the Instant Client and libaio inside this Dockerfile. See also Building Custom Runtimes.
Answering your first question, I think that the instructions in the oracle website just show that you have to install said library for your application to work.
In the case of App engine flex, they way to ensure that the libraries are present in the deployment is with the requirements.txt textfile. There is a documentation page which does explain how to do so.
On the other hand, I will assume that "Instant Client Files" are not libraries, but necessary data for your App to run. You should use Google Cloud Storage to serve them, or any other alternative of Storage within Google Cloud.
I believe that, if this is all what you need for your App to work, pushing your own custom container should not be necessary.