Dockerfile COPY command is missing a single file when using `gcloud build` - python

I have run into an incredibly frustrating problem where a COPY command in my Dockerfile successfully copies all of my apps files except one. I do not have a .dockerignore file so I know the file isn't being excluded from the build that way.
Note: I do have a .gitignore which is excluding file2.json that I do not want to version. But as you will see below, I'm building from my local folder, not remotely from a clone/checkout so I don't see why .gitignore would influence the docker build in this case.
Below is what my directory looks like:
$ tree -a -I .git app
app
├── app
│   ├── data
│   │   ├── file1.txt
│   │   ├── file2.json
│   │   ├── file3.txt
│   │   └── file4.yml
│   ├── somefile2.py
│   └── somefile.py
├── Dockerfile
├── .gitignore
├── requirements.txt
└── setup.py
And this is what is in my Dockerfile looks like
FROM ubuntu:18.04
FROM python:3.7
COPY . /app
RUN cp app/app/data/file2.json ~/.somenewhiddendirectory
RUN pip install app/.
ENTRYPOINT ["python", "app/app/somefile.py"]
For some reason, file2.json is not being copied during the COPY . /app call and I am getting an error when I try to cp it somewhere else. I have done a call like RUN ls app/app/data/ and all the files except file2.json are in there. I checked the files permissions and made sure they are the same as all the other files. I have tried doing a direct COPY of that file which results in an error since Docker says that file doesn't exist.
On my system, that file exists, I can see it with ls, and I can cat its contents. I have played around with ensuring the context within the image is squarely within the root directory of my app, and like I said, all files are correctly copied except that json file. I can't for the life of my figure out why Docker hates this file.
For some added context, I am using Google's cloud build to build the image and the yaml config looks like this:
steps:
- name: gcr.io/cloud-builders/docker
id: base-image-build
waitFor: [-]
args:
- build
- .
- -t
- us.gcr.io/${PROJECT_ID}/base/${BRANCH_NAME}:${SHORT_SHA}
images:
- us.gcr.io/${PROJECT_ID}/base/${BRANCH_NAME}:${SHORT_SHA}
and the command I am executing looks like this:
gcloud builds submit --config=cloudbuild.yaml . \
--substitutions=SHORT_SHA="$(git rev-parse --short HEAD)",BRANCH_NAME="$(git rev-parse --abbrev-ref HEAD)"

Disclaimer: I have never used Google's cloud build so my answer is only based on read theory.
I don't see why .gitignore would influence the docker build in this case
Indeed, docker build in itself does not care about your .gitignore file. But you are building through Google's cloud build and this is a totally different story.
Quoting the documentation for the source specification in gcloud build command:
[SOURCE]
The location of the source to build. The location can be a directory on a local disk or a gzipped archive file (.tar.gz) in Google Cloud Storage. If the source is a local directory, this command skips the files specified in the --ignore-file. If --ignore-file is not specified, use .gcloudignore file. If a .gcloudignore file is absent and a .gitignore file is present in the local source directory, gcloud will use a generated Git-compatible .gcloudignore file that respects your .gitignore files. The global .gitignore is not respected. For more information on .gcloudignore, see gcloud topic gcloudignore
So in your given case, your file will be ignored even for a build from your local directory. At this point I see 2 options to workaround this problem:
Remove the entry for your file in .gitignore so that the default gcloud mechanism does not ignore it during your build
Provide a --ignore-file or a default .gcloudignore which actually re-includes the local file that is ignored for versioning.
I would personally go for the second option with something super simple like the following .gcloudignore file (crafted from the relevant documentation)
.git
.gcloudignore
.gitignore

Related

How to point to a folder relative to Python file when running in virtualenv/Pipenv?

I don't think my question is FastAPI-specific, but more Python-virtualenv generic, but here's my example use case with FastAPI anyway.
I'm running FastAPI from a virtualenv set up by Pipenv and I'd like it to serve static files for development purposes only. I know this is solved with a proper production deployment and I will be using a proper web server for that, but my question below is about the development environment.
File structure looks like this:
├── frontend <-- frontend code dir
│   ├── dist <-- frontend build output (static files)
│   ├── public
│   └── [..]
├── Pipfile <-- contains 'myproject = {editable = true, path = "."}' as dep
├── Pipfile.lock
├── setup.cfg
├── setup.py
├── myproject <-- backend code dir
│   ├── web.py <-- fastapi file
│   └── [...]
└── [...]
myproject/web.py:
import fastapi
app = fastapi.FastAPI()
app.mount("/", fastapi.staticfiles.StaticFiles(
directory="/absolute/path/to/frontend/dist"),
name="static",
)
Running the FastAPI dev server with pipenv run uvicorn myproject.web:app --reload.
Which works now, but I'd like to avoid this absolute path in my public source. Any relative path does not work, as it appears to be relative to the installed path in the virtualenv created by Pipenv, which is not deterministic. The same goes for the __file__ path; it's inside a virtualenv.
I've also considered to add the statics as data files to the Python package and load it from the directory as provided by pkg_resources (e.g. pkg_resources.resource_filename('myproject', 'data/')), but I don't plan to ship the statics with the Python package in production so it feels hackish.
Would it be possible to somehow link the right directory here by a relative path or do you have another clever idea in order to share the project in a team of devs (or publicly on GitHub for that matter) a little bit better?
Because you want to be flexible from where it will run and the statics location is unrelated, any constant relative path can not work.
I would go for:
Discovering the statics by some heuristics like walking back to the some known "anchor" folder (root folder?)
I've done something like this once, and even cached the found absolute path result in some persistent temp file so this discovery will happen only once per machine (unless the location moved or the temp cached file was deleted).
Setting the static location as config/command-line/both parameter that your server launched with.

Change tox workdir

Looking on the tox global settings section from tox documentation, the .tox directory which is working dir, is created in directory where tox.ini is located:
toxworkdir={toxinidir}/.tox(PATH)
Directory for tox to generate its environments into, will be created if
it does not exist.
Is there a way to change this location?
For instance, I have a project as follows:
awesome_project/
├── main.py
├── src
│   └── app.py
└── tox.ini
I want to execute tox.ini from awesome_project dir, but want to write .tox dir in /tmp/.tox not in awesome_project/.tox.
Thanks for help.
{toxinidir}/.tox is just a default value that you can change in tox.ini:
[tox]
toxworkdir=/tmp/.tox
You can change the working directory from the command line:
tox --workdir /tmp/.tox
It may be useful if you want to use the same tox.ini file in two different computers with different working directories.

How to tell pbr to include non-code files in package

I only run into this problem when I build in a python:alpine image. Reproducing it is a bit of a pain, but these are the steps:
Docker container setup:
$ docker run -it python:3.7-rc-alpine /bin/ash
$ pip install pbr
Small package setup, including non-python files
test
├── .git
├── setup.cfg
├── setup.py
└── src
   └── test
   ├── __init__.py
├── test.yml
   └── sub_test
      ├── __init__.py
      └── test.yml
setup.py:
from setuptools import find_packages, setup
setup(
setup_requires=['pbr'],
pbr=True,
package_dir={'': 'src'},
packages=find_packages(where='src'),
)
setup.cfg:
[metadata]
name = test
All other files are empty. I copy them to the container with docker cp test <docker_container>:/test.
Back in the container I now try to build the package with cd test; pip wheel -w wheel ., the test.yml in test/src/test will be included in it, but the one in test/src/test/sub_test won't.
I have no clue why this happens, since the (pitifully sparse, and imo quite confusing) documentation of pbr on that matter states that
Just like AUTHORS and ChangeLog, why keep a list of files you wish to include when you can find many of these in git. MANIFEST.in generation ensures almost all files stored in git, with the exception of .gitignore, .gitreview and .pyc files, are automatically included in your distribution.
I could not find a pbr-parameter that lets me explicitly include some file or file type, which I expected to exist.
Creating a MANIFEST.in with import src/test/sub_test.test.yml actually solves this problem, but I'd rather understand and avoid this behavior all together instead.
pbr needs git in order to correctly compile its files-to-include list, so the problem can be solved by installing git into the build environment before building the package. With an alpine image, that would be apk add --no-cache git.
pbr uses the .git file to figure out which files should be part of the package and which ones shouldn't. The short version is that it grabs the intersection of the file list from the packages parameter in the setup-call in setup.py and everything that is commited or staged in the currently checked out git branch.
So if the project came with no .git file, you'd need to additionally execute git init; git add src as well.
The reason for the 'bug' is that pbr silently assumes that all .py files should be added regardles of them being commited or not, which makes the actual problem harder to identify. It will also only throw an error if it can't find a .git file, and not if it is there but it can't get any info from it because git isn't installed.

How to package sub-folder for gcloud ml?

I am trying to upload my project to google cloud ml-engine for training. I have followed the "getting started" guide, replacing in relevant places with my own files.
I manage to train locally using
gcloud ml-engine local train --module-name="my-model.task" --package-path=my-model/ -- ./my_model/model_params_google.json
Yes, I have dashes in the module name :(. I also made a symbolic link my_module -> my-module so that I can use the name with underscore instead of dash. In any case, I don't think this is the problem, since the above command works well locally.
My folder structure doesn't follow the recommended one, since I had the project before thinking about ml-engine. It looks like this:
my-model/
├── __init__.py
├── setup.py
├── task.py
├── model_params_google.json
├── src
│   ├── __init__.py
│   ├── data_handler.py
│   ├── elastic_helpers.py
│   ├── model.py
The problem is that the src folder is not packaged/uploaded with the code, so in the cloud, when I say from .src.model import model_fn in task.py, it fails.
The command I use for packaging is (in folder my-model/../):
gcloud ml-engine jobs submit training my_model_$(date +"%Y%m%d_%H%M%S") \
--staging-bucket gs://model-data \
--job-dir $OUTPUT_PATH \
--module-name="my_model.task" \
--package-path=my_model/ \
--region=$REGION \
--config config.yaml --runtime-version 1.8 \
-- \
tf_crnn/model_params_google.json --verbosity DEBUG
It packages my-model.0.0.0.tar.gz without the contents of my-model/src. I cannot figure out why. I'm using the example setup.py:
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['tensorflow>=1.8']
setup(
name='my_model',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='my first model'
)
So, the question is, why does gcloud not pack the src folder ?
You need to put the setup.py in the directory above my-model.
You can check your results by invoking:
python setup.py sdist
Then un-taring the tarball in the dist directory. As is, you'll see that task.py is not included in the tarball.
By moving setup.py one directory higher and repeating, you'll see that task.py is included, as is everything in src.

AWS Chalice using custom config path or custom app path?

I am trying to use Chalice to fit into an pre-existing build folder structure, the python source file (app.py) is one level deeper than the vanilla Chalice project
├── .chalice
│   └── config.json
└── src
├── app.py
├── requirements.txt
├── requirements_test.txt
When I run chalice local in src folder it says it cannot find the config file:
Unable to load the project config file. Are you sure this is a chalice project?
When I run chalice local in the project folder, it says it cannot find the source file:
No module named 'app'
I had a look at the config file doesn't seem to have an option to specify where the source file is.
Since there were no responses on Stackoverflow, so I went to the project and opened an issue ticket for it, from the horses mouth, this is not possible at this stage.

Categories