How to package sub-folder for gcloud ml?

How to package sub-folder for gcloud ml? - python

I am trying to upload my project to google cloud ml-engine for training. I have followed the "getting started" guide, replacing in relevant places with my own files.
I manage to train locally using
gcloud ml-engine local train --module-name="my-model.task" --package-path=my-model/ -- ./my_model/model_params_google.json
Yes, I have dashes in the module name :(. I also made a symbolic link my_module -> my-module so that I can use the name with underscore instead of dash. In any case, I don't think this is the problem, since the above command works well locally.
My folder structure doesn't follow the recommended one, since I had the project before thinking about ml-engine. It looks like this:
my-model/
├── __init__.py
├── setup.py
├── task.py
├── model_params_google.json
├── src
│   ├── __init__.py
│   ├── data_handler.py
│   ├── elastic_helpers.py
│   ├── model.py
The problem is that the src folder is not packaged/uploaded with the code, so in the cloud, when I say from .src.model import model_fn in task.py, it fails.
The command I use for packaging is (in folder my-model/../):
gcloud ml-engine jobs submit training my_model_$(date +"%Y%m%d_%H%M%S") \
--staging-bucket gs://model-data \
--job-dir $OUTPUT_PATH \
--module-name="my_model.task" \
--package-path=my_model/ \
--region=$REGION \
--config config.yaml --runtime-version 1.8 \
-- \
tf_crnn/model_params_google.json --verbosity DEBUG
It packages my-model.0.0.0.tar.gz without the contents of my-model/src. I cannot figure out why. I'm using the example setup.py:
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['tensorflow>=1.8']
setup(
name='my_model',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='my first model'
)
So, the question is, why does gcloud not pack the src folder ?

You need to put the setup.py in the directory above my-model.
You can check your results by invoking:
python setup.py sdist
Then un-taring the tarball in the dist directory. As is, you'll see that task.py is not included in the tarball.
By moving setup.py one directory higher and repeating, you'll see that task.py is included, as is everything in src.

Related

how to run a script using pyproject.toml settings and poetry?

I am using poetry to create .whl files.
I have an ftp sever runing on a remote host.
I wrote a python script (log_revision.py) which save in a database the git commit, few more parameters and in the end send the the .whl(that poetry created) to the remote server ( each .whl in a different path in the server, the path is save in the db) .
At the moment I run the script manually after each time I run the poetry build commend.
I know the pyproject.toml has the [tool.poetry.scripts] but i dont get how can i use it to run a python script.
I tried
[tool.poetry.scripts]
my-script = "my_package_name:log_revision.py
and then poetry run my-script but I allways get an error
AttributeError: module 'my_package_namen' has no attribute 'log_revision'
1. can some one please help me understand how to run to wish commend?
as a short term option(with out git and params) i tried to use the poetry publish -r http://192.168.1.xxx/home/whl -u hello -p world but i get the following error
[RuntimeError]
Repository http://192.168.1.xxx/home/whl is not defined
2. what am i doing wring and how can i fix it?
would appricate any help, thx!

At the moment the [tool.poetry.scripts] sections is equivalent to setuptools console_scripts.
So the argument must be a valid module and method name. Let's imagine within your package my_package, you have log_revision.py, which has a method start(). Then you have to write:
[tool.poetry.scripts]
my-script = "my_package.log_revision:start"
Here's a complete example:
You should have this folder structure:
my_package
├── my_package
│   ├── __init__.py
│   └── log_revision.py
└── pyproject.toml
The content of pyproject.toml is:
[tool.poetry]
name = "my_package"
version = "0.1.0"
description = ""
authors = ["Your Name <you#example.com>"]
[tool.poetry.dependencies]
python = "^3.8"
[tool.poetry.scripts]
my-script = "my_package.log_revision:start"
[build-system]
requires = ["poetry_core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
and of log_revision.py:
def start():
print("Hello")
After you have run poetry install once you should be able to do this:
$ poetry run my-script
Hello
You cannot pass something to the start() method directly. Instead you can use command line arguments and parse them, e.g. with pythons argparse.

Although the previous answers are correct, they are a bit complicated. The simplest way to run a python script with poetry is as follows:
poetry run python myscript.py
If you are using a dev framework like streamlit you can use
poetry run streamlit run myapp.py
Basically anything you put after poetry run will execute from the poetry virtual environment.

For future visitors, I think what OP is asking for (a post build hook?) isn't directly supported. But you might find satisfaction from using a tool I wrote called poethepoet which integrates with poetry to run arbitrary tasks defined in the pyproject.toml in terms of shell commands or by referencing python functions.
For example you could define something like the following in your pyproject.toml
[tool.poe.tasks.log_revision]
script = "my_package.log_revision:main" # where main is the name of the python function in the log_revision module
help = "Register this revision in the catalog db"
[tool.poe.tasks.build]
cmd = "poetry build"
help = "Build the project"
[tool.poe.tasks.shipit]
sequence = ["build", "log_revision"]
help = "Build and deploy"
And then execute and of the tasks with the poe CLI like the following which will run the other two tasks in sequence, thus building your project and running the deployment script in one go!
poe shipit
By default tasks are executed inside the poetry managed virtualenv (like using poetry run) so the python script can use dev dependencies.
If you need to define CLI arguments or load values into a task from a dotenv file (such as credentials) then this is also supported.
Update: poetry plugin support
poethepoet can now support post build hooks when used as a poetry plugin. For example when using poetry >=1.2.0b1 you can configure the following to run your log_revision task automatically after poetry build is run:
[tool.poe.poetry_hooks]
post_build = "log-revision"
[tool.poe.tasks.log-revision]
script = "scripts:log_revision"

Tinkering with such a problem for a couple of hours and found a solution
I had a task to start the django server via poetry script.
Here are the directories. manage.py is in test folder:
├── pyproject.toml
├── README.rst
├── runserver.py
├── test
│   ├── db.sqlite3
│   ├── manage.py
│   └── test
│   ├── asgi.py
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-39.pyc
│   │   ├── settings.cpython-39.pyc
│   │   ├── urls.cpython-39.pyc
│   │   └── wsgi.cpython-39.pyc
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── tests
│   ├── __init__.py
│   └── test_tmp.py
└── tmp
└── __init__.py
I had to create a file runserver.py:
import subprocess
def djtest():
cmd =['python', 'test/manage.py', 'runserver']
subprocess.run(cmd)
then write the script itself pyproject.toml:
[tool.poetry.scripts]
dj = "runserver:djtest"
and still make changes to pyproject.toml:
[tool.poetry.scripts]
dj = "runserver:djtest"
only then command poetry run dj started working

How can I make pip install package data (a config file)?

I have a package called clana (Github, PyPI) with the following structure:
.
├── clana
│   ├── cli.py
│   ├── config.yaml
│   ├── __init__.py
│   ├── utils.py
│   └── visualize_predictions.py
├── docs/
├── setup.cfg
├── setup.py
├── tests/
└── tox.ini
The setup.py looks like this:
from setuptools import find_packages
from setuptools import setup
requires_tests = [...]
install_requires = [...]
config = {
"name": "clana",
"version": "0.3.6",
"author": "Martin Thoma",
"author_email": "info#martin-thoma.de",
"maintainer": "Martin Thoma",
"maintainer_email": "info#martin-thoma.de",
"packages": find_packages(),
"entry_points": {"console_scripts": ["clana=clana.cli:entry_point"]},
"install_requires": install_requires,
"tests_require": requires_tests,
"package_data": {"clana": ["clana/config.yaml"]},
"include_package_data": True,
"zip_safe": False,
}
setup(**config)
How to check that it didn't work
Quick
python3 setup.py sdist
open dist/clana-0.3.8.tar.gz # config.yaml is not in this file
The real check
I thought this would make sure that the config.yaml is in the same directory as the cli.py when the package is installed. But when I try this:
virtualenv venv
source venv/bin/activate
pip install clana
cd venv/lib/python3.6/site-packages/clana
ls
I get:
cli.py __init__.py __pycache__ utils.py visualize_predictions.py
The way I upload it to PyPI:
python3 setup.py sdist bdist_wheel && twine upload dist/*
So the config.yaml is missing. How can I make sure it is there?

You can add a file name MANIFEST.in next to setup.py with a list of the file you want to add, wildcard allowed (ex: include *.yaml or include clana/config.yaml)
then the option include_package_data=True will activate the manifest file

In short: add config.yaml to MANIFEST.in, and set include_package_data. One without the other is not enough.
Basically it goes like this:
MANIFEST.in adds files to sdist (source distribution).
include_package_data adds these same files to bdist (built distribution), i.e. it extends the effect of MANIFEST.in to bdist.
exclude_package_data prevents files in sdist to be added to bdist, i.e. it filters the effect of include_package_data.
package_data adds files to bdist, i.e. it adds build artifacts (typically the products of custom build steps) to your bdist and has of course no effect on sdist.
So in your case, the file config.yaml is not installed, because it is not added to your bdist (built distribution). There are 2 ways to fix this depending on where the file comes from:
either the file is a build artifact (typically it is somehow created during the ./setup.py build phase), then you need to add it to package_data ;
or the file is part of your source (typically it is in your source code repository), then you need to add it to MANIFEST.in, set include_package_data, and leave it out of exclude_package_data (this seems to be your case here).
See:
https://stackoverflow.com/a/54953494/11138259
https://setuptools.readthedocs.io/en/latest/setuptools.html#including-data-files

Following from the documentation on including data files, if your package has data files such as .yaml files, you may include them like so:
setup(
...
package_data={
"": ["*.yaml"],
},
...
)
This will allow any file in your package with the file extension .yaml to be included.

Python package published with poetry is not found after install

In the last few days, I was working on a python module. Until now, I used poetry as a packages management tool in many other projects, but it is my first time wanting to publish a package to PyPI.
I was able to run the poetry build and poetry publish commands. I was also able to also install the published package:
$ pip3 install git-profiles
Collecting git-profiles
Using cached https://files.pythonhosted.org/packages/0e/e7/bac9027effd1e34a5b5718f2b35c0b28b3d67f3809e2f2981b6c7b58963e/git_profiles-1.1.0-py3-none-any.whl
Installing collected packages: git-profiles
Successfully installed git-profiles-1.1.0
However, right after the install, I am not able to run my package:
$ git-profiles --help
git-profiles: command not found
My project has the following structure:
git-profiles/
├── src/
│   ├── commands/
│   ├── executor/
│   ├── git_manager/
│   ├── profile/
│   ├── utils/
│   ├── __init__.py
│   └── git_profiles.py
└── tests
I tried to work with different scripts configurations in the pyproject.toml file but I've never been able to make it work after install.
[tool.poetry.scripts]
poetry = "src:git_profiles.py"
or
[tool.poetry.scripts]
git-profile = "src:git_profiles.py"
I don't know if this is a python/pip path/version problem or I need to change something in the configuration file.
If it is helpful, this is the GitHub repository I'm talking about. The package is also published on PyPI.

Poetry's scripts sections wraps around the console script definition of setuptools. As such, the entrypoint name and the call path you give it need to follow the exact same rules.
In short, a console script does more or less this from the shell:
import my_lib # the module isn't called src, that's just a folder name
# the right name to import is whatever you put at [tool.poetry].name
my_lib.my_module.function()
Which, if given the name my-lib-call (the name can be the same as your module, but it doesn't need to be) would be written like this:
[tool.poetry.scripts]
my-lib-call = "my_lib.my_module:function"
Adapted to your project structure, the following should do the job:
[tool.poetry.scripts]
git-profile = "git-profiles:main"

How to tell pbr to include non-code files in package

I only run into this problem when I build in a python:alpine image. Reproducing it is a bit of a pain, but these are the steps:
Docker container setup:
$ docker run -it python:3.7-rc-alpine /bin/ash
$ pip install pbr
Small package setup, including non-python files
test
├── .git
├── setup.cfg
├── setup.py
└── src
   └── test
   ├── __init__.py
├── test.yml
   └── sub_test
      ├── __init__.py
      └── test.yml
setup.py:
from setuptools import find_packages, setup
setup(
setup_requires=['pbr'],
pbr=True,
package_dir={'': 'src'},
packages=find_packages(where='src'),
)
setup.cfg:
[metadata]
name = test
All other files are empty. I copy them to the container with docker cp test <docker_container>:/test.
Back in the container I now try to build the package with cd test; pip wheel -w wheel ., the test.yml in test/src/test will be included in it, but the one in test/src/test/sub_test won't.
I have no clue why this happens, since the (pitifully sparse, and imo quite confusing) documentation of pbr on that matter states that
Just like AUTHORS and ChangeLog, why keep a list of files you wish to include when you can find many of these in git. MANIFEST.in generation ensures almost all files stored in git, with the exception of .gitignore, .gitreview and .pyc files, are automatically included in your distribution.
I could not find a pbr-parameter that lets me explicitly include some file or file type, which I expected to exist.
Creating a MANIFEST.in with import src/test/sub_test.test.yml actually solves this problem, but I'd rather understand and avoid this behavior all together instead.

pbr needs git in order to correctly compile its files-to-include list, so the problem can be solved by installing git into the build environment before building the package. With an alpine image, that would be apk add --no-cache git.
pbr uses the .git file to figure out which files should be part of the package and which ones shouldn't. The short version is that it grabs the intersection of the file list from the packages parameter in the setup-call in setup.py and everything that is commited or staged in the currently checked out git branch.
So if the project came with no .git file, you'd need to additionally execute git init; git add src as well.
The reason for the 'bug' is that pbr silently assumes that all .py files should be added regardles of them being commited or not, which makes the actual problem harder to identify. It will also only throw an error if it can't find a .git file, and not if it is there but it can't get any info from it because git isn't installed.

Packaging stub files

Let's say I have very simple package with a following structure:
.
├── foo
│   ├── bar
│   │   └── __init__.py
│   └── __init__.py
└── setup.py
Content of the files:
setup.py:
from distutils.core import setup
setup(
name='foobar',
version='',
packages=['foo', 'foo.bar'],
url='',
license='Apache License 2.0',
author='foobar',
author_email='',
description=''
)
foo/bar/__init__.py:
def foobar(x):
return x
The remaining files are empty.
I install the package using pip:
cd foobar
pip install .
and can confirm it is installed correctly.
Now I want to create a separate package with stub files:
.
├── foo
│   ├── bar
│   │   └── __init__.pyi
│   └── __init__.pyi
└── setup.py
Content of the files:
setup.py:
from distutils.core import setup
import sys
import pathlib
setup(
name='foobar_annot',
version='',
packages=['foo', 'foo.bar'],
url='',
license='Apache License 2.0',
author='foobar',
author_email='',
description='',
data_files=[
(
'shared/typehints/python{}.{}/foo/bar'.format(*sys.version_info[:2]),
["foo/bar/__init__.pyi"]
),
],
)
foo.bar.__init__.pyi:
def foobar(x: int) -> int: ...
I can install this package, see that it creates anaconda3/shared/typehints/python3.5/foo/bar/__init__.pyi in my Anaconda root, but it doesn't look like it is recognized by PyCharm (I get no warnings). When I place pyi file in the main package everything works OK.
I would be grateful for any hints how to make this work:
I've been trying to make some sense from PEP 484 - Storing and distributing stub files but to no avail. Even pathlib part seem to offend my version of distutils
PY-18597 and https://github.com/python/mypy/issues/1190#issuecomment-188526651 seem to be related but somehow I cannot connect the dots.
I tried putting stubs in the .PyCharmX.X/config/python-skeletons but it didn't help.'
Some things that work, but don't resolve the problem:
Putting stub files in the current project and marking as sources.
Adding stub package root to the interpreter path (at least in some simple cases).
So the questions: How to create a minimal, distributable package with Python stubs, which will be recognized by existing tools. Based on the experiments I suspect one of two problems:
I misunderstood the structure which should be created by the package in the shared/typehints/pythonX.Y - if this is true, how should I define data_files?
PyCharm doesn't consider these files at all (this seem to be contradicted by some comments in the linked issue).
It suppose to work just fine, but I made some configure mistake and looking for external problem which doesn't exist.
Are there any established procedures to troubleshoot problems like this?

Problem is that you didn't include the foo/__init__.pyi file in your stub distribution. Even though it's empty, it makes foo a stub files package, and enables search for foo.bar.
You can modify the data_files in your setup.py to include both
data_files=[
(
'shared/typehints/python{}.{}/foo/bar'.format(*sys.version_info[:2]),
["foo/bar/__init__.pyi"]
),
(
'shared/typehints/python{}.{}/foo'.format(*sys.version_info[:2]),
["foo/__init__.pyi"]
),
],

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to package sub-folder for gcloud ml? - python

Related

how to run a script using pyproject.toml settings and poetry?

How can I make pip install package data (a config file)?

Python package published with poetry is not found after install

How to tell pbr to include non-code files in package

Packaging stub files

Categories

Resources