How to specify the source (git branch) of a conda build package - python

I created a conda package that build successfully, and that I can install with conda. I am using versioneer to automatically generate the version number of my builds. My project is in a git repository with multiple branches.
My problem is that when I want to install the package, conda will install the last built version (no matter the branch), whereas I would like that it installs by default the last version of the branch Master.
My workaround is to manually specify the version number of the version I want.
Is there a way to generate a version number with versioneer that will make conda install in priority the last built version of the branch master? Alternatively, is there a way to specify conda the branch to get the latest build?
Thanks

Rather than varying the version, I'd suggest looking into encoding the branch info into either the build string or the label/subdirectory. To me, these seem more semantically consistent with the situation.
Build Variants
For the former, this could either be done explicitly by defining a build string that includes some jinja-templated variable coordinated with the Git branch, or automatically through variants defined in the conda_build_config.yaml. If you get this working, then installing a build from branch foo would go something like:
conda install my_package=*=*foo
I don't know a simple example of this, but the Conda Forge blas-feedstock uses a conda_build_config.yaml to define the set of blas_impl options, which is then used to define build strings on the various outputs in meta.yaml.
Repository Labels
For the latter, I only know about Anaconda Cloud hosting (which you may not be using). In that case, one adds a label (subdir) with:
anaconda upload -l foo my_package.tar.gz
If you went this route, then installing a build from a branch foo would go something like:
conda install channel/foo::my_package
where "channel" is the channel to which you upload.

Related

Get the list of packages used in Anaconda

Is there a way to get a list of packages that are being used rather than just installed in the environment?
Example: I can install matplotlib with conda install matplotlib, but if I never used it in any of the files I don't want it to be in the list.
Interesting idea to check the 'frequently used' packages in your environment.
It appears to me that there is no direct way for checking.
I am also attempting to work out this topic now. My layman idea is that we can do it in two consecutive stages: (a) to find out the most-used packages which were either often updated (checked using conda list --revisions) or easily recognized by the user; (b) to trace the dependencies of those packages (whether one package related to another package, or not) through pipdeptree command for checking packages' dependencies. This Anaconda link might also be useful: Managing Anaconda packages
The first step is to identify those most-used packages in your applications from time to time. Then only tracing their dependencies with other packages so that related packages were not unfavorably removed. Despite that, I still think it is better to stick with the default packages provided by Conda and will only add more packages if required.

Can conda perform an install whilst minimally updating dependencies?

The conda install man page says
Conda attempts to install the newest versions of the requested
packages. To accomplish this, it may update some packages that
are already installed, or install additional packages.
So first, does this also apply to dependencies that it determines it needs to install or update? Assuming that the answer is "yes"; can that behaviour be changed? For example when working with legacy code it can be useful to update dependencies as little as possible, or install the oldest version of a dependency that will still work. Is there some way to get the conda dependency resolver to figure this out automatically, or does one have to resort to manually figuring out the dependency updates in this case?
Or maybe I am wrong entirely and this is the default behaviour? The dependency resolution rules are not clear to me from the documentation.
Conda's Two-Stage Solving
Conda first tries to find a version of the requested package that could be installed without changing any installed packages (a frozen solve). If that fails, it simply re-solves the entire environment from scratch with the new constraint added (a full solve). There is no in-between (e.g., minimize packages updated). Perhaps this will change in the future, but this has been the state of things for versions 4.6[?]-4.12.
Mamba
If one needs to manually work things out, I'd strongly suggest looking into Mamba. In addition to being a compiled (fast!) drop-in replacement for conda, the mamba repoquery tool could be helpful for identifying the constraints that are problematic. It has a depends subcommand for identifying dependencies and a whoneeds subcommand for reverse dependencies.
Suggested Workflow
Were I working with legacy code, I might try defining a YAML for the environment (env.yaml) and place upper bounds on crucial packages. If I needed new packages, I would dry run adding it (e.g., mamba install -d somepkg) to see how it affects the environment, figure out what if any constraint (again upper bound) it needs, add it to the YAML, then actually install it with mamba env update -f env.yaml.

Python wheel anti-dependency

I'm managing a python project which can be released in two different variants, "full" and "lightweight", called e.g. my-project and my-project-lw. Both use the same top-level name, e.g. myproject. I have a script that cuts off the "heavy" parts of the project and builds both wheel installable archives with dependencies (the lightweight has considerably fewer). Everything works, and I can install them using the wheels.
Now I would like to make sure that a user wouldn't have both packages installed at the same time. Ideally I'd like pip to uninstall one when installing the other, or at least fail when the other is present (such that the user would have to uninstall the current manually).
Otherwise, when you install the my-project package it installs into /lib/python3.6/site-packages/myproject, and then when you install the my-project-lw package it overwrites the files in the same folder so you get a weird hybrid when some file are from "full" and some from "lightweigth", which is not good.
Is there a way to specify an anti-dependency? To mark them somehow as mutually exclusive? Thanks!
Pip doesn't support it. See also the related 'obsoletes' metadata. https://github.com/pypa/packaging-problems/issues/154

conda environment to AWS Lambda

I would like to set up a Python function I've written on AWS Lambda, a function that depends on a bunch of Python libraries I have already collected in a conda environment.
To set this up on Lambda, I'm supposed to zip this environment up, but the Lambda docs only give instructions for how to do this using pip/VirtualEnv. Does anyone have experience with this?
You should use the serverless framework in combination with the serverless-python-requirements plugin. You just need a requirements.txt and the plugin automatically packages your code and the dependencies in a zip-file, uploads everything to s3 and deploys your function. Bonus: Since it can do this dockerized, it is also able to help you with packages that need binary dependencies.
Have a look here (https://serverless.com/blog/serverless-python-packaging/) for a how-to.
From experience I strongly recommend you look into that. Every bit of manual labour for deployment and such is something that keeps you from developing your logic.
Edit 2017-12-17:
Your comment makes sense #eelco-hoogendoorn.
However, in my mind a conda environment is just an encapsulated place where a bunch of python packages live. So, if you would put all these dependencies (from your conda env) into a requirements.txt (and use serverless + plugin) that would solve your problem, no?
IMHO it would essentially be the same as zipping all the packages you installed in your env into your deployment package. That being said, here is a snippet, which does essentially this:
conda env export --name Name_of_your_Conda_env | yq -r '.dependencies[] | .. | select(type == "string")' | sed -E "s/(^[^=]*)(=+)([0-9.]+)(=.*|$)/\1==\3/" > requirements.txt
Unfortunately conda env export only exports the environment in yaml format. The --json flag doesn't work right now, but is supposed to be fixed in the next release. That is why I had to use yq instead of jq. You can install yq using pip install yq. It is just a wrapper around jq to allow it to also work with yaml files.
KEEP IN MIND
Lambda deployment code can only be 50MB in size. Your environment shouldn't be too big.
I have not tried deploying a lambda with serverless + serverless-python-packaging and a requirements.txt created like that and I don't know if it will work.
The main reason why I use conda is an option not to compile different binary packages myself (like numpy, matplotlib, pyqt, etc.) or compile them less frequently. When you do need to compile something yourself for the specific version of python (like uwsgi), you should compile the binaries with the same gcc version that the python within your conda environment is compiled with - most probably it is not the same gcc that your OS is using, since conda is now using the latest versions of the gcc that should be installed with conda install gxx_linux-64.
This leads us to two situations:
All you dependencies are in pure python and you can actually save a list of list of them using pip freeze and bundle them as it is stated for virtualenv.
You have some binary extensions. In that case, the the binaries from your conda environment will not work with the python used by AWS lambda. Unfortunately, you will need to visit the page describing the execution environment (AMI: amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2), set up the environment, build the binaries for the specific version of built-in python in a separate directory (as well as pure python packages), and then bundle them into a zip-archive.
This is a general answer to your question, but the main idea is that you can not reuse your binary packages, only a list of them.
I can't think of a good reason why zipping up your conda environment wouldn't work.
I thik you can go into your anaconda2/envs/ or anaconda3/envs/ directory and copy/zip the env-directory you want to upload. Conda is just a souped-up version of a virtualenv, plus a different & somewhat optional package-manager. The big reason I think it's ok is that conda environments encapsulate all their dependencies within their particular .../anaconda[2|3]/envs/$VIRTUAL_ENV_DIR/ directories by default.
Using the normal virtualenv expression gives you a bit more freedom, in sort of the same way that cavemen had more freedom than modern people. Personally I prefer cars. With virtualenv you basically get a semi-empty $PYTHON_PATH variable that you can fill with whatever you want, rather than the more robust, pre-populated env that Conda spits out. The following is a good table for reference: https://conda.io/docs/commands.html#conda-vs-pip-vs-virtualenv-commands
Conda turns the command ~$ /path/to/$VIRTUAL_ENV_ROOT_DIR/bin/activate into ~$ source activate $VIRTUAL_ENV_NAME
Say you want to make a virtualenv the old fashioned way. You'd choose a directory (let's call it $VIRTUAL_ENV_ROOT_DIR,) & name (which we'll call $VIRTUAL_ENV_NAME.) At this point you would type:
~$ cd $VIRTUAL_ENV_ROOT_DIR && virtualenv $VIRTUAL_ENV_NAME
python then creates a copy of it's own interpreter library (plus pip and setuptools I think) & places an executable called activate in this clone's bin/ directory. The $VIRTUAL_ENV_ROOT_DIR/bin/activate script works by changing your current $PYTHONPATH environment variable, which determines what python interpreter gets called when you type ~$ python into the shell, & also the list of directories containing all modules which the interpreter will see when it is told to import something. This is the primary reason you'll see #!/usr/bin/env python in people's code instead of /usr/bin/python.
In https://github.com/dazza-codes/aws-lambda-layer-packing, the pip wheels seem to be working for many packages (pure-pip installs). It is difficult to bundle a lot of packages into a compact AWS Lambda layer, since pip wheels do not use shared libraries and tend to get bloated a bit, but they work. Based on some discussions in github, the conda vs. pip challenges are not trivial:
https://github.com/pypa/packaging-problems/issues/25
try https://github.com/conda-incubator/conda-press
AFAICT, the AWS SAM uses https://github.com/aws/aws-lambda-builders and it appears to be pip based, but it also has a conda package at https://anaconda.org/conda-forge/aws_lambda_builders

Anaconda and different python builds for the same python version

Some time ago when i wanted to install a package using Conda in Anaconda python distribution i saw that Conda wants to update the python package from 2.7.10-0 to 2.7.10-1. It's the same python version (2.7.10 in this case).
Checking the channel's content I see there are multiple packages for the same python version:
python-2.7.10-0.tar.bz2 18.3M
python-2.7.10-1.tar.bz2 16.7M
python-2.7.10-3.tar.bz2 16.7M
...
So what is the difference between these builds and how can i prevent them to be updated?
What you're seeing there are build numbers.
They're usually used to fix a build of the same version of a package.
For example, imagine you have built this python version accidentally as a pydebug build. However, that's not what you want since it will lead to crashes in users of this package if they're not away that this is a pydebug build.
In this case you should rebuild the package (correctly this time), increment the build number and re-upload it.
So what is the difference between these builds?
You can't easily know the difference, unless Continuum provides a changelog for each build of python they provide (which I sincerely doubt).
To install a package with a specific build number you could do: conda install "python=2.7.10 0". The 0 means the build number.
I don't know if this syntax is officially supported, however it worked the last time I used it.
how can i prevent them to be updated?
First I would have to know what is your workflow.
If you're asking about the command-line, I don't think that is possible.
If you're asking about using environment.yml files you can pin a package to a specific version (including the build number) using a similar syntax of conda install.

Categories