How to manage conda shared environment and local environment?

How to manage conda shared environment and local environment? - python

Say the team has a shared conda environment called 'env1' located at this directory:
/home/share/conda/envs/env1/
While my working directory is at:
/home/my_name/...
I do not have write permission for any files under /home/share/, just read permission.
Now I want to use the 'env1' environment with one additional library installed (this library is not originally appearing in /home/share/conda/envs/env1/)
How should I achieve that without re-installing everything in env1 again to my own directory. Also I have to use 'conda install' for that additional package.
I feel that this has something to do with 'conda install --use-local' to handle such shared-local-combined environment situation, but not sure exactly about the procedure.
Thanks for any help and explanation!

It looks like the --use-local flag only refers to whether conda should install a package built locally and maybe is not distributed through the usual channels (or instead of the usual channels). So I don't think this directly relates to your case.
Perhaps one solution is to clone this shared environment into a new one under your own account, where you have write permissions. Then conda install the new package you need in that environment. If you are concerned that it takes up space or duplicates the packages, I recommend reading this answer here, which explains that conda actually tries to not waste space by using hardlinks, so most likely the packages will not actually re-installed, but rather reused, in this new environment.
Finally, I'd personally try to create a new environment even just for the reason of clarity. If I later came back to this project, I'd like to know that it requires your "base/shared" env + an additional package. If it's named identically to the shared one, perhaps this difference wouldn't be so obvious.

Related

Installing a second python environment I’m mac

I’m trying to figure out how to install a second python environment alongside anaconda.
On windows I can just install python in a different folder stand reference the desired python environment using env variables. I’d like to do the same on Mac.
A virtual env won’t do the trick as it does not copy the standard library and other things. It needs to be a complete stand alone environment. I guess I could compile it, but is there an easier way?
Thank you very much for any input.

You can do that using pyenv.
It allows you to have several python versions, and even different distributions.
It works, mostly on user space. So, no additional requirements are needed (apart from compilation tools)

Should I uninstall all globally installed Python packages and only install them locally in VENV environments?

I just read it's possible to have conflicts between globally and locally installed Python packages and that it's better to install Python itself and packages only in local VENV environments to avoid those potential problems.
I know I installed Python globally and (by mistake) Jupyterlab globally as well, but when I check pip list globally I get a long list of packages I don't remember ever installing, such as:
anyio, argon2, async-generator, ..., ipython, Jinja2 (I have pycharm but isn't is supposed to only install locally when you create a new project?), numpy, pandas, etc...
many others, perhaps 50 other names.
Should I erase everything that's installed globally and only install Python itself and project relevant packages in VENV environments?
And if so, how?

I've been looking into something similar and thought I would add an answer here for future browsers.
As with many things, the answer is it depends... It would be good practice to only use virtual environments; this helps to ensure a few things, including ensuring that different projects can run on different versions of packages without conflict, or you needing to update packages, potentially breaking older pieces of code.
On the other hand, if you maintain multiple applications using any given package, you will have to update each of these venvs individually (if you want to update), making it something of an earache, in which case you might decide to install the package globally to save yourself the pain.
As for the original question RE: deleting everything, again, no one but you can answer this. My advice would be to (if it's manageable), check each package and delete it if you don't see yourself using it often enough to justify it being global.

question about virtual environments from conda to virtualenv

I have been working on a project in python that its inside an environment created at a linux machine. I recently got a new pc and i tried freebsd so i decided to see if i can port the settings, since these environments are supposed to be platform independent.
Since there is no support for conda in freebsd, i decided to write a script to migrate the dependencies from conda to virtualenv. The script, although it translates the .yml file into the .txt file needed for pip to install the dependencies, i can see that there are still a lot of packages missing, especially from the dependencies label in the .yml file.
Does it mean that these packages are not yet ported on freebsd or is there a different way to add them in the .txt file instead of just their name?

Does it mean that these packages are not yet ported on freebsd or is there a different way to add them in the .txt file instead of just their name?
It sounds like pip can't find a number of your dependencies, so yes.
Keep in mind that conda and pip are completely different build systems, despite being mostly compatible with each other and despite most packages available on one being available on the other. This also means that conda list usually includes some packages you don't necessarily need to install via pip. So you may be better off starting from scratch with a new requirements.txt file that includes the packages you actually need, and just let pip find what else it needs (which, again, is likely different than what conda needs).

What is exactly aggressive_update_packages in Anaconda?

I've recently started using the Anaconda environment and in the config list I came across an option called aggressive_update_packages. It is not very clear to me what happens when I add a new package to this. I couldn't find any satisfying description about this option (only a little bit here), so I only can assume what it does: I think it will keep autoupdated the certain package. However I'm certainly not sure how it works, that's what I'm asking. I'm actively developing a package especially for Anaconda environment, and for others it would be a nice feature to keep it automatically updated.

Why it exists
The default settings for the aggressive_updates_packages set are provided mostly for security purposes. Because Conda brings many native libraries with it, some of which provide core functionality for securely communicating on the internet, there is an implicit responsibility to ensure that it's making some effort to patch software that are frequent surfaces of generic cyberattacks.
Try searching any of the default software (e.g., openssl) in the NIST's National Vulnerability Database and you'll quickly get a sense of why it might be crucial to keep those packages patched. Running an old SSL protocol or having an outdated list of certificate authorities leaves one generically vulnerable.
How it works
Essentially, whenever one indicates a willingness to mutate an environment (e.g., conda (install|update|remove)), Conda will check for and request to install the latest versions of the packages in the set. Not much more to it than that. It does not autoupdate packages. If the user never tries to mutate the environment, the package will never be updated.
Repurposing functionality
OP suggests using this as a way to "keep autoupdated the certain package". It's possible that, if your users already frequently mutate their envs, the package will get updated frequently via this setting. However, the setting is not something the package can manipulate on its own (manipulating anything other than install files is expressly forbidden). Users would have to manually manipulate their settings to add "the certain package" to the list.
For users who are reproducibility-minded, I would actively discourage them from changing their global settings to add non-security-essential packages to their aggressive_updates_packages list.

According to conda release notes
aggressive updates: Conda now supports an aggressive_update_packages configuration parameter that holds a sequence of MatchSpec strings, in addition to the pinned_packages configuration parameter. Currently, the default value contains the packages ca-certificates, certifi, and openssl. When manipulating configuration with the conda config command, use of the --system and --env flags will be especially helpful here. For example:
conda config --add aggressive_update_packages defaults::pyopenssl --system
would ensure that, system-wide, solves on all environments enforce using the latest version of pyopenssl from the defaults channel.
conda config --add pinned_packages Python=2.7 --env
would lock all solves for the current active environment to Python versions matching 2.7.*.
According to this issue - https://github.com/conda/conda/issues/7419
This might means any new env created by default adds/updates the packages in aggressive_update_packages configuration.
How to get the variable value? - conda config --show

conda environment to AWS Lambda

I would like to set up a Python function I've written on AWS Lambda, a function that depends on a bunch of Python libraries I have already collected in a conda environment.
To set this up on Lambda, I'm supposed to zip this environment up, but the Lambda docs only give instructions for how to do this using pip/VirtualEnv. Does anyone have experience with this?

You should use the serverless framework in combination with the serverless-python-requirements plugin. You just need a requirements.txt and the plugin automatically packages your code and the dependencies in a zip-file, uploads everything to s3 and deploys your function. Bonus: Since it can do this dockerized, it is also able to help you with packages that need binary dependencies.
Have a look here (https://serverless.com/blog/serverless-python-packaging/) for a how-to.
From experience I strongly recommend you look into that. Every bit of manual labour for deployment and such is something that keeps you from developing your logic.
Edit 2017-12-17:
Your comment makes sense #eelco-hoogendoorn.
However, in my mind a conda environment is just an encapsulated place where a bunch of python packages live. So, if you would put all these dependencies (from your conda env) into a requirements.txt (and use serverless + plugin) that would solve your problem, no?
IMHO it would essentially be the same as zipping all the packages you installed in your env into your deployment package. That being said, here is a snippet, which does essentially this:
conda env export --name Name_of_your_Conda_env | yq -r '.dependencies[] | .. | select(type == "string")' | sed -E "s/(^[^=]*)(=+)([0-9.]+)(=.*|$)/\1==\3/" > requirements.txt
Unfortunately conda env export only exports the environment in yaml format. The --json flag doesn't work right now, but is supposed to be fixed in the next release. That is why I had to use yq instead of jq. You can install yq using pip install yq. It is just a wrapper around jq to allow it to also work with yaml files.
KEEP IN MIND
Lambda deployment code can only be 50MB in size. Your environment shouldn't be too big.
I have not tried deploying a lambda with serverless + serverless-python-packaging and a requirements.txt created like that and I don't know if it will work.

The main reason why I use conda is an option not to compile different binary packages myself (like numpy, matplotlib, pyqt, etc.) or compile them less frequently. When you do need to compile something yourself for the specific version of python (like uwsgi), you should compile the binaries with the same gcc version that the python within your conda environment is compiled with - most probably it is not the same gcc that your OS is using, since conda is now using the latest versions of the gcc that should be installed with conda install gxx_linux-64.
This leads us to two situations:
All you dependencies are in pure python and you can actually save a list of list of them using pip freeze and bundle them as it is stated for virtualenv.
You have some binary extensions. In that case, the the binaries from your conda environment will not work with the python used by AWS lambda. Unfortunately, you will need to visit the page describing the execution environment (AMI: amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2), set up the environment, build the binaries for the specific version of built-in python in a separate directory (as well as pure python packages), and then bundle them into a zip-archive.
This is a general answer to your question, but the main idea is that you can not reuse your binary packages, only a list of them.

I can't think of a good reason why zipping up your conda environment wouldn't work.
I thik you can go into your anaconda2/envs/ or anaconda3/envs/ directory and copy/zip the env-directory you want to upload. Conda is just a souped-up version of a virtualenv, plus a different & somewhat optional package-manager. The big reason I think it's ok is that conda environments encapsulate all their dependencies within their particular .../anaconda[2|3]/envs/$VIRTUAL_ENV_DIR/ directories by default.
Using the normal virtualenv expression gives you a bit more freedom, in sort of the same way that cavemen had more freedom than modern people. Personally I prefer cars. With virtualenv you basically get a semi-empty $PYTHON_PATH variable that you can fill with whatever you want, rather than the more robust, pre-populated env that Conda spits out. The following is a good table for reference: https://conda.io/docs/commands.html#conda-vs-pip-vs-virtualenv-commands
Conda turns the command ~$ /path/to/$VIRTUAL_ENV_ROOT_DIR/bin/activate into ~$ source activate $VIRTUAL_ENV_NAME
Say you want to make a virtualenv the old fashioned way. You'd choose a directory (let's call it $VIRTUAL_ENV_ROOT_DIR,) & name (which we'll call $VIRTUAL_ENV_NAME.) At this point you would type:
~$ cd $VIRTUAL_ENV_ROOT_DIR && virtualenv $VIRTUAL_ENV_NAME
python then creates a copy of it's own interpreter library (plus pip and setuptools I think) & places an executable called activate in this clone's bin/ directory. The $VIRTUAL_ENV_ROOT_DIR/bin/activate script works by changing your current $PYTHONPATH environment variable, which determines what python interpreter gets called when you type ~$ python into the shell, & also the list of directories containing all modules which the interpreter will see when it is told to import something. This is the primary reason you'll see #!/usr/bin/env python in people's code instead of /usr/bin/python.

In https://github.com/dazza-codes/aws-lambda-layer-packing, the pip wheels seem to be working for many packages (pure-pip installs). It is difficult to bundle a lot of packages into a compact AWS Lambda layer, since pip wheels do not use shared libraries and tend to get bloated a bit, but they work. Based on some discussions in github, the conda vs. pip challenges are not trivial:
https://github.com/pypa/packaging-problems/issues/25
try https://github.com/conda-incubator/conda-press
AFAICT, the AWS SAM uses https://github.com/aws/aws-lambda-builders and it appears to be pip based, but it also has a conda package at https://anaconda.org/conda-forge/aws_lambda_builders

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.