conda environment to AWS Lambda

conda environment to AWS Lambda - python

I would like to set up a Python function I've written on AWS Lambda, a function that depends on a bunch of Python libraries I have already collected in a conda environment.
To set this up on Lambda, I'm supposed to zip this environment up, but the Lambda docs only give instructions for how to do this using pip/VirtualEnv. Does anyone have experience with this?

You should use the serverless framework in combination with the serverless-python-requirements plugin. You just need a requirements.txt and the plugin automatically packages your code and the dependencies in a zip-file, uploads everything to s3 and deploys your function. Bonus: Since it can do this dockerized, it is also able to help you with packages that need binary dependencies.
Have a look here (https://serverless.com/blog/serverless-python-packaging/) for a how-to.
From experience I strongly recommend you look into that. Every bit of manual labour for deployment and such is something that keeps you from developing your logic.
Edit 2017-12-17:
Your comment makes sense #eelco-hoogendoorn.
However, in my mind a conda environment is just an encapsulated place where a bunch of python packages live. So, if you would put all these dependencies (from your conda env) into a requirements.txt (and use serverless + plugin) that would solve your problem, no?
IMHO it would essentially be the same as zipping all the packages you installed in your env into your deployment package. That being said, here is a snippet, which does essentially this:
conda env export --name Name_of_your_Conda_env | yq -r '.dependencies[] | .. | select(type == "string")' | sed -E "s/(^[^=]*)(=+)([0-9.]+)(=.*|$)/\1==\3/" > requirements.txt
Unfortunately conda env export only exports the environment in yaml format. The --json flag doesn't work right now, but is supposed to be fixed in the next release. That is why I had to use yq instead of jq. You can install yq using pip install yq. It is just a wrapper around jq to allow it to also work with yaml files.
KEEP IN MIND
Lambda deployment code can only be 50MB in size. Your environment shouldn't be too big.
I have not tried deploying a lambda with serverless + serverless-python-packaging and a requirements.txt created like that and I don't know if it will work.

The main reason why I use conda is an option not to compile different binary packages myself (like numpy, matplotlib, pyqt, etc.) or compile them less frequently. When you do need to compile something yourself for the specific version of python (like uwsgi), you should compile the binaries with the same gcc version that the python within your conda environment is compiled with - most probably it is not the same gcc that your OS is using, since conda is now using the latest versions of the gcc that should be installed with conda install gxx_linux-64.
This leads us to two situations:
All you dependencies are in pure python and you can actually save a list of list of them using pip freeze and bundle them as it is stated for virtualenv.
You have some binary extensions. In that case, the the binaries from your conda environment will not work with the python used by AWS lambda. Unfortunately, you will need to visit the page describing the execution environment (AMI: amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2), set up the environment, build the binaries for the specific version of built-in python in a separate directory (as well as pure python packages), and then bundle them into a zip-archive.
This is a general answer to your question, but the main idea is that you can not reuse your binary packages, only a list of them.

I can't think of a good reason why zipping up your conda environment wouldn't work.
I thik you can go into your anaconda2/envs/ or anaconda3/envs/ directory and copy/zip the env-directory you want to upload. Conda is just a souped-up version of a virtualenv, plus a different & somewhat optional package-manager. The big reason I think it's ok is that conda environments encapsulate all their dependencies within their particular .../anaconda[2|3]/envs/$VIRTUAL_ENV_DIR/ directories by default.
Using the normal virtualenv expression gives you a bit more freedom, in sort of the same way that cavemen had more freedom than modern people. Personally I prefer cars. With virtualenv you basically get a semi-empty $PYTHON_PATH variable that you can fill with whatever you want, rather than the more robust, pre-populated env that Conda spits out. The following is a good table for reference: https://conda.io/docs/commands.html#conda-vs-pip-vs-virtualenv-commands
Conda turns the command ~$ /path/to/$VIRTUAL_ENV_ROOT_DIR/bin/activate into ~$ source activate $VIRTUAL_ENV_NAME
Say you want to make a virtualenv the old fashioned way. You'd choose a directory (let's call it $VIRTUAL_ENV_ROOT_DIR,) & name (which we'll call $VIRTUAL_ENV_NAME.) At this point you would type:
~$ cd $VIRTUAL_ENV_ROOT_DIR && virtualenv $VIRTUAL_ENV_NAME
python then creates a copy of it's own interpreter library (plus pip and setuptools I think) & places an executable called activate in this clone's bin/ directory. The $VIRTUAL_ENV_ROOT_DIR/bin/activate script works by changing your current $PYTHONPATH environment variable, which determines what python interpreter gets called when you type ~$ python into the shell, & also the list of directories containing all modules which the interpreter will see when it is told to import something. This is the primary reason you'll see #!/usr/bin/env python in people's code instead of /usr/bin/python.

In https://github.com/dazza-codes/aws-lambda-layer-packing, the pip wheels seem to be working for many packages (pure-pip installs). It is difficult to bundle a lot of packages into a compact AWS Lambda layer, since pip wheels do not use shared libraries and tend to get bloated a bit, but they work. Based on some discussions in github, the conda vs. pip challenges are not trivial:
https://github.com/pypa/packaging-problems/issues/25
try https://github.com/conda-incubator/conda-press
AFAICT, the AWS SAM uses https://github.com/aws/aws-lambda-builders and it appears to be pip based, but it also has a conda package at https://anaconda.org/conda-forge/aws_lambda_builders

Related

Managing Python and Python package versions for Test Automation

Folks,
I plan to use Python and various python packages like robot framework, appium, selenium etc for test automation. But as we all know, python and all the package versions keep revving.
If we pick a version of all of these to start with, and as these packages up rev, what is the recommended process for keeping the development environment up to date with the latest versions?
Appreciate some guidance on this.
Thanks.

If you wrote the code with a given version of a library, updating that library in the future is more likely to break your code than make it run better unless you intend to make use of the new features. Most of the time, you are better off sticking with the version you used when you wrote the code unless you want to change the code to use a new toy.
In order to ensure that the proper versions of every library are installed when the program is loaded on a new machine, you need a requirements.txt document. Making one of these is easy. All you do is build your program inside a virtual environment (e.g. conda create -n newenv conda activate newenv) Only install libraries you need for your program and then, once all of your dependencies are installed, in your terminal, type pip freeze > requirements.txt. This will put all your dependencies and their version information in the text document. When you want to use the program on a new machine, simply incorporate pip install -r requirements.txt into the loading process for the program.
If you containerize it using something like docker, your requirements.txt dependencies can be installed automatically whenever the container is created. If you want to use a new library or library version, simply update it in your requirements.txt and boom, you are up to date.

In this case you would want to isolate your package (and the external packages/versions it depends on) using a virtual environment. A virtual environment can be thought of as a file that tracks the specific package versions you're importing. Thus you can have the latest package installed on your system, but your project will still only import the version in your virtual environment.
What is the difference between venv, pyvenv, pyenv, virtualenv, virtualenvwrapper, pipenv, etc?
https://virtualenv.pypa.io/en/stable/
https://docs.python-guide.org/dev/virtualenvs/

setup.py + virtualenv = chicken and egg issue?

I'm a Java/Scala dev transitioning to Python for a work project. To dust off the cobwebs on the Python side of my brain, I wrote a webapp that acts as a front-end for Docker when doing local Docker work. I'm now working on packaging it up and, as such, am learning about setup.py and virtualenv. Coming from the JVM world, where dependencies aren't "installed" so much as downloaded to a repository and referenced when needed, the way pip handles things is a bit foreign. It seems like best practice for production Python work is to first create a virtual environment for your project, do your coding work, then package it up with setup.py.
My question is, what happens on the other end when someone needs to install what I've written? They too will have to create a virtual environment for the package but won't know how to set it up without inspecting the setup.py file to figure out what version of Python to use, etc. Is there a way for me to create a setup.py file that also creates the appropriate virtual environment as part of the install process? If not — or if that's considered a "no" as this respondent stated to this SO post — what is considered "best practice" in this situation?

You can think of virtualenv as an isolation for every package you install using pip. It is a simple way to handle different versions of python and packages. For instance you have two projects which use same packages but different versions of them. So, by using virtualenv you can isolate those two projects and install different version of packages separately, not on your working system.
Now, let's say, you want work on a project with your friend. In order to have the same packages installed you have to share somehow what versions and which packages your project depends on. If you are delivering a reusable package (a library) then you need to distribute it and here where setup.py helps. You can learn more in Quick Start
However, if you work on a web site, all you need is to put libraries versions into a separate file. Best practice is to create separate requirements for tests, development and production. In order to see the format of the file - write pip freeze. You will be presented with a list of packages installed on the system (or in the virtualenv) right now. Put it into the file and you can install it later on another pc, with completely clear virtualenv using pip install -r development.txt
And one more thing, please do not put strict versions of packages like pip freeze shows, most of time you want >= at least X.X version. And good news here is that pip handles dependencies by its own. It means you do not have to put dependent packages there, pip will sort it out.
Talking about deploy, you may want to check tox, a tool for managing virtualenvs. It helps a lot with deploy.

Python default package path always point to system environment, that need Administrator access to install. Virtualenv able to localised the installation to an isolated environment.
For deployment/distribution of package, you can choose to
Distribute by source code. User need to run python setup.py --install, or
Pack your python package and upload to Pypi or custom Devpi. So the user can simply use pip install <yourpackage>
However, as you notice the issue on top : without virtualenv, they user need administrator access to install any python package.
In addition, the Pypi package worlds contains a certain amount of badly tested package that doesn't work out of the box.
Note : virtualenv itself is actually a hack to achieve isolation.

Will Anaconda Python config scripts clash with Homebrew's?

Will Anaconda Python config scripts clash with Homebrew's? Note that I do not use these config scripts in any of my workflows, I'm just wondering if any of these config scripts may get called "behind the scenes". Sample output below (with username replaced by '..'):
$ brew doctor
...
Having additional scripts in your path can confuse software installed via
Homebrew if the config script overrides a system or Homebrew provided
script of the same name. We found the following "config" scripts:
/Users/../anaconda/bin/curl-config
/Users/../anaconda/bin/freetype-config
/Users/../anaconda/bin/libdynd-config
/Users/../anaconda/bin/libpng-config
/Users/../anaconda/bin/libpng15-config
/Users/../anaconda/bin/llvm-config
/Users/../anaconda/bin/python-config
/Users/../anaconda/bin/python2-config
/Users/../anaconda/bin/python2.7-config
/Users/../anaconda/bin/xml2-config
/Users/../anaconda/bin/xslt-config
Clearly some of these clash with some Homebrew-installed packages.
$ ls /usr/local/bin/*-config
/usr/local/bin/Magick++-config /usr/local/bin/libpng-config
/usr/local/bin/Magick-config /usr/local/bin/libpng16-config
/usr/local/bin/MagickCore-config /usr/local/bin/pcre-config
/usr/local/bin/MagickWand-config /usr/local/bin/pkg-config
/usr/local/bin/Wand-config /usr/local/bin/python-config
/usr/local/bin/freetype-config /usr/local/bin/python2-config
/usr/local/bin/gdlib-config /usr/local/bin/python2.7-config

Such clashes are entirely possible. When you install softwares that depends on Python using Homebrew, you want it to see Python packages and libraries installed via Homebrew but not those installed by Anaconda.
My solution to this is not putting
export PATH=$HOME/anaconda/bin:$PATH
into .bashrc. Normally, you'll just use Python and pip installed via Homebrew and packages installed by that pip. Sometimes, when you are developing Python projects that is convenient to use Anaconda's environment management mechanism (conda create -n my-env), you can temporarily do export PATH=$HOME/anaconda/bin:$PATH to turn it on. From what I gathered, one important benefit of using Anaconda compared to using regular Python is that conda create -n my-env anaconda will not duplicate package installations unnecessarily as virtualenv my-env will when you have a large number of virtual environments. If you do not mind having some degree of duplication, you could just avoid installing Anaconda all together and just use virtualenv.

It's entirely possible you won't notice any problems. On the other hand, you may have some pretty frustrating ones. It all depends on what you use and how your $PATH is ordered. Homebrew will take whatever file has precedence in your $PATH; if another Homebrew package needs to use Homebrew-installed config files and it sees the Anaconda versions first, it doesn't known any better than to use the wrong ones. In a sense, that's what you told it to do.
My recommendation is to keep things simple and clean. Unless you have a particular reason to keep Anaconda on your $PATH, you should probably pop it out and alias anything you need. Alternatively you could just install the things you require (e.g., numpy) via Homebrew and eliminate Anaconda altogether. (Actually, that's really what I would do. Anaconda comes with way more stuff than I have any reason to be dumping onto my machine.)
I don't know what your $PATH looks like, but in my experience, keeping it short and systematic has a lot of advantages.

Replicate virtualenv without downloading all the packages again on the same machine

I have a couple projects that require similar dependencies, and I don't want to have pip going out and DLing the dependencies from the web every time. For instance I am using the norel-django package which would conflict with my standard django (rdbms version) if I installed it system wide.
Is there a way for me to "reuse" the downloaded dependancies using pip? Do I need to DL the source tar.bz2 files and make a folder structure similar to that of a pip archive or something? Any assistance would be appreciated.
Thanks

Add the following to $HOME/.pip/pip.conf:
[global]
download_cache = ~/.pip/cache
This tells pip to cache downloads in ~/.pip/cache so it won't need to go out and download them again next time.

it looks like virtualenv has a virtualenv-clone command, or perhaps virtualenvwrapper does?
Regardless, it looks to be a little more involved then just copyin and pasting virtual environment directories:
https://github.com/edwardgeorge/virtualenv-clone
additionally it appears virtualenv has a flag that will facilitate in moving your virtualenv.
http://www.virtualenv.org/en/latest/#making-environments-relocatable
$ virtualenv --relocatable ENV from virtualenv doc:
This will make some of the files created by setuptools or distribute
use relative paths, and will change all the scripts to use
activate_this.py instead of using the location of the Python
interpreter to select the environment.
Note: you must run this after you’ve installed any packages into the
environment. If you make an environment relocatable, then install a
new package, you must run virtualenv --relocatable again.
Also, this does not make your packages cross-platform. You can move
the directory around, but it can only be used on other similar
computers. Some known environmental differences that can cause
incompatibilities: a different version of Python, when one platform
uses UCS2 for its internal unicode representation and another uses
UCS4 (a compile-time option), obvious platform changes like Windows
vs. Linux, or Intel vs. ARM, and if you have libraries that bind to C
libraries on the system, if those C libraries are located somewhere
different (either different versions, or a different filesystem
layout).
If you use this flag to create an environment, currently, the
--system-site-packages option will be implied.

What are the Python equivalents to Ruby's bundler / Perl's carton?

I know about virtualenv and pip. But these are a bit different from bundler/carton.
For instance:
pip writes the absolute path to shebang or activate script
pip doesn't have the exec sub command (bundle exec bar)
virtualenv copies the Python interpreter to a local directory
Does every Python developer use virtualenv/pip? Are there other package management tools for Python?

From what i've read about bundler — pip without virtualenv should work just fine for you. You can think of it as something between regular gem command and bundler. Common things that you can do with pip:
Installing packages (gem install)
pip install mypackage
Dependencies and bulk-install (gemfile)
Probably the easiest way is to use pip's requirements.txt files. Basically it's just a plain list of required packages with possible version constraints. It might look something like:
nose==1.1.2
django<1.3
PIL
Later when you'd want to install those dependencies you would do:
$ pip install -r requirements.txt
A simple way to see all your current packages in requirements-file syntax is to do:
$ pip freeze
You can read more about it here.
Execution (bundler exec)
All python packages that come with executable files are usually directly available after install (unless you have custom setup or it's a special package). For example:
$ pip install gunicorn
$ gunicorn -h
Package gems for install from cache (bundler package)
There is pip bundle and pip zip/unzip. But i'm not sure if many people use it.
p.s. If you do care about environment isolation you can also use virtualenv together with pip (they are close friends and work perfectly together). By default pip installs packages system-wide which might require admin rights.

You can use pipenv, which has similar interface with bundler.
$ pip install pipenv
Pipenv creates virtualenv automatically and installs dependencies from Pipfile or Pipfile.lock.
$ pipenv --three # Create virtualenv with Python3
$ pipenv install # Install dependencies from Pipfile
$ pipenv install requests # Install `requests` and update Pipfile
$ pipenv lock # Generate `Pipfile.lock`
$ pipenv shell # Run shell with virtualenv activated
You can run command with virtualenv scope like bundle exec.
$ pipenv run python3 -c "print('hello!')"

There is a clone pbundler.
The version that is currently in pip simply reads the requirements.txt file you already have, but is much out of date. It's also not totally equivalent: it insists on making a virtualenv. Bundler, I notice, only installs what packages are missing, and gives you the option of giving your sudo password to install into your system dirs or of restarting, which doesn't seem to be a feature of pbundler.
However, the version on git is an almost complete rewrite to be much closer to Bundler's behaviour... including having a "Cheesefile" and now not supporting requirements.txt. This is unfortunate, since requirements.txt is the de facto standard in pythonland, and there's even Offical BDFL-stamped work to standardize it. When that comes into force, you can be sure that something like pbundler will become the de facto standard. Alas, nothing quite stable yet that I know of (but I would love to be proven wrong).

I wrote one — https://github.com/Deepwalker/pundler .
On PIP its pundle, name was already taken.
It uses requirements(_\w+)?.txt files as your desired dependencies and creates frozen(_\w+)?.txt files with frozen versions.
About (_\w+)? thing — this is envs. You can create requirements_test.txt and then use PUNDLEENV=test to use this deps in your run with requirements.txt ones alongside.
And about virtualenv – you need not one, its what pundle takes from bundler in first head.

Python Poetry is the closest to Ruby bundler as of 2020 (and already since 2018). It's already more than two years old, still very active, has great documentation. One might complain about curl-pipe-python-style being the recommended way of installing, but there are alternatives, e.g. homebrew on macOS.
Primary website: https://python-poetry.org/
Github: https://github.com/python-poetry/poetry
Documentation: https://python-poetry.org/docs/
It uses virtualenvs behind the scenes (in contrast to bundler), but it provides and uses a lock-file, takes care of sub dependencies, adheres to specified version constraints and allows automatically updating outdated packages. There's even autocompletion for your favorite shell.
With its use of a pyproject.toml file, it's also going a bit further than bundler (closer to a gemspec. It's also comparable to JavaScript's and TypeScript's npm and yarn).
Poetrify (a complementing project) helps converting projects from requirements.txt to pyproject.toml for Poetry.
The lock file can be exported to requirements.txt by poetry export -f requirements.txt > requirements.txt, if you need that for other tooling (or the unlikely case want to go back).

I'd say Shovel is worth a look. It was developed specifically to the Pythonish version of Rake. There's not a ton of commit activity on the project, but seems stable and useful.

You can use pipx to install and run Python Applications in Isolated Environments automatically.
You can use pipenv to create and manage a virtualenv for your projects automatically.
Both wraps pip with virtual environment tools and aiming for different use cases.
All of these are one of the most stared project listed in the github PyPA repository.
FYI: Debian bullseye/testing currently lacks pipx. But package from sid should work fine. (2021-06-19)

No, no all the developers use virtualenv and/or pip, but many developers use/prefer these tools
And now, for package development tools and diferent environments that is your real question. Exist any other tools like Buildout (http://www.buildout.org/en/latest/) for the same purpose, isolate your environment Python build system for every project that you manage. For some time I use this, but not now.
Independent environments per project, in Python are a little different that the same situation in Ruby. In my case i use pyenv (https://github.com/yyuu/pyenv) that is something like rbenv but, for Python. diferent versions of python and virtualenvs per project, and, in this isolated environments, i can use pip or easy-install (if is needed).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.