Packaging Python dependencies in subdirectory for AWS Lambda - python

I came across an article on serverlesscode.com about building Python 3 apps for AWS Lambda that recommends using pip (or pip3) to install dependencies in a /vendored subdirectory. I like this idea as it keeps the file structure clean, but I'm having some issues achieving it.
I'm using Serverless Framework and my modules are imported in my code in the normal way, e.g. from pynamodb.models import Model
I've used the command pip install -t vendored/ -r requirements.txt to install my various dependencies (per requirements.txt) in the subdirectory, which seems to work as expected - I can see all modules installed in the subdirectory.
When the function is called, however, I get the error Unable to import module 'handler': No module named 'pynamodb' (where pynamodb is one of the installed modules).
I can resolve this error by changing my pip installation to the project root, i.e. not in the /vendored folder (pip install -t ./ -r requirements.txt). This installs exactly the same files.
There must be a configuration that I'm missing that points to the subfolder, but Googling hasn't revealed whether I need to import my modules in a different way, or if there is some other global config I need to change.
To summarise: how can I use Pip to install my dependencies in a subfolder within my project?
Edit: noting tkwargs' good suggestion on the use of the serverless plugin for packaging, it would still be good to understand how this might be done without venv, for example. The primary purpose is not specifically to make packaging easier (it's pretty easy as-is with pip), but to keep my file structure cleaner by avoiding additional folders in the root.

I've seen some people using the sys module in their lambda function's code to add the subdirectory, vendored in this case, to their python path... I'm not a fan of that as a solution because it would mean needing to do that for every single lambda function and add the need for extra boiler plate code. The solution I ended up using is to modify the PYTHONPATH runtime environment variable to include my subdirectories. For example, in my serverless.yml I have:
provider:
environment:
PYTHONPATH: '/var/task/vendored:/var/runtime'
By setting this as an environment variable at this level it will apply to every lambda function you are deploying in your serverless.yml -- you could also specify it at a per lambda function level if for some reason you didn't want it applied to all of them.
I wasn't sure how to self reference the existing value of PYTHONPATH to ensure I wasn't incorrectly overwriting it while in the process of adding my custom path "/var/task/vendored"... would love to know if anyone else has.

Related

How to export a python project with its full dependency tree for execution anywhere

Say you have a project, and you want to export it so that you can run it on another machine that:
You don't have root access on
You cannot assume any python packages to be installed other than python itself (not even pip)
Is there a way to export the project, even if it is just a simple script, with everything that it imports, and then everything that the imports need etc.
For example, my project uses a library called python-telegram-bot. It has a list of requirements, and I have tried running pip -r requirements.txt --target myapp to install the requirements into the app's folder, but this is not recursive. For example, requests is not in the library, yet it is needed by the app. And if I manually add requests, there are things that requests needs that aren't part of that.
Is there a way to collect every last bit of requirements into a single folder so that my script functions on an entirely vanilla installation of python?
Pyinstaller creates an executable and you can either roll all dependencies into 1 file (makes it a bit slow to load though) or create a folder with all the packages, modules, imports etc. your script will need to run on any machine. You can Google the docs for pyinstaller, it's all pretty well covered.
Hope that helps, Kuda

How to make PyPI-hosted Ansible plugin discoverable by Ansible

I'm writing a lookup plugin for Ansible and would like to deploy it to PyPI, so users would be able to install it with pip install and use in their playbooks.
However I do not understand how Ansible is doing the plugin discovery. It's apparently checking ./lookup_plugins path in the playbook's folder, as well as a couple of fixed paths (one in ~/.ansible, and another in usr/share/). However what I want is to install plugin package into virtualenv.
Is it even possible? If so, how?
AFAIK this is not possible (at least as of Ansible 2.3).
But I'm not a Python expert, maybe there are some workarounds possible.
Ansible search for lookup plugins in the following locations:
lookup_plugins directory near your playbook file
lookup_plugins directory inside any role that is applied in playbook
configured lookup plugins directories
default location: ~/.ansible/plugins/lookup:/usr/share/ansible/plugins/lookup
can be overwritten with lookup_plugins configuration option or
with ANSIBLE_LOOKUP_PLUGINS environment variable
ansible/plugins/lookup directory inside current ansible python package
So for plugin mylookup to be found by Ansible, there should be a file mylookup.py in any of the above locations.
If your plugin is very complex to be distributed as a single py-file, you can wrap it into a package plus a separate tiny helper file, so users will have to:
pip install my_super_lookup
Create ~/.ansible/plugins/lookup/easy_name.py:
import my_super_lookup
LookupModule = my_super_lookup.LookupModule
Use with_easy_name: ... or lookup('easy_name',...) in playbooks

Delete unused packages from requirements file

Is there any easy way to delete no-more-using packages from requirements file?
I wrote a bash script for this task but, it doesn't work as I expected. Because, some packages are not used following their PyPI project names. For example;
dj-database-url
package is used as
dj_database_url
My project has many packages in its own requirements file, so, searching them one-by-one is too messy, error-prone and takes too much time. As I searched, IDEs don't have this property, yet.
You can use Code Inspection in PyCharm.
Delete the contents of your requirements.txt but keep the empty file.
Load your project in,
PyCharm go to Code -> Inspect code....
Choose Whole project option in dialog and click OK.
In inspection results panel locate Package requirements section under Python (note that this section will be showed only if there is any requirements.txt or setup.py file).
The section will contain one of the following messages:
Package requirement '<package>' is not satisfied if there is any package that is listed in requirements.txt but not used in any .py file.
Package '<package>' is not listed in project requirements if there is any package that is used in .py files, but not listed in requirements.txt.
You are interested in the second inspection.
You can add all used packages to requirements.txt by right clicking the Package requirements section and selecting Apply Fix 'Add requirements '<package>' to requirements.txt'. Note that it will show only one package name, but it will actually add all used packages to requirements.txt if called for section.
If you want, you can add them one by one, just right click the inspection corresponding to certain package and choose Apply Fix 'Add requirements '<package>' to requirements.txt', repeat for each inspection of this kind.
After that you can create clean virtual environment and install packages from new requirements.txt.
Also note that PyCharm has import optimisation feature, see Optimize imports.... It can be useful to use this feature before any other steps listed above.
The best bet is to use a (fresh) python venv/virtual-env with no packages, or only those you definitely know you need, test your package - installing missing packages with pip as you hit problems which should be quite quick for most software then use the pip freeze command to list the packages you really need. Better you you could use pip wheel to create a wheel with the packages in.
The other approach would be to:
Use pylint to check each file for unused imports and delete them, (you should be doing this anyway),
Run your tests to make sure that it was right,
Use a tool like snakefood or snakefood3 to generate your new list of dependencies
Note that for any dependency checking to work well it is advisable to avoid conditional import and import within functions.
Also note that to be sure you have everything then it is a good idea to build a new venv/virtual-env and install from your dependencies list then re-test your code.
You can find obsolete dependencies by using deptry, a command line utility that checks for various issues with a project's dependencies, such as obsolete, missing or transitive dependencies.
Add it to your project with
pip install deptry
and then run
deptry .
Example output:
-----------------------------------------------------
The project contains obsolete dependencies:
Flask
scikit-learn
scipy
Consider removing them from your projects dependencies. If a package is used for development purposes, you should add
it to your development dependencies instead.
-----------------------------------------------------
Note that for the best results, you should be using a virtual environment for your project, see e.g. here.
Disclaimer: I am the author of deptry.
In pycharm go to Tools -> Sync Python Requirements. There's a 'Remove unused requirements' checkbox.
I've used with success pip-check-reqs.
With command pip-extra-reqs your_directory it will check for all unused dependencies in your_directory
Install it with pip install pip-check-reqs.

How to specify another tox project folder as a dependency for a tox project

We have a tox-enabled project (let's call it "main" project) which has a dependency on another tox project (let's call it "library" project) - all united in one repository since it is all part of a large overarching project.
How the project works for the regular user
For a regular install as an end-user you would simply install "library" first and then "main", right from the repository or whatever sources, and then run it.
What our issue is with tox
However, as a developer the situation is different, because "tox" should work and you might want to have multiple versions around at the same time.
You usually check out the large overarching repository and then the filesystem layout is this:
overarchingproject/main/
overarchingproject/main/src/
overarchingproject/main/tox.ini
overarchingproject/main/setup.py
...
overarchingproject/library/
overarchingproject/library/src/
overarchingproject/library/tox.ini
overarchingproject/library/setup.py
...
Now if I go into main/ and type "tox", this happens:
Current behavior: It will try to build the "main" project with the dependency on "library" - which will obviously result in an attempt to get "library" from pip. However, the project is not released yet (therefore not on pip) so it won't work - even though the lib is right there in the same repo.
Result: it doesn't work.
Workaround 1: We could set up our own package index/ask users to do that. However, asking everyone contributing to the project to do that with DevPI or similar just to be able to run the unit tests seems not such a good idea, so we'd need to do it centrally.
But if we provided a package index at some central place or a pip package for "library", people couldn't run the tests of "main" easily with involvement of a modified version of "library" they created themselves:
"library" is in the same repository after all, so people might as well modify it at some point.
That typing "tox" inside the "main" project folder will not easily pick up upon the present neighbouring "library" version but only a prepackaged online thing isn't exactly intuitive.
Workaround 2: We tried sitepackages=True and installing "library" in the system - however, sitepackages=True has caused us notable amount of troubles and it doesn't seem a good idea in general.
Desired behavior: We want tox to use the local version of the "library" in that folder right in the same overarching repository which people will usually get in one thing:
That version might be newer or even locally modified, so this is clearly what the dev user wants to use. And it exists, which cannot be said about the pip package right now.
Why do we want the overarching repository anyway with subprojects ("main", "library", ...) and not just one single project?
We develop a multi-daemon large project with many daemons for various purposes, with shared code in some libs to form a university course management system (which deals with forums, course management with possibility to hand in things, attached code versioning systems for student projects etc.).
It is possible to just use a subset of daemons, so it makes sense that they are separate projects, but still they are connected enough that most people would want to have most of them - therefore all of them in one repository.
The library itself is also suitable to be used for entirely different projects, but it is usually used for ours to start with - so that is where it is stuffed into the repository. So that means it is always around in the given relative path, but it has its separate tox.ini and unit tests.
TL;DR / Summary
So how can we make tox look for a specific dependency in another toxable project folder instead of just pip when installing a project?
Of course "main"'s regular setup.py install process shouldn't mess around with tox or searching the local disk: it should just check for one specific relative path, and then give up if that one is not present (and fall back to pip or whatever).
So best would be if the relative path could be somehow stored in tox.ini.
Or is this all just a pretty bad idea? Should we solve this differently to make our "main" project easily toxable with the latest local dev version of "library" as present in the local repository?
You can use pip's --editable option in your main project, like followings:
deps =
--editable=file:///{toxinidir}/../library
-r{toxinidir}/requirements.txt
P.S. Don't use this style: -e file:///{toxinidir}/../library, because tox pass whole string as an argument to argparse in error foramt.
As suggested in the comments to the response of diabloneo it is possible to supply an install_command in the tox.ini file:
I used this to make a bash script that takes all the usual pip arguments, but then runs pip before with just pip install --editable="file://`pwd`/../path/to/neighbour/repo", and only then actually runs the regular pip install $# afterwards with the arguments to the script (as would be passed by tox to pip directly). I then used this script with install_command instead of the regular default pip command.
With this two-step procedure it works fine :-)
I realize it's been a long time since the question was asked but might be useful to someone, I tried to do the same, and it turns out it can be done using distshare directory of tox: https://tox.wiki/en/latest/example/general.html#access-package-artifacts-between-multiple-tox-runs
# example two/tox.ini
[testenv]
# install latest package from "one" project
deps = {distshare}/one-*.zip
You should try separate tool for that like ansible.
Example
ansible creates and run your virtualenv
ansible install local dependencies like you subprojects
In every subproject use tox to run tests, lint, docs
It was more convenient way to tox subprojects.

How to install python package without copy everything into lib/site-packages?

I want to develop a common python package, I got other packages depends on it. For example:
packageA/
packageB/
packageC/
commonPackage/
packageA, packageB and packageC can all be executed directly, but they are all depend on commonPackage. I want to install the commonPackage into lib/site-packages, but I don't want it copys the source code. Instead, I want it creates a commonPackage.pth in lib/site-packages with the path of where the commonPackage at. So that when I modify commonPackage or update it from SVN, I don't need to install it again. Here comes the problem, how can I write the setup.py or use the options of python setup.py install so that it would do what I want?
Oops, I just find exactly what I want here. The develop command of setuptools do what I said. Here you type
python setup.py develop
It creates .pth rather than copying everything into site-packages.
You can always take a look at virtualenv which will allow you to create a python environment for each of your projects - this is the ideal way to develop/build/deploy your app without having load up your site-packages directory with all and sundry.
There's a good tutorial here:
http://iamzed.com/2009/05/07/a-primer-on-virtualenv/
Good luck !

Categories