First let me explain the current situation:
We do have several python applications which depend on custom (not public released ones) as well as general known packages. These depedencies are all installed on the system python installation. Distribution of the application is done via git by source. All these computers are hidden inside a corporate network and don't have internet access.
This approach is bit pain in the ass since it has the following downsides:
Libs have to be installed manually on each computer :(
How to better deploy an application? I recently saw virtualenv which seems to be the solution but I don't see it yet.
virtualenv creates a clean python instance for my application. How exactly should I deploy this so that usesrs of the software can easily start it?
Should there be a startup script inside the application which creates the virtualenv during start?
The next problem is that the computers don't have internet access. I know that I can specify a custom location for packages (network share?) but is that the right approach? Or should I deploy the zipped packages too?
Would another approach would be to ship the whole python instance? So the user doesn't have to startup the virutalenv? In this python instance all necessary packages would be pre-installed.
Since our apps are fast growing we have a fast release cycle (2 weeks). Deploying via git was very easy. Users could pull from a stable branch via an update script to get the last release - would that be still possible or are there better approaches?
I know that there are a lot questions. Hopefully someone can answer me r give me some advice.
You can use pip to install directly from git:
pip install -e git+http://192.168.1.1/git/packagename#egg=packagename
This applies whether you use virtualenv (which you should) or not.
You can also create a requirements.txt file containing all the stuff you want installed:
-e git+http://192.168.1.1/git/packagename#egg=packagename
-e git+http://192.168.1.1/git/packagename2#egg=packagename2
And then you just do this:
pip install -r requirements.txt
So the deployment procedure would consist in getting the requirements.txt file and then executing the above command. Adding virtualenv would make it cleaner, not easier; without virtualenv you would pollute the systemwide Python installation. virtualenv is meant to provide a solution for running many apps each in its own distinct virtual Python environment; it doesn't have much to do with how to actually install stuff in that environment.
Related
I have a little Python side project which is experiencing some growing pains, wondering how people on larger Python projects manage this issue.
The project is Python/Flask/Docker deployed to AWS. Listed dependencies (that we import directly in the project) are installed from a requirements.txt file with explicit version numbers. We added the version numbers after noticing our new deployments (which rebuild Docker/dependencies etc) would sometimes install newer versions of the packages, causing the project to break.
The issue we're facing now is that an onboarding developer is setting up her environment and facing the same issue - this time with sub-dependencies of the original dependencies. (For example, Flask might install Werkskreug, Jinja2, etc and if some of these are the wrong version, the app breaks.) The obvious solution is to go through each sub-dependency and list out every package, with explicit versions, in requirements.txt. But this is a bit of a pain so I'm asking around to see what people do on Real Projects.
You guys can't be doing this all manually, right? In JS we have NPM and package.lock files and so on - they're automatically built. Is there some equivalent in Python? Have I missed something basic that we should be using here?
Thanks in advance
I wrote a tool that might be helpful for this called realreq.. You can install it from pip pip install realreq. It will generate the requirements you have by reading through your source files and recursively specifying their requirements.
realreq --deep -s /path/to/source will fully specify your dependencies and their sub-dependencies. Note that if you are using a virtual environment you need to have it activated for realreq to be able to find the dependencies, and they must be installed. (i.e realreq needs to be ran in an environment where the dependencies are installed). One of your engineers who has a setup env can run it and then pass the output as a requirements.txt file to your new engineers.
My production server has no access to the internet, so it's a bit a mess copying all the dependencies from my dev machine to the production/development server.
If I'd use virtualenv, I'd have all my dependencies in this environment. Doing this I'd also be able to deploy it on any machine, which has python & virtualenv installed.
But I've seen this rarely, and it seems kind of dirty.
Am I wrong and this could be a good practice, or are there other ways to solve that nicely?
Three options I would consider:
Run your own PyPI mirror with the dependencies you need. You really only need to build the file layout and pull from your local server using the index-url flag:
$ pip install --index-url http://pypi.beastcraft.net/ numpy
Build virtualenvs on the same architecture and copy those over as needed.
This works, but you're taking a risk on true portability.
Use terrarium to build virtual environments then bring those over (basically option 2 but with easier bookkeeping/automation).
I've done all of these and actually think that hosting your own PyPI mirror is the best option. It gives you the most flexibility when you're making a deployment or trying out new code.
I'm looking to set up python on a new machine.
I've found many instructions on this however I'm concerned with keeping the main installation clean so that each future environment can be modified specifically while I become familiar with the ins and outs of the program and packages.
I've installed python and git on my old machine and having not really known anything I did all the installs via the admin account and made all settings global.
Later discovered this was likely not the best way to do it.
I wonder if anyone here might be able to point this crayon eater in the right direction?
Would I be best off to make a user account on the computer specifically for my developing projects and install python, git, etc locally on this profile? Or are there parts of the install which one would want to have installed from the admin account?
It is OK to have git installed globally. Just create a new repository for each project, using git init.
For maintaining python dependencies per project, consider using virtualenv or pyenv. They create virtual environments which can be activated and deactivated and keep you from cluttering up your globally install python packages.
An alternative is to create a Docker image for each project and run your projects inside Docker containers.
If you are a beginner, the latter might be an overkill.
I am after advice on how to manage python modules within the context of docker.
Current options that I'm aware of include:
Installing them individually via pip in the build process
Installing them together via pip in the build process via requirments.txt
Installing them to a volume and adding the volume to the PYTHONPATH
Ideally I want a solution that is fully re-producible and that doesn't require every module to be re-installed if I decide to add another module or update the version of one of them.
From my perspective:
(2) is an issue because the docker ADD command (to get access to the requirements.txt file) apparently invalidates the cache and means that any changes to the file means everything has to be re-built / re-installed everytime you build the image.
(1) keeps the cache intact but means you'd need to specify the exact version for each package (and potentially their dependencies?) which seems like it could be pretty tedious and error prone.
(3) is currently my personal favorite as it allows the packages to persist between images/builds and allows for requirements.txt to be used. Only downside is that essentially you are storing the packages on your local machine rather than the image which leads to the container being dependent on the host OS which kind-of defeats the point of a container.
So yer I'm not entirely sure what best practices are here and would appreciate advice.
For reference there have been other questions on this topic but I don't feel any of them properly address my above question:
docker with modified python modules?
Docker compose installing requirements.txt
How can I install python modules in a docker image?
EDIT:
Just some additional notes to give some more context. My projects are typically data analysis focused (rather than software development or web development). I tend to use multiple images (1 for python, 1 for R, 1 for the database) using docker compose to manage them all together. So far I've been using a makefile on the host OS to re-build the project from scratch i.e. something like
some_output.pdf: some_input.py
docker-compose run python_container python some_input.py
where the outputs are written to a volume on the host OS
The requirements.txt file is the best option. (Even if changing it does a complete reinstall.)
A new developer starts on your project. They check out your source control repository and say, "oh, it's a Python project!", create a virtual environment, and run pip install -r requirements.txt, and they're set to go. A week later they come by and say "so how do we deploy this?", but since you've wrapped the normal Python setup in Docker they don't have to go out of their way to use a weird Docker-specific development process.
Disaster! Your primary server's hard disk has crashed! You have backups of all of your data, but the application code just gets rebuilt from source control. If you're keeping code in a Docker volume (or a bind-mounted host directory) you need to figure out how to rebuild it; but your first two options have that written down in the Dockerfile. This is also important for the new developer in the previous paragraph (who wants to test their image locally before deploying it) and any sort of cluster-based deployment system (Swarm, Kubernetes) where you'd like to just deploy an image and not also have to deploy the code alongside it, by hand, outside of the deployment system framework.
Another option is to use multi-stage build feature. Create an intermediate build that installs the dependencies and then just copy the folder to the production image (second build stage). This gives you the benefit of your option 3 as well.
It depends on which step in your build is more expensive and would benefit from caching. Compare the following:
Dockerfile A
FROM Ubuntu:16.04
Install Python, Pip etc.
Add requirements.txt
Run pip install
Run my build steps which are expensive.
Dockerfile B
FROM Ubuntu:16.04 AS intermediate
Install Python, Pip etc.
Add requirements.txt
Run pip install
FROM Ubuntu:16.04
Run my build steps which are expensive.
COPY --from=intermediate /pip-packages/ /pip-packages/
In the first case touching your requirements.txt will force a full build. In the second case, your expensive build steps are still cached. The intermediate build still runs but I assume that is not the expensive step here.
I'm a Java/Scala dev transitioning to Python for a work project. To dust off the cobwebs on the Python side of my brain, I wrote a webapp that acts as a front-end for Docker when doing local Docker work. I'm now working on packaging it up and, as such, am learning about setup.py and virtualenv. Coming from the JVM world, where dependencies aren't "installed" so much as downloaded to a repository and referenced when needed, the way pip handles things is a bit foreign. It seems like best practice for production Python work is to first create a virtual environment for your project, do your coding work, then package it up with setup.py.
My question is, what happens on the other end when someone needs to install what I've written? They too will have to create a virtual environment for the package but won't know how to set it up without inspecting the setup.py file to figure out what version of Python to use, etc. Is there a way for me to create a setup.py file that also creates the appropriate virtual environment as part of the install process? If not — or if that's considered a "no" as this respondent stated to this SO post — what is considered "best practice" in this situation?
You can think of virtualenv as an isolation for every package you install using pip. It is a simple way to handle different versions of python and packages. For instance you have two projects which use same packages but different versions of them. So, by using virtualenv you can isolate those two projects and install different version of packages separately, not on your working system.
Now, let's say, you want work on a project with your friend. In order to have the same packages installed you have to share somehow what versions and which packages your project depends on. If you are delivering a reusable package (a library) then you need to distribute it and here where setup.py helps. You can learn more in Quick Start
However, if you work on a web site, all you need is to put libraries versions into a separate file. Best practice is to create separate requirements for tests, development and production. In order to see the format of the file - write pip freeze. You will be presented with a list of packages installed on the system (or in the virtualenv) right now. Put it into the file and you can install it later on another pc, with completely clear virtualenv using pip install -r development.txt
And one more thing, please do not put strict versions of packages like pip freeze shows, most of time you want >= at least X.X version. And good news here is that pip handles dependencies by its own. It means you do not have to put dependent packages there, pip will sort it out.
Talking about deploy, you may want to check tox, a tool for managing virtualenvs. It helps a lot with deploy.
Python default package path always point to system environment, that need Administrator access to install. Virtualenv able to localised the installation to an isolated environment.
For deployment/distribution of package, you can choose to
Distribute by source code. User need to run python setup.py --install, or
Pack your python package and upload to Pypi or custom Devpi. So the user can simply use pip install <yourpackage>
However, as you notice the issue on top : without virtualenv, they user need administrator access to install any python package.
In addition, the Pypi package worlds contains a certain amount of badly tested package that doesn't work out of the box.
Note : virtualenv itself is actually a hack to achieve isolation.