Had a quick question here, I am used to devpi and was wondering what is the difference between devpi and pypi server?
Is one better than another? Which of this one scale better?
PyPI (Python Package Index)- is the official repository for third-party Python software packages. Every time you use e.g. pip to install a package that is not in the standard it will get downloaded from the PyPI server.
All of the packages that are on PyPI are publicly visible. So if you upload your own package then anybody can start using it. And obviously you need internet access in order to use it.
devpi (not sure what the acronym stands for) - is a self hosted private Python Package server. Additionally you can use it for testing and releasing of your own packages.
Being self hosted it's ideal for proprietary work that maybe you wouldn't want (or can't) share with the rest of the world.
So other features that devpi offers:
PyPI mirror - cache locally any packages that you download form PyPI. This is excellent for CI systems. Don't have to worry if a package or server goes missing. You can even still use it if you don't have internet access.
multiple indexes - unlike PyPI (which has only one index) in devpi you can create multiple indexes. For example a main index for packages that are rock solid and development where you can release packages that are still under development. Although you have to be careful with this because a large amount of indexes can make things hard to track.
The server has a simple web interface where you can you and search for packages.
You can integrate it with pip so that you can use your local devpi server as if you were using PyPI.
So answering you questions:
Is one better than the other? - well these are two different tools really. No clear answer here, depends on what your needs are.
Which scales better? - definitely devpi.
The official website is very useful with good examples: http://doc.devpi.net/latest/
Related
We have server behind proxy and we want this server to be able to run commands such as:
python: pip install module
R: install.packages("fortunes")
...
Simply to install packages from these sources. Since we are behind proxy, we cannot install these unless the proxy has them whitelisted (otherwise the proxy probihits the connection between the server and wherever the package resides).
My question is: what should we whitelist to be able to run these commands?
I am not sure how the package websites actually works (whether they store the packages themselves or it is just the index and the actual packages resides on other domains/hostnames/...). I believe pypi is quite friendly here (packages are actually found there), but CRAN or Maven = don't know. We are running Spark servers, so our primary concerns are python, R, Java or Scala libraries/packages.
Maven: is actually storing packages. Regarding mirroring, see this answer. It also contains the url of the central repository.
Pypi: From the documentation on how to upload a package to the index, it seems like it is also physically storing the packages.
CRAN: also hosts the packages. There are several mirrors, you will need to whitelist one you want to use
You might want to consider setting up an internal mirror where you put your dependencies once, and then don't need to go to the outside internet.
I am looking for a flexible solution for uploading generic builds to an artifact repository (in my case it would be Artifactory but I would not mind if it would also support others, like Nexus)
Because I am not building java code adding maven to the process would only add some unneeded complexity to the game.
Still, the entire infrastructure already supports bash and python everywhere (including Windows) which makes me interested on finding something that involves those two.
I do know that I could code it myself, but now I am looking for a way to make it as easy and flexible as possible.
Gathering the metadata seems simple, only publishing it in the format required by the artefact repository seems to be the issue.
After discovering that the two existing Python packages related to Artifactory are kinda useless as both not being actively maintained, one being only usable as a query interface and the other two having serious bugs that prevent it use, I discovered something than seems to close that what I was looking: http://teamfruit.github.io/defend_against_fruit/
Still, it seems that was designed to deal only with python packages, not with generic builds.
Some points to consider:
Tools like Maven and Gradle are capable of building more than Java projects. Artifactory already integrates with them and this includes gathering the metadata and publishing it together with the build artifacts.
The Artifactory Jenkins plugin supports generic (freestyle) builds. You can use this integration to deploy whatever type of files you like.
You can create your own integration based on the Artifactory's open integration layer for CI build servers - build-info. This is an open source project and all the implementations are also open sourced.The relevant Artifactory REST APIs are documented here.
Disclaimer: I'm affiliated with Artifactory
Currently I am developing automated test framework. This test-framework has different packages. These packages will be refer in different project and these may be modified locally by the developer. I want to manage the python package eggs. I am thinking of using Artifactory. I tried to look for Artifactory help for Python,But I couldn't get anything useful.
should I use Artifactory or PIP ?
Edit:
Is there any way or command in python which can help me to put the eggs in artifactory?
There are numerous reasons to prefer a binary repository manager over a simple shared directory/SCM binary storage:
Fine grained security.
Ability to proxy and cache remote repositories.
More efficient handling of binaries (because it's a tool that's tailored to do so).
Sharing the binaries with other teams and the world is a lot safer and easier.
Integration with many tools in the ecosystem.
Search and manipulation facilities.
Administration tools.
Artifactory exposes a very rich REST API and the deployment of any artifact can be achieved by a simple HTTP PUT request.
Take a look at the Defend Against Fruit project. It provides the previously missing glue between Python and Artifactory.
http://teamfruit.github.io/defend_against_fruit/
You can use "in house" PyPi (either with easy_install -f ... or pip -f ...).
For a server you can have just Apache serving a directory with all the eggs or something like http://pypi.python.org/pypi/pypiserver
I'm building various python-based projects that use pip/buildout to install dependencies. But I don't like the idea of someone deleting a github project and crippling my apps, or a network outage meaning I can't perform a deployment.
How do other people solve this?
I've got various ideas, but I think perhaps the one that sounds most promising would be some kind of caching proxy server. I'd point pip to use this internal proxy server which would cache a copy of the downloaded project, and periodically check for updates (if there's a net connection) before serving cached versions.
Does anything like this already exist?
Use case:
I have a project which I deploy to web server 1. I add new features with a remote dependency, and when I come to update to the production web server, PyPi is down so I can't deploy. Or perhaps when I come to set up a new web server, a dependency has disappeared from github or wherever.
How can I make it so my deployments/dev environments can always be brought up regardless of what happens in the wider world?
Also, when I deploy, I won't deploy over the top of existing code. Rather I'll build a new virtualenv and switch over to it so I can rollback if anything goes wrong. So each time I deploy I'll need to rebuild my environment and will need dependencies to exist.
So I'm looking for a solution that will insulate me against short-term network outages to servers hosting dependencies, as well as guarding against projects being deleted.
You should keep a "reference copy" of the projects on which you depend.
If someone removes the project from GitHub (and PyPi and all the mirrors, and every other site on the net) then you have the source and can now distribute it.
I have exactly the same requirements, and also use buildout to manage my deployments. I try not to install ANY of my package dependencies system-wide; I let buildout install eggs for all of them into my buildout. That way if I depend on a newer version of some package in rev N+1 of my project, and at "go-live" time N+1 falls on its face, I can roll back to N and automatically get the packge dependencies that N worked with.
We run a private eggbasket server, and configure buildout to fetch packages only from that. Server contents were initialized by allowing buildout to grab eggs from the network one time, then copying the downloaded eggs.
This way, upgrades to each package are totally under control and I can ensure that 2 successive buildouts of the same snapshot of my code will build out the same thing. When I want to upgrade all, I will let buildout fetch most-recent-versions again, test test test, then copy my eggs to the eggbasket server to go into production mode.
This is what I'm looking for:
http://pypi.python.org/pypi/collective.eggproxy
I'm looking for a tool to keep track of "what's running where". We have a bunch of servers, and on each of those a bunch of projects. These projects may be running on a specific version (hg tag/commit nr) and have their requirements at specific versions as well.
Fabric looks like a great start to do the actual deployments by automating the ssh part. However, once a deployment is done there is no overview of what was done.
Before reinventing the wheel I'd like to check here on SO as well (I did my best w/ Google but could be looking for the wrong keywords). Is there any such tool already?
(In practice I'm deploying Django projects, but I'm not sure that's relevant for the question; anything that keeps track of pip/virtualenv installs or server state in general should be fine)
many thanks,
Klaas
==========
EDIT FOR TEMP. SOLUTION
==========
For now, we've chosen to simply store this information in a simple key-value store (in our case: the filesystem) that we take great care to back up (in our case: using a DCVS). We keep track of this store with the same deployment tool that we use to do the actual deploys (in our case: fabric)
Passwords are stored inside a TrueCrypt volume that's stored inside our key-value store.
==========
I will still gladly accept any answer when some kind of Open Source solution to this problem pops up somewhere. I might share (part of) our solution somewhere myself in the near future.
pip freeze gives you a listing of all installed packages. Bonus: if you redirect the output to a file, you can use it as part of your deployment process to install all those packages (pip can programmatically install all packages from the file).
I see you're already using virtualenv. Good. You can run pip freeze -E myvirtualenv > myproject.reqs to generate a dependency file that doubles as a status report of the Python environment.
Perhaps you want something like Opscode Chef.
In their own words:
Chef works by allowing you to write
recipes that describe how you want a
part of your server (such as Apache,
MySQL, or Hadoop) to be configured.
These recipes describe a series of
resources that should be in a
particular state - for example,
packages that should be installed,
services that should be running, or
files that should be written. We then
make sure that each resource is
properly configured, only taking
corrective action when it's
neccessary. The result is a safe,
flexible mechanism for making sure
your servers are always running
exactly how you want them to be.
EDIT: Note Chef is not a Python tool, it is a general purpose tool, written in Ruby (it seems). But it is capable of supporting various "cookbooks", including one for installing/maintaining Python apps.