Generating patches for open source, source code in virtualenv site-packages - python

I have an open source python library sitting in my virtualenv site-packages. And I noticed a bug in that library and would like to contribute my patches back to the open source project.
The problem is, my virtualenv site-packages is not version controlled by git (obviously, since it was installed via pip) and it's a pain to rename a specific string which is causing the bug (which is located in multiple files, 10+ files) manually and then using diff to generate the patches.
A simpler way - since the project is hosted on github - is actually to place that library under git control, and then make a "pull request" on github. But I am not sure whether it makes sense or not to be directly managing a git repository inside my virtualenv's site-packages directory. (will that cause problems to pip???)
How would you manage your personal workflow to contribute back to open source projects efficiently in such a scenario?

Fork the project on github, clone it to a directory separate from your virtualenv, make the pull request, and install your own fork into the virtualenv by pointing pip at your fork in github.

Related

Downsides of making a git repo in the root

I am currently working on a git repo located in
/home/user/bin/git/my-project
so i am saving the project as I work on it. But i have an issue, my project depends on python libraries, crontab configuration, bashrc/.profile variables and if i need to transfer all of this on another linux system I was wondering if it was possible to host the whole directory right from root? like:
git/home/user/bin/git/my-project
so that when i want to retrieve the exact same copy and update the second machine, it works exactly like the first? Is it possible? are there any downsides?
thanks in advance for your help
EDIT: Thanks for your answers, by root I meant the directory which does not have a parent directory. On my raspberry pi I probably meant the place where you find (etc, var, usr, boot, dev...). So how would you proceed, if you wanted to (for example) have another RPI with the exact same function without having to duplicate what's in the SD card? and everyone working on a different system have the exact same functionnality, up to date?
Thanks!
Downsides of making a git repo in the root (ie. in /)
Some IDE applications work horribly slow, because they "autodetect" if they are inside a git repo and then try to run git status to check for changes. This ends up indexing the whole filesystem. I remember a problem with eclipse and NetBeans (I think NetBeans didn't start at all, because it tried to open read-write some git files, and they were owned by root, but I'm not sure).
how would you proceed, if you wanted to (for example) have another RPI with the exact same function without having to duplicate what's in the SD card?
Create a package for the distribution installed on the platform. Create a publicly available repository with that package. Add that repository to the native package manager of that platform and install it like a normal package.
Package manager is the way of distributing stuff to systems. Package managers have dependency resolution, easy-updates, pre-/post-install scripts and hooks, all the functionality you need to distribute programs. And package managers have great support of removing files and updates.
my project depends on python libraries
So install them, via pip pyenv or the best via the package manager available on the system, the system administrators choice. Or request the user to setup a venv after installing your application.
, crontab configuration
(Side note: see systemd timers).
, bashrc/.profile variables
If your library depends on my home customization, you have zero chances of installing it on any of my systems. There is zero need to touch home directory stuff if you want to distribute a library to systems. Applications have /etc for configuration, and most distributions have /etc/profile.d/ as the standard directory for drop-in shell configuration. Please research linux directory structure, see Filesystem Hierarchy Specification and XDG specifications.
Still, requesting users to do manual configuration after installing a package is normal. Some applications to work need environmental variables.
If you want to propagate change to different computers and manage that change, then use ansible, puppet, chef and such automation tools.

Where do you clone Python module git repositories?

I'm interested in contributing to a GitHub Python module repo, but I'm not entirely sure where to clone it. This is a simple module, just an __init__.py and some .py files. No other files need to be installed or changed outside of the module's folder.
I would like to be able to clone the repository directly in my site-packages folder. When I want to use the library as is, I would switch to the master branch. If I want to develop a new feature, can branch off of devel. If I want to try out a new feature someone else implemented, I can switch to that particular branch. I can even keep it in the development branch, to get the latest, albeit possibly unstable, features. All this without having to change the import statement to point to a different location in any of my scripts. This option, even though is seems to do all the things I want it to do, seems a bit wrong for some reason. Also, I'm not sure what this would do to pip when calling python -m pip list --outdated. I have a feeling it won't know what the current version is.
Another option would be to clone it to some other folder and keep only the pip-installed variant in the site-packages folder. That way I would have a properly installed library in site-packages and I could try out new features by creating a script inside the repo folder. This doesn't seem nearly as flexible as the option above, but it doesn't mess with the site-packages folder.
Which is the best way to go about this? How do you clone repositories when you both want to work on them and use them with the latest features?
I think this is more a question about packaging and open source than Python itself, but I'll try to help you out.
If you want to host your package on Pip, you should go here, and there you'll see how to upload and tag appropriately your package for usage.
If you want to add some functionality to some open source library, what you could do is to try to submit a Pull Request to that library, so everybody can use it. Rules for PR are specific for each project, you you should ask the maintainer.
If your modification doesn't get merged into master, but you still want to use it without changing import statements, you could fork that repo, and publish your own modifications on, for instance, Github.
In that case, you could install you modifications like this:
pip install git+https://github.com/username/amazing-project.git
So in that way, your library will come from your own repo.
If you're going for the third option, I strongly recommend you using virtualenv, where you can create different virtual environments with different packages, dependencies and so on, without messing up with your Python installation. A nice guide is available here.

setup.py + virtualenv = chicken and egg issue?

I'm a Java/Scala dev transitioning to Python for a work project. To dust off the cobwebs on the Python side of my brain, I wrote a webapp that acts as a front-end for Docker when doing local Docker work. I'm now working on packaging it up and, as such, am learning about setup.py and virtualenv. Coming from the JVM world, where dependencies aren't "installed" so much as downloaded to a repository and referenced when needed, the way pip handles things is a bit foreign. It seems like best practice for production Python work is to first create a virtual environment for your project, do your coding work, then package it up with setup.py.
My question is, what happens on the other end when someone needs to install what I've written? They too will have to create a virtual environment for the package but won't know how to set it up without inspecting the setup.py file to figure out what version of Python to use, etc. Is there a way for me to create a setup.py file that also creates the appropriate virtual environment as part of the install process? If not — or if that's considered a "no" as this respondent stated to this SO post — what is considered "best practice" in this situation?
You can think of virtualenv as an isolation for every package you install using pip. It is a simple way to handle different versions of python and packages. For instance you have two projects which use same packages but different versions of them. So, by using virtualenv you can isolate those two projects and install different version of packages separately, not on your working system.
Now, let's say, you want work on a project with your friend. In order to have the same packages installed you have to share somehow what versions and which packages your project depends on. If you are delivering a reusable package (a library) then you need to distribute it and here where setup.py helps. You can learn more in Quick Start
However, if you work on a web site, all you need is to put libraries versions into a separate file. Best practice is to create separate requirements for tests, development and production. In order to see the format of the file - write pip freeze. You will be presented with a list of packages installed on the system (or in the virtualenv) right now. Put it into the file and you can install it later on another pc, with completely clear virtualenv using pip install -r development.txt
And one more thing, please do not put strict versions of packages like pip freeze shows, most of time you want >= at least X.X version. And good news here is that pip handles dependencies by its own. It means you do not have to put dependent packages there, pip will sort it out.
Talking about deploy, you may want to check tox, a tool for managing virtualenvs. It helps a lot with deploy.
Python default package path always point to system environment, that need Administrator access to install. Virtualenv able to localised the installation to an isolated environment.
For deployment/distribution of package, you can choose to
Distribute by source code. User need to run python setup.py --install, or
Pack your python package and upload to Pypi or custom Devpi. So the user can simply use pip install <yourpackage>
However, as you notice the issue on top : without virtualenv, they user need administrator access to install any python package.
In addition, the Pypi package worlds contains a certain amount of badly tested package that doesn't work out of the box.
Note : virtualenv itself is actually a hack to achieve isolation.

Best practice for installing python modules from an arbitrary VCS repository

I'm newish to the python ecosystem, and have a question about module editing.
I use a bunch of third-party modules, distributed on PyPi. Coming from a C and Java background, I love the ease of easy_install <whatever>. This is a new, wonderful world, but the model breaks down when I want to edit the newly installed module for two reasons:
The egg files may be stored in a folder or archive somewhere crazy on the file system.
Using an egg seems to preclude using the version control system of the originating project, just as using a debian package precludes development from an originating VCS repository.
What is the best practice for installing modules from an arbitrary VCS repository? I want to be able to continue to import foomodule in other scripts. And if I modify the module's source code, will I need to perform any additional commands?
Pip lets you install files gives a URL to the Subversion, git, Mercurial or bzr repository.
pip install -e svn+http://path_to_some_svn/repo#egg=package_name
Example:
pip install -e hg+https://rwilcox#bitbucket.org/ianb/cmdutils#egg=cmdutils
If I wanted to download the latest version of cmdutils. (Random package I decided to pull).
I installed this into a virtualenv (using the -E parameter), and pip installed cmdutls into a src folder at the top level of my virtualenv folder.
pip install -E thisIsATest -e hg+https://rwilcox#bitbucket.org/ianb/cmdutils#egg=cmdutils
$ ls thisIsATest/src
cmdutils
Are you wanting to do development but have the developed version be handled as an egg by the system (for instance to get entry-points)? If so then you should check out the source and use Development Mode by doing:
python setup.py develop
If the project happens to not be a setuptools based project, which is required for the above, a quick work-around is this command:
python -c "import setuptools; execfile('setup.py')" develop
Almost everything you ever wanted to know about setuptools (the basis of easy_install) is available from the the setuptools docs. Also there are docs for easy_install.
Development mode adds the project to your import path in the same way that easy_install does. An changes you make will be available to your apps the next time they import the module.
As others mentioned, you can also directly use version control URLs if you just want to get the latest version as it is now without the ability to edit, but that will only take a snapshot, and indeed creates a normal egg as part of the process. I know for sure it does Subversion and I thought it did others but I can't find the docs on that.
You can use the PYTHONPATH environment variable or symlink your code to somewhere in site-packages.
Packages installed by easy_install tend to come from snapshots of the developer's version control, generally made when the developer releases an official version. You're therefore going to have to choose between convenient automatic downloads via easy_install and up-to-the-minute code updates via version control. If you pick the latter, you can build and install most packages seen in the python package index directly from a version control checkout by running python setup.py install.
If you don't like the default installation directory, you can install to a custom location instead, and export a PYTHONPATH environment variable whose value is the path of the installed package's parent folder.

Is site-packages appropriate for applications or just libraries?

I'm in a bit of a discussion with some other developers on an open source project. I'm new to python but it seems to me that site-packages is meant for libraries and not end user applications. Is that true or is site-packages an appropriate place to install an application meant to be run by an end user?
Once you get to the point where your application is ready for distribution, package it up for your favorite distributions/OSes in a way that puts your library code in site-packages and executable scripts on the system path.
Until then (i.e. for all development work), don't do any of the above: save yourself major headaches and use zc.buildout or virtualenv to keep your development code (and, if you like, its dependencies as well) isolated from the rest of the system.
We do it like this.
Most stuff we download is in site-packages. They come from pypi or Source Forge or some other external source; they are easy to rebuild; they're highly reused; they don't change much.
Must stuff we write is in other locations (usually under /opt, or c:\opt) AND is included in the PYTHONPATH.
There's no great reason for keeping our stuff out of site-packages. However, our feeble excuse is that our stuff changes a lot. Pretty much constantly. To reinstall in site-packages every time we think we have something better is a bit of a pain.
Since we're testing out of our working directories or SVN checkout directories, our test environments make heavy use of PYTHONPATH.
The development use of PYTHONPATH bled over into production. We use a setup.py for production installs, but install to an alternate home under /opt and set the PYTHONPATH to include /opt/ourapp-1.1.
The program run by the end user is usually somewhere in their path, with most of the code in the module directory, which is often in site-packages.
Many python programs will have a small script located in the path, which imports the module, and calls a "main" method to run the program. This allows the programmer to do some upfront checks, and possibly modify sys.path if needed to find the needed module. This can also speed up load time on larger programs, because only files that are imported will be run from bytecode.
Site-packages is for libraries, definitely.
A hybrid approach might work: you can install the libraries required by your application in site-packages and then install the main module elsewhere.
If you can turn part of the application to a library and provide an API, then site-packages is a good place for it. This is actually how many python applications do it.
But from user or administrator point of view that isn't actually the problem. The problem is how we can manage the installed stuff. After I have installed it, how can I upgrade and uninstall it?
I use Fedora. If I use the python that came with it, I don't like installing things to site-packages outside the RPM system. In some cases I have built rpm myself to install it.
If I build my own python outside RPM, then I naturally want to use python's mechanisms to manage it.
Third way is to use something like easy_install to install such thing for example as a user to home directory.
So
Allow packaging to distributions.
Allow selecting the python to use.
Allow using python installed by distribution where you don't have permissions to site-packages.
Allow using python installed outside distribution where you can use site-packages.

Categories