locally modifying a python library (sklearn, linux)

locally modifying a python library (sklearn, linux) - python

I want to modify a couple of functions from sci-kit learn - adding a few lines at most.
Working on my Windows machine at home I can (and did) edit the source code directly (though I realize this is risky in view of future projects...).
Now, however, I'm working on a remote Linux server where I don't have privileges to edit /usr/lib/python2.7/dist-packages/sklearn/.../
What's the best way to go about this?

Depending of your modifications you can:
Override method/function in a new python module
Install library in a different folder for your project
pip install --install-option="--prefix=$PREFIX_PATH" package_name
Copy the library in your project (pip or git or cp)

Related

setup.py + virtualenv = chicken and egg issue?

I'm a Java/Scala dev transitioning to Python for a work project. To dust off the cobwebs on the Python side of my brain, I wrote a webapp that acts as a front-end for Docker when doing local Docker work. I'm now working on packaging it up and, as such, am learning about setup.py and virtualenv. Coming from the JVM world, where dependencies aren't "installed" so much as downloaded to a repository and referenced when needed, the way pip handles things is a bit foreign. It seems like best practice for production Python work is to first create a virtual environment for your project, do your coding work, then package it up with setup.py.
My question is, what happens on the other end when someone needs to install what I've written? They too will have to create a virtual environment for the package but won't know how to set it up without inspecting the setup.py file to figure out what version of Python to use, etc. Is there a way for me to create a setup.py file that also creates the appropriate virtual environment as part of the install process? If not — or if that's considered a "no" as this respondent stated to this SO post — what is considered "best practice" in this situation?

You can think of virtualenv as an isolation for every package you install using pip. It is a simple way to handle different versions of python and packages. For instance you have two projects which use same packages but different versions of them. So, by using virtualenv you can isolate those two projects and install different version of packages separately, not on your working system.
Now, let's say, you want work on a project with your friend. In order to have the same packages installed you have to share somehow what versions and which packages your project depends on. If you are delivering a reusable package (a library) then you need to distribute it and here where setup.py helps. You can learn more in Quick Start
However, if you work on a web site, all you need is to put libraries versions into a separate file. Best practice is to create separate requirements for tests, development and production. In order to see the format of the file - write pip freeze. You will be presented with a list of packages installed on the system (or in the virtualenv) right now. Put it into the file and you can install it later on another pc, with completely clear virtualenv using pip install -r development.txt
And one more thing, please do not put strict versions of packages like pip freeze shows, most of time you want >= at least X.X version. And good news here is that pip handles dependencies by its own. It means you do not have to put dependent packages there, pip will sort it out.
Talking about deploy, you may want to check tox, a tool for managing virtualenvs. It helps a lot with deploy.

Python default package path always point to system environment, that need Administrator access to install. Virtualenv able to localised the installation to an isolated environment.
For deployment/distribution of package, you can choose to
Distribute by source code. User need to run python setup.py --install, or
Pack your python package and upload to Pypi or custom Devpi. So the user can simply use pip install <yourpackage>
However, as you notice the issue on top : without virtualenv, they user need administrator access to install any python package.
In addition, the Pypi package worlds contains a certain amount of badly tested package that doesn't work out of the box.
Note : virtualenv itself is actually a hack to achieve isolation.

Export pip packages

I have a project with multiple dependencies installed using virtualenv and pip. I want to run my project on a server which does not have pip installed. Unfortunately, installing pip is not an option.
Is there a way to export my required packages and bundle them with my project? What is the common approach in this situation?

Twitter uses pex files to bundle Python code with its dependencies. This will produce a single file. Another relevant tool is platter which also aims to reduce the complexity of deploying Python code to a server.
Another alternative is to write a tool yourself which creates a zip file with the Python and dependencies and unzips it in the right location on the server.
In Python 3.5 the module zipapp was introduced to improve support for this way of deploying / using code. This allows you to manage the creation of zip files containing Python code and run them directly using the Python interpreter.

#Simeon Visser's answer is a good way to deal with that. Mine is to build my python project with buildout.

This may be outside of scope of the question, but if your need is deploying applications to servers with their dependencies, have a look at virtualization and linux containers.
It is by far the most used solution to this problem, and will work with any type of application (python or not), and it is lightweight (the performance hit of LXC is not noticeable in most cases, and isolating apps is a GREAT feature).
Docker containers, besides being trendy right now, are a very convenient way to deploy applications without caring about dependencies, etc...
The same goes for development envs with vagrant.

How to "build" a python script with its dependencies

I have a simple python shell script (no gui) who uses a couple of dependencies (requests and BeautifulfSoup4).
I would like to share this simple script over multiple computers. Each computer has already python installed and they are all Linux powered.
At this moment, on my development environments, the application runs inside a virtualenv with all its dependencies.
Is there any way to share this application with all the dependencies without the needing of installing them with pip?
I would like to just run python myapp.py to run it.

You will need to either create a single-file executable, using something like bbfreeze or pyinstaller or bundle your dependencies (assuming they're pure-python) into a .zip file and then source it as your PYTHONPATH (ex: PYTHONPATH=deps.zip python myapp.py).
The much better solution would be to create a setup.py file and use pip. Your setup.py file can create dependency links to files or repos if you don't want those machines to have access to the outside world. See this related issue.

As long as you make the virtualenv relocatable (use the --relocatable option on it in its original place), you can literally just copy the whole virtualenv over. If you create it with --copy-only (you'll need to patch the bug in virtualenv), then you shouldn't even need to have python installed elsewhere on the target machines.
Alternatively, look at http://guide.python-distribute.org/ and learn how to create an egg or wheel. An egg can then be run directly by python.

I haven't tested your particular case, but you can find source code (either mirrored or original) on a site like github.
For example, for BeautifulSoup, you can find the code here.
You can put the code into the same folder (probably a rename is a good idea, so as to not call an existing package). Just note that you won't get any updates.

Howto deploy python applications inside corporate network

First let me explain the current situation:
We do have several python applications which depend on custom (not public released ones) as well as general known packages. These depedencies are all installed on the system python installation. Distribution of the application is done via git by source. All these computers are hidden inside a corporate network and don't have internet access.
This approach is bit pain in the ass since it has the following downsides:
Libs have to be installed manually on each computer :(
How to better deploy an application? I recently saw virtualenv which seems to be the solution but I don't see it yet.
virtualenv creates a clean python instance for my application. How exactly should I deploy this so that usesrs of the software can easily start it?
Should there be a startup script inside the application which creates the virtualenv during start?
The next problem is that the computers don't have internet access. I know that I can specify a custom location for packages (network share?) but is that the right approach? Or should I deploy the zipped packages too?
Would another approach would be to ship the whole python instance? So the user doesn't have to startup the virutalenv? In this python instance all necessary packages would be pre-installed.
Since our apps are fast growing we have a fast release cycle (2 weeks). Deploying via git was very easy. Users could pull from a stable branch via an update script to get the last release - would that be still possible or are there better approaches?
I know that there are a lot questions. Hopefully someone can answer me r give me some advice.

You can use pip to install directly from git:
pip install -e git+http://192.168.1.1/git/packagename#egg=packagename
This applies whether you use virtualenv (which you should) or not.
You can also create a requirements.txt file containing all the stuff you want installed:
-e git+http://192.168.1.1/git/packagename#egg=packagename
-e git+http://192.168.1.1/git/packagename2#egg=packagename2
And then you just do this:
pip install -r requirements.txt
So the deployment procedure would consist in getting the requirements.txt file and then executing the above command. Adding virtualenv would make it cleaner, not easier; without virtualenv you would pollute the systemwide Python installation. virtualenv is meant to provide a solution for running many apps each in its own distinct virtual Python environment; it doesn't have much to do with how to actually install stuff in that environment.

Is site-packages appropriate for applications or just libraries?

I'm in a bit of a discussion with some other developers on an open source project. I'm new to python but it seems to me that site-packages is meant for libraries and not end user applications. Is that true or is site-packages an appropriate place to install an application meant to be run by an end user?

Once you get to the point where your application is ready for distribution, package it up for your favorite distributions/OSes in a way that puts your library code in site-packages and executable scripts on the system path.
Until then (i.e. for all development work), don't do any of the above: save yourself major headaches and use zc.buildout or virtualenv to keep your development code (and, if you like, its dependencies as well) isolated from the rest of the system.

We do it like this.
Most stuff we download is in site-packages. They come from pypi or Source Forge or some other external source; they are easy to rebuild; they're highly reused; they don't change much.
Must stuff we write is in other locations (usually under /opt, or c:\opt) AND is included in the PYTHONPATH.
There's no great reason for keeping our stuff out of site-packages. However, our feeble excuse is that our stuff changes a lot. Pretty much constantly. To reinstall in site-packages every time we think we have something better is a bit of a pain.
Since we're testing out of our working directories or SVN checkout directories, our test environments make heavy use of PYTHONPATH.
The development use of PYTHONPATH bled over into production. We use a setup.py for production installs, but install to an alternate home under /opt and set the PYTHONPATH to include /opt/ourapp-1.1.

The program run by the end user is usually somewhere in their path, with most of the code in the module directory, which is often in site-packages.
Many python programs will have a small script located in the path, which imports the module, and calls a "main" method to run the program. This allows the programmer to do some upfront checks, and possibly modify sys.path if needed to find the needed module. This can also speed up load time on larger programs, because only files that are imported will be run from bytecode.

Site-packages is for libraries, definitely.
A hybrid approach might work: you can install the libraries required by your application in site-packages and then install the main module elsewhere.

If you can turn part of the application to a library and provide an API, then site-packages is a good place for it. This is actually how many python applications do it.
But from user or administrator point of view that isn't actually the problem. The problem is how we can manage the installed stuff. After I have installed it, how can I upgrade and uninstall it?
I use Fedora. If I use the python that came with it, I don't like installing things to site-packages outside the RPM system. In some cases I have built rpm myself to install it.
If I build my own python outside RPM, then I naturally want to use python's mechanisms to manage it.
Third way is to use something like easy_install to install such thing for example as a user to home directory.
So
Allow packaging to distributions.
Allow selecting the python to use.
Allow using python installed by distribution where you don't have permissions to site-packages.
Allow using python installed outside distribution where you can use site-packages.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

locally modifying a python library (sklearn, linux) - python

Depending of your modifications you can: Override method/function in a new python module Install library in a different folder for your project pip install --install-option="--prefix=$PREFIX_PATH" package_name Copy the library in your project (pip or git or cp)

Related

setup.py + virtualenv = chicken and egg issue?

Export pip packages

How to "build" a python script with its dependencies

Howto deploy python applications inside corporate network

Is site-packages appropriate for applications or just libraries?

Categories

Resources