How to modify / edit an installed anaconda package - python

I have some issues with a published package and wish to edit the code myself (may generate a pull request later to contribute). I am quite confused about how to do this since it seems there is a lack of step-by-step guidance. Could anybody give me a very detailed instruction about how this is done (or a link)? My understanding and also my questions about the workflow are:
Fork the package through git/github and have a local synced copy (done!).
Create a new Anaconda environment (done!)?
Install the package as normal: $conda install xxx or $python setup.py develop?
Do I make changes to the package directly in the package folder in Anaconda if I use python setup.py develop?
Or make changes to the local forked copy and install/update again and what are the commands for this?
Do I need to update the setup.py file as well before running it either way?

You can simply git-clone the package repo to your local computer and then install it in "development" or "editable" mode. This way you can easily make changes to the code while at the same time incorporating it into your own projects. Of course, this will also allow you to create pull requests later on.
Using Anaconda (or Miniconda) you have 2 equivalent options for this:
using conda (conda-develop):
conda develop <path_to_local_repo>
using pip (pip install options)
pip install --editable <path_to_local_repo>
What these commands basically do is creating a link to the local repo-folder inside the environments site-packages folder.
Note that for "editable" pip installs you need a a basic setup.py:
import setuptools
setuptools.setup(name=<anything>)
On the other hand the conda develop <path_to_local_repo> command unfortunately doesn't work in environment.yml files.

Related

Using virtual environments to manage develop/deploy packages in Jupyterhub

I have a package that I am developing for a local server. I would like to have the current stable release importable in a Jupyter notebook using import my_package and the current development state importable (for end-to-end testing and stuff) with import my_package_dev, or something like that.
The package is version controlled with git. The master branch holds the stable release, and new development work is done in the develop branch.
I currently pulled these two branches into two different folders:
my_package/ # tracks master branch of repository
setup.py
requirements.txt
my_package/
__init__.py
# other stuff
my_package_dev/ # tracks develop branch of repository
setup.py
requirements.txt
my_package/
__init__.py
# other stuff for dev branch
My setup.py file looks like this:
from setuptools import setup
setup(
name='my_package', # or 'my_package_dev' for the dev version
# metadata stuff...
)
I can pip install my_package just fine, but I have been unable to get anything to link to the name my_package_dev in Python.
Things I have tried
pip install my_package_dev
Doesn't seem to overwrite the existing my_package, but doesn't seem to make my_package_dev available either, even though pip says it finishes OK.
pip install -e my_package_dev
makes an egg and puts the development package path in easy-install.pth, but I cannot import my_package_dev, and my_package is still the old content.
Adding a file my_package_dev.pth to site-packages directory and filling it with /path/to/my_package_dev
causes no visible change. Still does not allow me to import my_package_dev.
Thoughts on a solution
It looks like the best approach is going to be to use virtual environments, as discussed in the answers.
With pip install you install packages by its name in setup.py's name attribute. If you have installed both and execute pip freeze, you will see both packages listed. Which code is available depends on how they are included in Python path.
The issue is those two packages contains just a python module named my_package, that it why you can not import my_package_dev (it does not exist).
I would suggest you to have an working copy for each version (without modifying package name) and use virtualenv to keep environments isolated (one virtualenv for stable version and the other for dev).
You could also use pip's editable install to keep the environment updated with the working copies.
Note: Renaming my_package_dev's my_package module directory to my_package_dev, will also work. But it will be harder to merge changes from one version to the other.
The answer provided by Gonzalo got me on the right track: use virtual environments to manage two different builds. I created the virtual environment for the master (stable) branch with:
$ cd my_package
$ virtualenv venv # make the virtual environment
$ source venv/bin/activate
(venv) $ pip install -r requirements.txt # install everything listed as a requirement
(venv) $ pip install -e . # install my_package dynamicially so that any changes are visible right away
(venv) $ sudo venv/bin/python -m ipykernel install --name 'master' --display-name 'Python 3 (default)'
And for the develop branch, I followed the same procedure in my my_package_dev folder, giving it a different --name and --display-name value.
Note that I needed to use sudo for the final ipykernel install command because I kept getting permission denied errors on my system. I would recommend trying without sudo first, but for this system it needed to be installed system-wide.
Finally, to switch between which version of the tools I am using, I just have to select Kernel -> Change kernel and choose Python 3 (default) or Python 3 (develop). The import stays the same (import my_package), so nothing in the notebook has to change.
This isn't quite my ideal scenario since it means that I will then have to re-run the whole notebook any time I change kernels, but it works!

How to debug python system library modules [duplicate]

Two options in setup.py develop and install are confusing me. According to this site, using develop creates a special link to site-packages directory.
People have suggested that I use python setup.py install for a fresh installation and python setup.py develop after any changes have been made to the setup file.
Can anyone shed some light on the usage of these commands?
python setup.py install is used to install (typically third party) packages that you're not going to develop/modify/debug yourself.
For your own stuff, you want to first install your package and then be able to frequently edit the code without having to re-install the package every time — and that is exactly what python setup.py develop does: it installs the package (typically just a source folder) in a way that allows you to conveniently edit your code after it’s installed to the (virtual) environment, and have the changes take effect immediately.
Note: It is highly recommended to use pip install . (regular install) and pip install -e . (developer install) to install packages, as invoking setup.py directly will do the wrong things for many dependencies, such as pull prereleases and incompatible package versions, or make the package hard to uninstall with pip.
Update:
The develop counterpart for the latest python -m build approach is as follows (as per):
From the documentation. The develop will not install the package but it will create a .egg-link in the deployment directory back to the project source code directory.
So it's like installing but instead of copying to the site-packages it adds a symbolic link (the .egg-link acts as a multiplatform symbolic link).
That way you can edit the source code and see the changes directly without having to reinstall every time that you make a little change. This is useful when you are the developer of that project hence the name develop. If you are just installing someone else's package you should use install
Another thing that people may find useful when using the develop method is the --user option to install without sudo. Ex:
python setup.py develop --user
instead of
sudo python setup.py develop

Python - how to let users run script with libraries

I plan on sharing a Python program on GitHub.
However it uses additional libraries like http, Selenium, BeautifulSoup and the Google Calendar API.
How do I include these libraries in the directory I push to GitHub, so that all users have to do is run python script.py, instead of having to install the libraries?
I thought of generating an executable with pyinstaller but that didn't work :/
You use a pip requirements.txt file for this.
If you have done your work inside of a virtual environment then run on your command line/terminal:
pip freeze > requirements.txt
Then commit and push the file to your github repository.
If you have not done your script within a virtual environment then run:
pip freeze > requirements.txt
And edit the file so you only have the modules you require.
I'd recommend always use a virtual environment for this since it makes your application easy to share. In django framework employing virtualenv's is very common.
Your collaborators can install your dependencies using:
pip install -r requirements.txt
After cloning your github repo.
Usually you don't need to embed your dependencies in your project (not practical! specially when they are many). Instead you may include requirements.txt inside your project to list the modules (and version number) that are required by your application. Then when a user need to use your script, they can run something like this:
pip install -r requirements.txt
read more about requirements files here:
https://pip.readthedocs.org/en/1.1/requirements.html#requirements-files

How to install python packages without root privileges?

I am using numpy / scipy / pynest to do some research computing on Mac OS X. For performance, we rent a 400-node cluster (with Linux) from our university so that the tasks could be done parallel. The problem is that we are NOT allowed to install any extra packages on the cluster (no sudo or any installation tool), they only provide the raw python itself.
How can I run my scripts on the cluster then? Is there any way to integrate the modules (numpy and scipy also have some compiled binaries I think) so that it could be interpreted and executed without installing packages?
You don't need root privileges to install packages in your home directory. You can do that with a command such as
pip install --user numpy
or from source
python setup.py install --user
See https://stackoverflow.com/a/7143496/284795
The first alternative is much more convenient, so if the server doesn't have pip or easy_install, you should politely ask the admins to add it, explaining the benefit to them (they won't be bothered anymore by requests for individual packages).
You could create a virtual environment through the virtualenv package.
This creates a folder (say venv) with a new copy of the Python executable and a new site-packages directory, into which you can "install" any number of packages without needing any kind of administrative access at all. Thus, activating the environment through source venv/bin/activate will give Python an environment that's equivalent to having those packages installed.
I know this works for SGE clusters, although how the virtual environment is activated might depend on your cluster's configuration.
You can try installing virtualenv on your cluster within your own site-packages directory using the following steps:
Download virtualenv from here, put it on your cluster
Install it using setup.py to a specific, local directory to serve as your own site-packages:
python setup.py build
python setup.py install --install-base /path/to/local-site-packages
Add that directory to your PYTHONPATH:
export PYTHONPATH="/path/to/local-site-packages:${PYTHONPATH}"
Create a virtualenv:
virtualenv venv
You can import a module from an arbitrary path by calling:
sys.path.append()
The Python Distribution Anaconda solves many of the issues discussed in this questions. Anaconda does not require Admin or root access and is able to install to your home directory. Anaconda comes with many of the packages in question (scipy, numpy, sklearn, etc...) as well as the conda installer to install additional packages should additional ones be necessary.
It can be downloaded from https://www.continuum.io/downloads

Working with many different modules in python

I am new to python just a few weeks back i started using python(Classic Noob-Disclaimer)
Now whenever i install a module by copying the unzipped folder in site-packages under Lib and running the source install by using "c:\python27\lib\site-packages\tweepy-1.2\setup.py install" in command prompt it installs without any errors.
But now when i make a python script (*.py)
and store it on the desktop it wont work
and it gives out an error "No module found"
but when i store it in the same folder as the source it works perfectly.
also if i open the IDLE GUI it also returns the same error.
But this doesnt happen with the win32com module which i use for TTS.
I missing something..but i cudnt find the answer to it.
Plz help me!
i need to use many of these modules..they work great differently but not together as the modules are always missing!
Copying an unzipped folder to site-packages does not install a Python package.
To install manually, unzip the package to a temporary directory, then run:
python setup.py install
in this directory, after that you can remove the directory.
To download and install a pure Python package automatically, run:
pip install tweepy
if you have pip installed.
The simplest way to install Python packages that have C extensions is to use binary installers (*.exe, *.msi files).
To avoid all this use VirtualEnv
Virtualenv is a tool to create isolated Python environments.
The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into /usr/lib/python2.7/site-packages (or whatever your platform's standard location is), it's easy to end up in a situation where you unintentionally upgrade an application that shouldn't be upgraded.
Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.
Also, what if you can't install packages into the global site-packages directory? For instance, on a shared host.
The easiest way to install python packages is by using pip. First you need to install pip as explained here if you use windows. Then you can query some packages, from command line, for example
> pip search twitter
Then to install certain packages, just use pip something like this:
> pip install tweepy

Categories