Idiom for script directory in python application?

Idiom for script directory in python application? - python

I have a python application (Django based), and I have a couple of standalone maintenance scripts that go along with the application, that I have to call every now and then. They have to import parts of my application (sub-packages). Currently, I just put them in my toplevel directory:
application/
djangoproject/
djangoapp/
otherpackage/
tool1.py
tool2.py
Where tool1.py would do
from djangoproject import wsgi
from djangoapp.models import Poll
I've accumulated quite some of these tools, and would like to move them to a scripts subdirectory. Then, I would like to be able to call them via python scripts/tool1.py or maybe cd scripts; python tool1.py.
I understand (and sometimes lament) how Python's imports work, and I know that I can add some lines to each script to add the parent directory to PYTHONPATH. I am wondering if there is a widespread pattern to handle such a collection of assorted scripts. Maybe one could put the path manipulation into another file, and have every script start with import mainproject?
I am using a virtualenv, and installing dependencies with pip. But the application itself currently doesn't use a setup.py, and I think it wouldn't help to move the scripts to a separate package installed via pip, since I change them a lot during development, and there are lots of one-offs.

The ways for organizing the source code vary from project to project. From the years of my experience, the best and the most pythonic way is to always have setup.py.
In that case, you can make pip install -e . and the editable version from . dir will be pseudo-installed to the virtualenv. Actually, not really installed (i.e. copied), but "linked": the source code dir will be added to sys.path with .pth files, so you can edit & try without any special copying/installing steps afterward.
More on that, you can extend setup.py with extra dependencies for e.g. the development purposes, and install them by pip install -e .[dev]. More like a fancy consequence.
The rest depends on the nature of the scripts.
If the scripts are part of the application, they should be installed via the entry-points in setup.py.
# setup.py:
setup(
entry_points={
'console_scripts': [
'tool1 = mytools.tool1:main',
'tool2 = mytools.tool2:main',
],
},
)
In that case, after pip install -e ., they will be in the bin folder of the virtualenv, or in /usr/local/bin or alike if the system python is used. You can execute them like this:
source .venv/bin/activate
tool1 ...
# OR:
~/path/to/venv/bin/tool2
The scripts installed this way are fully aware of the virtualenv, to which they were installed, so no activation and no explicit python binary are needed.
If the scripts are for the code maintenance, and not semantically part of the application, then they are usually put into ./scripts/ directory (or any other, e.g. ./ci/), with shebang at the top (#!/usr/bin/env python). E.g., tool1.py:
#!/usr/bin/env python
def main():
pass
if __name__ == '__main__':
main()
And executed in the current virtualenv due to this shebang as follows:
source .venv/bin/activate
./scripts/tool1.py ...
# OR:
~/path/to/venv/bin/python ./scripts/tool1.py
Unlike the scripts installed via the entry points, these scripts do not know about their own virtualenv in any way, so the virtualenv should be activate or proper python used explicitly.
This way is also used when the scripts are non-python, e.g. for the bash scripts.
In both cases, the requirements.txt file is sometimes used to pin the application's & dependencies' versions (with pip freeze), so that the deployments would be persistent & predictable. But this is another story — about the deployment of the application, not about the packaging & maintenance.
The requirements.txt file is regenerated from time to time to satisfy the new unpinned (i.e. flexible) requirements in setup.py and the new package versions available. But usually it is the generated content (despite being committed in the repo), not the content maintained by hand.
If you strictly do not want to have setup.py for any reason, then either execute those scripts with the modified env var:
PYTHONPATH=. python scripts/tool1.py
Or hack the sys.path from inside:
# tools1.py
import sys
import os
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
This is exactly what pip install -e . does, just done manually on every call, not once with the .pth file in the virtualenv. And also this looks hacky.
However, as we know, neither hacky solutions nor the duplicating solutions, especially those duplicating the standard toolkit, are considered "pythonic".

Related

Calling script in standard project directory structure (Python path for bin subdirectory)

I am experimenting with putting my Python code into the standard directory structure used for deployment with setup.py and maybe PyPI. for a Python library called mylib it would be something like this:
mylibsrc/
README.rst
setup.py
bin/
some_script.py
mylib/
__init.py__
foo.py
There's often also a test/ subdirectory but I haven't tried writing unit tests yet. The recommendation to have scripts in a bin/ subdirectory can be found in the official Python packaging documentation.
Of course, the scripts start with code that looks like this:
#!/usr/bin/env python
from mylib.foo import something
something("bar")
This works well when it eventually comes to deploying the script (e.g. to devpi) and then installing it with pip. But if I run the script directly from the source directory, as I would while developing new changes to the library/script, I get this error:
ImportError: No module named 'mylib'
This is true even if the current working directory is the root mylibsrc/ and I ran the script by typing ./bin/some_script.py. This is because Python starts searching for packages in the directory of the script being run (i.e. from bin/), not the current working directory.
What is a good, permament way to make it easy to run scripts while developing packages?
Here is a relevant other question (especially comments to the first answer).
The solutions for this that I've found so far fall into three categories, but none of them are ideal:
Manually fix up your Python's module search path somehow before running your scripts.
You can manually add mylibsrc to my PYTHONPATH environment variable. This seems to be the most official (Pythonic?) solution, but means that every time I check out a project I have to remember to manually change my environment before I can run any code in it.
Add . to the start of my PYTHONPATH environment variable. As I understand it this could have some security problems. This would actually be my favoured trick if I was the only person to use my code, but I'm not, and I don't want to ask others to do this.
While looking at answers on the internet, for files in a test/ directory I've seen recommendations that they all (indirectly) include a line of code sys.path.insert(0, os.path.abspath('..')) (e.g. in structuring your project). Yuck! This seems like a bearable hack for files that are only for testing, but not those that will be installed with the package.
Edit: I have since found an alternative, which turns out to be in this category: by running the scripts with Python's -m script, the search path starts in the working directory instead of the bin/ directory. See my answer below for more details.
Install the package to a virtual environment before using it, using a setup.py (either running it directly or using pip).
This seems like overkill if I'm just testing a change that I'm not sure is even syntactically correct yet. Some of the projects I'm working on aren't even meant to be installed as packages but I want to use the same directory structure for everything, and this would mean writing a setup.py just so I could test them!
Edit: Two interesting variants of this are discussed in the answers below: the setup.py develop command in logc's answer and pip install -e in mine. They avoid having to re-"install" for every little edit, but you still need to create a setup.py for packages you never intend to fully install, and doesn't work very well with PyCharm (which has a menu entry to run the develop command but no easy way to run the scripts that it copies to the virtual environment).
Move the scripts to the project's root directory (i.e. in mylibsrc/ instead of mylibsrc/bin/).
Yuck! This is a last resort but, unfortunately, this seems like the only feasible option at the moment.

Run modules as scripts
Since I posted this question, I've learnt that you can run a module as if it were a script using Python's -m command-line switch (which I had thought only applied to packages).
So I think the best solution is this:
Instead of writing wrapper scripts in a bin subdirectory, put the bulk of the logic in modules (as you should anyway), and put at the end of relevant modules if __name__ == "__main__": main(), as you would in a script.
To run the scripts on the command line, call the modules directly like this: python -m pkg_name.module_name
If you have a setup.py, as Alik said you can generate wrapper scripts at installation time so your users don't need to run them in this funny way.
PyCharm doesn't support running modules in this way (see this request). However, you can just run modules (and also scripts in bin) like normal because PyCharm automatically adds the project root to the PYTHONPATH, so import statements resolve without any further effort. There are a few gotchas for this though:
The main problem the working directory will be incorrect, so opening data files won't work. Unfortunately there is no quick fix; the first time you run each script, you must then stop it and change its configured working directory (see this link).
If your package directory is not directly within the root project directory, you need to mark its parent directory as a source directory in the project structure settings page.
Relative imports don't work i.e. you can do from pkg_name.other_module import fn but not from .other_module import fn. Relative imports are usually poor style anyway, but they're useful for unit tests.
If a module has a circular dependency and you run it directly, it will end up being imported twice (once as pkg_name.module_name and once as __main__). But you shouldn't have circular dependencies anyway.
Bonus command line fun:
If you still want to put some scripts in bin/ you can call them with python -m bin.scriptname (but in Python 2 you'll need to put an __init__.py in the bin directory).
You can even run the overall package, if it has a __main__.py, like this: python -m pkg_name
Pip editable mode
There is an alternative for the command line, which is not as simple, but still worth knowing about:
Use pip's editable mode, documented here
To use it, make a setup.py, and use the following command to install the package into your virtual environment: pip install -e .
Note the trailing dot, which refers to the current directory.
This puts the scripts generated from your setup.py in your virtual environment's bin directory, and links to your package source code so you can edit and debug it without re-running pip.
When you're done, you can run pip uninstall pkg_name
This is similar to setup.py's develop command, but uninstallation seems to work better.

The simplest way is to use setuptools in your setup.py script, and use the entry_points keyword, see the documentation of Automatic Script Creation.
In more detail: you create a setup.py that looks like this
from setuptools import setup
setup(
# other arguments here...
entry_points={
'console_scripts': [
'foo = my_package.some_module:main_func',
'bar = other_module:some_func',
],
'gui_scripts': [
'baz = my_package_gui:start_func',
]
}
)
then create other Python packages and modules underneath the directory where this setup.py exists, e.g. following the above example:
.
├── my_package
│   ├── __init__.py
│   └── some_module.py
├── my_package_gui
│   └── __init__.py
├── other_module.py
└── setup.py
and then run
$ python setup.py install
or
$ python setup.py develop
Either way, new Python scripts (executable scripts without the .py suffix) are created for you that point to the entry points you have described in setup.py. Usually, they are at the Python interpreter's notion of "directory where executable binaries should be", which is usually on your PATH already. If you are using a virtual env, then virtualenv tricks the Python interpreter into thinking this directory is bin/ under wherever you have defined that the virtualenv should be. Following the example above, in a virtualenv, running the previous commands should result in:
bin
├── bar
├── baz
└── foo

how to set different PYTHONPATH variables for python3 and python2 respectively

I want to add a specific library path only to python2. After adding export PYTHONPATH="/path/to/lib/" to my .bashrc, however, executing python3 gets the error: Your PYTHONPATH points to a site-packages dir for Python 2.x but you are running Python 3.x!
I think it is due to that python2 and python3 share the common PYTHONPATH variable.
So, can I set different PYTHONPATH variables respectively for python2 and python3. If not, how can I add a library path exclusively to a particular version of python?

PYTHONPATH is somewhat of a hack as far as package management is concerned. A "pretty" solution would be to package your library and install it.
This could sound more tricky than it is, so let me show you how it works.
Let us assume your "package" has a single file named wow.py and you keep it in /home/user/mylib/wow.py.
Create the file /home/user/mylib/setup.py with the following content:
from setuptools import setup
setup(name="WowPackage",
packages=["."],
)
That's it, now you can "properly install" your package into the Python distribution of your choice without the need to bother about PYTHONPATH. As far as "proper installation" is concerned, you have at least three options:
"Really proper". Will copy your code to your python site-packages directory:
$ python setup.py install
"Development". Will only add a link from the python site-packages to /home/user/mylib. This means that changes to code in your directory will have effect.
$ python setup.py develop
"User". If you do not want to write to the system directories, you can install the package (either "properly" or "in development mode") to /home/user/.local directory, where Python will also find them on its own. For that, just add --user to the command.
$ python setup.py install --user
$ python setup.py develop --user
To remove a package installed in development mode, do
$ python setup.py develop -u
or
$ python setup.py develop -u --user
To remove a package installed "properly", do
$ pip uninstall WowPackage
If your package is more interesting than a single file (e.g. you have subdirectories and such), just list those in the packages parameter of the setup function (you will need to list everything recursively, hence you'll use a helper function for larger libraries). Once you get a hang of it, make sure to read a more detailed manual as well.
In the end, go and contribute your package to PyPI -- it is as simple as calling python setup.py sdist register upload (you'll need a PyPI username, though).

You can create a configuration file mymodule.pth under lib/site-packages (on Windows) or lib/pythonX.Y/site-packages (on Unix and Macintosh), then add one line containing the directory to add to python path.
From docs.python2 and docs.python3:
A path configuration file is a file whose name has the form name.pth and exists in one of the four directories mentioned above; its contents are additional items (one per line) to be added to sys.path. Non-existing items are never added to sys.path, and no check is made that the item refers to a directory rather than a file. No item is added to sys.path more than once. Blank lines and lines beginning with # are skipped. Lines starting with import (followed by space or tab) are executed.

I found that there is no way to modify PYTHONPATH that is only for python2 or only for python3. I had to use a .pth file.
What I had to do was:
make sure directory is created in my home: $HOME/.local/lib/python${MAJOR_VERSION}.${MINOR_VERSION}/site-packages
create a .pth file in that directory
test that your .pth file is work
done
For more info on `.pth. file syntax and how they work please see: python2 docs and python3 docs.
(.pth files in a nutshell: when your python interpreter starts it will look in certain directories and see the .pth file, open those files, parse the files, and add those directories to your sys.path (i.e. the same behavior as PYTHONPATH) and make any python modules located on those directories available for normal importing.)

If you don't want to bother with moving/adding documents in lib/site-packages, try adding two lines of code in the python2.7 script you would like to run (below.)
import sys
sys.path = [p for p in sys.path if p.startswith(r'C:\Python27')]
This way, PYTHONPATH will be updated (ignore all python3.x packages) every time you run your code.

How to switch between test code and production code in python?

I have a project that is constantly undergoing development. I have installed a release of the project in my python distribution's site-packages directory using the setup.py script for the project.
However, when I make changes to the project I would like my test scripts to find the files that are under the project's directory and not those that it finds in site-packages. What is the proper way to do this? I only know of one approach which is to modify the search path in the test script itself using sys.path, but then it means that I cannot use the same scripts to test the "installed" version of my codes without editing the sys.path again.

I'm not quite sure what you are asking but you could use
python setup.py develop to create a develop version of your project
https://pythonhosted.org/setuptools/setuptools.html#development-mode
Under normal circumstances, the distutils assume that you are going to
build a distribution of your project, not use it in its “raw” or
“unbuilt” form. If you were to use the distutils that way, you would
have to rebuild and reinstall your project every time you made a
change to it during development.
Another problem that sometimes comes up with the distutils is that you
may need to do development on two related projects at the same time.
You may need to put both projects’ packages in the same directory to
run them, but need to keep them separate for revision control
purposes. How can you do this?
Setuptools allows you to deploy your projects for use in a common
directory or staging area, but without copying any files. Thus, you
can edit each project’s code in its checkout directory, and only need
to run build commands when you change a project’s C extensions or
similarly compiled files. You can even deploy a project into another
project’s checkout directory, if that’s your preferred way of working
(as opposed to using a common independent staging area or the
site-packages directory).

Use "Editable" package installation like:
pip install -e path/to/SomeProject
Assuming we are in the same directory with setup.py, the command will be:
pip install -e .

Use local copy of python package rather than the one installed in site-packages

I've installed a Python based package in my site-packages directory. However, I'm trying to learn how the code works so I'd like to hack it a bunch by putting in lots of print statements so I can understand what the code is doing. But at the end of the day I want a clean installation without all my hacks in it.
Of course I could just copy the original files to something else, make some hacks, and then at the end copy all the original files back over. But that's really tedious. At the very least, I'd like to install a local copy of the Python package and then have the python script use this copy preferentially (perhaps by suitable statements at the top of the script). But perhaps this isn't even the best way to do python development/hacking.
What's the best solution for my problem? I want to be able to hack on the package (and use that package) but without messing up my clean version.

Take a look at virtualenv. You can basically setup a local python environment in which you can install anything you like without having to mess around with the system environment.

The virtualenv-advice given is correct, depending on your actual package you can even go beyond that and not mess with the site-packages inside the virtualenv at all.
If the package is setuptools-based, a simple
$ python setup.py develop
from within a working-copy of it's source, it won't be installed, but instead just hooked into the virtualenv pointing to the working-copy. Advantage: you can edit (and e.g. rollback using GIT or whatever SCM the package maintainer uses) files in a well-defined and non-volatile location.

This is what the Python virtualenv tool is for. It allows you to create a local Python environment with a set of packages distinct from your system installation. For example, I could do something like this:
$ virtualenv myenv
$ . myenv/bin/activate
$ pip install nifty-module
The activate script modifies your PATH so that any script that starts with:
#!/usr/bin/env python
will use the Python from your virtual environment, rather than the system Python, and will see the modules installed in that environment.

PYTHONPATH vs symbolic link

Yesterday, I edited the bin/activate script of my virtualenv so that it sets the PYTHONPATH environment variable to include a development version of some external package. I had to do this because the setup.py of the package uses distutils and does not support the develop command à la setuptools. Setting PYTHONPATH works fine as far as using the Python interpreter in the terminal is concerned.
However, just now I opened the project settings in PyCharm and discovered that PyCharm is unaware of the external package in question - PyCharm lists neither the external package nor its path. Naturally, that's because PyCharm does not (and cannot reliably) parse or source the bin/activate script. I could manually add the path in the PyCharm project settings, but that means I have to repeat myself (once in bin/activate, and again in the PyCharm project settings). That's not DRY and that's bad.
Creating, in site-packages, a symlink that points to the external package is almost perfect. This way, at least the source editor of PyCharm can find the package and so does the Python interpreter in the terminal. However, somehow PyCharm still does not list the package in the project settings and I'm not sure if it's ok to leave it like that.
So how can I add the external package to my virtualenv/project in such a way that…
I don't have to repeat myself; and…
both the Python interpreter and PyCharm would be aware of it?

Even when a package is not using setuptools pip monkeypatches setup.py to force it to use setuptools.
Maybe you can remove that PYTHONPATH hack and pip install -e /path/to/package.

One option is to add path dynamically:
try:
import foo
except ImportError:
sys.path.insert(0. "/path/to/your/package/directory")
import foo
But it is not the best solution because it is very likely that that code will not get into the final version of the application. One more option (and more appropriate imho) is to make simple setup.py file for package and deploy it in virtualenv with develop parameter or by pip with -e parameter:
python setup.py develop
or:
pip install -e /path/to/your/package/directory
http://packages.python.org/distribute/setuptools.html#development-mode

This is an improvement on ndpu's answer that will work regardless of where the real file is.
You can dereference the symlink and then set sys.path before importing local imports.
import os.path
import sys
# Ensure this file is dereferenced if it is a symlink
if __name__ == '__main__' and os.path.islink(__file__):
try:
sys.path.remove(os.path.dirname(__file__))
except ValueError:
pass
sys.path.insert(0, os.path.dirname(os.path.realpath(__file__)))
# local imports go here

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.