Python scripts that depend on binaries... how to distribute? - python

I have a codebase that includes some C++ code and Python scripts that make use of the resulting binaries (via the subprocess module).
root/
experiments/
script_1.py (needs to call binary_1)
clis/
binary_1.cc
binary_1
What's the best way to refer to the binary from the Python scripts?
A relative path from the Python script's directory to the binary, which assumes the user will be running the Python script from a particular directory
Just the binary name, which assumes the user will have added the binary's directory to the $PATH variable, or copied the binary to /usr/local/bin, or something
Something else?

If your binaries are pre-compiled you can use the data_files parameter to setuptools. Have it installed in /usr/local/bin.
data_files=[("/usr/local/bin", glob("bin/*"))], ...

You could use __file__ to find out the location of the Python script, so it wouldn't matter where the user ran the script from.
path = os.path.normpath(os.path.join(
os.path.dirname(__file__), '..', 'clis', 'binary_1'
))

In my experience, the best way to integrate your C(pp) code in your Python program is to make a compiled Python module out of the C(pp) code instead of using the subprocess module as you are now doing.
In addition to a more consistent and readable Python codebase, you get the added benefit of modularity (solving among others the $PATH issues) and can use distutils as build tool. Distribution is also easier, then, as setup.py automates it.

Related

How to "install" Python code from CMake?

I have a mainly c++ project that I use CMake to manage. After setting cmake_install_prefix and configuring, it generates makefiles which can then be used to build and install with the very standard:
make
make install
At this point, my binaries end up in cmake_install_prefix, and they can be executed with no additional work. Recently I've added some Python scripts to a few places in the source tree, and some of them depend on others. I can use CMake to copy the Python files+directory structure to the cmake_install_prefix, but if I go into that path and try to use one of the scripts, Python cannot find the other scripts used as imports because PYTHONPATH does not contain cmake_install_prefix. I know you can set an environment variable with CMake, but it doesn't persist across shells, so it's not really "setup" for the user for more than the current terminal session.
The solution seems to be to add a step to your software build instructions that says "set your PYTHONPATH". Is there any way to avoid this? Is this the standard practice for "installing" Python scripts as part of a bigger project? It seems to really complicate things like setting up continuous integration for the project, as something like Jenkins has to be manually configured to inject environment variables, whereas nothing special was required for it to build and execute executables built from c++ code.
Python provides sys.path list, which is used for search modules with import directives. You may adjust this list before include your modules:
script1.py:
# Do some things useful for other scripts
script2.py.in:
# Uses script1.py.
...
sys.path.insert(1, "#SCRIPT1_INSTALL_PATH#")
import script1
...
CMakeLists.txt:
...
# Installation path for script1. Depends from CMAKE_INSTALL_PREFIX.
set(SCRIPT1_INSTALL_PATH ${CMAKE_INSTALL_PREFIX}/<...>)
install(FILES script1.py DESTINATION ${SCRIPT1_INSTALL_PATH}
# Configure 'sys.path' in script2.py, so it may find script1.py.
configure_file("script2.py.in" "script2.py" #ONLY)
set(SCRIPT2_INSTALL_PATH ${CMAKE_INSTALL_PREFIX}/<...>)
install(FILES script2.py DESTINATION ${SCRIPT2_INSTALL_PATH}
...
If you want script2.py to work both in build tree and in install tree, you need to have two instances of it, one which works in build tree, and one which works after being installed. Both instances may be configured from single .in file.
In case of compiled executables and libraries, similar mechanism is uses for help binaries to find libraries in non-standard locations. It is known as RPATH.
Because CMake
knows every binary created (it tracks add_executable and add_library calls),
knows linkage between binaries (target_link_libraries call is also tracked),
has full control over linking procedure,
CMake is able to automatically adjust RPATH when install binaries.
In case of Python scripts CMake doesn't have such information, so adjusting linkage path should be performed manually.

How to make Python API using py2exe?

Is it possible to "compile" a Python script with py2exe (or similar) and then allow the user access to modify the top-level Python scripts? Or possibly import the compiled modules into their normal Python scripts? I'm looking for the ability to distribute an easy installer for some customers, but allow other customers to build upon that installed version by creating their own scripts that work with the installed framework modules, like an API.
I have tried to use py2exe to import files that I have placed in the "dist" directory, but it complains that they aren't frozen. Why can't it use a mix of frozen binary modules and interpreted modules?
The reason that I am using py2exe is because I have some troublesome libraries (paramiko/pycrypto, plus some internally developed ones) that I don't want to require my customers to trudge through those installations. I also don't want them to have open access to my framework files. I know that they can reverse-compile the py2exe objects, but they will have to work to modify the framework, which is good enough protection.
I figured out how to get it to work. I placed my "head" framework file in the "includes" list in the setup.py file. Then, I have a compliled runner that uses the imp module to dynamically load regular Python scripts, and those scripts call upon that head framework file. This is exactly the kind of hidden framework, yet reachable API that I was looking for.
For example, let's say we have a directory called "framework" with a master file "foo" that contains all of the API calls. The line in the py2exe setup.py file would look like this:
includes = ['framework.foo', 'some_other_module', 'etc']
I then make a target for this runner script:
FrameworkTarget = Target(
# what to build
script = "run_framework.py",
dest_base = "run_framework"
)
Then add the target to the setup() command in the setup.py script among the other things:
console = [FrameworkTarget],
The compiled runner script is passed the name of the "test suite" script from the command line:
test_suite_name = sys.argv[1]
file_name = test_suite_name + ".py"
path_name = os.path.join(os.getcwd(), file_name)
print "Loading source %s at %s"%(file_name, path_name)
module = imp.load_source(file_name, path_name )
Then, in the file called by the imp.load_source() command, I have this:
import framework.foo
When I didn't have 'framework.foo' in my includes, it couldn't find the compiled version of framework.foo. Maybe someone will find this useful in the future. I don't know if I could do one useful thing without Stackoverflow!
Is it possible to "compile" a Python script with py2exe (or similar)
and then allow the user access to modify the top-level Python scripts?
I'm not particularly familiar with py2exe, but looking at the tutorial page, it would seem relatively straightforward to replace the hello.py script with something along the lines of...
import sys
import os
# Import your framework here, and anything else you want py2exe to embed
import my_framework
TOP_LEVEL_SCRIPT_DIR = '/path/to/scripts'
MAIN_SCRIPT = os.path.join(TOP_LEVEL_SCRIPT_DIR, 'main.py')
sys.path.append(TOP_LEVEL_SCRIPT_DIR)
execfile(MAIN_SCRIPT)
...and put any scripts you want the user to be able to modify in /path/to/scripts, although it'd probably make more sense to define TOP_LEVEL_SCRIPT_DIR as a path relative to the binary.
The reason that I am using py2exe is because I have some troublesome
libraries (paramiko/pycrypto, plus some internally developed ones)
that I don't want to require my customers to trudge through those
installations. I also don't want them to have open access to my
framework files.
If the goal is ease of installation, it might also suffice to create a regular InstallShield-esque installer to put all the files in the right places, and just include the .pyc versions of your "framework files" if you don't want them reading the source code.

Local collection of Python packages: best way to import them?

I need to ship a collection of Python programs that use multiple packages stored in a local Library directory: the goal is to avoid having users install packages before using my programs (the packages are shipped in the Library directory). What is the best way of importing the packages contained in Library?
I tried three methods, but none of them appears perfect: is there a simpler and robust method? or is one of these methods the best one can do?
In the first method, the Library folder is simply added to the library path:
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
import package_from_Library
The Library folder is put at the beginning so that the packages shipped with my programs have priority over the same modules installed by the user (this way I am sure that they have the correct version to work with my programs). This method also works when the Library folder is not in the current directory, which is good. However, this approach has drawbacks. Each and every one of my programs adds a copy of the same path to sys.path, which is a waste. In addition, all programs must contain the same three path-modifying lines, which goes against the Don't Repeat Yourself principle.
An improvement over the above problems consists in trying to add the Library path only once, by doing it in an imported module:
# In module add_Library_path:
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
and then to use, in each of my programs:
import add_Library_path
import package_from_Library
This way, thanks to the caching mechanism of CPython, the module add_Library_path is only run once, and the Library path is added only once to sys.path. However, a drawback of this approach is that import add_Library_path has an invisible side effect, and that the order of the imports matters: this makes the code less legible, and more fragile. Also, this forces my distribution of programs to inlude an add_Library_path.py program that users will not use.
Python modules from Library can also be imported by making it a package (empty __init__.py file stored inside), which allows one to do:
from Library import module_from_Library
However, this breaks for packages in Library, as they might do something like from xlutils.filter import …, which breaks because xlutils is not found in sys.path. So, this method works, but only when including modules in Library, not packages.
All these methods have some drawback.
Is there a better way of shipping programs with a collection of packages (that they use) stored in a local Library directory? or is one of the methods above (method 1?) the best one can do?
PS: In my case, all the packages from Library are pure Python packages, but a more general solution that works for any operating system is best.
PPS: The goal is that the user be able to use my programs without having to install anything (beyond copying the directory I ship them regularly), like in the examples above.
PPPS: More precisely, the goal is to have the flexibility of easily updating both my collection of programs and their associated third-party packages from Library by having my users do a simple copy of a directory containing my programs and the Library folder of "hidden" third-party packages. (I do frequent updates, so I prefer not forcing the users to update their Python distribution too.)
Messing around with sys.path() leads to pain... The modern package template and Distribute contain a vast array of information and were in part set up to solve your problem.
What I would do is to set up setup.py to install all your packages to a specific site-packages location or if you could do it to the system's site-packages. In the former case, the local site-packages would then be added to the PYTHONPATH of the system/user. In the latter case, nothing needs to changes
You could use the batch file to set the python path as well. Or change the python executable to point to a shell script that contains a modified PYTHONPATH and then executes the python interpreter. The latter of course, means that you have to have access to the user's machine, which you do not. However, if your users only run scripts and do not import your own libraries, you could use your own wrapper for scripts:
#!/path/to/my/python
And the /path/to/my/python script would be something like:
#!/bin/sh
PYTHONPATH=/whatever/lib/path:$PYTHONPATH /usr/bin/python $*
I think you should have a look at path import hooks which allow to modify the behaviour of python when searching for modules.
For example you could try to do something like kde's scriptengine does for python plugins[1].
It adds a special token to sys.path(like "<plasmaXXXXXX>" with XXXXXX being a random number just to avoid name collisions) and then when python try to import modules and can't find them in the other paths, it will call your importer which can deal with it.
A simpler alternative is to have a main script used as launcher which simply adds the path to sys.path and execute the target file(so that you can safely avoid putting the sys.path.append(...) line on every file).
Yet an other alternative, that works on python2.6+, would be to install the library under the per-user site-packages directory.
[1] You can find the source code under /usr/share/kde4/apps/plasma_scriptengine_python in a linux installation with kde.

python: how/where to put a simple library installed in a well-known-place on my computer

I need to put a python script somewhere on my computer so that in another file I can use it. How do I do this and where do I put it? And where in the python documentation do I learn how to do this? I'm a beginner + don't use python much.
library file: MyLib.py put in a well-known place
def myfunc():
....
other file SourceFile.py located elsewhere, doesn't need to know where MyLib.py is:
something = MyLib.myfunc()
Option 1:
Put your file at:
<Wherever your Python is>/Lib/site-packages/myfile.py
Add this to your code:
import myfile
Pros: Easy
Cons: Clutters site-packages
Option 2:
Put your file at:
/Lib/site-packages/mypackage/myfile.py
Create an empty text file called:
<Wherever your Python is>/Lib/site-packages/mypackage/__init__.py
Add this to your code:
from mypackage import myfile
Pros: Reduces clutter in site-packages by keeping your stuff consolidated in a single directory
Cons: Slightly more work; still some clutter in site-packages. This isn't bad for stable stuff, but may be regarded as inappropriate for development work, and may be impossible if Python is installed on a shared drive
Option 3
Put your file in any directory you like
Add that directory to the PYTHONPATH environment variable
Proceed as with Option 1 or Option 2, except substitute the directory you just created for <Wherever your Python is>/Lib/site-packages/
Pros: Keeps development code out of the site-packages directory
Cons: slightly more setup
This is the approach I usually use for development work
In general, the Modules section of the Python tutorial is a good introduction for beginners on this topic. It explains how to write your own modules and where to put them, but I'll summarize the answer to your question below:
Your Python installation has a site-packages directory; any python file you put in that directory will be available to any script you write. For example, if you put the file MyLib.py in the site-packages directory, then in your script you can say
import MyLib
something = MyLib.myfunc()
If you're not sure where Python is installed, the Stack Overflow question How do I find the location of my Python site-packages directory will be helpful to you.
Alternatively, you can modify sys.path, which is a list of directories where Python looks for libraries when you use the import statement. Your site-packages directory is already in this list, but you can add (or remove) entries yourself. For example, if you wanted to put your MyLib.py file in /usr/local/pythonModules, you could say
import sys
sys.path.append("/usr/local/pythonModules")
import MyLib
something = MyLib.myfunc()
Finally, you could use the PYTHONPATH environment variable to indicate the directory where your MyLib.py is located.
However, I recommend simply placing your MyLib.py file in the site-packages directory, as described above.
No one has mentioned using .pth files in site-packages to abstract away the location.
You will have to place your MyLib.py somewhere in your load path (this the paths in your sys.path variable) and then you'll be able to import it fine. Your code would look like
import MyLib
MyLib.myfunc()
Generally speaking, you should distribute your packages using distutils so that they can be easily installed in the proper locations. It would help you as well.
Also, you might not want to install packages in your global Python install. It's customary (and recommended) to use virtualenv which you can use to create small isolated Python environments that can hold local packages.
It's best your give the whole thing a shot and then ask further questions if you have them.
The private version, from my .profile
export PYTHONPATH=${PYTHONPATH}:$HOME/lib/python
which has a subdirectory "msw" so import msw.primes is self documenting or add to a local directory that is already in sys.path
The Python tutorial section 6 talks about modules, and 6.1.2 talks about the PYTHONPATH, which determines where Python will look for modules you try to import. The tutorial: http://docs.python.org/tutorial/modules.html

How can I make a Python extension module packaged as an egg loadable without installing it?

I'm in the middle of reworking our build scripts to be based upon the wonderful Waf tool (I did use SCons for ages but its just way too slow).
Anyway, I've hit the following situation and I cannot find a resolution to it:
I have a product that depends on a number of previously built egg files.
I'm trying to package the product using PyInstaller as part of the build process.
I build the dependencies first.
Next I want to run PyInstaller to package the product that depends on the eggs I built. I need PyInstaller to be able to load those egg files as part of it's packaging process.
This sounds easy: you work out what PYTHONPATH should be, construct a copy of sys.environ setting the variable up correctly, and then invoke the PyInstaller script using subprocess.Popen passing the previously configured environment as the env argument.
The problem is that setting PYTHONPATH alone does not seem to be enough if the eggs you are adding are extension modules that are packaged as zipsafe. In this case, it turns out that the embedded libraries are not able to be imported.
If I unzip the eggs (renaming the directories to .egg), I can import them with no further settings but this is not what I want in this case.
I can also get the eggs to import from a subshell by doing the following:
Setting PYTHONPATH to the directory that contains the egg you want to import (not the path of the egg itself)
Loading a python shell and using pkg_resources.require to locate the egg.
Once this has been done, the egg loads as normal. Again, this is not practical because I need to be able to run my python shell in a manner where it is ready to import these eggs from the off.
The dirty option would be to output a wrapper script that took the above actions before calling the real target script but this seems like the wrong thing to do: there must be a better way to do this.
Heh, I think this was my bad. The issue appear to have been that the zipsafe flag in setup.py for the extension package was set to False, which appears to affect your ability to treat it as such at all.
Now that I've set that to True I can import the egg files, simply by adding each one to the PYTHONPATH.
I hope someone else finds this answer useful one day!
Although you have a solution, you could always try "virtualenv" that creates a virtual environment of python where you can install and test Python Packages without messing with the core system python:
http://pypi.python.org/pypi/virtualenv

Categories