setup.py put top level module in subdirectory - python

I have a project that looks like this:
tool.py
utils/
tool2.py
tool3.py
I would like to bundle these into a distribution such that I can call tool or tool2 or tool3 from the command line. But so far I can't figure out how to make setup.py include the scripts under utils/ without creating a new utils python module. Basically I want to do this in setup.cfg without changing my directory layout:
[options.entry_points]
console_scripts =
tool = tool:main
tool2 = tool2:main
tool3 = tool3:main
If I flatten my project directory so that all 3 are in the root directory, it works, but there are potentially many utility tools that I don't want spamming up the root directory. But it seems like if I put them under utils that setup.py wants to create them as submodules under a utils module.
One potential workaround is to just do:
[options]
scripts =
tool.py
utils/tool2.py
utils/tool3.py
but then you have to always type the .py suffix to invoke from the commandline whereas I would prefer to leave it off so it feels more "command-liny".
Is what I'm trying to do possible?

I worked around the issue by doing something like:
[options]
scripts =
dist_scripts/tool
dist_scripts/tool2
dist_scripts/tool3
Where dist_scripts/tool was a relative softlink to tool.py, etc. Seems to work fine.

Related

How to properly structure internal scripts in a Python project?

Consider the following Python project skeleton:
proj/
├── foo
│   └── __init__.py
├── README.md
└── scripts
└── run.py
In this case foo holds the main project files, for example
# foo/__init__.py
class Foo():
def run(self):
print('Running...')
And scripts holds auxiliary scripts that need to import files from foo, which are then invoked via:
[~/proj]$ python scripts/run.py
There are two ways of importing Foo which both fail:
If a relative import is attempted from ..foo import Foo then the error is ValueError: attempted relative import beyond top-level package
If an absolute import is attempted from foo import Foo then the error is ModuleNotFoundError: No module named 'foo'
My current workaround is to append the running path to sys.path:
import sys
sys.path.append('.')
from foo import Foo
Foo().run()
But this feels like a hack, and has to be added to every new script in scripts/.
Is there a better way to structure scripts in such projects?
There's two ways you could resolve this.
(1) Turn your project into an installable package
Add a proj/setup.py file with the following contents:
import setuptools
setuptools.setup(
name="my-project",
version="1.0.0",
author="You",
author_email="you#example.com",
description="This is my project",
packages=["foo"],
)
create a virtualenv:
python3 -m venv virtualenv # this creates a directory "virtualenv" in your project
source ./virtualenv/bin/activate # this switches you into the new environment
python setup.py develop # this places your "foo" package in the environment
inside the virtualenv, foo behaves as an installed package and is importable via import foo.
So you can use absolute imports in your scripts.
To make them run from anywhere, without needing to activate the virtualenv, you can then specify the path as a shebang.
In scripts/run.py (the first line is important):
#!/path/to/proj/virtualenv/bin/python
import foo
print(foo.callfunc())
(2) Make the scripts part of the foo package
Instead of a separate subdirectory scripts, make a subpackage. In proj/foo/commands/run.py:
from .. import callfunc()
def main():
print(callfunc())
if __name__ == "__main__":
main()
Then execute the script from the top-level proj/ directory with:
python -m foo.commands.run
If you combine this with (1) and install your package, you can then run python -m foo.commands.run from anywhere.
Solution
There are multiple ways to achieve this. Both require creating a python package by adding a setup.py (building on #matejcik's answer).
Option 1 (recommended): entry_point + console_scripts register a function in your project as the entry point to script execution (ie: proj:foo:cli:run).
Option 2: scripts: Use this keyword argument in the setup() method to reference the path to your script (ie: `bin/script.py).
Note
I recommend using a CLI library/framework like Click so that your codebase is only concerned with maintaining application specific business logic rather than CLI robust framework feature logic. Also, click recommends using entry_point + console_scripts method of script integration due to cross-platform compatibility.
Setup Tools - Automatic script creation: https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-script-creation
Setup Tools - keyword arguments: https://setuptools.readthedocs.io/en/latest/setuptools.html#new-and-changed-setup-keywords
Click GitHub: https://github.com/pallets/click/
Click Setuptools integration: https://click.palletsprojects.com/en/master/setuptools/
Best practice? Put a single entry-point in the root
I know this might sound absurd, if you have lots of scripts you want to be able to execute... But it's actually the cleanest option and it's the one that is most often used in big Python projects like magage.py in Django, for example. It also doesn't need to be a huge undertaking. Even more importantly, it is always more secure to have a single entry point than several smaller ones.
proj/
├── run.py
├── foo
│ └── __init__.py
├── README.md
└── scripts
└── my_script.py
When run.py lives in the root directory, it can be very lightweight... Basically just a wrapper to call the function you need from my_scripts.py. It just ties everything together so now all of your imports just work.
Just keep in mind that your entrypoint is your root. The parent of a root doesn't exist. So put your entrypoint in the root, and then import packages relative to the root, aka import foo from scripts.
But how do I call multiple scripts!?
If you need to be able to call multiple scripts, this is a good argument for... Well... arguments! Keep run.py as your single entrypoint/command, and leverage subcommands to pass functionality to the script you care about.
Reinventing the wheel?
Generally, frameworks have already done the architecture for you to add your own subcommands, such as Django and, for a smaller footprint, Flask.
You can easily wrap up a small project without that help, though, as I've illustrated.
Security
No one ever wishes their code was less refactorable after a few years of working with it. No one ever wishes their codebase has less security. As we drive to more secure systems in general, it would make sense to create some gatekeeper script that determines what is and isn’t a safe operation and by whom. Moving the code to an LDAP based system, and need to lock things down by group? No problem. You can either change the single file or add LDAP security in your codebase, even creating your own internal API.
With distributed scripts, security options are much less flexible and much harder to maintain, and a single vulnerability could leave you wide open to exploit.
Bonus advantage
You're adding abstraction to your script base. If you ever want to change the structure of your codebase (maybe you want scripts to have subfolders with more organization), you/your users don't need to do any refactoring for any dependencies, or change paths to longer, more verbose names. Your package is self-contained, and the only thing a user will ever need to touch is your proj/run.py entry-point.
And, obviously, you don't need to play with Python paths as much!
You need to add __init__.py files to scripts and to proj folders for those to be considered Python packages and for you to be able to import from those.
One way this is also commonly done, is to place your foo and scripts folders into a proj/src folder, which then has a __init__.py file, and thus is a Python package.
If you like simplicity, and there are no additional restrictions on what you asked, add one __init__.py to the scripts folder, and to any other sibling folders, making them packages, then always use the absolute import form, as you said you do not want proj as a parent package of those and so there is no __init__.py there, and then call your scripts (instead) from inside the proj folder with:
python -m scripts.run
or whatever name you give to other scripts other than run.py
This is similar to option 2 of #matejcik answer, but even simpler.
another solution is you add a.pth file on your Python directory
and write the content of the following,
# your.pth
#↓ input the directory of proj
C:\...\proj
done
# scripts.py
from foo import Foo
Foo().run()
It will work well.
.. note:: If your IDE is PyCharm, then you can use the Source roots to help you too.
Python looks for packages/modules in the directories listed in sys.path. There are several ways of ensuring that your directories of interest, in this case proj, is one of those directories:
Move your scripts to the proj directory. Python adds the directory containing the input script to sys.path.
Put the directory proj into the contents of the PYTHONPATH environment variable.
Make the module part of an installable package and install it, either in a virtual environment or not.
At run time, dynamically add the directory proj to sys.path.
Option 1 is the most logical and requires no source changes. If you are afraid that might break something, you can perhaps make scripts a symbolic link pointing back to proj?
If you are unwilling to do that, then ...
You may consider it a hack, but I would recommend that you do modify your scripts to update sys.path at runtime. But instead append an absolute path so that the scripts can be executed regardless of what the current directory is. In your case, directory proj is the parent directory of directory scripts, where the scripts reside, so:
import sys
import os.path
parent_directory = os.path.split(os.path.dirname(__file__))[0]
if parent_directory not in sys.path:
#sys.path.insert(0, parent_directory) # the first entry is directory of the running script, so maybe insert after that at index 1
sys.append(parent_directory)

Install python repository without parent directory structure

I have a repository I inherited used by a lot of teams, lots of scripts call it, and it seems like its going to be a real headache to make any structural changes to it. I would like to make this repo installable somehow. It is structured like this:
my_repo/
scripts.py
If it was my repository, I would change the structure like so and make it installable, and run python setup.py install:
my_repo/
setup.py
my_repo/
__init__.py
scripts.py
If this is not feasible (and it sounds like it might not be), can I somehow do something like:
my_repo/
setup.py
__init__.py
scripts.py
And add something to setup.py to let it know that the repo is structured funny like this, so that I can install it?
You can do what you suggest.
my_repo/
setup.py
__init__.py
scripts.py
The only thing is you will need to import modules in your package via their name if they are in the base level. So for example if your structure looked like this:
my_repo/
setup.py
__init__.py
scripts.py
lib.py
pkg/
__init__.py
pkgmodule.py
Then your imports in scripts.py might look like
from lib import func1, func2
from pkg.pkgmodule import stuff1, stuff2
So in your base directory imports are essentially by module name not by package. This could screw up some of your other packages namespaces if you're not careful, like if there is another dependency with a package named lib. So it would be best if you have these scripts running in a virtualenv and if you test to ensure namespacing doesn't get messed up
There is a directive in setup.py file to set the name of a package to install and from where it should get it's modules for installation. That would let you use the desired directory structure. For instance with a given directory structure as :
my_repo/
setup.py
__init__.py
scripts.py
You could write a setup.py such as:
setup(
# -- Package structure ----
packages=['my_repo'],
package_dir={'my_repo': '.'})
Thus anyone installing the contents of my_repo with the command "./setup.py install" or "pip install ." would end up with an installed copy of my_repo 's modules.
As a side note; relative imports work differently in python 2 and python 3. In the latter, any relative imports need to explicitly specify the will to do so. This method of installing my_repo will work in python 3 when calling in an absolute import fashion:
from my_repo import scripts

Running a script from a package

I'm new to python coming from java. I created a folder called: 'Project'. In 'Project' I created many packages (with __init__.py files) like 'test1' and 'tests2'. 'test1' contains a python script file .py that uses scripts from 'test2' (import a module from test2). I want to run a script x.py in 'test1' from command line. How can I do that?
Edit: if you have better recommendations on how I can better organize my files I would be thankful. (notice my java mentality)
Edit: I need to run the script from a bash script, so I need to provide full paths.
There are probably several ways to achieve what you want.
One thing that I do when I need to make sure the module paths are correct in an executable scripts is to get the parent directory and insert in the module search path (sys.path):
import sys, os
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
import test1 # next imports go here...
from test2 import something
# any import what works from the parent dir will work here
This way you are safe to run your scripts without worrying how the script is called.
Python code is organized into modules and packages. A module is just a .py file that can contain class definitions, function definitions and variables. A package is a directory with a __init__.py file.
A standard Python project might look something like this:
thingsproject/
README
setup.py
doc/
...
things/
__init__.py
animals.py
vegetables.py
minerals.py
test/
test_animals.py
test_vegetables.py
test_minerals.py
The setup.py file describes the metadata about your project. See Writing the Setup Script and particularly the section on installing scripts.
Entry points exist to help distribute command line tools in Python. An entry point is defined in setup.py like this:
setup(
name='thingsproject',
....
entry_points = {
'console_scripts': ['dog = things.animals:dog_main_function']
},
...
)
The effect is that when the package is installed using python setup.py install a script is automatically created in some reasonable place according to your OS, such as /usr/local/bin. The script then calls the dog_main_function in the animals module of the things package.
Yet another Python convention to consider is have a __main__.py file. This signifies the "main" script within a directory or zip file full of python code. This is a good place to define a command line interface to your code using the argparse parser for command line arguments.
Good and up-to-date information on the somewhat muddled world of Python packaging can be found in the Python Packaging User Guide.

Automatically call common initialization code without creating __init__.py file

I have two directories in my project:
project/
src/
scripts/
"src" contains my polished code, and "scripts" contains one-off Python scripts.
I would like all the scripts to have "../src" added to their sys.path, so that they can access the modules under the "src" tree. One way to do this is to write a scripts/__init__.py file, with the contents:
scripts/__init__.py:
import sys
sys.path.append("../src")
This works, but has the unwanted side-effect of putting all of my scripts in a package called "scripts". Is there some other way to get all my scripts to automatically call the above initialization code?
I could just edit the PYTHONPATH environment variable in my .bashrc, but I want my scripts to work out-of-the-box, without requiring the user to fiddle with PYTHONPATH. Also, I don't like having to make account-wide changes just to accommodate this one project.
Even if you have other plans for distribution, it might be worth putting together a basic setup.py in your src folder. That way, you can run setup.py develop to have distutils put a link to your code onto your default path (meaning any changes you make will be reflected in-place without having to "reinstall", and all modules will "just work," no matter where your scripts are). It'd be a one-time step, but that's still one more step than zero, so it depends on whether that's more trouble than updating .bashrc. If you use pip, the equivalent would be pip install -e /path/to/src.
The more-robust solution--especially if you're going to be mirroring/versioning these scripts on several developers' machines--is to do your development work inside a controlled virtual environment. It turns out virtualenv even has built-in support for making your own bootstrap customizations. It seems like you'd just need an after_install() hook to either tweak sitecustomize, run pip install -e, or add a plain .pth file to site-packages. The custom bootstrap could live in your source control along with the other scripts, and would need to be run once for each developer's setup. You'd also have the normal benefits of using virtualenv (explicit dependency versioning, isolation from system-wide configuration, and standardization between disparate machines, to name a few).
If you really don't want to have any setup steps whatsoever and are willing to only run these scripts from inside the 'project' directory, then you could plop in an __init__.py as such:
project/
src/
some_module.py
scripts/
__init__.py # special "magic"
some_script.py
And these are what your files could look like:
# file: project/src/some_module.py
print("importing %r" % __name__)
def some_function():
print("called some_function() inside %s" % __name__)
--------------------------------------------------------
# file: project/scripts/some_script.py
import some_module
if __name__ == '__main__':
some_module.some_function()
--------------------------------------------------------
# file: project/scripts/__init__.py
import sys
from os.path import dirname, abspath, join
print("doing magic!")
sys.path.insert(0, join(dirname(dirname(abspath(__file__))), 'src'))
Then you'd have to run your scripts like so:
[~/project] $ python -m scripts.some_script
doing magic!
importing 'some_module'
called some_function() inside some_module
Beware! The scripts can only be called like this from inside project/:
[~/otherdir] $ python -m scripts.some_script
ImportError: no module named scripts
To enable that, you're back to editing .bashrc, or using one of the options above. The last option should really be a last resort; as #Simon said, you're really fighting the language at that point.
If you want your scripts to be runnable (I assume from the command line), they have to be on the path somewhere.
Something sounds odd about what you're trying to do though. Can you show us an example of exactly what you're trying to accomplish?
You can add a file called 'pathHack.py' in the project dir and put something like this into it:
import os
import sys
pkgDir = os.path.dirname(__file__)
sys.path.insert(os.path.join(pkgDir, 'scripts')
Then, in a python file in your project dir, start by:
import pathHack
And now you can import stuff from the scripts dir without the 'scripts.' prefix. If you have only one file in this directory, and you don't care about hiding this kind of thing, you may inline this snippet.

Distributing a python application

I have a simple python application where my directory structure is as follows:
project/
main.py
config.py
plugins/
plugin1
plugin2
...
Config.py only loads configuration files, it does not contain any configuration info in itself.
I now want to distribute this program, and I thought I'd use setuptools to do it. The file users are expected to use is main.py, so that one clearly goes into /usr/bin and the rest of the files go into /usr/share/project.
But there's one problem: I would somehow need to tell main.py to look for config.py in the share directory. But I can't really be sure where exactly the share directory is since that's up to setuptools, right?
What's the best practice when distributing Python-based applications?
setuptools install your package in a location which is reachable from python i.e. you can import it:
import project
the problem raise when you do relative imports instead of absolute imports. if your main.py imports config.py it works because they live in the same directory. when you move your main.py to another location like /usr/bin or another location present in PATH environment variable, python try to import config.py from sys.path and not from your package dir. the solution is to use absolute import:
from project import config
now main.py is "movable".
another solution, which i prefer, is using automatic script creation offered by setuptools.
instead of having your code in a
if __name__ == "__main__":
# here all your beautiful code
statement, put your code in a function (main could be a good name):
def main():
# put your code here
if __name__ == "__main__": # not needed, just in case...
main()
now modify your setup.py:
setup(
# ...
entry_points = {
"console_scripts": [
# modify script_name with the name you want use from shell
# $ script_name [params]
"script_name = project.main:main",
],
}
)
that's all. after an install setuptools will create a wrapper script which is callable from shell and that calls your main function. now main.py can live in your project directory and you don't need anymore to move it in a bin/ directory. note that setuptools automatically puts this script in the bin/ directory relative to the installation prefix.
es.
python setup.py install --prefix ~/.local
install your project package in
~/.local/lib/python<version>/site-packages/<package_name>
and your script in
~/.local/bin/<script_name>
so be sure that ~/.local/bin is present in your PATH env.
more info at: http://peak.telecommunity.com/DevCenter/setuptools#automatic-script-creation

Categories