Using code in init.py when running python from command line - python

I'm setting up some code for unittesting. My directory currently looks like this:
project/
src/
__init__.py
sources.py
test/
__init__.py
sources_test.py
In __init__.py for the test directory, I have these two lines:
import sys
sys.path.insert(0, '../')
In the test files, I have the line import src.sources.
When I use nose to run these tests from the project directory, everything works just fine. If I try to run the tests individually it gives me this error:
ImportError: No module named src.sources
I assume that this is because when I run the test from the command line it isn't using __init__.py. Is there a way I can make sure that it will use those lines even when I try to run the tests individually?
I could take the lines out of __init__.py and put them into my test files, but I'm trying to avoid doing that.
To run the tests individually I am running python sources_test.py

You're really trying to abuse packages here, and that isn't a good idea.
The simple solution is to not run the tests from within the tests directory. Just cd up a level, then do python tests/sources_test.py.
Of course that in itself isn't going to import test/__init__.py. For that, you really need to import the package. So python -m tests.sources_test is probably a better idea… except, of course, that if your package is made to be run as a script but not to be imported, that won't work.
Alternatively, you could (on POSIX platforms, at least) do PYTHONPATH=.. python sources_test.py from within tests. This is a bit hacky, but it should work.
Or, better, combine the above, and, from outside of tests, do PYTHONPATH=. python tests/sources_test.py.
A really hacky workaround is to explicitly import __init__. This should basically work for you simple use case, but everything ends up wrong—in particular, you end up with a module named __init__ instead of one named test, and of course your main module isn't named test.sources_test, and in fact there is no test package at all. Unless you accidentally re-import anything after modifying sys.path, in which case you may get duplicates of the modules.

If you write
import src.source
the python interpreter looks into the src directory for a __init__.py file. If it exists, you can use the directory as a package name. If your are not in your project directory, which is the case when you are in the src directory, then python looks into the directories in $PYTHONPATH environment variable (at least in linux, windows should also have some environment variable, maybe with another name), if it can find some directory src with a __init__.py file in it.
Did you set your $PYTHONPATH?

Related

How to properly structure internal scripts in a Python project?

Consider the following Python project skeleton:
proj/
├── foo
│   └── __init__.py
├── README.md
└── scripts
└── run.py
In this case foo holds the main project files, for example
# foo/__init__.py
class Foo():
def run(self):
print('Running...')
And scripts holds auxiliary scripts that need to import files from foo, which are then invoked via:
[~/proj]$ python scripts/run.py
There are two ways of importing Foo which both fail:
If a relative import is attempted from ..foo import Foo then the error is ValueError: attempted relative import beyond top-level package
If an absolute import is attempted from foo import Foo then the error is ModuleNotFoundError: No module named 'foo'
My current workaround is to append the running path to sys.path:
import sys
sys.path.append('.')
from foo import Foo
Foo().run()
But this feels like a hack, and has to be added to every new script in scripts/.
Is there a better way to structure scripts in such projects?
There's two ways you could resolve this.
(1) Turn your project into an installable package
Add a proj/setup.py file with the following contents:
import setuptools
setuptools.setup(
name="my-project",
version="1.0.0",
author="You",
author_email="you#example.com",
description="This is my project",
packages=["foo"],
)
create a virtualenv:
python3 -m venv virtualenv # this creates a directory "virtualenv" in your project
source ./virtualenv/bin/activate # this switches you into the new environment
python setup.py develop # this places your "foo" package in the environment
inside the virtualenv, foo behaves as an installed package and is importable via import foo.
So you can use absolute imports in your scripts.
To make them run from anywhere, without needing to activate the virtualenv, you can then specify the path as a shebang.
In scripts/run.py (the first line is important):
#!/path/to/proj/virtualenv/bin/python
import foo
print(foo.callfunc())
(2) Make the scripts part of the foo package
Instead of a separate subdirectory scripts, make a subpackage. In proj/foo/commands/run.py:
from .. import callfunc()
def main():
print(callfunc())
if __name__ == "__main__":
main()
Then execute the script from the top-level proj/ directory with:
python -m foo.commands.run
If you combine this with (1) and install your package, you can then run python -m foo.commands.run from anywhere.
Solution
There are multiple ways to achieve this. Both require creating a python package by adding a setup.py (building on #matejcik's answer).
Option 1 (recommended): entry_point + console_scripts register a function in your project as the entry point to script execution (ie: proj:foo:cli:run).
Option 2: scripts: Use this keyword argument in the setup() method to reference the path to your script (ie: `bin/script.py).
Note
I recommend using a CLI library/framework like Click so that your codebase is only concerned with maintaining application specific business logic rather than CLI robust framework feature logic. Also, click recommends using entry_point + console_scripts method of script integration due to cross-platform compatibility.
Setup Tools - Automatic script creation: https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-script-creation
Setup Tools - keyword arguments: https://setuptools.readthedocs.io/en/latest/setuptools.html#new-and-changed-setup-keywords
Click GitHub: https://github.com/pallets/click/
Click Setuptools integration: https://click.palletsprojects.com/en/master/setuptools/
Best practice? Put a single entry-point in the root
I know this might sound absurd, if you have lots of scripts you want to be able to execute... But it's actually the cleanest option and it's the one that is most often used in big Python projects like magage.py in Django, for example. It also doesn't need to be a huge undertaking. Even more importantly, it is always more secure to have a single entry point than several smaller ones.
proj/
├── run.py
├── foo
│ └── __init__.py
├── README.md
└── scripts
└── my_script.py
When run.py lives in the root directory, it can be very lightweight... Basically just a wrapper to call the function you need from my_scripts.py. It just ties everything together so now all of your imports just work.
Just keep in mind that your entrypoint is your root. The parent of a root doesn't exist. So put your entrypoint in the root, and then import packages relative to the root, aka import foo from scripts.
But how do I call multiple scripts!?
If you need to be able to call multiple scripts, this is a good argument for... Well... arguments! Keep run.py as your single entrypoint/command, and leverage subcommands to pass functionality to the script you care about.
Reinventing the wheel?
Generally, frameworks have already done the architecture for you to add your own subcommands, such as Django and, for a smaller footprint, Flask.
You can easily wrap up a small project without that help, though, as I've illustrated.
Security
No one ever wishes their code was less refactorable after a few years of working with it. No one ever wishes their codebase has less security. As we drive to more secure systems in general, it would make sense to create some gatekeeper script that determines what is and isn’t a safe operation and by whom. Moving the code to an LDAP based system, and need to lock things down by group? No problem. You can either change the single file or add LDAP security in your codebase, even creating your own internal API.
With distributed scripts, security options are much less flexible and much harder to maintain, and a single vulnerability could leave you wide open to exploit.
Bonus advantage
You're adding abstraction to your script base. If you ever want to change the structure of your codebase (maybe you want scripts to have subfolders with more organization), you/your users don't need to do any refactoring for any dependencies, or change paths to longer, more verbose names. Your package is self-contained, and the only thing a user will ever need to touch is your proj/run.py entry-point.
And, obviously, you don't need to play with Python paths as much!
You need to add __init__.py files to scripts and to proj folders for those to be considered Python packages and for you to be able to import from those.
One way this is also commonly done, is to place your foo and scripts folders into a proj/src folder, which then has a __init__.py file, and thus is a Python package.
If you like simplicity, and there are no additional restrictions on what you asked, add one __init__.py to the scripts folder, and to any other sibling folders, making them packages, then always use the absolute import form, as you said you do not want proj as a parent package of those and so there is no __init__.py there, and then call your scripts (instead) from inside the proj folder with:
python -m scripts.run
or whatever name you give to other scripts other than run.py
This is similar to option 2 of #matejcik answer, but even simpler.
another solution is you add a.pth file on your Python directory
and write the content of the following,
# your.pth
#↓ input the directory of proj
C:\...\proj
done
# scripts.py
from foo import Foo
Foo().run()
It will work well.
.. note:: If your IDE is PyCharm, then you can use the Source roots to help you too.
Python looks for packages/modules in the directories listed in sys.path. There are several ways of ensuring that your directories of interest, in this case proj, is one of those directories:
Move your scripts to the proj directory. Python adds the directory containing the input script to sys.path.
Put the directory proj into the contents of the PYTHONPATH environment variable.
Make the module part of an installable package and install it, either in a virtual environment or not.
At run time, dynamically add the directory proj to sys.path.
Option 1 is the most logical and requires no source changes. If you are afraid that might break something, you can perhaps make scripts a symbolic link pointing back to proj?
If you are unwilling to do that, then ...
You may consider it a hack, but I would recommend that you do modify your scripts to update sys.path at runtime. But instead append an absolute path so that the scripts can be executed regardless of what the current directory is. In your case, directory proj is the parent directory of directory scripts, where the scripts reside, so:
import sys
import os.path
parent_directory = os.path.split(os.path.dirname(__file__))[0]
if parent_directory not in sys.path:
#sys.path.insert(0, parent_directory) # the first entry is directory of the running script, so maybe insert after that at index 1
sys.append(parent_directory)

Python finding some, not all custom packages

I have a project with the following file structure:
root/
run.py
bot/
__init__.py
my_discord_bot.py
dice/
__init__.py
dice.py
# dice files
help/
__init__.py
help.py
# help files
parser/
__init__.py
parser.py
# other parser files
The program is run from within the root directory by calling python run.py. run.py imports bot.my_discord_bot and then makes use of a class defined there.
The file bot/my_discord_bot.py has the following import statements:
import dice.dice as d
import help.help as h
import parser.parser as p
On Linux, all three import statements execute correctly. On Windows, the first two seem to execute fine, but then on the third I'm told:
ImportError: No module named 'parser.parser'; 'parser' is not a package
Why does it break on the third import statement, and why does it only break on Windows?
Edit: clarifies how the program is run
Make sure that your parser is not shadowing a built-in or third-party package/module/library.
I am not 100% sure about the specifics of how this name conflict would be resolved, but it seems like you can potentially a). have your module overridden by the existing module (which seems like it might be happening in your Windows case), or b). override the existing module, which could cause bugs down the road. It seems like b is what commonly trips people up.
If you think this might be happening with one of your modules (which seems fairly likely with a name like parser), try renaming your module.
See this very nice article for more details and more common Python "import traps".
Put run.py outside root folder, so you'll have run.py next to root folder, then create __init__.py inside root folder, and change imports to:
import root.parser.parser as p
Or just rename your parser module.
Anyway you should be careful with naming, because you can simply mess your own stuff someday.

Expanding sys.path via __init__.py

There're a lot of threads on importing modules from sibling directories, and majority recommends to either simply add init.py to source tree, or modify sys.path from inside those init files.
Suppose I have following project structure:
project_root/
__init__.py
wrappers/
__init__.py
wrapper1.py
wrapper2.py
samples/
__init__.py
sample1.py
sample2.py
All init.py files contain code which inserts absolute path to project_root/ directory into the sys.path. I get "No module names x", no matter how I'm trying to import wrapperX modules into sampleX. And when I try to print sys.path from sampleX, it appears that it does not contain path to project_root.
So how do I use init.py correctly to set up project environment variables?
Do not run sampleX.py directly, execute as module instead:
# (in project root directory)
python -m samples.sample1
This way you do not need to fiddle with sys.path at all (which is generally discouraged). It also makes it much easier to use the samples/ package as a library later on.
Oh, and init.py is not run because it only gets run/imported (which is more or less the same thing) if you import the samples package, not if you run an individual file as script.

Python: Nosetests with multiple files

This is a broad question because no one seems to have found a solution to it as yet so I think asking to see a working example might prove more useful. So here goes:
Has anyone run a nosetests on a python project using imports of multiple files/packages?
What I mean is, do you have a directory listing such as:
project/
|
|____app/
|___main.py
|___2ndFile.py
|___3rdFile.py
|____tests/
|____main_tests.py
Where your main.py imports multiple files and you perform a nosetests from the project file of utilizing a test script in the main_tests.py file? If so please can you screen shot your import section both of all your main files and your main_tests.py file?
This seems to be a major issue in nosetests, with no apparent solution:
Nosetests Import Error
A test running with nosetests fails with ImportError, but works with python command
https://github.com/nose-devs/nose/issues/978
https://github.com/nose-devs/nose/issues/964
You can't have python modules starting with a digit, so 2ndFile.py, 3rdFile.py won't actually work (rename them).
You'll need an __init__.py inside the app directory, for it to be considered a package, so add that (it can be empty file).
You don't need an __init__.py in the tests directory!
The import statements in main_tests.py should look like from app.main import blah
The absolute path of the project directory needs to be in your sys.path. To achieve this, set an environment variable: export PYTHONPATH=/path/to/project
Now running nosetests should work.

Automatically call common initialization code without creating __init__.py file

I have two directories in my project:
project/
src/
scripts/
"src" contains my polished code, and "scripts" contains one-off Python scripts.
I would like all the scripts to have "../src" added to their sys.path, so that they can access the modules under the "src" tree. One way to do this is to write a scripts/__init__.py file, with the contents:
scripts/__init__.py:
import sys
sys.path.append("../src")
This works, but has the unwanted side-effect of putting all of my scripts in a package called "scripts". Is there some other way to get all my scripts to automatically call the above initialization code?
I could just edit the PYTHONPATH environment variable in my .bashrc, but I want my scripts to work out-of-the-box, without requiring the user to fiddle with PYTHONPATH. Also, I don't like having to make account-wide changes just to accommodate this one project.
Even if you have other plans for distribution, it might be worth putting together a basic setup.py in your src folder. That way, you can run setup.py develop to have distutils put a link to your code onto your default path (meaning any changes you make will be reflected in-place without having to "reinstall", and all modules will "just work," no matter where your scripts are). It'd be a one-time step, but that's still one more step than zero, so it depends on whether that's more trouble than updating .bashrc. If you use pip, the equivalent would be pip install -e /path/to/src.
The more-robust solution--especially if you're going to be mirroring/versioning these scripts on several developers' machines--is to do your development work inside a controlled virtual environment. It turns out virtualenv even has built-in support for making your own bootstrap customizations. It seems like you'd just need an after_install() hook to either tweak sitecustomize, run pip install -e, or add a plain .pth file to site-packages. The custom bootstrap could live in your source control along with the other scripts, and would need to be run once for each developer's setup. You'd also have the normal benefits of using virtualenv (explicit dependency versioning, isolation from system-wide configuration, and standardization between disparate machines, to name a few).
If you really don't want to have any setup steps whatsoever and are willing to only run these scripts from inside the 'project' directory, then you could plop in an __init__.py as such:
project/
src/
some_module.py
scripts/
__init__.py # special "magic"
some_script.py
And these are what your files could look like:
# file: project/src/some_module.py
print("importing %r" % __name__)
def some_function():
print("called some_function() inside %s" % __name__)
--------------------------------------------------------
# file: project/scripts/some_script.py
import some_module
if __name__ == '__main__':
some_module.some_function()
--------------------------------------------------------
# file: project/scripts/__init__.py
import sys
from os.path import dirname, abspath, join
print("doing magic!")
sys.path.insert(0, join(dirname(dirname(abspath(__file__))), 'src'))
Then you'd have to run your scripts like so:
[~/project] $ python -m scripts.some_script
doing magic!
importing 'some_module'
called some_function() inside some_module
Beware! The scripts can only be called like this from inside project/:
[~/otherdir] $ python -m scripts.some_script
ImportError: no module named scripts
To enable that, you're back to editing .bashrc, or using one of the options above. The last option should really be a last resort; as #Simon said, you're really fighting the language at that point.
If you want your scripts to be runnable (I assume from the command line), they have to be on the path somewhere.
Something sounds odd about what you're trying to do though. Can you show us an example of exactly what you're trying to accomplish?
You can add a file called 'pathHack.py' in the project dir and put something like this into it:
import os
import sys
pkgDir = os.path.dirname(__file__)
sys.path.insert(os.path.join(pkgDir, 'scripts')
Then, in a python file in your project dir, start by:
import pathHack
And now you can import stuff from the scripts dir without the 'scripts.' prefix. If you have only one file in this directory, and you don't care about hiding this kind of thing, you may inline this snippet.

Categories