How to deal with submodule interdependency in Python?

How to deal with submodule interdependency in Python? - python

I have a program with several submodules. I want the submodules to be usable independently, so that I can use some of them in other programs as well. However, the submodules have inter-dependency, requiring aspects of each other to run. What is the least problematic way to deal with this?
Right now, I have structured my program like this:
myapp/
|-- __init__.py
|-- app.py
|-- special_classes
| |-- __init__.py
| `-- tools
| `-- __init__.py
|-- special_functions
| |-- __init__.py
| `-- tools
| `-- __init__.py
`-- tools
|-- __init__.py
|-- classes.py
`-- functions.py
Where each submodule is a a git submodule of its parent.
The advantage of this is that I can manage and develop each submodule independently, and adding one of these submodules to a new project is as simple as git cloneing and git submodule adding it. Because I work in a managed, shared computing environment, this also makes it easy to run the program since user environment management & software versioning and installation are contentious issues.
The disadvantage is that in this example, I now have 3 copies of the tools submodule, which are independent of each other and have to be manually updated everytime there is a change in any of them. Doing any sort of development on the submodules becomes very cumbersome. Similarly, it has now tripled the number of unit tests I run, since tests get run in each submodule and there are 3 copies of the tools module.
I have seen various importing methods, such as those mentioned here but that does not seem like an ideal solution for this.
I have read about how to create a formal Python package here but this seems like a large undertaking and will make it much more difficult for my end users to actually install and run the program.
Another relevant question asked here

Better to have a single tool in the parent and import it to the submodule. that's by far feel best to me.

Related

VSCode Python autocomplete for generated code in separate directory

I use pants to manage a Python project that uses protocol buffers. Pants places the generated _pb2.py and _pb2.pyi files under a separate dist/codegen tree. Is it possible to get VS Code autocomplete to work when using the _pb2 modules?
The file tree looks like this:
.
|-- dist/
| `-- codegen/
| `-- src/
| `-- project/
| |-- data_pb2.py
| `-- data_pb2.pyi
`-- src/
`-- project/
|-- __init__.py
|-- code.py
`-- data.proto
And in code.py I have import statements like this:
from project import data_pb2
I've tried setting python.analysis.extraPaths to ["dist/codegen/src"] in settings.json. This makes pylance stop complaining that data_pb2 is missing. But autocomplete still does not work, and pylance has no type information for members of data_pb2.

Replace your python.analysis.extraPaths with the following extent:
"python.analysis.extraPaths": [
"./dist/codegen/src"
],
And adding the following code to your code.py:
import sys
sys.path.append(".\dist\codegen\src")

You can use Python implicit namespace packages (PEP 420) to make this work. Namespace packages are allowed to have modules within the same package reside in different directories. Which allows pylance and other tools to work correctly when code is split between src and dist/codegen/src.
To use implicit namespace packages, you just need to remove src/package/__init__.py, and leave "python.analysis.extraPaths" set to ["dist/codegen/src"].
See also the GitHub issue microsoft/pylance-release#2855, which describes using implicit namespace packages to make pylance work correctly in a similar situation.

Python multi-project build

I'm in the process of splitting up a monolithic project code base into several smaller projects. I'm having a hard time understanding how to handle dependencies amongst the different projects properly.
The structure looks somewhat like this:
SCM_ROOT
|-- core
| |-- src
| `-- setup.py
|-- project1
| |-- src
| `-- setup.py
|-- project2
| |-- src
| `-- setup.py
`-- project3
|-- src
`-- setup.py
What's the recommended way to handle dependencies between multi-package projects and setup a development environment? I'm using pip, virtualenv and requirements.txt files. Are there any tools that allow me bootstrap my environment from the repository quickly?

Using a build tool like Pybuilder or Pants was unnecessarily complicating the process. I ended up splitting it up into multiple projects in svn - each with it's own trunk/tags/branches directories. Dependencies are handled using a combination of install_requires and requirements.txt file based on information from here and here. Each project has a fabfile to run common tasks like clean, build, upload to pypi etc.

Sharing python modules between applications with git sub modules or svn:externals

In our company we're using subversion. We have different python modules (own and third party) in different versions in use. The various applications we develop have various dependencies regarding the version of the shared modules.
One possibility is using virtualenv installing the modules from a local pypi server. So on every initial checkout we need to create a virtualenv, activate it and install dependent modules from requirements.txt.
Disadvantages:
Relatively complex operation for a simple task like checkout and run
Your able to miss the creation of the virtualenv and your working with the modules installed in site-packages
Need for a local pypi server (ok, your otherwise able to use urls pointing to your vcs)
So we came up with another solution and I ask for your opinion:
In the path of the application we use svn:externals (aka git submodules) to "link" to the specified module (from it's release path and with specified revision number to keep it read only), so the module will be placed locally in the path of the application. A "import mylib" will work as it was installed in python site-packages or in the virtualenv. This could be extended to even put a release of wx, numpy and other often used libraries into our repository and link them locally.
The advantages are:
After the initial checkout your ready to run (really important point for me)
version dependencies are fixed (like requirements.txt)
The actual question is:
Are there projects out there on github/sorceforge using this scheme? Why is everybody using virtualenv instead of this (seemingly) simpler scheme?
I never saw such a solution, so maybe we miss a point?
PS: I posted this already on pypa-dev mailinglist but it seems to be the wrong place for this kind of question. Please excuse this cross post.

In the path of the application we use svn:externals (aka git submodules) to "link" to the specified module (from it's release path and with specified revision number to keep it read only), so the module will be placed locally in the path of the application.
This is a more traditional method for managing package dependencies, and is the simpler of the two options for software which is only used internally. With regards to...
After the initial checkout you're ready to run
...that's not strictly true. If one of your dependencies is a Python library written in C, it will need to be compiled first.
We tried it with git's submodule functionality but it's not possible to get a subpath of a repository (like /source/lib)
This is fairly easy to work around if you check out the whole repository in a location outside your PYTHONPATH, then just symlink to the required files or directories inside your PYTHONPATH, although it does require you to be using a filesystem which support symlinks.
For example, with a layout like...
myproject
|- bin
| |- myprogram.py
|
|- lib
| |- mymodule.py
| |- mypackage
| | |- __init__.py
| |
| |- foopackage -> ../submodules/libfoo/lib/foopackage
| |- barmodule
| |- __init__.py -> ../../submodules/libbar/lib/barmodule.py
|
|- submodules
|- libfoo
| |- bin
| |- lib
| |- foopackage
| |- __init__.py
|
|- libbar
|- bin
|- lib
| barmodule.py
...you need only have my_project/lib in your PYTHONPATH, and everything should import correctly.
Are there projects out there on github/sourceforge using this scheme?
The submodule information is just stored in a file called .gitmodules, and a quick Google for "site:github.com .gitmodules" returns quite a few results.
Why is everybody using virtualenv instead of this (seemingly) simpler scheme?
For packages published on PyPI, and installed with pip, it's arguably easier from a dependency-management point-of-view.
If your software has a relatively simple dependency graph, like...
myproject
|- libfoo
|- libbar
...it's no big deal, but when it becomes more like...
myproject
|- libfoo
| |- libsubfoo
| |- libsubsubfoo
| |- libsubsubsubfoo
| |- libsubsubsubsubfoo
|- libbar
|- libsubbar1
|- libsubbar2
|- libsubbar3
|- libsubbar4
...you may not want to take on the responsibility of working out which versions of all those sub-packages are compatible, should you need to upgrade libbar for whatever reason. You can delegate that responsibility to the maintainer of the libbar package.
In your particular case, the decision as to whether your solution is the right one will depend on the answers to the questions:-
Are all of the external modules you need to use actually available from svn repositories?
Do those repositories use svn:externals correctly to include compatible versions of any dependencies they require, or if not, are you prepared to take on the responsibility of managing those dependencies yourself?
If the answer to both questions is "yes", then your solution is probably right for your case.

Create a python executable using setuptools

I have a small python application that I would like to make into a downloadable / installable executable for UNIX-like systems. I am under the impression that setuptools would be the best way to make this happen but somehow this doesn't seem to be a common task.
My directory structure looks like this:
myappname/
|-- setup.py
|-- myappname/
| |-- __init__.py
| |-- myappname.py
| |-- src/
| |-- __init__.py
| |-- mainclassfile.py
| |-- morepython/
| |-- __init__.py
| |-- extrapython1.py
| |-- extrapython2.py
The file which contains if __name__ == "__main__": is myappname.py. This file has a line at the top, import src.mainclassfile.
When this is downloaded, I would like for a user to be able to do something like:
$ python setup.py build
$ python setup.py install
And then it will be an installed executable which they can invoke from anywhere on the command line with:
$ myappname arg1 arg2
The important parts of my setup.py are like:
from setuptools import setup, find_packages
setup(
name='code2flow',
scripts=['myappname/myappname.py'],
package_dir={'myappname': 'myappname'},
packages=find_packages(),
)
Current state
By running:
$ sudo python setup.py install
And then in a new shell:
$ myapp.py
I am getting a No module named error

The problem here is that your package layout is broken.
It happens to work in-place, at least in 2.x. Why? You're not accessing the package as myappname—but the same directory that is that package's directory is also the top-level script directory, so you end up getting any of its siblings via old-style relative import.
Once you install things, of course, you'll end up with the myappname package installed in your site-packages, and then a copy of myappname.py installed somewhere on your PATH, so relative import can't possibly work.
The right way to do this is to put your top-level scripts outside the package (or, ideally, into a bin directory).
Also, your module and your script shouldn't have the same name. (There are ways you can make that work, but… just don't try it.)
So, for example:
myappname/
|-- setup.py
|-- myscriptname.py
|-- myappname/
| |-- __init__.py
| |-- src/
| |-- __init__.py
| |-- mainclassfile.py
Of course so far, all this makes it do is break in in-place mode the exact same way it breaks when installed. But at least that makes things easier to debug, right?
Anyway, your myscriptname.py then has to use an absolute import:
import myappname.src.mainclassfile
And your setup.py has to find the script in the right place:
scripts=['myscriptname.py'],
Finally, if you need some code from myscriptname.py to be accessible inside the module, as well as in the script, the right thing to do is to refactor it into two files—but if that's too difficult for some reason, you can always write a wrapper script.
See Arranging your file and directory structure and related sections in the Hitchhiker's Guide to Packaging for more details.
Also see PEP 328 for details on absolute vs. relative imports (but keep in mind that when it refers to "up to Python 2.5" it really means "up to 2.7", and "starting in 2.6" means "starting in 3.0".
For a few examples of packages that include scripts that get installed this way via setup.py (and, usually, easy_install and pip), see ipython, bpython, modulegraph, py2app, and of course easy_install and pip themselves.

How to run python test files

I am developing python in eclipse. As a result, python src files and test files are in different directories.
Question is: How do we run on command line specific test files in the test folder? These obviously depend on files in the src folder.
Cheers
Edit: if I run
python test/myTestFile.py
I get dependency errors, eg. ImportError: No module named SrcFile1

You need to make sure your PYTHONPATH is set correctly so the command-line interpreter can find your packages, or run your test cases from within Eclipse Pydev. Update: Another option: running your tests using nose might make things a bit easier, since it can auto-discover packages and test cases.
If your project is laid out like so:
/home/user/dev/
src/pkg1/
mod1.py
test/
mod1_test.py
Use: PYTHONPATH=$HOME/dev/src python test/mod1_test.py. I'd also recommend using distribute and virtualenv to set up your project for development.
Updated in response to question in comments:
This shows how the PYTHONPATH environment variable extends Python's package sear
ch path:
% PYTHONPATH=foo:bar python -c 'import sys; print sys.path[:3]'
['', '/home/user/foo', '/home/user/bar']
# exporting the variable makes it sticky for your current session. you can
# add this to your shell's resource file (e.g. ~/.profile) or source
# it from a textfile to save typing:
% export PYTHONPATH=bar:baz
% python -c 'import sys; print sys.path[:3]'
['', '/home/user/foo', '/home/user/bar']
% python -c 'import sys; print sys.path[:3]'
['', '/home/user/foo', '/home/user/bar']
The above should get you going in the short term. Using distribute and
virtualenv have a higher one-time setup cost but you get longer-term benefits
from using them. When you get a chance, read some of the many tutorials on SO for setting these up to see if they're a good fit for your project.

There are 2 principal solutions to this. Either, you need to use e.g. PYTHONPATH environment variable to tell the tests where the source is, or you need to make tests and production code part of the same module tree by inserting the relevant __init__.py files. In the latter approach, the tree may look something like this:
|-- qbit
| |-- __init__.py
| |-- master.py
| |-- policy.py
| |-- pool.py
| |-- synchronize.py
| `-- worker.py
`-- test
|-- __init__.py
|-- support.py
|-- test_policy.py
|-- test_synchronize.py
`-- test_worker.py
__init__.py can be an empty file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.