Structuring a Keras project - python

I am developing a Keras environment to train DL models, and I am struggling trying to decide the file structure of the project.
Currently it looks like this:
.
├── build_model.py
├── ctalearn
│   ├── data
│   │   ├── data_loading.py
│   │   ├── data_processing.py
│   │   ├── image_mapping.py
│   │   ├── __init__.py
│   │   └── pixel_pos_files
│   │   ├── FACT_pos.npy
│   │   ├── HESS-II_pos.npy
│   │   ├── HESS-I_pos.npy
│   │   ├── LST_pos.npy
│   │   ├── MSTF_pos.npy
│   │   ├── MSTN_pos.npy
│   │   ├── MSTS_pos.npy
│   │   ├── SST1_pos.npy
│   │   ├── SSTA_pos.npy
│   │   └── SSTC_pos.npy
│   ├── __init__.py
│   ├── predict.py
│   ├── train.py
│   └── utils.py
├── example_model_schematics.yaml
├── example_train_config.yaml
├── models
│   ├── build_cnn_rnn.py
│   └── cnn_rnn.yml
├── setup.py
└── test.py
train.py and predict.py are the main scripts. They take a path to a model architecture stored in .yml format and a configuration file in .yml format as well and run the thing.
data_loading.py is a factory of Sequence objects that are fed to model.fit_generator() / a manager to access the real data files (which are located outside the project structure).
The rest of the datafolder are helper classes for the factory.
The Keras architectures are currently build from a .yaml schematic by the script build_model.py or by a ad hoc script (for example build_cnn_rnn.py) and are stored in .yml format, to be loaded by train.py or predict.py and afterwards compiled and fed a data generator.
Does my structure make sense?
Where should I put build_model.py?
Should I have separate train and predict scripts or mix them together and let the user choose if he want to predict or train via the config.yml file?
Should the model path be specified as a command line argument or as an option inside the config.yaml file?
Should I remove the __init__.py in the inner folder?

Related

moduleNotFoundError: no module named (*)

 I'm trying to run my tests using python -m pytest but I get an error that
ModuleNotFoundError: No module named 'sample'
When using nosetests or anything else it works fine, but when trying to use pytest, it doesn't work.
My tree looks like below, do you have any advice why it doesn't work?
├── LICENSE.txt
├── README.md
├── data
│   └── data_file
├── exported_register.csv
├── pyproject.toml
├── requirements.txt
├── setup.cfg
├── setup.py
├── src
│   └── sample
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-39.pyc
│   │   ├── dziennik.cpython-39.pyc
│   │   ├── przedmiot.cpython-39.pyc
│   │   ├── simple.cpython-39.pyc
│   │   └── uczen.cpython-39.pyc
│   ├── dziennik.py
│   ├── package_data.dat
│   ├── przedmiot.py
│   ├── simple.py
│   └── uczen.py
├── tests
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-39.pyc
│   │   ├── test_ASSERTPY_uczen.cpython-39-pytest-6.2.1.pyc
│   │   ├── test_ASSERTPY_uczen.cpython-39-pytest-6.2.5.pyc
│   │   ├── test_ASSERTPY_uczen.cpython-39.pyc
│   │   ├── test_PYHAMCREST_uczen.cpython-39-pytest-6.2.1.pyc
│   │   ├── test_PYHAMCREST_uczen.cpython-39-pytest-6.2.5.pyc
│   │   ├── test_PYHAMCREST_uczen.cpython-39.pyc
│   │   ├── test_UNITTEST_register.cpython-39-pytest-6.2.1.pyc
│   │   ├── test_UNITTEST_register.cpython-39-pytest-6.2.5.pyc
│   │   ├── test_UNITTEST_register.cpython-39.pyc
│   │   ├── test_UNITTEST_uczen.cpython-39-pytest-6.2.1.pyc
│   │   ├── test_UNITTEST_uczen.cpython-39-pytest-6.2.5.pyc
│   │   ├── test_UNITTEST_uczen.cpython-39.pyc
│   │   ├── test_simple.cpython-39-pytest-6.2.1.pyc
│   │   ├── test_simple.cpython-39-pytest-6.2.5.pyc
│   │   └── test_simple.cpython-39.pyc
│   ├── test_ASSERTPY_uczen.py
│   ├── test_PYHAMCREST_uczen.py
│   ├── test_UNITTEST_register.py
│   ├── test_UNITTEST_uczen.py
│   └── test_simple.py
└── tox.ini
When you run pytest with python -m pytest it uses the current directory as it its working dir, which doesn't contain the sample module (located inside ./src). The way I deal with this is I have a conftest.py inside my tests directory where I add my source dir to python path something like this:
import sys
from pathlib import Path
source_path = Path(__file__).parents[1].joinpath("src").resolve()
sys.path.append(str(source_path))
I've recently started using pytorch and have had similar problems. Couple steps come to mind:
How are you writing the .py file that contains the tests? It may simply be that you need to change up how you import sample within the unit test file. I would expect that you need something like import src.sample.simple. In other words, could be just a pathing issue.
Try a (much) simpler folder structure and try again. If that doesn't work, try to just copy an example of a simple scheme that someone has posted. That is, just get python -m pytest to run somehow, then start slowly adding the complexities of your project.

Generate rst files and directories mirroring the package and module tree

I'm trying to generate documentation for my library. Since the library directory structure is quite big, I want Sphinx to generate the .rst files as a nested directory that mirrors the package and module structure.
The library structure:
pyflocker/
├── __init__.py
├── ciphers/
│   ├── __init__.py
│   ├── backends/
│   │   ├── __init__.py
│   │   ├── _asymmetric.py
│   │   ├── _symmetric.py
│   │   ├── cryptodome_/
│   │   │   ├── AES.py
│   │   │   ├── ChaCha20.py
│   │   │   ├── ECC.py
│   │   │   ├── Hash.py
│   │   │   ├── RSA.py
│   │   │   ├── __init__.py
│   │   │   ├── _serialization.py
│   │   │   └── _symmetric.py
│   │   └── cryptography_/
│   │   ├── AES.py
│   │   ├── Camellia.py
│   │   ├── ChaCha20.py
│   │   ├── DH.py
│   │   ├── ECC.py
│   │   ├── Hash.py
│   │   ├── RSA.py
│   │   ├── __init__.py
│   │   ├── _serialization.py
│   │   └── _symmetric.py
│   ├── base.py
│   ├── exc.py
│   ├── interfaces/
│   │   ├── AES.py
│   │   ├── Camellia.py
│   │   ├── ChaCha20.py
│   │   ├── DH.py
│   │   ├── ECC.py
│   │   ├── Hash.py
│   │   ├── RSA.py
│   │   └── __init__.py
│   └── modes.py
└── locker.py
Till now I was using sphinx-apidoc -e -o ... to generate the documentation within the docs/source/ folder.
But this doesn't work as expected.
Expected Results:
Documentation generated as a nested directory. The files have been removed to keep the backbone only.
docs/source/
└── ciphers/
   └── backends/
      ├── cryptodome_/
      └── cryptography_/
Actual results:
The whole module name is retained.
docs/source/
├── ... # skipping boilerplate files
├── pyflocker.ciphers.backends.cryptodome_.AES.rst
├── pyflocker.ciphers.backends.cryptodome_.ChaCha20.rst
├── pyflocker.ciphers.backends.cryptodome_.ECC.rst
├── pyflocker.ciphers.backends.cryptodome_.Hash.rst
├── pyflocker.ciphers.backends.cryptodome_.RSA.rst
├── pyflocker.ciphers.backends.cryptodome_.rst
├── pyflocker.ciphers.backends.cryptography_.AES.rst
├── pyflocker.ciphers.backends.cryptography_.Camellia.rst
├── pyflocker.ciphers.backends.cryptography_.ChaCha20.rst
├── pyflocker.ciphers.backends.cryptography_.DH.rst
├── pyflocker.ciphers.backends.cryptography_.ECC.rst
├── pyflocker.ciphers.backends.cryptography_.Hash.rst
├── pyflocker.ciphers.backends.cryptography_.RSA.rst
├── pyflocker.ciphers.backends.cryptography_.rst
├── pyflocker.ciphers.backends.rst
├── pyflocker.ciphers.base.rst
├── pyflocker.ciphers.exc.rst
├── pyflocker.ciphers.interfaces.AES.rst
├── pyflocker.ciphers.interfaces.Camellia.rst
├── pyflocker.ciphers.interfaces.ChaCha20.rst
├── pyflocker.ciphers.interfaces.DH.rst
├── pyflocker.ciphers.interfaces.ECC.rst
├── pyflocker.ciphers.interfaces.Hash.rst
├── pyflocker.ciphers.interfaces.RSA.rst
├── pyflocker.ciphers.interfaces.rst
├── pyflocker.ciphers.modes.rst
├── pyflocker.ciphers.rst
├── pyflocker.locker.rst
└── pyflocker.rst
Is there any way to generate the doc as a directory tree?
What you specify isn't currently possible.
sphinx-apidoc will not create directories mirroring your package/file structure.
sphinx-apidoc will not distribute .rst files along several directories mirroring your package/file structure.
Notice the sphinx-apidoc signature, you can specify one input path for modules, and one output path for the .rst files:
Synopsis
sphinx-apidoc [OPTIONS] -o <OUTPUT_PATH> <MODULE_PATH> [EXCLUDE_PATTERN …]
You'll have to write your own script to recurse into your file system and execute sphinx-apidoc once for every package/directory with <MODULE_PATH> mirroring <OUTPUT_PATH>.
This may seem counter-intuitive, however the Python philosophy is:
The Zen of Python - PEP 20
Flat is better than nested.
Arguably it is more convenient to have sphinx-apidoc produce the .rst files with dotted names mirroring the package/module structure, because you get an overview of the packages at a glance and it tends to save clicking.
If you want to organize some .rst files into directories afterwards it is possible to link them, at the time of this writing it is however not possible to generate such a tree automatically using sphinx-apidoc in a single execution.
It is possible to do so using sphinx-nested-apidoc.
It mirrors the original package structure and generates appropriate files.
Note that it does not edit the files or the links within it. It just renames or moves them.

Importing multiple files as a single module?

I have been chasing my tail for the last 4 hours here and can't find the solution.
I have the following module/package structure for my project.
.
├── app-cli.py
├── tools
│   ├── __init__.py
│   ├── adapters
│   │   ├── __init__.py
│   │   ├── cli.py
│   │   ├── web.py
│   ├── utils
│   │   ├── __init__.py
│   │   ├── core.py
│ │   │   ├── my_public_method()
│   │   ├── io.py
│ │   │   ├── some_other_public_method()
What I'm trying to do is bundle everything inside utils within the utils name space.
So when I do import tools at the main level, I can access the util functions as:
tools.utils.my_public_method()
tools.utils.some_other_public_method()
Instead of:
tools.utils.core.my_public_method()
tools.utils.io.some_other_public_method()
I have been editing the __init__.py messing around with the levels of imports, attempting to create a shortcut but with no success.
In your __init__.py inside the utils package you can add
from .core import my_public_method
from .io import some_other_public_method
and then you can do:
import tools.utils
tools.utils.my_public_method()
tools.utils.some_other_public_method()

How to structure object-oriented Python 3 project and its imports?

I have a object-oriented Python 3.7 project with a structure along the following lines:
├── plugins
│   ├── book_management
│   │   ├── book_inserter.py
│   │   ├── book_remover.py
│   │   ├── __init__.py
│   │   ├── book.py
│   │   ├── book_sampler.py
│   │   ├── operators
│   │   │   ├── __init__.py
│   │   │   ├── register_book.py
│   │   │   ├── unregister_book.py
│   │   │   └── mark_book_as_missing.py
│   ├── __init__.py
│   ├── reader_management
│   │   ├── __init__.py
│   │   ├── reader.py
│   │   ├── reader_creator.py
│   │   ├── reader_emailer.py
│   │   ├── reader_remover.py
│   │   ├── operators
│   │   │   ├── __init__.py
│   │   │   ├── create_reader.py
│   │   │   ├── remove_reader.py
│   │   │   └── email_reader.py
├── tests
│   ├── __init__.py
│   ├── book_management_tests
│   │   ├── __init__.py
│   │   ├── test_book.py
│   │   ├── test_book_inserter.py
│   │   ├── test_book_remover.py
│   │   ├── test_book_sampler.py
│   │   ├── test_mark_book_as_missing_operator.py
│   │   ├── test_register_book_operator.py
│   │   ├── test_unregister_book_operator.py
│   ├── reader_management_tests
│   │   ├── __init__.py
│   │   ├── test_reader.py
│   │   ├── test_reader_creator.py
In a test like test_mark_book_as_missing_operator I end up having imports like:
from plugins.book_management.book_inserter import BookInserter
from plugins.book_management.operators.mark_book_as_missing import (
MarkBookAsMissingOperator
)
from plugins.reader_management.reader_creator import ReaderCreator
from plugins.reader_management.operators.create_reader import (
CreateReaderOperator
)
This feels really bad having these really verbose partial imports. So I am guessing I must be doing it wrong. Ideally, importing plugins.reader_management and plugins.reader_management.operators possibly as something shorter would seem much more readable.
book_inserter.py is defining a single class BookInserter. Ideally, I would like to keep this 1-class / 1-file structure. Obvisouly, this leads to an inflation of the number of files but also allows of shorter more focused files. But if this is deeply non-Pythonic I am willing to hear why and how I should adapt the code structure.
Finally I have been using this kind of several layer architectures (plugins/*_management/operators/*.py) but this leads to very long import lines, and I am frequently having legitimate lint issues as a result.
I have been considering importing submodules from top modules (like book_management, in book_management/__init__.py) but I am not sure if it is good practice and also it seems like a violation of the principle of not having unused imports in your files. (Also would I be at risk of circular imports as a result?)
My main question in short: what would be a(the?) Pythonic way to structure such a project and setup the imports (with ideally some justification of why this would be a/the Pythonic way to do it).
It is perfectly fine to use __init__.py to compress your namespace. Use __all__ to clearly define that imported names are meant for export.
# plugins/book_management/__init__.py
from .book_inserter import BookInserter
from .operators.mark_book_as_missing import MarkBookAsMissingOperator
# more imports
__all__ = [
'BookInserter',
'MarkBookAsMissingOperator',
# more exports
]
This reduces the length and number of imports on usage:
# test_mark_book_as_missing_operator
from plugins.book_management import BookInserter, MarkBookAsMissingOperator
from plugins.reader_management import ReaderCreator, CreateReaderOperator
There seems not to be a consensus whether 1-definition-per-file is a bad thing. For the standard library and many third-party modules it is customary to keep all directly related classes and functions together, though.

Not able to run python file which is under django project

I have my project tree like .
├── sizer
│   ├── manage.py
│   ├── node
│   │   ├── __init__.py
│   │   ├── __init__.pyc
│   │   ├── models.py
│   │   ├── models.pyc
│   │   ├── node_serializer.py
│   │   ├── node_serializer.pyc
│   │   ├── part_serializer.py
│   │   ├── part_serializer.pyc
│   │   ├── Part_Serializer.pyc
│   │   ├── test.py
│   │   ├── test.pyc
│   │   ├── tests.py
│   │   ├── tests.pyc
│   │   ├── urls.py
│   │   ├── urls.pyc
│   │   ├── views.py
│   │   └── views.pyc
│   ├── requirement.txt
│   ├── sizer
│   │   ├── __init__.py
│   │   ├── __init__.pyc
│   │   ├── settings.py
│   │   ├── settings.pyc
│   │   ├── urls.py
│   │   ├── urls.pyc
│   │   ├── wsgi.py
│   │   └── wsgi.pyc
│   ├── solver
│   │   ├── attrib.py
│   │   ├── attrib.pyc
│   │   ├── cap.py
│   │   ├── cap.pyc
│   │   ├── __init__.py
│   │   ├── node.py
│   │   ├── node.pyc
│   │   ├── nodes1.json
│   │   ├── nodes2.json
│   │   ├── parts.json
│   ├── strings.py
│   ├── strings.pyc
│   └── workload
│   ├── __init__.py
│   ├── __init__.pyc
│   ├── models.py
│   ├── models.pyc
│   ├── tests.py
│   ├── tests.pyc
│   ├── urls.py
│   ├── urls.pyc
│   ├── views.py
│   └── views.pyc
I have created node and workload app by manage.py startapp command .
In the above directory structure I copied solver .Now I am importing my node.model under sizer.py file like .
import json
from pulp import *
from attrib import *
from cap import *
from node import *
from wl import *
from sizer.node.models import Part,Node
When I run python solver/sizer.py I am keep on getting
ImportError: No module named node.models
Please help me out what might I am doing wrong here .Spent more then 4 hours still not able to figure out .
Thanks
If your app name is node then your import statement should look like:
from node.models import Part, Node
Note that this requires that you already included node in the INSTALLED_APPS in your settings.py.
There are multiple reasons why an import might fail.
The module is not on the path. To check this print sys.path in your script just before you import the module.
The module is broken and cannot be imported. You can check this by opening a Python console in the same directory as the module and attempting to import. Does that work?
Importing the module results in a CIRCULAR import. This means that the import imports another module that imports another module that imports the original module. This is easy enough to avoid with a little thought and a clear hierarchy.
So, which problem do you have? I have no idea because I can't see the sys.path, and I can't see the code in your files.
What I can see is a bit of a mess. You have multiple modules named 'node'. You have manage.py files at multiple levels. You've included the .pyc files in the output instead of editing them out for the reader. You have so many different modules called 'node', 'sizer' or 'solver' that it must be VERY confusing to figure out which one is being imported at any given time.
Your underlying problem might be that you are trying to work on a project without using source control (git) which means you don't know what changes broke things and you don't feel brave about making big changes because you have no way of stepping back in time if they don't work out.

Categories