Python Project with several modules dependences at several levels for develop - python

I have a project, this project has several modules, let's pick this as an example:
Utility: all modules that use this module have shared functions
Read: This will read files with some configurations provided in the init function, can be called needed.
Process Data 1 (PD1): This will work with some data read with the "Read" module.
Process Data 2: This will work with some data read with the "Read" module and will use Process Data 2.
Manage: This will register any process data and handle them.
First, Utility will be used for all the modules, so, I can't just copy the module folder inside of any module that wants to use it. The dependency of PD1, PD2, and Manage causes that the module Manage needs to can load or access to the PD1, but PD2 needs PD1 too. There is the Read module too, with similar problems.
But, where is the problem?, Usually, if we add a new module, we put them in folders, especially if have a lot of files to don't mix up things, but this doesn't work fine here, excluding symbolic links, there is no way by only sorting the folders to have the right access to any section, well, there are some ways, but here is the point:
If we sort it all now, add a new module that uses the right modules, can break the actual sort.
I try the next thing, I put all the modules folders in a folder, and then add the ".." path, so now I can call any module I need for any existent or new module, but this has several problems... the import function start doing weird things, workaround, first import all external modules, then add the new ones..., but is a bit..., too much messy.
There is a bigger solution, I can handle all the modules in an independent way, install all of them in the system, so now all the modules are exposed to the others, and all works properly, and that is how should works, conceptually and in python terms is right. The major issue here is that installing all in this way is, developing time-consuming. I will test the "system" if I don't have installed everything... but install it every time for every single test.
Another thing to consider is that, if I install all the modules in the system, and I work with too many of them..., there can be a conflict with other modules names.., so, I need a way to can "encapsulate" all these modules with 1 module, and only can expose all of them, "inside" of this module.
Summary, I need a "Super module" that will be exposed to the system, and inside the super module, it needs to contain, Utility, Read, PD1, PD2, and Manage, where all of these modules needs to can call every one of them, without including any new "path" to python variables in any way.

Related

List all files/modules imported but nothing else in Python?

I'm working with some code that's been handed off to me by someone else. I need to reuse some parts of the project, but not all. I'm basically adding a Flask server that hosts a single function from all the project.
I'm packaging it into a docker image, and I want to keep it as lightweight as possible. For this, I'm planning to only include the files that are actually use/referenced from my entry point of a Flask server written in the "server.py" file.
Is there a way to get a list of files that are referenced/imported via my entry point into the code? I'm thinking something along the lines of pipreqs, except I'm not looking for pip modules, but rather files in my folder.
For example, my "server.py" imports a local module called "feature_extraction" that then imports another local module called "preprocessing" and so on. BUT starting from my server, the module "train_model" is never imported. I'd like to keep only the files that are necessary (~"feature_extraction" and "preprocessing"), and not use any of the other files in the project (~"train_model").
Is there a module that supports this? If not, how would I go about it programmatically? Doing this by hand would take a long time.

import all functions in a package to one module of the same package

I have a package animals with several thousand modules dog_1.py, cat_3.py, etc. Each module contains functions color(), size(), etc, and these functions often depend on other functions in the package outside the module. In other words, the bark() function in dog_2.py might depend on 5 other functions from different modules within the same animals package. At the bottom of each module, I have if __name__ == '__main__': because I need to be able to run each module as a stand-alone script but I don't want the module to execute if it is imported into another module. I could accomplish what I need by adding a bunch of import statements to each module to satsify the dependencies, but I thought there must be a better way to do this.
How is this type of problem typically handled?
Essentially, I need to be able to run each module separately as a script, and each module has (possibly) hundreds of dependencies within the same package.
I tried adding to the __init__.py a statement: __all__ = ['dog_1.py', 'cat_3.py', ...]. This seems like a good idea. But the problem I run into is that when I include from animals import * at the top of a given module, the import of that current module causes an error. I thought it was possible to import the module to itself, but for some reason, it is not working. If I remove the current module from the __all__ list, then it seems to work fine.
I also tried installing the package locally by creating an outer directory and adding .cfg and setup.py files. I thought it might fix the problem, but I wasn't able to get anywhere.
I feel like I might be just going about this completely wrongly. It seems like it would be an easy problem to handle: Make a bunch of program files in one directory and have them all share the functions between each other. Any help is appreciated. Thank you.

Can a Package containing several sub-packages be also called as a Library in Python?

I am a little bit confused in the difference between a package and a library. When I install packages from pypi.org, these packages contain several sub-packages, that contain modules. When I googled the difference between a package and I library, I found this.
And that being the case, can a package contain several sub-packages be also called as a library? If no then what is a library? And what is the difference between a library and a package containing sub-packages?
Library
Most often will refer to the general library or another collection created with a similar format and use. The General Library is the sum of 'standard', popular and widely used Modules, witch can be thought of as single file tools, for now or short cuts making things possible or faster. The general library is an option most people enable when installing Python. Because it has this name "Python General Library" it is used often with similar structure, and ideas. Witch is simply to have a bunch of Modules, maybe even packages grouped together, usually in a list. The list is usually to download them. Generally it is just related files, with similar interests. That is the easiest way to describe it.
Module
A Module refers to a file. The file has script 'in it' and the name of the file is the name of the module, Python files end with .py. All the file contains is code that ran together makes something happen, by using functions, strings ect. Main modules you probably see most often are popular because they are special modules that can get info from other files/modules. It is confusing because the name of the file and module are equal and just drop the .py. Really it's just code you can use as a shortcut written by somebody to make something easier or possible.
Package
This is a termis used to generally sometimes, although context makes a difference. The most common use from my experience is multiple modules (or files) that are grouped together. Why they are grouped together can be for a few reasons, that is when context matters. These are ways I have noticed the term package(s) used. They are a group of Downloaded, created and/or stored modules. Which can all be true, or only 1, but really it is just a file that references other files, that need to be in the correct structure or format, and that entire sum is the package itself, installed or may have been included in the python general library. A package can contain modules(.py files) because they depend on each other and sometimes may not work correctly, or at all. There is always a common goal of every part (module/file) of a package, and the total sum of all of the parts is the package itself.
Most often in Python Packages are Modules, because the package name is the name of the module that is used to connect all the pieces. So you can input a package because it is a module, also allows it to call upon other modules, that are not packages because they only perform a certain function, or task don't involve other files. Packages have a goal, and each module works together to achieve that final goal.
Most confusion come from a simple file file name or prefix to a file, used as the module name then again the package name.
Remember Modules and Packages can be installed. Library is usually a generic term for listing, or formatting a group of modules and packages. Much like Pythons general library. A hierarchy would not work, APIs do not belong really, and if you did they could be anywhere and every ware involving Script, Module, and Packages, the worl library being such a general word, easily applied to many things, also makes API able to sit above or below that. Some Modules can be based off of other code, and that is the only time I think it would relate to a pure Python related discussion.

python 3 import from subdir

My project has to be extensible, i have a lot of scripts with the same interface that lookup things online. Before i was using __import__ but that does not let me put my 'plugins' on a dedicated directory:
root/
main.py
plugins/
[...]
So my question is: Is there a way to individually import modules from that subdirectory? I'm guessing importlib, but i'm so lost in how the python module loading process works... What i want to do is something like this:
for pluginname in plugins:
plugin = somekindofimport("plugins/{name}".format(name=pluginname))
plugin.unififedinterface()
Also, as a side question, the way am i trying to achieve extensibility is a good way?
I'm on python3.3
Stop thinking in terms of pathnames and start thinking in terms of packages. Read Packages in the tutorial, and if you want more detail see The import system.
But the basic idea is this:
Create a file name plugins/__init__.py. It can be empty; that's enough to turn plugins into a package. Which means you can import modules from that package with:
import plugins.plugin
So, how do you do this dynamically? That's what importlib is for. (You can also use __import__ here, but it's less flexible, and less readable in non-trivial cases, so unless you need pre-3.3 compatibility, don't.)
plugin = importlib.import_module('plugins.{name}'.format(name=pluginname))
It would probably be cleaner to import plugins to get the package, and then use relative imports from within that package, as shown in the examples in the import_module docs.
This also means Python takes care of the .pyc creation and caching, etc.
And it means that you can later expand plugins to be a "namespace package", which can be split across multiple directories like /usr/share/myapp/plugins for stock plugins, /etc/myapp/plugins for site plugins and ~/myapp/plugins for user-specific plugins.
If you really, really want to import from a directory that isn't a package, you can create a module loader and use it, but that's a whole lot of work for no actual benefit. (It's actually not that hard in 3.3 (SourceLoader and friends will do most of the work for you), but you will find almost no examples out there to guide you; instead, you'll find examples of the 2.6-3.2 way, or the 2.0-2.5 way, both of which are hard.) Plus, it means that if someone creates a plugin named, say, gzip, you can end up blocking the stdlib gzip module with the plugin. (That's especially fun if the gzip plugin tries to use the gzip stdlib module, as it likely will…) If the plugin ends up being named plugins.gzip, there's no problem.
Also, as a side question, the way am i trying to achieve extensibility is a good way?
As long as you only want to support 3.3+, yes, I think this is a great solution.
Before 3.3, using a package for plugins was a lot more problematic. People have come up with a variety of different plugin systems—in one case going so far as to dynamically create module objects and execfile into them. If you need to deal with that, I would suggest looking at existing Python apps with plugins (e.g., MusicBrainz Picard) to get different ideas.

Local collection of Python packages: best way to import them?

I need to ship a collection of Python programs that use multiple packages stored in a local Library directory: the goal is to avoid having users install packages before using my programs (the packages are shipped in the Library directory). What is the best way of importing the packages contained in Library?
I tried three methods, but none of them appears perfect: is there a simpler and robust method? or is one of these methods the best one can do?
In the first method, the Library folder is simply added to the library path:
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
import package_from_Library
The Library folder is put at the beginning so that the packages shipped with my programs have priority over the same modules installed by the user (this way I am sure that they have the correct version to work with my programs). This method also works when the Library folder is not in the current directory, which is good. However, this approach has drawbacks. Each and every one of my programs adds a copy of the same path to sys.path, which is a waste. In addition, all programs must contain the same three path-modifying lines, which goes against the Don't Repeat Yourself principle.
An improvement over the above problems consists in trying to add the Library path only once, by doing it in an imported module:
# In module add_Library_path:
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
and then to use, in each of my programs:
import add_Library_path
import package_from_Library
This way, thanks to the caching mechanism of CPython, the module add_Library_path is only run once, and the Library path is added only once to sys.path. However, a drawback of this approach is that import add_Library_path has an invisible side effect, and that the order of the imports matters: this makes the code less legible, and more fragile. Also, this forces my distribution of programs to inlude an add_Library_path.py program that users will not use.
Python modules from Library can also be imported by making it a package (empty __init__.py file stored inside), which allows one to do:
from Library import module_from_Library
However, this breaks for packages in Library, as they might do something like from xlutils.filter import …, which breaks because xlutils is not found in sys.path. So, this method works, but only when including modules in Library, not packages.
All these methods have some drawback.
Is there a better way of shipping programs with a collection of packages (that they use) stored in a local Library directory? or is one of the methods above (method 1?) the best one can do?
PS: In my case, all the packages from Library are pure Python packages, but a more general solution that works for any operating system is best.
PPS: The goal is that the user be able to use my programs without having to install anything (beyond copying the directory I ship them regularly), like in the examples above.
PPPS: More precisely, the goal is to have the flexibility of easily updating both my collection of programs and their associated third-party packages from Library by having my users do a simple copy of a directory containing my programs and the Library folder of "hidden" third-party packages. (I do frequent updates, so I prefer not forcing the users to update their Python distribution too.)
Messing around with sys.path() leads to pain... The modern package template and Distribute contain a vast array of information and were in part set up to solve your problem.
What I would do is to set up setup.py to install all your packages to a specific site-packages location or if you could do it to the system's site-packages. In the former case, the local site-packages would then be added to the PYTHONPATH of the system/user. In the latter case, nothing needs to changes
You could use the batch file to set the python path as well. Or change the python executable to point to a shell script that contains a modified PYTHONPATH and then executes the python interpreter. The latter of course, means that you have to have access to the user's machine, which you do not. However, if your users only run scripts and do not import your own libraries, you could use your own wrapper for scripts:
#!/path/to/my/python
And the /path/to/my/python script would be something like:
#!/bin/sh
PYTHONPATH=/whatever/lib/path:$PYTHONPATH /usr/bin/python $*
I think you should have a look at path import hooks which allow to modify the behaviour of python when searching for modules.
For example you could try to do something like kde's scriptengine does for python plugins[1].
It adds a special token to sys.path(like "<plasmaXXXXXX>" with XXXXXX being a random number just to avoid name collisions) and then when python try to import modules and can't find them in the other paths, it will call your importer which can deal with it.
A simpler alternative is to have a main script used as launcher which simply adds the path to sys.path and execute the target file(so that you can safely avoid putting the sys.path.append(...) line on every file).
Yet an other alternative, that works on python2.6+, would be to install the library under the per-user site-packages directory.
[1] You can find the source code under /usr/share/kde4/apps/plasma_scriptengine_python in a linux installation with kde.

Categories