Python circular import in custom package and __init__.py - python

I get ImportError: cannot import name 'Result' from partially initialized module 'libs.elastic_search_hunt' (most likely due to a circular import) error when I try to run tests.
But I does not see any circular imports in my code.
I have a package, named elastic_search_hunt which contains 3 modules:
elastic_query.py
elastic_query_result.py
search_processor.py
And I also have __init__.py file with following text:
from libs.elastic_search_hunt.elastic_query import Query
from libs.elastic_search_hunt.search_processor import SearchProcessor
from libs.elastic_search_hunt.elastic_query_result import Result
__all__ = ['Query', 'SearchProcessor', 'Result'] # I guess it does not have any effect
elastic_query.py has only external imports.
elastic_query_result.py the same.
search_processor.py has those import:
from . import Query
from . import Result
Then I have a test file, which imports Query class:
from libs.elastic_search_hunt import Query
When I run tests, I get this errors:
test_query.py:2: in <module>
from libs.elastic_search_hunt import Query
..\src\libs\elastic_search_hunt\__init__.py:2: in <module>
from libs.elastic_search_hunt.search_processor import SearchProcessor
..\src\libs\elastic_search_hunt\search_processor.py:4: in <module>
from . import Result
E ImportError: cannot import name 'Result' from partially initialized module 'libs.elastic_search_hunt' (most likely due to a circular import)
But where is any circular import in my code?
I only can assume that when I import Query from tests, it also import search_processor from the __init__.py module which in turn loads Query one more time. But the error is about Result in elastic_query_result module and I see only one import of Result.
When i delete search_processor from __init__.py everything works fine.
I have read a lot of issues about circular imports, but all of them was quite obvious and does not touch the __init__.py. What am I missing?

TL;DR: replace from . import Query with from .elastic_query import Query
Explanation:
When you import something from libs.elastic_search_hunt module it loads __init__.py at first. Since every module executes at first import __init__.py also being executed.
Then Python executes code from __init__.py and at second line
from libs.elastic_search_hunt.search_processor import SearchProcessor
it imports search_processor.py. Since it's first import - file must be executed - therefore all your imports in that file must be executed right now as well:
As you mentioned you have the following imports in your file:
from . import Query
from . import Result
At this point you tell python to load libs.elastic_search_hunt entire module and take Query, Result from it. So Python does.
It makes an attempt to load libs/elastic_search_hunt/__init__.py but wait... it is still not loaded completely. So it must load it, but in order to load it properly it must firstly load search_processor which requires elastic_search_hunt/__init__.py to be loaded.... oh well, there's a loop.
So in order to avoid such behaviour you should explicitly say from which module exactly you wish to load Query and Result, therefore change
from . import Query
from . import Result
to
from .elastic_query import Query
from .elastic_query_result import Result
Example: Failed
Example: Success

Related

In a Python package's __init__.py file, is there a way of detecting if a package was executed directly?

I would like a way to detect if my module was executed directly, as in import module or from module import * rather than by import module.submodule (which also executes module), and have this information accessible in module's __init__.py.
Here is a use case:
In Python, a common idiom is to add import statement in a module's __init__.py file, such as to "flatten" the module's namespace and make its submodules accessible directly. Unfortunately, doing so can make loading a specific submodule very slow, as all other siblings imported in __init__.py will also execute.
For instance:
module/
__init__.py
submodule/
__init__.py
...
sibling/
__init__.py
...
By adding to module/__init__.py:
from .submodule import *
from .sibling import *
It is now possible for users of the module to access definitions in submodules without knowing the details of the package structure (i.e. from module import SomeClass, where SomeClass is defined somewhere in submodule and exposed in its own __init__.py file).
However, if I now run submodule directly (as in import module.submodule, by calling python3 -m module.submodule, or even indirectly via pytest) I will also, unavoidably, execute sibling! If sibling is large, this can slow things down for no reason.
I would instead like to write module/__init__.py something like:
if __???__ == 'module':
from .submodule import *
from .sibling import *
Where __???__ gives me the fully qualified name of the import. Any similar mechanism would also work, although I'm mostly interested in the general case (detecting direct executing) rather than this specific example.
What is being desired is will result in undefined behavior (in the sense whether or not the flattened names be importable from module) when we consider how the import system actually works, if it were actually possible.
Hypothetically, if what you want to achieve is possible, where some __dunder__ that will disambiguate which import statement was used to import module/__init__.py (e.g. import module and from module import *, vs import module.submodule. For the first case, module may trigger the subsequent (slow) import to produce a "flattened" version of the desired imports, while the latter case (import module.submodule) will avoid that and thus module will not contain any assignments of the "flattened" imports.
To illustrate the example a bit more, say one may import SiblingClass from module.sibling.SiblingClass by simply doing from module import SiblingClass as the module/__init__.py file executes from .sibling import * statement to create that binding. But then, if executing import module.submodule resulting in the avoidance of that flatten import, we get the following scenario:
import module.submodule
# module.submodule gets imported
from module import SiblingClass
# ImportError will occur
Why is that? This is simply due to how Python imports a file - the source file is executed in its entirety once to assign imports, function and class declarations to the designated names, and be registered to sys.modules under its import name. Importing the module again will not execute the file again, thus if the from .sibling import * statement was not executed during its initial import (i.e. import module.submodule), it will never be executed again during subsequent import of the same module, as the copy produced by the initial import assigned to its module entry in sys.module is returned (unless the module was reloaded manually, the code for the module will be executed again).
You may verify this fact by putting in a print statement into a file, import the corresponding module to see the output produced, and see that no further output will be produced on subsequent import of that module (related: What happens when a module is imported twice?).
Effectively, the desired functionality as described in the question cannot be implemented in Python.
A related thread on this topic: How to only import sub module without exec __init__.py in the package
This is not a complete solution, but standalone py.test (ignore __init__.py files) proposes setting a global flag to detect when in test. This corrects the problem for tests at least, provided the concerned modules don't call each other.

Can't import classes from local files in Python

Python local import from files stored in at the same level of directory is often confusing to me. I have the following directory structure,
/distrib
__init__.py # empty
bases.py # contains classes MainBase etc.
extension.py # contains classes MainExtension etc
/errors
__init__.py
base_error.py
In file bases.py importing classes from extension.py is successful. For example, simply using from .extensions import MainExtension would work. On the other hand, in extensions.py, importing a class from bases.py faces challenges.
Attempt 1
If I from bases import MainBase, it complains ModuleNotFoundError: No module named 'bases'.
Attempt 2
If I specify it as local import by from .bases import MainBase, it complains ImportError: cannot import name 'MainBase'.
Attempt 3
If I import it using from . import bases, there is no error. But, consequently using bases.MainBase triggers error module distrib.bases has no attribute 'MainBase' and
it seem that all classes defined in the bases.py file are 'missing'.
However, in a different folder such as errors, I can import classes from distrib.bases normally. What exactly is happening here? Doesn't Python allow cyclical import?
You have a circular import. In your base module you try to import from extension. But from extension you import base. But base is not finished importing yet so that results in an ImportError when you try to import anything from the extensions module.

ModuleNotFoundError: struggling to with imports within an imported program

I have a program, program1.py, that has this structure:
Program
--program1.py
--__init__.py
--data\
----__init__.py
----helper_data.py
--classes\
----__init__.py
----helper_class.py
In helper_class.py, there is an import statement from data.helper_data import *. When I run program1, this works perfectly.
I have a second program, program2.py. I have put program1.py on my PYTHONPATH. In program2.py, I use import program1. It finds the program, but when running the imports from program1.py, I get the following error stemming from the classes.helper_class: ModuleNotFoundError: No module named 'data.helper_data'.
I think I vaguely understand what's going on, but I can't figure out the fix or the search terms to find the answer. I've tried changing the import in program1 to from ..data.helper_data import * and get an error saying I've tried a relative import beyond the parent-level package. I've also tried from .data.helper_data import * and get the same ModuleNotFoundError.
What can I do?
I think You have to import "sys" package.
import sys
sys.path.append('E:\ToDataScientist') # this is where the "Program" folder exists
from Program.data.helper_data import aa # "aa" is the class or function in helper_data
from Program.data.helper_data import * # include all from helper_data

Is it possible to make subpackage appear as the actual package

I came around this question and got quite disappointed by how the tensorflow developers try to make the tensorflow directory appear as the actual package, whereas the actual package root is actually tensorflow/python. By using a __init__.py file of the form
from tensorflow.python import *
del python
they try to achieve this goal. This results in some inconsistent behaviour (at least so it seems to me) when working with the package, e.g.
import tensorflow.python # seems to work
tensorflow.python # AttributeError: no attribute 'python'
from tensorflow.python import Session # works as expected
tensorflow.python.Session # AttributeError: no attribute 'python'
from tensorflow import python # works as expected
tensorflow.nn # works as expected
import tensorflow.nn # ImportError: no module 'tensorflow.nn'
tensorflow.nn.tanh # works as expected
from tensorflow.nn import tanh # ImportError: no module 'tensorflow.nn'
Now, I was wondering whether/how it could be possible to avoid most/all of these issues to get a more consistent behaviour. The first set of inconsistencies could be easily resolved by not deleting the python attribute. However, given that the goal would be to make the complete package appear as if it is a sub-package, this might not be entirely satisfactory.
To keep things simple, let's consider the following package structure
package/
__init__.py
api/
__init__.py
code.py
where package/api/code.py looks something like
def a():
return 'alpha'
def b():
return 'bravo'
and package/api/__init__.py would be
import package.api.code
would it be possible to create package/__init__.py so that the following works
import package.api # ImportError: no module 'package.api'
package.api # AttributeError: no attribute 'api'
from package.api import code # ImportError: no module 'package.api'
package.api.code # AttributeError: no attribute 'api'
from package import api # ImportError: cannot import 'api'
package.code # works as expected
import package.code # works as above
package.code.a # works as expected
from package import a # correctly imports function a
I believe that the last four lines of code should give the expected result by adding to sys.modules, but I do not seem to be able to find a way to make import package.api fail.
Would anyone have an idea on how this could be done? Feel free to point me to use-cases that I am overlooking or should consider to achieve the above-mentioned goal.
First of all - I must say that I literally hate "shady" techniques like this. It has a bad effect on various IDE intellisense and makes the library structure less understandable. But still...
If you want the submodule code.py to act as an actual subpackage, you need to create a dummy module:
package/
__init__.py
api/
__init__.py
code.py
code/
__init__.py
Add this in code/__init__py:
from package.api.code import *
And this in package/__init__.py:
from package.code import *
And then this part should work as intended:
import package.code # works as expected
package.code # works as expected
package.code.a # works as expected
from package import a # works as expected
If you further add this to the package/__init__.py:
import package.api
del package.api
You basically disconnect user from accessing package.api, but nothing else, and they can still access the submodule through subpackage using 'from x import y':
import package.api # works
package.api.a() # AttributeError: module 'package' has no attribute 'api'
import package.api.code # works
package.api.code.a() # AttributeError: module 'package' has no attribute 'api'
from package.api import code # works
code.a() # works
from package import api # works
api.code.a() # AttributeError: module 'package.api' has no attribute 'code'
I managed to write something that almost works (in package/__init__.py):
import sys
from package.api import *
for key in sys.modules:
parts = key.split('.')
if len(parts) > 1 and parts.pop(0) == __name__:
subkey = parts.pop(0)
if subkey == 'api' and len(parts) == 0:
sys.modules['.'.join([__name__, subkey])] = None
elif subkey == 'api':
m = sys.modules.pop(key)
sys.modules['.'.join([__name__] + parts)] = m
del api
del sys
The import errors suggest that it is still quite a hack, but apart from that most all of the examples work as specified iff the package has already been loaded once (i.e. if import package or alike has been invoked before running the statements from my question). If the first statement is import package.api, there is thus no ImportError as I would like.
In an attempt to find a solution for this problem, I stumbled upon this answer, which practically leads to the same behaviour with much more elegant code:
import sys
from package import api
# clean up this module
self = sys.modules.pop(__name__)
del self
# this module becomes hidden module
sys.modules[__name__] = api
sys.modules[api.__name__] = None
del api
del sys
However, this still suffers from the problem that if the first import is something like import package.api, no ImportError is thrown.

Python 3 modules and package relative import doesn't work?

I have some difficulties constructing my project structure.
This is my project directory structure :
MusicDownloader/
__init__.py
main.py
util.py
chart/
__init__.py
chart_crawler.py
test/
__init__.py
test_chart_crawler.py
These are codes :
1.main.py
from chart.chart_crawler import MelonChartCrawler
crawler = MelonChartCrawler()
2.test_chart_crawler.py
from ..chart.chart_crawler import MelonChartCrawler
def test_melon_chart_crawler():
crawler = MelonChartCrawler()
3.chart_crawler.py
import sys
sys.path.append("/Users/Chois/Desktop/Programming/Project/WebScrape/MusicDownloader")
from .. import util
class MelonChartCrawler:
def __init__(self):
pass
4.util.py
def hi():
print("hi")
In MusicDownloader, when I execute main.py by python main.py, it shows errors:
File "main.py", line 1, in <module>
from chart.chart_crawler import MelonChartCrawler
File "/Users/Chois/Desktop/Programming/Project/WebScrape/MusicDownloader/chart/chart_crawler.py", line 4, in <module>
from .. import util
ValueError: attempted relative import beyond top-level package
But when I execute my test code in test directory by py.test test_chart_crawler.py, it works
When I first faced with absolute, relative imports, it seems like very easy and intuitive. But it drives me crazy now. Need your helps. Thanks
The first problem is MusicDownloader not being a package. Add __init__.py to MusicDownloader along with main.py and your relative import ..chart should work. Relative imports work only inside packages, so you can't .. to non-package folder.
Editing my post to provide you with more accurate answer to your answer edit.
It's all about the __name__. Relative imports use __name__ of the module they are used in and the from .(.) part to form a full package/module name to import. Explaining in simple terms importer's __name__ is concatenated with from part, with dots showing how many components of name to ignore/remove, i.e.:
__name__='packageA.packageB.moduleA' of the file containing line: from .moduleB import something, leads to combined value for import packageA.packageB.moduleB, so roughly from packageA.packageB.moduleB import something(but not absolute import as it would be if typed like that directly).
__name__='packageA.packageB.moduleA' of the file containing line: from ..moduleC import something, leads to combined value for import packageA.moduleC, so roughly from packageA.moduleC import something(but not absolute import as it would be if typed like that directly).
Here if it's a moduleB(C) or a packageB(C) doesn't really matter. What's important is that we still have that packageA part which works as an 'anchor' for relative import in both cases. If there will be no packageA part, relative import won't be resolved, and we'll get an error like "Attempted relative import beyond toplevel package".
One more note here, when a module is run it gets a special __name__ value of __main__, which obviously prevents it from solving any relative imports.
Now regarding your case try adding print(__name__) as the very first line to every file and run your files in different scenarios and see how the output changes.
Namely if you run your main.py directly, you'll get:
__main__
chart.chart_crawler
Traceback (most recent call last):
File "D:\MusicDownloader\main.py", line 2, in <module>
from chart.chart_crawler import MelonChartCrawler
File "D:\MusicDownloader\chart\chart_crawler.py", line 2, in <module>
from .. import util
ValueError: Attempted relative import beyond toplevel package
What happened here is... main.py has no idea about MusicDownloader being a package (even after previous edit with adding __init__.py). In your chart_crawler.py: __name__='chart.chart_crawler' and when running relative import with from .. the combined value for package will need to remove two parts (one for every dot) as explained above, so the result will become '' as there're just two parts and no enclosing package. This leads to exception.
When you import a module the code inside it is run, so it's almost the same as executing it, but without the __name__ becoming __main__ and the enclosing package, if there's any, being 'noticed'.
So, the solution is to import main.py as part of the MusicDownloader package. To accomplish the described above, create a module, say named launcher.py on the same level of hierarchy as MusicDownloader folder (near it, not inside it near main.py) with the following code:
print(__name__)
from MusicDownloader import main
Now run launcher.py and see the changes. The output:
__main__
MusicDownloader.main
MusicDownloader.chart.chart_crawler
MusicDownloader.util
Here __main__ is the __name__ inside launcher.py. Inside chart_crawler.py: __name__='MusicDownloader.chart.chart_crawler' and when running relative import with from .. the combined value for package will need to remove two parts (one for every dot) as explained above, so the result will become 'MusicDownloader' with import becoming from MusicDownloader import util. And as we see on the next line when util.py is imported successfully it prints its __name__='MusicDownloader.util'.
So that's pretty much it - "it's all about that __name__".
P.S. One thing not mentioned is why the part with test package worked. It wasn't launched in common way, you used some additional module/program to lauch it and it probably imported it in some way, so it worked. To understand this it's best to see how that program works.
There's a note in official docs:
Note that relative imports are based on the name of the current module. Since the name of the main module is always "__main__", modules intended for use as the main module of a Python application must always use absolute imports.

Categories