Distinguish between imported and globally defined module attributes - python

How to distinguish between attributes defined at global level and those imported from other modules programatically? For instance, I want to know which module HIGHEST_PROTOCOL and MY_HIGHEST_PROTOCOL defined in mymod.py belong to.
Contents of mymod.py:
from pickle import HIGHEST_PROTOCOL
MY_HIGHEST_PROTOCOL = 123
Inspecting in IPython.
In [2]: import mymod
In [3]: dir(mymod)
Out[3]:
['HIGHEST_PROTOCOL',
'MY_HIGHEST_PROTOCOL',
'__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__']

Related

python importlib.import_module requires explicitly imported module

i want to implement sort of plugin architecture that dynamically loads modules and calls a function from them
for instance plugin code looks like (in file "foo_func.py")
foo_local = []
def foo_add(a: int, b: int) -> int:
c = a + b
foo_local.append(c)
return c
def foo_print():
print(foo_local)
i need to support two plugins with the same code but with different memory state, so i created directory structure like this:
<ROOT_PROJECT>
app.py
bar/
apple/
foo/
foo_func.py
__init__.py
orange/
foo/
foo_func.py
__init__.py
code in "apple" and "orange" folders is the same.
then in app file i try to load modules and invoke functions from them
import importlib
from bar.apple.foo.foo_func import foo_add as apple_foo_add, foo_print as apple_foo_print
from bar.orange.foo.foo_func import foo_add as orange_foo_add, foo_print as orange_foo_print
apple = importlib.import_module('bar.apple.foo')
orange = importlib.import_module('bar.orange.foo')
apple_foo = getattr(apple, 'foo_func')
orange_foo = getattr(orange, 'foo_func')
apple_foo_add_my = getattr(apple_foo, 'foo_add')
apple_foo_print_my = getattr(apple_foo, 'foo_print')
apple_foo_add_my(1, 2)
apple_foo_print_my()
and this works fine, but you see these import lines at the top
from bar.apple.foo.foo_func import foo_add as apple_foo_add, foo_print as apple_foo_print
from bar.orange.foo.foo_func import foo_add as orange_foo_add, foo_print as orange_foo_print
they are not used in code (even pycharm complains about it)
but if i try to comment code and run it - then failure
AttributeError: module 'bar.apple.foo' has no attribute 'foo_func'
why ?
I suppose normal plugins should deal only with "importlib.import_module" and "getattr" and it must be enough ?
what is wrong here ?
Let's switch completely to direct imports for this explanation, because:
import something.whatever as name
is the same as:
name = importlib.import_module("something.whatever")
So let's rewrite your apple code:
apple = importlib.import_module('bar.apple.foo')
apple_foo = getattr(apple, 'foo_func')
will become:
import bar.apple.foo as apple
apple_foo = apple.foo_func
Now, the first line loads bar.apple.foo as a module. In case of packages, this means importing package's __init__.py code. And treating it as a module itself.
And what's the code in the package's init? Usually nothing! That's why the name lookup fails.
However, when you do any import my_package.whatever, the package gets its insides checked and the name becomes visible. You're basically pre-loading the module for interpreter to look at.
Why is pycharm giving you not used suggestion? Because it's not used as a variable anywhere. You're only using a side-effect + pycharm doesn't analyze strings for imports or attributes.
Visual example, with a part of standard library:
>>> import xml
>>> dir(xml)
['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__']
>>> xml.etree
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'xml' has no attribute 'etree'
>>>
>>> import xml.etree
>>> dir(xml)
['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'etree']
And another example, what happens if there are multiple modules in the package:
>>> import dateutil
>>> dir(dateutil)
['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_version']
but:
>>> import dateutil.parser
>>> dir(dateutil)
['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_common', '_version', 'parser', 'relativedelta', 'tz']
All sub-modules are now visible and usable with their qualified name.
tl;dr: import my_package == only my_package/__init__.py is looked at. import my_package.whatever == python now knows it's a package and registers its insides, all modules of my_package are visible and usable.

Import not Working

I have two files say a.py and b.py.
in a.py ,
we do
import xxx
from b import *
in b.py we have a function which requires module xxx.
Now when the function in b.py is called from a.py it cant find the module xxx.
Why is that and what can be the solution here?
i cant do import xxx in b.py for some reason.
MCV:
a.py
import xxx
from b import *
fun()
b.py
def fun():
xxx.dosomething()
Error:
Global name xxx not defined
In python all modules have their own global namespaces, and A namespace containing all the built-in names is created, and module don't share it with other only built in Namespace are common and available for all modules, when you import a module it added into module global namespace, not into built namespace
The import statement does two things:
one, if the requested module does not yet exist, executes the code in the imported file
two makes it available as a module. Subsequent import statements will skip the first step.
and the Main point is that the code in a module will be executed exactly once, no matter how many times it is imported from various other modules.
SOURCE
Question:
a.py:
import numpy
print("a.py is imported")
b.py:
import a
numpy.zeros(8)
Result (python3 b.py):
a.py is imported
Traceback (most recent call last):
File "b.py", line 3, in <module>
numpy.zeros(8)
NameError: name 'numpy' is not defined
Answer:
I guess this is better for writing a library. Let's say a.py is part of the library, b is the user's program that uses the library, and I wrote the library. If everything I imported (import numpy) in a.py shows up in b.py, the API of my library won't be that clean because I can't hide the numpy library from the users of my library. I guess that is the reason that libraries imported in a.py is hidden from b.py if b.py imports a.py.
Here is a set of two files that attempt to simulate your issue. Version 1 is what you describe and Version 2 is what works.
VERSION 1 (OP issue)
file 'a.py':
print("a.py: entered a.py")
import math
print("a.py: imported math")
print("a.py: 1st dir()={}".format(dir()))
from b import *
print("a.py: imported * from b")
print("a.py: 2nd dir()={}".format(dir()))
def angle(x, y):
return math.acos(x/mysq(x*x+y*y))
print("a.py: angle has been defined")
print("a.py: 3rd dir()={}".format(dir()))
import b
print("a.py: dir(b)={}".format(dir(b)))
file 'b.py':
print("b.py: entered b.py")
print("b.py: 1st dir():{}".format(dir()))
def mysq(x):
return math.sqrt(x)
print("b.py: mysq has been defined")
print("b.py: 2nd dir():{}".format(dir()))
print("b.py: leaving b.py...")
Then
>>> import a
a.py: entered a.py
a.py: imported math
a.py: 1st dir()=['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'math']
b.py: entered b.py
b.py: 1st dir():['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__']
b.py: mysq has been defined
b.py: 2nd dir():['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'mysq'] # <-- NOTICE that module 'b' is still hasn't
# loaded 'math' before leaving it!!!
b.py: leaving b.py...
a.py: imported * from b
a.py: 2nd dir()=['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'math', 'mysq']
a.py: angle has been defined
a.py: 3rd dir()=['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'angle', 'math', 'mysq']
a.py: dir(b)=['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'mysq'] # <-- NOTICE that module 'b' is still not aware of 'math'!!!
>>> a.angle(7,8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/.../a.py", line 9, in angle
return math.acos(x/mysq(x*x+y*y))
File "/Users/.../b.py", line 4, in mysq
return math.sqrt(x)
NameError: name 'math' is not defined
VERSION 2 (working)
Put import math in b.py and remove it from a.py:
file 'a.py':
from b import *
print("a.py: imported * from b")
print("a.py: 1st dir()={}".format(dir()))
def angle(x, y):
return math.acos(x/mysq(x*x+y*y))
print("a.py: angle has been defined")
print("a.py: 2nd dir()={}".format(dir()))
file 'b.py':
print("b.py: entered b.py")
import math
print("b.py: loaded math")
print("b.py: 1st dir():{}".format(dir()))
def mysq(x):
return math.sqrt(x)
print("b.py: mysq has been defined")
print("b.py: 2nd dir():{}".format(dir()))
print("b.py: leaving b.py...")
Then
>>> import a
b.py: entered b.py
b.py: loaded math
b.py: 1st dir():['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__', 'math']
b.py: mysq has been defined
b.py: 2nd dir():['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'math', 'mysq']
b.py: leaving b.py...
a.py: imported * from b
a.py: 1st dir()=['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'math', 'mysq'] # <-- NOTICE 'math' in a.py!!!
a.py: angle has been defined
a.py: 2nd dir()=['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'angle', 'math', 'mysq']
>>> a.angle(7,8)
0.8519663271732721
I can't explain (formulate) exactly the machinery behind this behavior but it seems reasonable to me: How is mysq() in b.py is supposed to know about math? The output from numerous print statements indicate that in Version 1 (OP question) importing from b results in importing into a.py's namespace everything that was defined/imported in b.py. The entire b.py is executed once at the time of the import into a.py. However, b itself never "knows" anything about math.
In Version 2 everything works as expected because math is imported into b which is executed immediately at the time of its import into a and imports everything from b (including math) into a.
Now, let's do some more experimentation... Let's break version 2:
VERSION 2b (broken)
In this version we modify a.py as follows (b.py stays the same as in Version 2):
file 'a.py':
import b # <-- We do not import 'math' from b into a!
# Is it still "loaded" somehow into 'a'?
def angle(x, y):
return math.acos(x/b.mysq(x*x+y*y))
Importing "just" b itself (as opposite to importing everything from b) does not import math into a:
>>> a.angle(7,8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/.../a.py", line 10, in angle
return math.acos(x/b.mysq(x*x+y*y))
NameError: name 'math' is not defined
VERSION 1b (fixed)
Finally, let's fix Version 1 by importing everything from a into b as well as continuing to import everything from b into a:
file 'a.py':
print("a.py: entered a.py")
import math
print("a.py: imported math")
print("a.py: 1st dir()={}".format(dir()))
from b import *
print("a.py: imported * from b")
print("a.py: 2nd dir()={}".format(dir()))
def angle(x, y):
return math.acos(x/mysq(x*x+y*y))
print("a.py: angle has been defined")
print("a.py: 3rd dir()={}".format(dir()))
import b # extra check of b
print("a.py: dir(b)={}".format(dir(b)))
file 'b.py':
print("b.py: entered b.py")
print("b.py: 1st dir():{}".format(dir()))
from a import *
print("b.py: imported * from a")
print("b.py: 2nd dir():{}".format(dir()))
def mysq(x):
return math.sqrt(x)
print("b.py: mysq has been defined")
print("b.py: 3rd dir():{}".format(dir()))
print("b.py: leaving b.py...")
Then
>>> import a
a.py: entered a.py
a.py: imported math
a.py: 1st dir()=['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'math'] # 'math' is loaded first into 'a'
b.py: entered b.py
b.py: 1st dir():['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__'
] # 'b' doesn't "know" yet about 'math'
b.py: imported * from a
b.py: 2nd dir():['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'math'] # after importing *(!!!) from 'a' into 'b', 'b' now has 'math'
b.py: mysq has been defined
b.py: 3rd dir():['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'math', 'mysq']
b.py: leaving b.py...
a.py: imported * from b
a.py: 2nd dir()=['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'math', 'mysq'] # NOTICE: math is not imported twice into 'a'
a.py: angle has been defined
a.py: 3rd dir()=['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'angle', 'math', 'mysq']
a.py: dir(b)=['__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__spec__',
'math', 'mysq'] # just to make sure, check that 'b' still has 'math' defined.
>>> a.angle(7,8)
0.8519663271732721
So, you can fix your code by importing * from a into b and from b into a. You cannot import a package xxx and b into a and expect b to magically learn about xxx. For instance, b is not aware of a when b is imported into a just like math has no clue that it was imported into a and it (math) cannot "learn" what other packages were imported into a when a imported math.
By the way, you can easily break the fixed version 1b again by switching the order of imports in a.py:
file 'a.py':
from b import * # swap order of imports breaks Version 1b!
import math
Based on my experimentation in my previous answer and with some information from How to get a reference to current module's attributes in Python, I came up with a solution that actually may fix your problem with imports. All the changes are made exclusively to the file a.py and b.py is not touched.
Solution 1:
# in file a.py do this
import xxx
import sys # OR import b (see below)
from b import *
b = sys.modules[fun.__module__] # alternatively, "import b" and drop "import sys" above
# "inject" 'xxx' into 'b':
b.__dict__['xxx'] = globals()['xxx']
Solution 2:
# in file a.py do this
import xxx
import sys
from b import *
b = sys.modules[fun.__module__] # alternatively, "import b"
# "inject" 'xxx' into 'b':
b.__dict__['xxx'] = sys.modules[__name__].__dict__['xxx']
EXAMPLE:
file a.py:
import math # my version of 'xxx'
import sys
from b import *
b = sys.modules[mysq.__module__] # mysq is a function defined in b.py
b.__dict__['math'] = globals()['math']
def angle(x, y):
return math.acos(x / mysq(x*x + y*y))
file b.py:
def mysq(x):
return math.sqrt(x)
Run:
>>> import a
>>> a.angle(7, 8)
0.8519663271732721
You can import xxx in b.py
If its name conflicts with another file you import in b, do this:
import xxx as some_name
and within b.py you can now refer to it as some_name, i.e.,
some_name.run()
a.py:
import numpy as np_patched
def f():
print("patched")
np_patched.array = f
b.py
import a as np_patched
import numpy as np
np.array()
c.py (order of import doesn't matter?)
import numpy as np
import a as np_patched
np.array()
Result (python3 b.py, or python3 c.py)
patched
Explanation:
a.py import library X (numpy) and monkey patches X. Then, b.py imports a.py. At this point, X is not directly visible to b.py. After that b.py imports X. Python won't import the same thing twice, so it goes on and uses the X patched in a.py for b.py instead of importing a new copy of X for b.py. That is why b.py only gets the patched X, but not the original X.

Python - submodule loading detection

I am currently working on a library that patches several other modules upon loading. It mostly works fine, however in some cases it runs into the problem when the functions to be patched are contained in a sub-module that needs to be explicitely loaded. For instance in scikits-learn, the sub-module datasets has the following behavior:
>>> import sklearn
>>> dir(sklearn)
['__SKLEARN_SETUP__', '__all__', '__builtins__', '__check_build', '__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', 'base', 'clone', 'externals', 're', 'setup_module', 'sys', 'utils', 'warnings']
It only loads it in case when dataset was explicitely loaded:
>>> from sklearn import datasets
>>> dir(sklearn)
['__SKLEARN_SETUP__', '__all__', '__builtins__', '__check_build', '__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', 'base', 'clone', 'datasets', 'externals', 'feature_extraction', 'preprocessing', 're', 'setup_module', 'sys', 'utils', 'warnings']
How can I detect when dataset is explicitely imported in order to launch my patching only when this sub-module is loaded?

Calling dir function on a module

When I did a dir to find the list of methods in boltons I got the below output
>>> import boltons
>>> dir(boltons)
['__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__']
When I explicitly did
>>> from boltons.strutils import camel2under
>>> dir(boltons)
['__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'strutils']
found that strutils getting added to attribute of boltons
Why is strutils not showing before explicit import?
From the docs on what dir does:
With an argument, attempt to return a list of valid attributes for
that object.
When we import the boltons package we can see that strutils is not an attribute on the boltons object. Therefore we do not expect it to show up in dir(boltons).
>>>import boltons
>>>getattr(boltons, 'strutils')
AttributeError: module 'boltons' has no attribute 'strutils'
The docs on importing submodules say:
For example, if package spam has a submodule foo, after importing spam.foo, spam will have an attribute foo which is bound to the submodule.
Importing a submodule creates an attribute on the package. In your example:
>>>import boltons
>>>getattr(boltons, 'strutils')
AttributeError: module 'boltons' has no attribute 'strutils'
>>>from boltons.strutils import camel2under
>>>getattr(boltons, 'strutils')
<module 'boltons.strutils' from '/usr/local/lib/python3.5/site-packages/boltons/strutils.py'>
Therefore in this case we do expect strutils to show up in dir(boltons)

Why does python put 'A' in global namespace when I import A.B.C

I am reading about how import works in python.
When I do:
import A.B.C
A, A.B, A.B.C are put in sys.modules. Expected.
A's __init__, A.B's __init__ get executed. Expected.
But here is a surprise: When I print globals(), only A is put into the namespace, while 'A.B.C' is not. I expect 'A.B.C' to be in global namespace.
And this means, I can access A.x defined in A's __init__.
Why is import implemented this way?
Only objects/names are put in globals namespace. A.B.C is not a valid name.
In your above case, the object is the module object for A , and its name is A .
In this particular case, if you do -
dir(A)
You would see B inside it, and that means its an attribute of the module object A . If you do -
hasattr(A,'B')
It would return True.
And in the same way if you do - dir(A.B) , you would be able to see C in it , and C is an attribute of A.B .
A Very simple example to show this -
My directory structur -
shared/
__init__.py
pkg/
__init__.py
b.py
Then in code I do -
>>> import shared.pkg.b
>>> dir(shared)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'pkg']
>>> hasattr(shared,'pkg')
True
>>>
>>> dir(shared.pkg)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'b']
>>> hasattr(shared.pkg,'b')
True
B and C are reachable through A.
eg.
import A.B.C
print(A.B.C)
If you want B and C to appear directly in your current namespace then do
from A import B
from A.B import C
print(B, C)

Categories