Python Import Star Creating Hidden Namespace? - python

I recently ran into some unusual behavior.
foo.py
a = 0
def bar():
print (a)
Console:
>>> import foo
>>> foo.bar()
0
>>> foo.a = 10
>>> foo.bar()
10
Console:
>>> from foo import *
>>> bar()
0
>>> a
0
>>> a = 10
>>> a
10
>>> bar()
0
I'm inferring that import * is actually creating two copies of a - one in the global namespace and one inside the foo module which cannot be accessed. Is this behavior explained/documented anywhere? I'm having trouble figuring out what to search for.
This seems like a notable and unexpected consequence of import * but for some reason I've never seen it brought up before.

There is no such thing as a hidden namespace in Python and the described behaviour is the normal and expected one.
You should read https://docs.python.org/3/tutorial/modules.html#more-on-modules in order to understand better how the globals do do work.

Related

class hidden from module dictionary

I'm developing a code analysis tool for Python program.
I'm using introspection techniques to navigate into program structure.
Recently, I tested my tool on big packages like tkinter and matplotlib. It worked well.
But I found an oddity when analyzing numpy.
import numpy,inspect
for elem in inspect.getmembers( numpy, inspect.isclass)
print( elem)
print( 'Tester' in dir( numpy))
print( numpy.__dict__['Tester'])
Result:
blablabla
('Tester', <class 'numpy.testing._private.nosetester.NoseTester'>),
blablabla
True
KeyError: 'Tester'
getmembers() and dir() agree that there is a 'Tester' class but it is not in __dict__ dictionary. I dug a little further:
1 >>> import numpy,inspect
2 >>> d1 = inspect.getmembers( numpy)
3 >>> d2 = dir( numpy)
4 >>> d3 = numpy.__dict__.keys()
5 >>> len(d1),len(d2),len(d3)
6 (602, 602, 601)
7 >>> set([d[0] for d in d1]) - set(d3)
8 {'Tester'}
9 numpy.Tester
10 <class 'numpy.testing._private.nosetester.NoseTester'>
11 >>>
getmembers() and dir() agree but __dict__ do not. Line 8 shows that 'Tester' is not in __dict__.
This bring questions:
what is the mechanism used by numpy to hide the 'Tester' class?
where are getmembers() and dir() finding the reference to 'Tester' class?
I'm using Python 3.9.2 and numpy 1.23.5
I believe inspect.getmembers relies on dir of an object for the keys, and getattr for the values, and dir for the numpy class is overridden to:
def __dir__():
return list(globals().keys() | {'Tester', 'testing'})
with the getattr overridden specifically in regard to the above to:
if attr == 'testing':
import numpy.testing as testing
return testing
elif attr == 'Tester':
from .testing import Tester
return
so dir will return a "Tester", and getattr will find and return a corresponding object, but it's not in the __dict__.
This is the reasoning they use is to allow for a lazy import:
# Importing Tester requires importing all of UnitTest which is not a
# cheap import Since it is mainly used in test suits, we lazy import it
# here to save on the order of 10 ms of import time for most users
#
# The previous way Tester was imported also had a side effect of adding
# the full `numpy.testing` namespace
numpy dir definition
numpy getattr
getmembers definition
Example
>>> import numpy as np
>>> import inspect
>>>
>>> np.__dir__ = lambda: ["poly"]
>>>
>>> dir(np)
['poly']
>>>
>>> inspect.getmembers(np)
[('poly', <function poly at 0x101fd8280>)]
>>>
if you override getattr as well, then you can create that "hidden" attribute:
>>> import numpy as np
>>> import inspect
>>>
>>> np.__dir__ = lambda: ["this_doesnt_exist","poly"]
>>>
>>> "this_doesnt_exist" in np.__dict__
False
>>> "poly" in np.__dict__
True
>>>
>>> inspect.getmembers(np) # this_doesnt_exist neither in dict, or successfully returned from getattr
[('poly', <function poly at 0x105ccc280>)]
>>>
>>> np.__getattr__ = lambda x: f"{x} doesnt exist, but my getattr pretends it does."
>>>
>>> inspect.getmembers(np)
[('poly', <function poly at 0x105ccc280>), ('this_doesnt_exist', 'this_doesnt_exist doesnt exist, but my getattr pretends it does.')]
>>>

How to properly export globals in Python modules?

I'm seeing this strange behavior with a module, which only happens if the module is implemented using an __init__.py and another file, containing the 'guts' of the module. So, let's say, I have a module, called module, implemented in the module directory. This directory contains two files:
__init__.py
# contents of module/__init__.py
from .guts import *
There's also guts.py in the same directory:
# contents of module/guts.py
test_x = 1
def inc_x():
global test_x
test_x += 1
def print_x(prefix):
print(f"{prefix} in module: {test_x}")
Now, I'm trying to use this module, including its global test_x from an application:
x = module.test_x
print(f"before in test : {module.test_x}")
module.print_x("before")
print("incrementing...")
module.inc_x()
print(f"after in test : {module.test_x}")
module.print_x("after ")
assert x==module.test_x-1
print("SUCCESS!!\n")
If you executed this code under python (I'm using 3.7.6), you get the following output:
before in test : 1
before in module: 1
incrementing...
after in test : 1
after in module: 2
Traceback (most recent call last):
File "test.py", line 9, in <module>
assert x==module.test_x-1
AssertionError
I'm completely baffled about this behavior, it's almost as if test_x had a split personality: one inside the module, one outside.
NOTE: this doesn't happen, if I copy the contents of guts.py into __init__.py instead if importing it.
NOTE: this also doesn't happen if the module is implemented as a single file (module.py) as opposed to in a directory with two files.
Can anyone enlighten me as to what is going on and - ideally - what to do about this behavior?
Thanks!
Two things have happened:
when you import * from guts, test_x is imported into namespace of module and has value 1.
when you run inc_x you declare test_x global, that is global within its module (guts).
Now since int is immutable (and does not / cannot support in place incrementing), it ultimately / effectively ends up assigning 2 (incremented value) to test_x that is global to guts.
You can try these two things which may demonstrate that, replace:
print(f"after in test : {module.test_x}")
with:
import module.guts
print(f"after in test : {module.guts.test_x}")
After the same operation, module.test_x was 1, but module.guts.test_x shows the incremented value as you have expected.
You can also change guts to look like this:
test_x = [1]
def inc_x():
global test_x
test_x[0] += 1
def print_x(prefix):
print(f"{prefix} in module: {test_x[0]}")
And test to match:
x = module.test_x[0]
print(f"before in test : {module.test_x[0]}")
module.print_x("before")
print("incrementing...")
module.inc_x()
print(f"after in test : {module.test_x[0]}")
module.print_x("after ")
assert x==module.test_x[0]-1
print("SUCCESS!!\n")
Now when you import guts you have module.test_x being a list (and the very same one as module.guts.test_x). When you manipulate items in that list, you are still accessing the same instance, regardless whether it's through guts.test_x (also as its global inside inc_x) or through module.test_x. -> You do not assign new value to guts.test_x; you change the object which also module.test_x refers to.
That said, I am generally somewhat dubious about global and properly. And yeah, they are a good way to confuse anyone reading the code.
Regarding your further inquiry: each object has its identity (and resides somewhere in memory). Variables (names) are references to these objects. Assigning to a variable establishes reference to this object:
>>> class C: pass
...
>>> a = C() # create new instance and assign it to a
>>> b = a # assign that instance to b
>>> c = C() # create new instance and assign it to c
>>> id(a) == id(b) # or: a is b
True
>>> id(a) == id(c) # or: a is c
False
Both a and b are different names referring to the one ans same object.
>>> a.a = 1
>>> print(b.a)
1
However:
>>> a = 1
>>> b = a
>>> id(a) == id(b) # or: a is b
True
>>> b = 2
>>> id(a) == id(b) # or: a is b
False
>>> a
1
What has happened here? I have an object (literal int of value 1) that I have assigned a name (of variable) a and then I have also made b reference this same object. However, with b = 2, I have not change value of the object, but reassign what b refers to (literal int with value 2).
In the first example changing attribute a of instance of class C. I was modifying the object in place... both keep pointing to the same object (instance). In this second example. I have changed (reassigned) the reference and a and b no longer point to the same object.
In your function, you have used += operator, that would attempt an in-place addition, but this operation is not (and cannot be) supported by int (since it's an immutable type: it cannot be changed in place). I.e. a new object is created and the name is reassign to refer to the result (in this case a += 1 has the same effect as a = a + 1).
Now, even when your variable (name) is global, it's still only global within its module. You can inspect global variables using globals, two small files, m1.py
script_a = 1
# prune keys starting with "__" from printed dict
print("in m1:", {k: v for (k, v) in globals().items() if not k.startswith("__")})
and script.py:
import m1
m1_a = 1
print("in script:", {k: v for (k, v) in globals().items() if not k.startswith("__")})
Will give you:
$ python3 script.py
in m1: {'script_a': 1}
in script: {'m1': <module 'm1' from '/tmp/m1.py'>, 'm1_a': 1}
This mean when a name is created as a result of from ... * import, a new name is created. While initially both name refer to the same object, when you call inc_x of guts gets reassigned and refers to something new (result of addition) while the other name in module still points to the original object (1).
Now as hinted. I would generally discourage use of global variables as they may result in less obvious behavior and can make future reading and maintenance of the script more difficult. That said, while possible to get the behavior you've wanted by using a type (such as list) and modifying the object in place. Crossing module boundaries takes this to another level of making it confusing. It really can be fairly difficult to immediately see effect of change in one part of the code and (possibly unforeseen) effects it may have elsewhere (where import from the same module is also being imported and used in a larger script).

Why dill dumps external classes by reference, no matter what?

In the example below, I have placed the class Foo inside its own module foo.
Why is the external class dumped by ref? The instance ff is not being dumped with its source code.
I am using Python 3.4.3 and dill-0.2.4.
import dill
import foo
class Foo:
y = 1
def bar( self, x ):
return x + y
f = Foo()
ff = foo.Foo()
print( dill.dumps( f, byref=False, recurse=True ) )
print( '\n' )
print( dill.dumps( ff, byref=False, recurse=True ) )
Well, the code above is actually wrong (should be Foo.y, instead of y). Correcting the code gives me an exception while dumping the f instance.
I'm the dill author. The foo.Foo instance (ff) pickles by reference because it's defined in a file. This is primarily for compactness of the pickled string. So the primary issue I can think of when importing a class by reference is that the class definition is not found on the other resource you might want to unpickle to (i.e. no module foo exists there). I believe that's a current feature request (and if it's not, feel free to submit a ticket on the github page).
Note, however, if you do modify the class dynamically, it does pull in the dynamically modified code to the pickled string.
>>> import dill
>>> import foo
>>>
>>> class Foo:
... y = 1
... def bar( self, x ):
... return x + Foo.y
...
>>> f = Foo()
>>> ff = foo.Foo()
So when Foo is defined in __main__, byref is respected.
>>> dill.dumps(f, byref=False)
b'\x80\x03cdill.dill\n_create_type\nq\x00(cdill.dill\n_load_type\nq\x01X\x04\x00\x00\x00typeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05h\x01X\x06\x00\x00\x00objectq\x06\x85q\x07Rq\x08\x85q\t}q\n(X\r\x00\x00\x00__slotnames__q\x0b]q\x0cX\x03\x00\x00\x00barq\rcdill.dill\n_create_function\nq\x0e(cdill.dill\n_unmarshal\nq\x0fC]\xe3\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x01\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x03Foo\xda\x01y)\x02\xda\x04self\xda\x01x\xa9\x00r\x05\x00\x00\x00\xfa\x07<stdin>\xda\x03bar\x03\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x10\x85q\x11Rq\x12c__builtin__\n__main__\nh\rNN}q\x13tq\x14Rq\x15X\x07\x00\x00\x00__doc__q\x16NX\n\x00\x00\x00__module__q\x17X\x08\x00\x00\x00__main__q\x18X\x01\x00\x00\x00yq\x19K\x01utq\x1aRq\x1b)\x81q\x1c.'
>>> dill.dumps(f, byref=True)
b'\x80\x03c__main__\nFoo\nq\x00)\x81q\x01.'
>>>
However, when the class is defined in a module, byref is not respected.
>>> dill.dumps(ff, byref=False)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
>>> dill.dumps(ff, byref=True)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
Note, that I wouldn't use the recurse option in this case, as Foo.y will likely infinitely recurse. That's also something that I believe there's current ticket for, but if there isn't, there should be.
Let's dig a little deeper… what if we modify the instance...
>>> ff.zap = lambda x: x + ff.y
>>> _ff = dill.loads(dill.dumps(ff))
>>> _ff.zap(2)
3
>>> dill.dumps(ff, byref=True)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01}q\x02X\x03\x00\x00\x00zapq\x03cdill.dill\n_create_function\nq\x04(cdill.dill\n_unmarshal\nq\x05CY\xe3\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x00\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x02ff\xda\x01y)\x01\xda\x01x\xa9\x00r\x04\x00\x00\x00\xfa\x07<stdin>\xda\x08<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x06\x85q\x07Rq\x08c__builtin__\n__main__\nX\x08\x00\x00\x00<lambda>q\tNN}q\ntq\x0bRq\x0csb.'
>>> dill.dumps(ff, byref=False)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01}q\x02X\x03\x00\x00\x00zapq\x03cdill.dill\n_create_function\nq\x04(cdill.dill\n_unmarshal\nq\x05CY\xe3\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x00\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x02ff\xda\x01y)\x01\xda\x01x\xa9\x00r\x04\x00\x00\x00\xfa\x07<stdin>\xda\x08<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x06\x85q\x07Rq\x08c__builtin__\n__main__\nX\x08\x00\x00\x00<lambda>q\tNN}q\ntq\x0bRq\x0csb.'
>>>
No biggie, it pulls in the dynamically added code. However, we'd probably like to modify Foo and not the instance.
>>> Foo.zap = lambda self,x: x + Foo.y
>>> dill.dumps(f, byref=True)
b'\x80\x03c__main__\nFoo\nq\x00)\x81q\x01.'
>>> dill.dumps(f, byref=False)
b'\x80\x03cdill.dill\n_create_type\nq\x00(cdill.dill\n_load_type\nq\x01X\x04\x00\x00\x00typeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05h\x01X\x06\x00\x00\x00objectq\x06\x85q\x07Rq\x08\x85q\t}q\n(X\x03\x00\x00\x00barq\x0bcdill.dill\n_create_function\nq\x0c(cdill.dill\n_unmarshal\nq\rC]\xe3\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x01\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x03Foo\xda\x01y)\x02\xda\x04self\xda\x01x\xa9\x00r\x05\x00\x00\x00\xfa\x07<stdin>\xda\x03bar\x03\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x0e\x85q\x0fRq\x10c__builtin__\n__main__\nh\x0bNN}q\x11tq\x12Rq\x13X\x07\x00\x00\x00__doc__q\x14NX\r\x00\x00\x00__slotnames__q\x15]q\x16X\n\x00\x00\x00__module__q\x17X\x08\x00\x00\x00__main__q\x18X\x01\x00\x00\x00yq\x19K\x01X\x03\x00\x00\x00zapq\x1ah\x0c(h\rC`\xe3\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x01\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x03Foo\xda\x01y)\x02\xda\x04self\xda\x01x\xa9\x00r\x05\x00\x00\x00\xfa\x07<stdin>\xda\x08<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x1b\x85q\x1cRq\x1dc__builtin__\n__main__\nX\x08\x00\x00\x00<lambda>q\x1eNN}q\x1ftq Rq!utq"Rq#)\x81q$.'
Ok, that's fine, but what about the Foo in our external module?
>>> ff = foo.Foo()
>>>
>>> foo.Foo.zap = lambda self,x: x + foo.Foo.y
>>> dill.dumps(ff, byref=False)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
>>> dill.dumps(ff, byref=True)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
>>>
Hmmm… not good. So the above is probably a pretty compelling use case to change the behavior dill exhibits for classes defined in modules -- or at least enable one of the settings to provide better behavior.
In sum, the answer is: we didn't have a use case for it, so now that we do… this should be a feature request if it is not already.

python class with user defined expressions executed at runtime

I'd like to build a class that is able to take a few user defined expressions at runtime and to calculations based on them and a few predefined variables that the class owns, e.g. the user will know that the variables a,b,c & d exist:
pseudo code:
>>> foo = myclass()
>>> foo.a = 2
>>> foo.b = 3
>>> foo.expression = 'a + b'
>>> foo.run_expression()
5
>>> foo.expression = 'a * b'
>>> foo.run_expression()
10
I've explored lambda functions but they seems to need me to explicitly define what the inputs are for the lambda function every time I create a new one which would mean a lot of boiler plate input from the user ever time they wanted to update the lambda as I know that the inputs would always be a predefined set of variables.
does anybody have experience doing anything similar, or have any thoughts on how to structure a program like this?
To evaluate expressions as Python, use the eval() function, passing in vars(self) as the namespace:
def run_expression(self):
return eval(self.expression, vars(self))
Do know this opens you up to attack vectors, where malicious users can execute arbitrary code and change your program to do completely different things.
Demo:
>>> class Foo(object):
... def run_expression(self):
... return eval(self.expression, vars(self))
...
>>> f = Foo()
>>> f.a = 2
>>> f.b = 3
>>> f.expression = 'a + b'
>>> f.run_expression()
5

How to export a variable from PDB?

Imagine the following scenario: a script is started from the IPython shell and at a break point the python debugger is called. Using the PDB commands one can analyze the code and variables at this point. But often it turns out that the values of the variables call for a deeper research.
Is it possible to export the value of a variable to the IPython shell?
My specific use case:
I struggle with a quite huge numpy array which does not seem to have the correct values. I know that I can run any python commands from the python debugger, but it would be helpful to save the values of the variable at different break points and to use all of them at IPython shell. I am imaging something like
ipdb> global var1; var1 = var
ipdb> continue
...
ipdb> global var2; var2 = var
ipdb> continue
...
In [2]: abs(var1 - var2) # do some interesting calculations with IPython
You can use globals():
ipdb>__name__
'my_module'
ipdb> get_var = 'a value'
ipdb> globals()['myvar'] = get_var
ipdb> q
In [11]: my_module.myvar
Out[11]: 'a value'
This assumes the break point is set in my_module.py, so we are editing the globals of the module my_module.
Not a pretty solution, but working:
ipdb> import cPickle; f=open('/tmp/dump1','w+'); cPickle.dump(var,f); f.close()
...
ipdb> import cPickle; f=open('/tmp/dump2','w+'); cPickle.dump(var,f); f.close()
then
In [2]: var1 = cPickle.load(open('/tmp/dump1'))
In [3]: var2 = cPickle.load(open('/tmp/dump2'))
You need to distinguish different globals().
For example, suppose we have a module: mymodule.py
foo = 100
def test():
bar = 200
return bar
We run it under the control of pdb.
>>> import pdb
>>> import mymodule
>>> foobar = 300
>>> pdb.run('mymodule.test()')
> <string>(1)<module>()
(Pdb) print foobar
300
(Pdb) print foo
*** NameError: name 'foo' is not defined
(Pdb) global foobar2; foobar2 = 301
(Pdb) print foobar2
301
At the beginning, namely, before executing test(), the environment in pdb is your current globals(). Thus foobar is defined, while foo is not defined.
Then we execute test() and stop at the end of bar = 200
-> bar = 200
(Pdb) print bar
200
(Pdb) print foo
100
(Pdb) print foobar
*** NameError: name 'foobar' is not defined
(Pdb) global foo2; foo2 = 101
(Pdb) print foo2
101
(Pdb) c
>>>
The environment in pdb has been changed. It uses mymodule's globals() in test(). Thus 'foobaris not defined. whilefoo` is defined.
We have exported two variables foobar2 and foo2. But they live in different scopes.
>>> foobar2
301
>>> mymodule.foobar2
Traceback (most recent call last):
File "<pyshell#16>", line 1, in <module>
mymodule.foobar2
AttributeError: 'module' object has no attribute 'foobar2'
>>> mymodule.foo2
101
>>> foo2
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
foo2
NameError: name 'foo2' is not defined
You have already found the solution. But it works slightly differently.

Categories