Why dill dumps external classes by reference, no matter what?

Why dill dumps external classes by reference, no matter what? - python

In the example below, I have placed the class Foo inside its own module foo.
Why is the external class dumped by ref? The instance ff is not being dumped with its source code.
I am using Python 3.4.3 and dill-0.2.4.
import dill
import foo
class Foo:
y = 1
def bar( self, x ):
return x + y
f = Foo()
ff = foo.Foo()
print( dill.dumps( f, byref=False, recurse=True ) )
print( '\n' )
print( dill.dumps( ff, byref=False, recurse=True ) )
Well, the code above is actually wrong (should be Foo.y, instead of y). Correcting the code gives me an exception while dumping the f instance.

I'm the dill author. The foo.Foo instance (ff) pickles by reference because it's defined in a file. This is primarily for compactness of the pickled string. So the primary issue I can think of when importing a class by reference is that the class definition is not found on the other resource you might want to unpickle to (i.e. no module foo exists there). I believe that's a current feature request (and if it's not, feel free to submit a ticket on the github page).
Note, however, if you do modify the class dynamically, it does pull in the dynamically modified code to the pickled string.
>>> import dill
>>> import foo
>>>
>>> class Foo:
... y = 1
... def bar( self, x ):
... return x + Foo.y
...
>>> f = Foo()
>>> ff = foo.Foo()
So when Foo is defined in __main__, byref is respected.
>>> dill.dumps(f, byref=False)
b'\x80\x03cdill.dill\n_create_type\nq\x00(cdill.dill\n_load_type\nq\x01X\x04\x00\x00\x00typeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05h\x01X\x06\x00\x00\x00objectq\x06\x85q\x07Rq\x08\x85q\t}q\n(X\r\x00\x00\x00__slotnames__q\x0b]q\x0cX\x03\x00\x00\x00barq\rcdill.dill\n_create_function\nq\x0e(cdill.dill\n_unmarshal\nq\x0fC]\xe3\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x01\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x03Foo\xda\x01y)\x02\xda\x04self\xda\x01x\xa9\x00r\x05\x00\x00\x00\xfa\x07<stdin>\xda\x03bar\x03\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x10\x85q\x11Rq\x12c__builtin__\n__main__\nh\rNN}q\x13tq\x14Rq\x15X\x07\x00\x00\x00__doc__q\x16NX\n\x00\x00\x00__module__q\x17X\x08\x00\x00\x00__main__q\x18X\x01\x00\x00\x00yq\x19K\x01utq\x1aRq\x1b)\x81q\x1c.'
>>> dill.dumps(f, byref=True)
b'\x80\x03c__main__\nFoo\nq\x00)\x81q\x01.'
>>>
However, when the class is defined in a module, byref is not respected.
>>> dill.dumps(ff, byref=False)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
>>> dill.dumps(ff, byref=True)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
Note, that I wouldn't use the recurse option in this case, as Foo.y will likely infinitely recurse. That's also something that I believe there's current ticket for, but if there isn't, there should be.
Let's dig a little deeper… what if we modify the instance...
>>> ff.zap = lambda x: x + ff.y
>>> _ff = dill.loads(dill.dumps(ff))
>>> _ff.zap(2)
3
>>> dill.dumps(ff, byref=True)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01}q\x02X\x03\x00\x00\x00zapq\x03cdill.dill\n_create_function\nq\x04(cdill.dill\n_unmarshal\nq\x05CY\xe3\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x00\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x02ff\xda\x01y)\x01\xda\x01x\xa9\x00r\x04\x00\x00\x00\xfa\x07<stdin>\xda\x08<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x06\x85q\x07Rq\x08c__builtin__\n__main__\nX\x08\x00\x00\x00<lambda>q\tNN}q\ntq\x0bRq\x0csb.'
>>> dill.dumps(ff, byref=False)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01}q\x02X\x03\x00\x00\x00zapq\x03cdill.dill\n_create_function\nq\x04(cdill.dill\n_unmarshal\nq\x05CY\xe3\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x00\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x02ff\xda\x01y)\x01\xda\x01x\xa9\x00r\x04\x00\x00\x00\xfa\x07<stdin>\xda\x08<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x06\x85q\x07Rq\x08c__builtin__\n__main__\nX\x08\x00\x00\x00<lambda>q\tNN}q\ntq\x0bRq\x0csb.'
>>>
No biggie, it pulls in the dynamically added code. However, we'd probably like to modify Foo and not the instance.
>>> Foo.zap = lambda self,x: x + Foo.y
>>> dill.dumps(f, byref=True)
b'\x80\x03c__main__\nFoo\nq\x00)\x81q\x01.'
>>> dill.dumps(f, byref=False)
b'\x80\x03cdill.dill\n_create_type\nq\x00(cdill.dill\n_load_type\nq\x01X\x04\x00\x00\x00typeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05h\x01X\x06\x00\x00\x00objectq\x06\x85q\x07Rq\x08\x85q\t}q\n(X\x03\x00\x00\x00barq\x0bcdill.dill\n_create_function\nq\x0c(cdill.dill\n_unmarshal\nq\rC]\xe3\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x01\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x03Foo\xda\x01y)\x02\xda\x04self\xda\x01x\xa9\x00r\x05\x00\x00\x00\xfa\x07<stdin>\xda\x03bar\x03\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x0e\x85q\x0fRq\x10c__builtin__\n__main__\nh\x0bNN}q\x11tq\x12Rq\x13X\x07\x00\x00\x00__doc__q\x14NX\r\x00\x00\x00__slotnames__q\x15]q\x16X\n\x00\x00\x00__module__q\x17X\x08\x00\x00\x00__main__q\x18X\x01\x00\x00\x00yq\x19K\x01X\x03\x00\x00\x00zapq\x1ah\x0c(h\rC`\xe3\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x01\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x03Foo\xda\x01y)\x02\xda\x04self\xda\x01x\xa9\x00r\x05\x00\x00\x00\xfa\x07<stdin>\xda\x08<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x1b\x85q\x1cRq\x1dc__builtin__\n__main__\nX\x08\x00\x00\x00<lambda>q\x1eNN}q\x1ftq Rq!utq"Rq#)\x81q$.'
Ok, that's fine, but what about the Foo in our external module?
>>> ff = foo.Foo()
>>>
>>> foo.Foo.zap = lambda self,x: x + foo.Foo.y
>>> dill.dumps(ff, byref=False)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
>>> dill.dumps(ff, byref=True)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
>>>
Hmmm… not good. So the above is probably a pretty compelling use case to change the behavior dill exhibits for classes defined in modules -- or at least enable one of the settings to provide better behavior.
In sum, the answer is: we didn't have a use case for it, so now that we do… this should be a feature request if it is not already.

Related

class hidden from module dictionary

I'm developing a code analysis tool for Python program.
I'm using introspection techniques to navigate into program structure.
Recently, I tested my tool on big packages like tkinter and matplotlib. It worked well.
But I found an oddity when analyzing numpy.
import numpy,inspect
for elem in inspect.getmembers( numpy, inspect.isclass)
print( elem)
print( 'Tester' in dir( numpy))
print( numpy.__dict__['Tester'])
Result:
blablabla
('Tester', <class 'numpy.testing._private.nosetester.NoseTester'>),
blablabla
True
KeyError: 'Tester'
getmembers() and dir() agree that there is a 'Tester' class but it is not in __dict__ dictionary. I dug a little further:
1 >>> import numpy,inspect
2 >>> d1 = inspect.getmembers( numpy)
3 >>> d2 = dir( numpy)
4 >>> d3 = numpy.__dict__.keys()
5 >>> len(d1),len(d2),len(d3)
6 (602, 602, 601)
7 >>> set([d[0] for d in d1]) - set(d3)
8 {'Tester'}
9 numpy.Tester
10 <class 'numpy.testing._private.nosetester.NoseTester'>
11 >>>
getmembers() and dir() agree but __dict__ do not. Line 8 shows that 'Tester' is not in __dict__.
This bring questions:
what is the mechanism used by numpy to hide the 'Tester' class?
where are getmembers() and dir() finding the reference to 'Tester' class?
I'm using Python 3.9.2 and numpy 1.23.5

I believe inspect.getmembers relies on dir of an object for the keys, and getattr for the values, and dir for the numpy class is overridden to:
def __dir__():
return list(globals().keys() | {'Tester', 'testing'})
with the getattr overridden specifically in regard to the above to:
if attr == 'testing':
import numpy.testing as testing
return testing
elif attr == 'Tester':
from .testing import Tester
return
so dir will return a "Tester", and getattr will find and return a corresponding object, but it's not in the __dict__.
This is the reasoning they use is to allow for a lazy import:
# Importing Tester requires importing all of UnitTest which is not a
# cheap import Since it is mainly used in test suits, we lazy import it
# here to save on the order of 10 ms of import time for most users
#
# The previous way Tester was imported also had a side effect of adding
# the full `numpy.testing` namespace
numpy dir definition
numpy getattr
getmembers definition
Example
>>> import numpy as np
>>> import inspect
>>>
>>> np.__dir__ = lambda: ["poly"]
>>>
>>> dir(np)
['poly']
>>>
>>> inspect.getmembers(np)
[('poly', <function poly at 0x101fd8280>)]
>>>
if you override getattr as well, then you can create that "hidden" attribute:
>>> import numpy as np
>>> import inspect
>>>
>>> np.__dir__ = lambda: ["this_doesnt_exist","poly"]
>>>
>>> "this_doesnt_exist" in np.__dict__
False
>>> "poly" in np.__dict__
True
>>>
>>> inspect.getmembers(np) # this_doesnt_exist neither in dict, or successfully returned from getattr
[('poly', <function poly at 0x105ccc280>)]
>>>
>>> np.__getattr__ = lambda x: f"{x} doesnt exist, but my getattr pretends it does."
>>>
>>> inspect.getmembers(np)
[('poly', <function poly at 0x105ccc280>), ('this_doesnt_exist', 'this_doesnt_exist doesnt exist, but my getattr pretends it does.')]
>>>

Python make function object subscriptable [duplicate]

I need to patch current datetime in tests. I am using this solution:
def _utcnow():
return datetime.datetime.utcnow()
def utcnow():
"""A proxy which can be patched in tests.
"""
# another level of indirection, because some modules import utcnow
return _utcnow()
Then in my tests I do something like:
with mock.patch('***.utils._utcnow', return_value=***):
...
But today an idea came to me, that I could make the implementation simpler by patching __call__ of function utcnow instead of having an additional _utcnow.
This does not work for me:
from ***.utils import utcnow
with mock.patch.object(utcnow, '__call__', return_value=***):
...
How to do this elegantly?

When you patch __call__ of a function, you are setting the __call__ attribute of that instance. Python actually calls the __call__ method defined on the class.
For example:
>>> class A(object):
... def __call__(self):
... print 'a'
...
>>> a = A()
>>> a()
a
>>> def b(): print 'b'
...
>>> b()
b
>>> a.__call__ = b
>>> a()
a
>>> a.__call__ = b.__call__
>>> a()
a
Assigning anything to a.__call__ is pointless.
However:
>>> A.__call__ = b.__call__
>>> a()
b
TLDR;
a() does not call a.__call__. It calls type(a).__call__(a).
Links
There is a good explanation of why that happens in answer to "Why type(x).__enter__(x) instead of x.__enter__() in Python standard contextlib?".
This behaviour is documented in Python documentation on Special method lookup.

[EDIT]
Maybe the most interesting part of this question is Why I cannot patch somefunction.__call__?
Because the function don't use __call__'s code but __call__ (a method-wrapper object) use function's code.
I don't find any well sourced documentation about that, but I can prove it (Python2.7):
>>> def f():
... return "f"
...
>>> def g():
... return "g"
...
>>> f
<function f at 0x7f1576381848>
>>> f.__call__
<method-wrapper '__call__' of function object at 0x7f1576381848>
>>> g
<function g at 0x7f15763817d0>
>>> g.__call__
<method-wrapper '__call__' of function object at 0x7f15763817d0>
Replace f's code by g's code:
>>> f.func_code = g.func_code
>>> f()
'g'
>>> f.__call__()
'g'
Of course f and f.__call__ references are not changed:
>>> f
<function f at 0x7f1576381848>
>>> f.__call__
<method-wrapper '__call__' of function object at 0x7f1576381848>
Recover original implementation and copy __call__ references instead:
>>> def f():
... return "f"
...
>>> f()
'f'
>>> f.__call__ = g.__call__
>>> f()
'f'
>>> f.__call__()
'g'
This don't have any effect on f function. Note: In Python 3 you should use __code__ instead of func_code.
I Hope that somebody can point me to the documentation that explain this behavior.
You have a way to work around that: in utils you can define
class Utcnow(object):
def __call__(self):
return datetime.datetime.utcnow()
utcnow = Utcnow()
And now your patch can work like a charm.
Follow the original answer that I consider even the best way to implement your tests.
I've my own gold rule: never patch protected methods. In this case the things are little bit smoother because protected method was introduced just for testing but I cannot see why.
The real problem here is that you cannot to patch datetime.datetime.utcnow directly (is C extension as you wrote in the comment above). What you can do is to patch datetime by wrap the standard behavior and override utcnow function:
>>> with mock.patch("datetime.datetime", mock.Mock(wraps=datetime.datetime, utcnow=mock.Mock(return_value=3))):
... print(datetime.datetime.utcnow())
...
3
Ok that is not really clear and neat but you can introduce your own function like
def mock_utcnow(return_value):
return mock.Mock(wraps=datetime.datetime,
utcnow=mock.Mock(return_value=return_value)):
and now
mock.patch("datetime.datetime", mock_utcnow(***))
do exactly what you need without any other layer and for every kind of import.
Another solution can be import datetime in utils and to patch ***.utils.datetime; that can give you some freedom to change datetime reference implementation without change your tests (in this case take care to change mock_utcnow() wraps argument too).

As commented on the question, since datetime.datetime is written in C, Mock can't replace attributes on the class (see Mocking datetime.today by Ned Batchelder). Instead you can use freezegun.
$ pip install freezegun
Here's an example:
import datetime
from freezegun import freeze_time
def my_now():
return datetime.datetime.utcnow()
#freeze_time('2000-01-01 12:00:01')
def test_freezegun():
assert my_now() == datetime.datetime(2000, 1, 1, 12, 00, 1)
As you mention, an alternative is to track each module importing datetime and patch them. This is in essence what freezegun does. It takes an object mocking datetime, iterates through sys.modules to find where datetime has been imported and replaces every instance. I guess it's arguable whether you can do this elegantly in one function.

python class with user defined expressions executed at runtime

I'd like to build a class that is able to take a few user defined expressions at runtime and to calculations based on them and a few predefined variables that the class owns, e.g. the user will know that the variables a,b,c & d exist:
pseudo code:
>>> foo = myclass()
>>> foo.a = 2
>>> foo.b = 3
>>> foo.expression = 'a + b'
>>> foo.run_expression()
5
>>> foo.expression = 'a * b'
>>> foo.run_expression()
10
I've explored lambda functions but they seems to need me to explicitly define what the inputs are for the lambda function every time I create a new one which would mean a lot of boiler plate input from the user ever time they wanted to update the lambda as I know that the inputs would always be a predefined set of variables.
does anybody have experience doing anything similar, or have any thoughts on how to structure a program like this?

To evaluate expressions as Python, use the eval() function, passing in vars(self) as the namespace:
def run_expression(self):
return eval(self.expression, vars(self))
Do know this opens you up to attack vectors, where malicious users can execute arbitrary code and change your program to do completely different things.
Demo:
>>> class Foo(object):
... def run_expression(self):
... return eval(self.expression, vars(self))
...
>>> f = Foo()
>>> f.a = 2
>>> f.b = 3
>>> f.expression = 'a + b'
>>> f.run_expression()
5

Python: pickling nested functions

Using the example
def foo(a):
def bar(b):
return a+b
return bar
d = {1:foo(1), 2:foo(2)}
It appears that pickle module will not work with a function not defined at the module scope, so pickling 'd' will not work. Is there another pickling mechanism available that I should consider?

I'm afraid that you can't pickle nested functions.
The pickle module serializes functions by name. That is, if you have a function myfunc in a module mymodule it simply saves the name mymodule.myfunc and looks it up again when unserializing. (This is an important security and compatibility issue, as it guarantees that the unserializing code uses its own definition for the function, rather than the original definition which might be compromised or obsolete.)
Alas, pickle can't do that with nested functions, because there's no way to directly address them by name. Your bar function, for instance, can't be accessed from outside of foo.
If you need a serializable object that works like a function, you can instead make a class with a __call__ method:
class foo(object):
def __init__(self, a):
self.a = a
def __call__(self, b): # the function formerly known as "bar"
return self.a + b
This works just like the nested functions in the question, and should pose no problem to pickle. Do be aware though, that you'll need to have the same class definition available when you unserialize a foo instance.

You can pickle nested functions if you use dill instead of pickle.
>>> import dill
>>>
>>> def foo(a):
... def bar(b):
... return a+b
... return bar
...
>>> d = {1:foo(1), 2:foo(2)}
>>>
>>> _d = dill.dumps(d)
>>> d_ = dill.loads(_d)
>>> d_
{1: <function bar at 0x108cfe848>, 2: <function bar at 0x108cfe8c0>}
>>> d[1](0) + d[2](10)
13
>>>

according to Blckknght's answersing. if nested function is the only extrac serialized type and will use it as decorator, you can just add functools.warps at top of inner function defination to lead other interpret find the correct name:
from functools import warps
def foo(func):
#wraps(func)
def bar(b):
return func(b)
return bar
#foo
def zzz(b):
return b

replacing the "new" module

I have code which contains the following two lines in it:-
instanceMethod = new.instancemethod(testFunc, None, TestCase)
setattr(TestCase, testName, instanceMethod)
How could it be re-written without using the "new" module? Im sure new style classes provide some kind of workaround for this, but I am not sure how.

There is a discussion that suggests that in python 3, this is not required. The same works in Python 2.6
http://mail.python.org/pipermail/python-list/2009-April/531898.html
See:
>>> class C: pass
...
>>> c=C()
>>> def f(self): pass
...
>>> c.f = f.__get__(c, C)
>>> c.f
<bound method C.f of <__main__.C instance at 0x10042efc8>>
>>> c.f
<unbound method C.f>
>>>
Reiterating the question for every one's benefit, including mine.
Is there a replacement in Python3 for new.instancemethod? That is, given an arbitrary instance (not its class) how can I add a new appropriately defined function as a method to it?
So following should suffice:
TestCase.testFunc = testFunc.__get__(None, TestCase)

You can replace "new.instancemethod" by "types.MethodType":
from types import MethodType as instancemethod
class Foo:
def __init__(self):
print 'I am ', id(self)
def bar(self):
print 'hi', id(self)
foo = Foo() # prints 'I am <instance id>'
mm = instancemethod(bar, foo) # automatically uses foo.__class__
mm() # prints 'I have been bound to <same instance id>'
foo.mm # traceback because no 'field' created in foo to hold ref to mm
foo.mm = mm # create ref to bound method in foo
foo.mm() # prints 'I have been bound to <same instance id>'

This will do the same:
>>> Testcase.testName = testFunc
Yeah, it's really that simple.
Your line
>>> instanceMethod = new.instancemethod(testFunc, None, TestCase)
Is in practice (although not in theory) a noop. :) You could just as well do
>>> instanceMethod = testFunc
In fact, in Python 3 I'm pretty sure it would be the same in theory as well, but the new module is gone so I can't test it in practice.

To confirm that it's not needed to use new.instancemthod() at all since Python v2.4, here's an example how to replace an instance method. It's also not needed to use descriptors (even though it works).
class Ham(object):
def spam(self):
pass
h = Ham()
def fake_spam():
h._spam = True
h.spam = fake_spam
h.spam()
# h._spam should be True now.
Handy for unit testing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why dill dumps external classes by reference, no matter what? - python

Related

class hidden from module dictionary

Python make function object subscriptable [duplicate]

python class with user defined expressions executed at runtime

Python: pickling nested functions

replacing the "new" module

Categories

Resources