This question already has answers here:
Is there an easy way to pickle a python function (or otherwise serialize its code)?
(12 answers)
Closed 2 years ago.
I am trying to run the following code:
import pickle
def foo():
print("i am foo")
pickle_foo = pickle.dumps(foo)
def foo():
print("i am the new foo")
fkt = pickle.loads(pickle_foo)
return fkt()
foo()
The expected behavior would be:
the new defined function "foo" is called
in the new function the old function gets unpickeled and then called
output:
i am the new foo
i am foo
What actually happens is:
the new function foo gets called, and then recursively calls itself until a Recursion Error gets thrown:
RecursionError: maximum recursion depth exceeded while calling a Python object
The error does not occur, when the two functions are named differently, but that would be very unpractical for my project.
Could anyone explain, why this behavior occurs and how to avoid it (without changing the function names)?
The pickle module pickles functions based on their fully-qualified name reference. This means that if your function is redefined somewhere in code, and then you unpickle a pickled reference to it, calling it will result in a call to the new definition.
From the Python docs on pickle:
Note that functions (built-in and user-defined) are pickled by “fully
qualified” name reference, not by value. 2 This means that only the
function name is pickled, along with the name of the module the
function is defined in. Neither the function’s code, nor any of its
function attributes are pickled. Thus the defining module must be
importable in the unpickling environment, and the module must contain
the named object, otherwise an exception will be raised.
What you can do however, is use inspect.getsource() to retrieve the source code for your function, and pickle that. This requires that your code be available as source somewhere on the file system, so compiled C code imported, or other outside sources (interpreter input, dynamically loaded modules) will not work.
When you unpickle it, you can use exec to convert it into a function and execute it.
Note: this will redefine foo every time, so calls to foo cannot be guaranteed to have the same effect.
Note 2: exec is unsafe and usually unsuitable for code that will be interacting with external sources. Make sure you protect calls to exec from potential external attacks that attempt to execute arbitrary code.
Related
I'm having some trouble importing some stuff using a function, despite being able to do so in the interpreter.
Imagine there is a file, input.py, in a folder A which in turn is in the same directory as my script. In this file, we define variable 'B'.
B = 5
When I go into the interpreter, the following commands give me the correct value of B
>>> import sys
>>> sys.path.append('A')
>>> exec('from inputs import *')
>>> print(B)
Yet if I move that code to a seperate file, say 'test.py':
import sys
def import_stuff(import_dir):
sys.path.append(import_dir)
exec('from inputs import *')
print(B)
Then call it from the interpreter like so:
>>> import test
>>> test.import_stuff('A')
I get a NameError and B is not found. What's going on?
Local variables in functions are treated differently than global variables and object attributes (which both use dictionaries to map names to values).
When a function is defined, the Python compiler examines its code and makes a note of which local variable names are used, and designates a "slot" for each one. When the function is called, the slots refer to part of the memory in the frame object. Local variable assignments and lookups access the slots by number, not by name. This makes local variable lookups notably faster than global variable and attribute lookups (since indexing a slot is much faster than doing a dict lookup).
When you try to use exec to create local variables, it bypasses the slots. The compiler doesn't know what variables will be created in the exec'd code, so there are no slots allocated for them. This is also the reason Python 3 doesn't allow you to use from module import * inside a function: the names to be imported are not known at the function's compile time (only when it is run and the imported module is loaded) so the compiler can't set up slots for the names.
Even if you separately initialized local variables for the names you expect to be assigned to in the exec'd code, it still wouldn't work. The exec'd code doesn't know it's being run from within a function, and always wants to write variables to a dictionary (never to function slots). Function's do have a local namespace dictionary that does catch the assignments (they don't become global variables), but using it via the locals() function is very flaky. The values of all of the local variables stored in slots are copied into the dictionary each time you call locals(), but no copying in the other direction ever happens (modifying the dictionary returned from locals() doesn't effect the values of variables stored in slots and accessed the normal way).
That brings me to what I think is the best way to work around this issue. Rather than having exec modify the current namespace (which occurs in a flaky way if you're in a function), you should explicitly pass in a dictionary for it to use as its namespace.
def import_stuff(import_dir):
sys.path.append(import_dir)
namespace = {} # create a namespace dict
exec('from inputs import *', namespace) # pass it to exec
print(namespace["B"]) # read the results from the namespace
I have a project in which I want to repeatedly change code in a class and then run other modules to test the changes (verification..). Currently, after each edit I have to reload the code, the testing modules which run it, and then run the test. I want to reduce this cycle to one line, moreover, I will later want to test different classes, so I want to be able to receive the name of the tested class as a parameter - meaning I need dynamic imports.
I wrote a function for clean imports of any module, it seems to work:
def build_module_clean(module_string,attr_strings):
module = import_module(module_string)
module = reload(module)
for f in attr_strings:
globals()[f]=getattr(module,f)
Now, in the name of cleanliness, I want to keep this function in a wrapper module (which will contain the one-liner I want to rebuild and test all the code each time), and run it from the various modules, i.e. among the import statements of my ModelChecker module I would place the line
from wrapper import build_module_clean
build_module_clean('test_class_module',['test_class_name'])
however, when I do this, it seems the test class is added to the globals in the wrapper module, but not in the ModelChecker module (attempting to access globals()['test_class_name'] in ModelChecker gives a key error). I have tried passing globals or globals() as further parameters to build_module_clean, but globals is a function (so the test module is still loaded to the wrapper globals), and passing and then using globals() gives the error
TypeError: 'builtin_function_or_method' object does not support item assignment
So I need some way to edit one module's globals() from another module.
Alternatively, (ideally?) I would like to import the test_class module in the wrapper, in a manner that would make it visible to all the modules that use it (e.g. ModelChecker). How can I do that?
Your function should look like:
def build_module_clean(globals, module_string, attr_strings):
module = import_module(module_string)
module = reload(module)
globals[module_string] = module
for f in attr_strings:
globals[f] = getattr(module, f)
and call it like so:
build_module_clean(globals(), 'test_class_module', ['test_class_name'])
Explanation:
Calling globals() in the function call (build_module_clean(globals()...) grabs the module's __dict__ while still in the correct module and passes that to your function.
The function is then able to (re)assign the names to the newly-loaded module and it's current attributes.
Note that I also (re)assigned the newly-loaded module itself to the globals (you may not want that part).
I just started with python a couple of days ago, coming from a C++ background. When I write a class, call it by a script, and afterwards update the interface of the class, I get some behaviour I find very unintuitive.
Once successfully compiled, the class seems to be not changeable anymore. Here an example:
testModule.py:
class testClass:
def __init__(self,_A):
self.First=_A
def Method(self, X, Y):
print X
testScript.py:
import testModule
tm=testModuleB.testClass(10)
tm.Method(3, 4)
Execution gives me
3
Now I change the argument list of Method:
def Method(self, X):
, I delete the testModule.pyc and in my script I call
tm.Method(3)
As result, I get
TypeError: Method() takes exactly 3 arguments (2 given)
What am I doing wrong? Why does the script not use the updated version of the class? I use the Canopy editor but I saw this behaviour also with the python.exe interpreter.
And apologies, if something similar was asked before. I did not find a question related to this one.
Python loads the code objects into memory; the class statement is executed when a file is first imported an a class object is created and stored in the module namespace. Subsequent imports re-use the already created objects.
The .pyc file is only used the next time the module is imported for the first time that Python session. Replacing the file will not result in a module reload.
You can use the reload() function to force Python to replace an already-loaded module with fresh code from disk. Note that any and all other direct references to a class are not replaced; an instance of the testClass class (tm in your case) would still reference the old class object.
When developing code, it is often just easier to restart the Python interpreter and start afresh. That way you don't have to worry about hunting down all direct references and replacing those, for example.
testModule is already loaded in your interpreter. Deleting the pyc file won't change anything. You will need to do reload(testModule), or even better restart the interpreter.
Deleting the .pyc file cannot do the change in your case. When you import a module for the first time on the interpreter, it gets completely loaded on the interpreter and deleting the files or modifying won't change anything.
Better restart the interpreter or use the built-in reload function.
I am trying to save a dictionary that contains a lambda function in django.core.cache. The example below fails silently.
from django.core.cache import cache
cache.set("lambda", {"name": "lambda function", "function":lambda x: x+1})
cache.get("lambda")
#None
I am looking for an explanation for this behaviour. Also, I would like to know if there is a workaround without using def.
The example below fails silently.
No, it doesn't. The cache.set() call should give you an error like:
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
Why? Internally, Django is using Python's pickle library to serialize the value you are attempting to store in cache. When you want to pull it out of cache again with your cache.get() call, Django needs to know exactly how to reconstruct the cached value. And due to this desire not to lose information or incorrectly/improperly reconstruct a cached value, there are several restrictions on what kinds of objects can be pickled. You'll note that only these types of functions may be pickled:
functions defined at the top level of a module
built-in functions defined at the top level of a module
And there is this further explanation about how pickling functions works:
Note that functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value. This means that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised.
I am having problem understand the sys.stdout and sys.stderr
The problem is how did the variable put get got into the cmd module?
My aim is to write a single function that would accept a string basically I am using it to write exception caught in my application to the screen and to a log file. I saw similar code somewhere so I decided to learn more by using the same example i saw which was a little different from my as the other person was writing to tow files simultaneously with just one function call.
According to my understanding:
The cmd module recieves a string which it then calls output module on the recieved string.
output module takes two arguements - (1 of its parameters must evalute to python standard input module object and second the a string) fine.
However, since output module calls logs module which does the printing or better still combines parameters by calling write function from python's standard output object passing it the string or text to be written.
Please if my explanation is not clear it means I am truely not understanding the whole process.
My questions is: How does put variable called outside the function got into the cmd module or any other module when I have commented it or not even called out?
Please find code below
`
import sys
def logs(prints, message):
#global put
#print prints
prints.write(message)
prints.flush()
def output(prints, message):
#global put
#logs(prints, content)
logs(prints, message)
#logs(put, via)
''' This is where the confusion is, how did put get into this function when i did
not declare it...'''
def cmd(message):
#global put
output(put, message)
output(sys.stderr, message)
put = open('think.txt', 'w')
#print put, '000000000'
cmd('Write me out to screen/file')
put.close()
`
Its because of the way that python handles scopes. When you execute the script, the logs, output and cmd functions are defined in the module namespace. Then put = open('think.txt', 'w') creates a variable called put in the module namespace.
When you call cmd, you are now executing in the function's local namespace. it is created when the function is called and destroyed when the function exits. When python hits the expression output(put, message), it needs to resolve the names output, put and message to see what to do with them. The rules for a function are that python will look for the name in the local function namespace and then fall back to the global module namespace if the variable is not found.
So, python checks the function namespace for output, doesn't find anything, looks at the module namespace and finds that output refers to a function object. It then checks the function namespace for put, doesn't find anything, looks at the module namespace and finds that put refers to an open file object. Finally, it looks up message, finds it in the function namespace (the function parameters go into the function namespace) and off it goes.
put is declared as a global variable, so when you access it from within cmd, it is accessing that global variable without you needing to declare it within the function.
For example, this code prints 5 for the same reason:
def foo():
print "bar: {0}".format(bar)
bar = 5
foo()