I found the following post extremely helpful:
How to pickle yourself?
however the limitation with this solution is that when the class is reloaded, it is not returned in its "runtime" state. i.e. it will reload all the variables etc and the general state of the class at the moment it was dumped.. but it won't continue running from that point.
Consider:
class someClass(object):
def doSomething(self):
i = 0
while i <= 20:
execute
i += 1
if i == 10:
self.dumpState()
def dumpState(self):
with open('somePickleFile','wb') as handle:
pickle.dump(self, handle)
#classmethod
def loadState(cls, file_name):
with open(file_name, 'rb') as handle:
return pickle.load(handle)
If the above is run, by creating an instance of someClass:
sC = someClass()
sC.doSomething()
sC.loadState('somePickleFile')
This does not return the class to its runtime state, it does not continue through the while loop until i == 20..
This may not be the correct approach, but I am trying to find a way to capture the runtime state of my program i.e. freeze/hibernate it, and then relaunch it after possibly moving it to another machine.. this is due to issues I have with time restrictions enforced by a queuing system on a cluster which does not support checkpointing.
That approach won't be possible with Pickle and Unpickle alone without your code being aware of it.
Pickle can save fundamental Python objects, and ordinary user classes that reference those fundamental types. But it can't freeze information of a running context as you want.
Python does allow limited (yet powerfull) ways of acessing a running code context trough its frame objects - you can get a frame object with a call to "inspect.currentframe" in the inspect module. This will allow you to see the current running line of code, local variables, content of local variables, and so on -- but there is no way inside pure-python, without resorting to raw memory manipulation of the Python interpreter's data structures to rebuild a mid-execution frame object and jump execution to there.
So - for that approach it would be better to "freeze" the entire process and it's memory data structures using an O.S. way to do that (probably there is a way to that in Linux and it should work with no file/file like resources in use by the process).
Or, from within Python, like you want, you have to keep "book check" of all your state data in a manner that Pickle would be able to "see it". In your basic example, you should refactor your code to something like:
class someClass(object):
def setup(self):
self.i = 0
def doSomething(self):
while self.i <= 20:
execute
i += 1
if i == 10:
self.dumpState()
...
#classmethod
def loadState(cls, file_name):
with open(file_name, 'rb') as handle:
self = pickle.load(handle)
if self.i <= 20: # or other check for "running context"
return self.doSomething()
The fundamental difference here is the book-keeping of the otherwise local "i" varianble as an object variable, and separate the initization code. In this way, all the state needed to continue the execution - for this small example - is recorded on the object attributes - which can be properly pickled.
loadState is a classmethod returning a new instance of someClass (or something else pickled into the file). So you should write instead:
sC = someClass()
sC.doSomething()
sC = someClass.loadState('somePickleFile')
I believe pickle only keeps the attribute values of the instance, not the internal state of any methods executing. It will not save the fact that a method was executing, and it won't save the values of the local variables, like i in your example.
Related
I have a class with a method which modifies its internal state, for instance:
class Example():
def __init__(self, value):
self.param = value
def example_method(self, m):
self.param = self.param * m
# By convention, these methods in my implementation return the object itself
return self
I wanna run example_method in parallel (I am using the mpire lib, but other options are welcome as well), for many instances of Example, and have their internal states altered in my instances. Something like:
import mpire
list_of_instances = [Example(i) for i in range(1, 6)]
def run_method(ex):
ex.example_method(10)
print("Before parallel calls, this should print <1>}")
print(f"<{list_of_instances[0]}>")
with mpire.WorkerPool(n_jobs=3) as pool:
pool.map_unordered(run_method, [(example,) for example in list_of_instances])
print("After parallel calls, this should print <10>}")
print(f"<{list_of_instances[0]}>")
However, the way that mpire works, what is being modified are copies of example, and not the objects within list_of_instances, making any changes to internal state not being kept after the parallel processing. So the second print will print <1> instead, because that object`s internal state was not changed, a copy of it was.
I am wondering if there are any solutions to have the internal state changes be applied to the original objects in list_of_instances.
The only solutions I can think about is:
replace list_of_instances by the result of pool.map_unordered (changing to pool.map_ordered if order is important).
Since in any other case (even when using shared_objects) I have a copy of the original objects being made, resulting in the state changes being lost.
Is there any way to solve this with parallel processing? I also accept answers using other libs.
I am using the Pool class from python's multiprocessing library write a program that will run on an HPC cluster.
Here is an abstraction of what I am trying to do:
def myFunction(x):
# myObject is a global variable in this case
return myFunction2(x, myObject)
def myFunction2(x,myObject):
myObject.modify() # here I am calling some method that changes myObject
return myObject.f(x)
poolVar = Pool()
argsArray = [ARGS ARRAY GOES HERE]
output = poolVar.map(myFunction, argsArray)
The function f(x) is contained in a *.so file, i.e., it is calling a C function.
The problem I am having is that the value of the output variable is different each time I run my program (even though the function myObject.f() is a deterministic function). (If I only have one process then the output variable is the same each time I run the program.)
I have tried creating the object rather than storing it as a global variable:
def myFunction(x):
myObject = createObject()
return myFunction2(x, myObject)
However, in my program the object creation is expensive, and thus, it is a lot easier to create myObject once and then modify it each time I call myFunction2(). Thus, I would like to not have to create the object each time.
Do you have any tips? I am very new to parallel programming so I could be going about this all wrong. I decided to use the Pool class since I wanted to start with something simple. But I am willing to try a better way of doing it.
I am using the Pool class from python's multiprocessing library to do
some shared memory processing on an HPC cluster.
Processes are not threads! You cannot simply replace Thread with Process and expect all to work the same. Processes do not share memory, which means that the global variables are copied, hence their value in the original process doesn't change.
If you want to use shared memory between processes then you must use the multiprocessing's data types, such as Value, Array, or use the Manager to create shared lists etc.
In particular you might be interested in the Manager.register method, which allows the Manager to create shared custom objects(although they must be picklable).
However I'm not sure whether this will improve the performance. Since any communication between processes requires pickling, and pickling takes usually more time then simply instantiating the object.
Note that you can do some initialization of the worker processes passing the initializer and initargs argument when creating the Pool.
For example, in its simplest form, to create a global variable in the worker process:
def initializer():
global data
data = createObject()
Used as:
pool = Pool(4, initializer, ())
Then the worker functions can use the data global variable without worries.
Style note: Never use the name of a built-in for your variables/modules. In your case object is a built-in. Otherwise you'll end up with unexpected errors which may be obscure and hard to track down.
Global keyword works on the same file only. Another way is to set value dynamically in pool process initialiser, somefile.py can just be an empty file:
import importlib
def pool_process_init():
m = importlib.import_module("somefile.py")
m.my_global_var = "some value"
pool = Pool(4, initializer=pool_process_init)
How to use the var in task:
def my_coroutine():
m = importlib.import_module("somefile.py")
print(m.my_global_var)
Is it possible to call a function in a kind of protected environment with the following feature: if calling function f raises an exception, then make sure all (outer) variables are restored to their previous values.
For instance, the following code:
a = 42
def f():
global a
a += 1
error
f()
will obviously set a to 43 before raising the exception. I would like to build some try/except structure for calling f() where the exception would restore local variables to their previous state.
Of course I thought to something related to sys._getframe(1).f_locals. Is it possible? Would it be portable accross different versions of Python? etc.
No major goal right now; just curious about that idea.
Short answer is no, there's no snapshot feature to these executions and thus no way of reverting the variables.
However there are some things you can do. One of them being:
(And I'm writing this as I go so this will be resource exhausting way to solve your problem if you use it on large variables.)
from pickle import load, dump
def snapshot(v):
with open('snapshot.bin', 'wb') as fh:
dump(v, fh)
def restore():
with open('snapshot.bin', 'rb') as fh:
v = load(fh)
return v
a = 42
snapshot(a)
def f():
global a
a += 1
error
try:
f()
except:
a = restore()
If this were a class with initated values, you could also snapshot the entire class or peak inside it and pull out certain variables. But there's no way to automatically do these things for you.
Of course this requires you to know a head of time what variables will be affected, I'm not sure there is a way to "peak inside" a function and see what variable names will be used, and even then you'd have to use a traceback call to see on which row your got the error and restored based on that.
One way I would solve it, is to store all my critical variables in a dictionary and snapshot branches of that dictionary or the entire dictionary itself.
I have to open a file-like object in python (it's a serial connection through /dev/) and then close it. This is done several times in several methods of my class. How I WAS doing it was opening the file in the constructor, and then closing it in the destructor. I'm getting weird errors though and I think it has to do with the garbage collector and such, I'm still not used to not knowing exactly when my objects are being deleted =\
The reason I was doing this is because I have to use tcsetattr with a bunch of parameters each time I open it and it gets annoying doing all that all over the place. So I want to implement an inner class to handle all that so I can use it doing
with Meter('/dev/ttyS2') as m:
I was looking online and I couldn't find a really good answer on how the with syntax is implemented. I saw that it uses the __enter__(self) and __exit(self)__ methods. But is all I have to do implement those methods and I can use the with syntax? Or is there more to it?
Is there either an example on how to do this or some documentation on how it's implemented on file objects already that I can look at?
Those methods are pretty much all you need for making the object work with with statement.
In __enter__ you have to return the file object after opening it and setting it up.
In __exit__ you have to close the file object. The code for writing to it will be in the with statement body.
class Meter():
def __init__(self, dev):
self.dev = dev
def __enter__(self):
#ttysetattr etc goes here before opening and returning the file object
self.fd = open(self.dev, MODE)
return self
def __exit__(self, type, value, traceback):
#Exception handling here
close(self.fd)
meter = Meter('dev/tty0')
with meter as m:
#here you work with the file object.
m.fd.read()
Easiest may be to use standard Python library module contextlib:
import contextlib
#contextlib.contextmanager
def themeter(name):
theobj = Meter(name)
try:
yield theobj
finally:
theobj.close() # or whatever you need to do at exit
# usage
with themeter('/dev/ttyS2') as m:
# do what you need with m
m.read()
This doesn't make Meter itself a context manager (and therefore is non-invasive to that class), but rather "decorates" it (not in the sense of Python's "decorator syntax", but rather almost, but not quite, in the sense of the decorator design pattern;-) with a factory function themeter which is a context manager (which the contextlib.contextmanager decorator builds from the "single-yield" generator function you write) -- this makes it so much easier to separate the entering and exiting condition, avoids nesting, &c.
The first Google hit (for me) explains it simply enough:
http://effbot.org/zone/python-with-statement.htm
and the PEP explains it more precisely (but also more verbosely):
http://www.python.org/dev/peps/pep-0343/
I am developing a medium size program in python spread across 5 modules. The program accepts command line arguments using OptionParser in the main module e.g. main.py. These options are later used to determine how methods in other modules behave (e.g. a.py, b.py). As I extend the ability for the user to customise the behaviour or the program I find that I end up requiring this user-defined parameter in a method in a.py that is not directly called by main.py, but is instead called by another method in a.py:
main.py:
import a
p = some_command_line_argument_value
a.meth1(p)
a.py:
meth1(p):
# some code
res = meth2(p)
# some more code w/ res
meth2(p):
# do something with p
This excessive parameter passing seems wasteful and wrong, but has hard as I try I cannot think of a design pattern that solves this problem. While I had some formal CS education (minor in CS during my B.Sc.), I've only really come to appreciate good coding practices since I started using python. Please help me become a better programmer!
Create objects of types relevant to your program, and store the command line options relevant to each in them. Example:
import WidgetFrobnosticator
f = WidgetFrobnosticator()
f.allow_oncave_widgets = option_allow_concave_widgets
f.respect_weasel_pins = option_respect_weasel_pins
# Now the methods of WidgetFrobnosticator have access to your command-line parameters,
# in a way that's not dependent on the input format.
import PlatypusFactory
p = PlatypusFactory()
p.allow_parthenogenesis = option_allow_parthenogenesis
p.max_population = option_max_population
# The platypus factory knows about its own options, but not those of the WidgetFrobnosticator
# or vice versa. This makes each class easier to read and implement.
Maybe you should organize your code more into classes and objects? As I was writing this, Jimmy showed a class-instance based answer, so here is a pure class-based answer. This would be most useful if you only ever wanted a single behavior; if there is any chance at all you might want different defaults some of the time, you should use ordinary object-oriented programming in Python, i.e. pass around class instances with the property p set in the instance, not the class.
class Aclass(object):
p = None
#classmethod
def init_p(cls, value):
p = value
#classmethod
def meth1(cls):
# some code
res = cls.meth2()
# some more code w/ res
#classmethod
def meth2(cls):
# do something with p
pass
from a import Aclass as ac
ac.init_p(some_command_line_argument_value)
ac.meth1()
ac.meth2()
If "a" is a real object and not just a set of independent helper methods, you can create an "p" member variable in "a" and set it when you instantiate an "a" object. Then your main class will not need to pass "p" into meth1 and meth2 once "a" has been instantiated.
[Caution: my answer isn't specific to python.]
I remember that Code Complete called this kind of parameter a "tramp parameter". Googling for "tramp parameter" doesn't return many results, however.
Some alternatives to tramp parameters might include:
Put the data in a global variable
Put the data in a static variable of a class (similar to global data)
Put the data in an instance variable of a class
Pseudo-global variable: hidden behind a singleton, or some dependency injection mechanism
Personally, I don't mind a tramp parameter as long as there's no more than one; i.e. your example is OK for me, but I wouldn't like ...
import a
p1 = some_command_line_argument_value
p2 = another_command_line_argument_value
p3 = a_further_command_line_argument_value
a.meth1(p1, p2, p3)
... instead I'd prefer ...
import a
p = several_command_line_argument_values
a.meth1(p)
... because if meth2 decides that it wants more data than before, I'd prefer if it could extract this extra data from the original parameter which it's already being passed, so that I don't need to edit meth1.
With objects, parameter lists should normally be very small, since most appropriate information is a property of the object itself. The standard way to handle this is to configure the object properties and then call the appropriate methods of that object. In this case set p as an attribute of a. Your meth2 should also complain if p is not set.
Your example is reminiscent of the code smell Message Chains. You may find the corresponding refactoring, Hide Delegate, informative.