python thread pool copy parameters - python

I'm learning about multithreading and I try to implement a few things to understand it.
After reading several (and very technical topics) I cannot find a solution or way to understand my issue.
Basically, I have the following structure:
class MyObject():
def __init__():
self.lastupdate = datetime.datetime.now()
def DoThings():
...
def MyThreadFunction(OneOfMyObject):
OneOfMyObject.DoThings()
OneOfMyObject.lastupdate = datetime.datetime.now()
def main():
MyObject1 = MyObject()
MyObject2 = MyObject()
MyObjects = [MyObject1, MyObject2]
pool = Pool(2)
while True:
pool.map(MyThreadFunction, MyObjects)
if __name__ == '__main__':
main()
I think the function .map make a copy of my objects because it does not update the time. Is it right ? if yes, how could I input a Global version of my objects. If not, would you have any idea why the time is fixed in my objects ?
When I check the new time with print(MyObject.lastupdate), the time is right, but not in the next loop
Thank you very much for any of your ideas

Yes, python threading will serialize (actually, pickle) your objects and then reconstruct them in the thread. However, it also sends them back. To recover them, see the commented additions to the code below:
class MyObject():
def __init__():
self.lastupdate = datetime.datetime.now()
def DoThings():
...
def MyThreadFunction(OneOfMyObject):
OneOfMyObject.DoThings()
OneOfMyObject.lastupdate = datetime.datetime.now()
# NOW, RETURN THE OBJECT
return oneOfMyObject
def main():
MyObject1 = MyObject()
MyObject2 = MyObject()
MyObjects = [MyObject1, MyObject2]
with Pool(2) as pool: # <- this is just a neater way of doing it than a while loop for various reasons. Checkout context managers if interested.
# Now we recover a list of the updated objects:
processed_object_list = pool.map(MyThreadFunction, MyObjects)
# Now inspect
for my_object in processed_object_list:
print(my_object.lastupdate)
if __name__ == '__main__':
main()

Related

How to set an instance attribute in parallel in a Python class?

I want to set an instance attribute by running an instance method in parallel. Let's say the attribute is initially an empty dictionary called d, and I want to update it in parallel by an instance method called update_d. I am currentely using multiprocessing.Pool:
from multiprocessing import Pool
import random
class A():
def __init__(self, n_jobs):
self.d = dict()
self.n_jobs = n_jobs
pool = Pool(self.n_jobs)
pool.map(self.update_d, range(100))
pool.close()
def update_d(self, key):
self.d[key] = random.randint(0, 100)
if __name__ == '__main__':
a = A(n_jobs=4)
print(a.d)
However, the attribute is not updated after running update_d in parallel. I understand that it's because multiprocessing.Pool always folks the instance to individual processes. But I want to know what is the recommended way to do this in Python? Note that I don't want to return anything from update_d, and we can assume that the code is written in a way that the individual processes won't conflict with each other.
Edit: I just use dictionary as an example. I need a solution that allows the attribute to be any type of variable, e.g. a Pandas dataframe.
You may need a Manager to create a dict for you. I still don't know how well the updates will work, whether there will be any race conditions.
from multiprocessing import Pool, Manager
import random
class A():
def __init__(self, n_jobs, manager):
self.d = manager.dict()
self.n_jobs = n_jobs
pool = Pool(self.n_jobs)
pool.map(self.update_d, range(100))
pool.close()
def update_d(self, key):
self.d[key] = random.randint(0, 100)
if __name__ == '__main__':
with Manager() as manager:
a = A(n_jobs=4, manager=manager)
print(a.d)

How to execute AST or code object in a separate process without exceeding max recursion depth

I am trying to write a metamorphic quine. Without the "spawn" context, the subprocesses seem to inherit the stack, and so I ultimately exceed the max recursion depth. With the "spawn context," the subprocess doesn't seem to recurse. How would I go about executing the modified AST?
def main():
module = sys.modules[__name__]
source = inspect.getsource(module)
tree = ast.parse(source)
visitor = Visitor() # TODO mutate
tree = visitor.visit(tree)
tree = ast.fix_missing_locations(tree)
ctx = multiprocessing.get_context("spawn")
process = ctx.Process(target=Y, args=(tree,))
# Y() encapsulates these lines, since code objects can't be pickled
#code = compile(tree, filename="<ast>", mode='exec', optimize=2)
#process = ctx.Process(target=exec, args=(code, globals())) # locals()
process.daemon = True
process.start()
# TODO why do daemonized processes need to be joined in order to run?
process.join()
return 0
if __name__ == '__main__': exit(main())
It really is that easy. with daemon.DaemonContext(): foo()
Based on comments by #user2357112 supports Monica.
#trace
def spawn_child(f:Callable):
with daemon.DaemonContext(stdin=sys.stdin, stdout=sys.stdout): return f()
I = TypeVar('I')
def ai(f:Callable[[int,], I])->Callable[[int,], I]:
def g(*args, **kwargs)->int:
# assuming we have a higher-order function morph()
# that has a concept of eta-equivalence
# (e.g., a probabilistic notion),
# then the recursive call should be "metamorphic"
O = [morph(f), status, partial(spawn_child, f),]
i = random.randrange(0, len(O)) # TODO something magickal
return O[i]()
return g
def main()->int: return Y(ai)()
if __name__ == '__main__': exit(main())
The next problem is compiling the source for a nested function definition, since f() is not a reference to ai() but to a function defined within Y().

python - How to define function that I can use efficiently across modules and directories

I want to implement a timer to measure how long a block of code takes to run. I then want to do this across an entire application containing multiple modules (40+) across multiple directories (4+).
My timer is created with two functions that are within a class with a structure like this:
class SubClass(Class1)
def getStartTime(self):
start = time.time()
return start
def logTiming(self, classstring, start):
fin = time.time() - start
logging.getLogger('perf_log_handler').info((classstring + ' sec').format(round(fin,3)))
The first function gets the start time, and the second function calculates the time for the block to run and then logs it to a logger.
This code is in a module that we'll call module1.py.
In practice, generically, it will be implemented as such:
class SubSubClass(SubClass)
def Some_Process
stim = super().getStartTime()
code..............................
...
...
...
...
super().logTiming("The Process took: {}", stim)
return Result_Of_Process
This code resides in a module called module2.py and already works and successfully logs. My problem is that when structured like this, I can seemingly only use the timer inside code that is under the umbrella of SubClass, where it is defined (my application fails to render and I get a "can't find page" error in my browser). But I want to use this code everywhere in all the application modules, globally. Whether the module is within another directory, whether some blocks of code are within other classes and subclasses inside other modules, everywhere.
What is the easiest, most efficient way to create this timing instrument so that I can use it anywhere in my application? I understand I may have to define it completely differently. I am very new to all of this, so any help is appreciated.
OPTION 1) You should define another module, for example, "mytimer.py" fully dedicated to the timer:
import time
class MyTimer():
def __init__(self):
self.start = time.time()
def log(self):
now = time.time()
return now - self.start
And then, from any line of your code, for example, in module2.py:
from mytimer import MyTimer
class SomeClass()
def Some_Function
t = MyTimer()
....
t.log()
return ...
OPTION 2) You could also use a simple function instead of a class:
import time
def mytimer(start=None, tag=""):
if start is None:
start = time.time()
now = time.time()
delay = float(now - start)
print "%(tag)s %(delay).2f seconds." % {'tag': tag, 'delay': delay}
return now
And then, in your code:
from mytimer import mytimer
class SomeClass()
def Some_Function
t = mytimer(tag='BREAK0')
....
t = mytimer(start=t, tag='BREAK1')
....
t = mytimer(start=t, tag='BREAK2')
....
t = mytimer(start=t, tag='BREAK3')
return ...
I am not quite sure what you are looking for, but once upon a time I used a decorator for a similar type of problem.
The snippet below is the closest I can remember to what I implemented at that time. Hopefully it is useful to you.
Brief explanation
The timed is a 'decorator' that wraps methods in the python object and times the method.
The class contains a log that is updated by the wrapper as the #timed methods are called.
Note that if you want to make the #property act as a "class property" you can draw inspiration from this post.
from time import sleep, time
# -----------------
# Define Decorators
# ------------------
def timed(wrapped):
def wrapper(self, *arg, **kwargs):
start = time()
res = wrapped(self, *arg, **kwargs)
stop = time()
self.log = {'method': wrapped.__name__, 'called': start, 'elapsed': stop - start}
return res
return wrapper
# -----------------
# Define Classes
# ------------------
class Test(object):
__log = []
#property
def log(self):
return self.__log
#log.setter
def log(self, kwargs):
self.__log.append(kwargs)
#timed
def test(self):
print("Running timed method")
sleep(2)
#timed
def test2(self, a, b=2):
print("Running another timed method")
sleep(2)
return a+b
# ::::::::::::::::::
if __name__ == '__main__':
t = Test()
res = t.test()
res = t.test2(1)
print(t.log)

Apply a method to a list of objects in parallel using multi-processing

I have created a class with a number of methods. One of the methods is very time consuming, my_process, and I'd like to do that method in parallel. I came across Python Multiprocessing - apply class method to a list of objects but I'm not sure how to apply it to my problem, and what effect it will have on the other methods of my class.
class MyClass():
def __init__(self, input):
self.input = input
self.result = int
def my_process(self, multiply_by, add_to):
self.result = self.input * multiply_by
self._my_sub_process(add_to)
return self.result
def _my_sub_process(self, add_to):
self.result += add_to
list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
list_of_results = [obj.my_process(100, 1) for obj in list_of_objects] # multi-process this for-loop
print list_of_numbers
print list_of_results
[0, 1, 2, 3, 4]
[1, 101, 201, 301, 401]
I'm going to go against the grain here, and suggest sticking to the simplest thing that could possibly work ;-) That is, Pool.map()-like functions are ideal for this, but are restricted to passing a single argument. Rather than make heroic efforts to worm around that, simply write a helper function that only needs a single argument: a tuple. Then it's all easy and clear.
Here's a complete program taking that approach, which prints what you want under Python 2, and regardless of OS:
class MyClass():
def __init__(self, input):
self.input = input
self.result = int
def my_process(self, multiply_by, add_to):
self.result = self.input * multiply_by
self._my_sub_process(add_to)
return self.result
def _my_sub_process(self, add_to):
self.result += add_to
import multiprocessing as mp
NUM_CORE = 4 # set to the number of cores you want to use
def worker(arg):
obj, m, a = arg
return obj.my_process(m, a)
if __name__ == "__main__":
list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
pool = mp.Pool(NUM_CORE)
list_of_results = pool.map(worker, ((obj, 100, 1) for obj in list_of_objects))
pool.close()
pool.join()
print list_of_numbers
print list_of_results
A big of magic
I should note there are many advantages to taking the very simple approach I suggest. Beyond that it "just works" on Pythons 2 and 3, requires no changes to your classes, and is easy to understand, it also plays nice with all of the Pool methods.
However, if you have multiple methods you want to run in parallel, it can get a bit annoying to write a tiny worker function for each. So here's a tiny bit of "magic" to worm around that. Change worker() like so:
def worker(arg):
obj, methname = arg[:2]
return getattr(obj, methname)(*arg[2:])
Now a single worker function suffices for any number of methods, with any number of arguments. In your specific case, just change one line to match:
list_of_results = pool.map(worker, ((obj, "my_process", 100, 1) for obj in list_of_objects))
More-or-less obvious generalizations can also cater to methods with keyword arguments. But, in real life, I usually stick to the original suggestion. At some point catering to generalizations does more harm than good. Then again, I like obvious things ;-)
If your class is not "huge", I think process oriented is better.
Pool in multiprocessing is suggested.
This is the tutorial -> https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers
Then seperate the add_to from my_process since they are quick and you can wait util the end of the last process.
def my_process(input, multiby):
return xxxx
def add_to(result,a_list):
xxx
p = Pool(5)
res = []
for i in range(10):
res.append(p.apply_async(my_process, (i,5)))
p.join() # wait for the end of the last process
for i in range(10):
print res[i].get()
Generally the easiest way to run the same calculation in parallel is the map method of a multiprocessing.Pool (or the as_completed function from concurrent.futures in Python 3).
However, the map method applies a function that only takes one argument to an iterable of data using multiple processes.
So this function cannot be a normal method, because that requires at least two arguments; it must also include self! It could be a staticmethod, however. See also this answer for a more in-depth explanation.
Based on the answer of Python Multiprocessing - apply class method to a list of objects and your code:
add MyClass object into simulation object
class simulation(multiprocessing.Process):
def __init__(self, id, worker, *args, **kwargs):
# must call this before anything else
multiprocessing.Process.__init__(self)
self.id = id
self.worker = worker
self.args = args
self.kwargs = kwargs
sys.stdout.write('[%d] created\n' % (self.id))
run what you want in run function
def run(self):
sys.stdout.write('[%d] running ... process id: %s\n' % (self.id, os.getpid()))
self.worker.my_process(*self.args, **self.kwargs)
sys.stdout.write('[%d] completed\n' % (self.id))
Try this:
list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
list_of_sim = [simulation(id=k, worker=obj, multiply_by=100*k, add_to=10*k) \
for k, obj in enumerate(list_of_objects)]
for sim in list_of_sim:
sim.start()
If you don't absolutely need to stick with Multiprocessing module then,
it can easily achieved using concurrents.futures library
here's the example code:
from concurrent.futures.thread import ThreadPoolExecutor, wait
MAX_WORKERS = 20
class MyClass():
def __init__(self, input):
self.input = input
self.result = int
def my_process(self, multiply_by, add_to):
self.result = self.input * multiply_by
self._my_sub_process(add_to)
return self.result
def _my_sub_process(self, add_to):
self.result += add_to
list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
With ThreadPoolExecutor(MAX_WORKERS) as executor:
for obj in list_of_objects:
executor.submit(obj.my_process, 100, 1).add_done_callback(on_finish)
def on_finish(future):
result = future.result() # do stuff with your result
here executor returns future for every task it submits. keep in mind that if you use add_done_callback() finished task from thread returns to the main thread (which would block your main thread) if you really want true parallelism then you should wait for future objects separately. here's the code snippet for that.
futures = []
with ThreadPoolExecutor(MAX_WORKERS) as executor:
for objin list_of_objects:
futures.append(executor.submit(obj.my_process, 100, 1))
wait(futures)
for succeded, failed in futures:
# work with your result here
if succeded:
print (succeeeded.result())
if failed:
print (failed.result())
hope this helps.

Process containing object method doesn't recognize edit to object

I have the following situation process=Process(target=sample_object.run) I then would like to edit a property of the sample_object: sample_object.edit_property(some_other_object).
class sample_object:
def __init__(self):
self.storage=[]
def edit_property(self,some_other_object):
self.storage.append(some_other_object)
def run:
while True:
if len(self.storage) is not 0:
print "1"
#I know it's an infinite loop. It's just an example.
_______________________________________________________
from multiprocessing import Process
from sample import sample_object
from sample2 import some_other_object
class driver:
if __name__ == "__main__":
samp = sample_object()
proc = Process(target=samp.run)
proc.start()
while True:
some = some_other_object()
samp.edit_property(some)
#I know it's an infinite loop
The previous code never prints "1". How would I connect the Process to the sample_object so that an edit made to the object whose method Process is calling is recognized by the process? In other words, is there a way to get .run to recognize the change in sample_object ?
Thank you.
You can use multiprocessing.Manager to share Python data structures between processes.
from multiprocessing import Process, Manager
class A(object):
def __init__(self, storage):
self.storage = storage
def add(self, item):
self.storage.append(item)
def run(self):
while True:
if self.storage:
print 1
if __name__ == '__main__':
manager = Manager()
storage = manager.list()
a = A(storage)
p = Process(target=a.run)
p.start()
for i in range(10):
a.add({'id': i})
p.join()

Categories