Python Multiprocessing-- Variable not being updated [duplicate] - python

I am trying to return values from subprocesses but these values are unfortunately unpicklable. So I used global variables in threads module with success but have not been able to retrieve updates done in subprocesses when using multiprocessing module. I hope I'm missing something.
The results printed at the end are always the same as initial values given the vars dataDV03 and dataDV04. The subprocesses are updating these global variables but these global variables remain unchanged in the parent.
import multiprocessing
# NOT ABLE to get python to return values in passed variables.
ants = ['DV03', 'DV04']
dataDV03 = ['', '']
dataDV04 = {'driver': '', 'status': ''}
def getDV03CclDrivers(lib): # call global variable
global dataDV03
dataDV03[1] = 1
dataDV03[0] = 0
# eval( 'CCL.' + lib + '.' + lib + '( "DV03" )' ) these are unpicklable instantiations
def getDV04CclDrivers(lib, dataDV04): # pass global variable
dataDV04['driver'] = 0 # eval( 'CCL.' + lib + '.' + lib + '( "DV04" )' )
if __name__ == "__main__":
jobs = []
if 'DV03' in ants:
j = multiprocessing.Process(target=getDV03CclDrivers, args=('LORR',))
jobs.append(j)
if 'DV04' in ants:
j = multiprocessing.Process(target=getDV04CclDrivers, args=('LORR', dataDV04))
jobs.append(j)
for j in jobs:
j.start()
for j in jobs:
j.join()
print 'Results:\n'
print 'DV03', dataDV03
print 'DV04', dataDV04
I cannot post to my question so will try to edit the original.
Here is the object that is not picklable:
In [1]: from CCL import LORR
In [2]: lorr=LORR.LORR('DV20', None)
In [3]: lorr
Out[3]: <CCL.LORR.LORR instance at 0x94b188c>
This is the error returned when I use a multiprocessing.Pool to return the instance back to the parent:
Thread getCcl (('DV20', 'LORR'),)
Process PoolWorker-1:
Traceback (most recent call last):
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/pool.py", line 71, in worker
put((job, i, result))
File "/alma/ACS-10.1/casa/lib/python2.6/multiprocessing/queues.py", line 366, in put
return send(obj)
UnpickleableError: Cannot pickle <type 'thread.lock'> objects
In [5]: dir(lorr)
Out[5]:
['GET_AMBIENT_TEMPERATURE',
'GET_CAN_ERROR',
'GET_CAN_ERROR_COUNT',
'GET_CHANNEL_NUMBER',
'GET_COUNT_PER_C_OP',
'GET_COUNT_REMAINING_OP',
'GET_DCM_LOCKED',
'GET_EFC_125_MHZ',
'GET_EFC_COMB_LINE_PLL',
'GET_ERROR_CODE_LAST_CAN_ERROR',
'GET_INTERNAL_SLAVE_ERROR_CODE',
'GET_MAGNITUDE_CELSIUS_OP',
'GET_MAJOR_REV_LEVEL',
'GET_MINOR_REV_LEVEL',
'GET_MODULE_CODES_CDAY',
'GET_MODULE_CODES_CMONTH',
'GET_MODULE_CODES_DIG1',
'GET_MODULE_CODES_DIG2',
'GET_MODULE_CODES_DIG4',
'GET_MODULE_CODES_DIG6',
'GET_MODULE_CODES_SERIAL',
'GET_MODULE_CODES_VERSION_MAJOR',
'GET_MODULE_CODES_VERSION_MINOR',
'GET_MODULE_CODES_YEAR',
'GET_NODE_ADDRESS',
'GET_OPTICAL_POWER_OFF',
'GET_OUTPUT_125MHZ_LOCKED',
'GET_OUTPUT_2GHZ_LOCKED',
'GET_PATCH_LEVEL',
'GET_POWER_SUPPLY_12V_NOT_OK',
'GET_POWER_SUPPLY_15V_NOT_OK',
'GET_PROTOCOL_MAJOR_REV_LEVEL',
'GET_PROTOCOL_MINOR_REV_LEVEL',
'GET_PROTOCOL_PATCH_LEVEL',
'GET_PROTOCOL_REV_LEVEL',
'GET_PWR_125_MHZ',
'GET_PWR_25_MHZ',
'GET_PWR_2_GHZ',
'GET_READ_MODULE_CODES',
'GET_RX_OPT_PWR',
'GET_SERIAL_NUMBER',
'GET_SIGN_OP',
'GET_STATUS',
'GET_SW_REV_LEVEL',
'GET_TE_LENGTH',
'GET_TE_LONG_FLAG_SET',
'GET_TE_OFFSET_COUNTER',
'GET_TE_SHORT_FLAG_SET',
'GET_TRANS_NUM',
'GET_VDC_12',
'GET_VDC_15',
'GET_VDC_7',
'GET_VDC_MINUS_7',
'SET_CLEAR_FLAGS',
'SET_FPGA_LOGIC_RESET',
'SET_RESET_AMBSI',
'SET_RESET_DEVICE',
'SET_RESYNC_TE',
'STATUS',
'_HardwareDevice__componentName',
'_HardwareDevice__hw',
'_HardwareDevice__stickyFlag',
'_LORRBase__logger',
'__del__',
'__doc__',
'__init__',
'__module__',
'_devices',
'clearDeviceCommunicationErrorAlarm',
'getControlList',
'getDeviceCommunicationErrorCounter',
'getErrorMessage',
'getHwState',
'getInternalSlaveCanErrorMsg',
'getLastCanErrorMsg',
'getMonitorList',
'hwConfigure',
'hwDiagnostic',
'hwInitialize',
'hwOperational',
'hwSimulation',
'hwStart',
'hwStop',
'inErrorState',
'isMonitoring',
'isSimulated']
In [6]:

When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.
Additionally, most of the abstractions that multiprocessing provides use pickle to transfer data. All data transferred using proxies must be pickleable; that includes all the objects that a Manager provides. Relevant quotations (my emphasis):
Ensure that the arguments to the methods of proxies are picklable.
And (in the Manager section):
Other processes can access the shared objects by using proxies.
Queues also require pickleable data; the docs don't say so, but a quick test confirms it:
import multiprocessing
import pickle
class Thing(object):
def __getstate__(self):
print 'got pickled'
return self.__dict__
def __setstate__(self, state):
print 'got unpickled'
self.__dict__.update(state)
q = multiprocessing.Queue()
p = multiprocessing.Process(target=q.put, args=(Thing(),))
p.start()
print q.get()
p.join()
Output:
$ python mp.py
got pickled
got unpickled
<__main__.Thing object at 0x10056b350>
The one approach that might work for you, if you really can't pickle the data, is to find a way to store it as a ctype object; a reference to the memory can then be passed to a child process. This seems pretty dodgy to me; I've never done it. But it might be a possible solution for you.
Given your update, it seems like you need to know a lot more about the internals of a LORR. Is LORR a class? Can you subclass from it? Is it a subclass of something else? What's its MRO? (Try LORR.__mro__ and post the output if it works.) If it's a pure python object, it might be possible to subclass it, creating a __setstate__ and a __getstate__ to enable pickling.
Another approach might be to figure out how to get the relevant data out of a LORR instance and pass it via a simple string. Since you say that you really just want to call the methods of the object, why not just do so using Queues to send messages back and forth? In other words, something like this (schematically):
Main Process Child 1 Child 2
LORR 1 LORR 2
child1_in_queue -> get message 'foo'
call 'foo' method
child1_out_queue <- return foo data string
child2_in_queue -> get message 'bar'
call 'bar' method
child2_out_queue <- return bar data string

#DBlas gives you a quick url and reference to the Manager class in an answer, but I think its still a bit vague so I thought it might be helpful for you to just see it applied...
import multiprocessing
from multiprocessing import Manager
ants = ['DV03', 'DV04']
def getDV03CclDrivers(lib, data_dict):
data_dict[1] = 1
data_dict[0] = 0
def getDV04CclDrivers(lib, data_list):
data_list['driver'] = 0
if __name__ == "__main__":
manager = Manager()
dataDV03 = manager.list(['', ''])
dataDV04 = manager.dict({'driver': '', 'status': ''})
jobs = []
if 'DV03' in ants:
j = multiprocessing.Process(
target=getDV03CclDrivers,
args=('LORR', dataDV03))
jobs.append(j)
if 'DV04' in ants:
j = multiprocessing.Process(
target=getDV04CclDrivers,
args=('LORR', dataDV04))
jobs.append(j)
for j in jobs:
j.start()
for j in jobs:
j.join()
print 'Results:\n'
print 'DV03', dataDV03
print 'DV04', dataDV04
Because multiprocessing actually uses separate processes, you cannot simply share global variables because they will be in completely different "spaces" in memory. What you do to a global under one process will not reflect in another. Though I admit that it seems confusing since the way you see it, its all living right there in the same piece of code, so "why shouldn't those methods have access to the global"? Its harder to wrap your head around the idea that they will be running in different processes.
The Manager class is given to act as a proxy for data structures that can shuttle info back and forth for you between processes. What you will do is create a special dict and list from a manager, pass them into your methods, and operate on them locally.
Un-pickle-able data
For your specialize LORR object, you might need to create something like a proxy that can represent the pickable state of the instance.
Not super robust or tested much, but gives you the idea.
class LORRProxy(object):
def __init__(self, lorrObject=None):
self.instance = lorrObject
def __getstate__(self):
# how to get the state data out of a lorr instance
inst = self.instance
state = dict(
foo = inst.a,
bar = inst.b,
)
return state
def __setstate__(self, state):
# rebuilt a lorr instance from state
lorr = LORR.LORR()
lorr.a = state['foo']
lorr.b = state['bar']
self.instance = lorr

When using multiprocess, the only way to pass objects between processes is to use Queue or Pipe; globals are not shared. Objects must be pickleable, so multiprocess won't help you here.

You could also use a multiprocessing Array. This allows you to have a shared state between processes and is probably the closest thing to a global variable.
At the top of main, declare an Array. The first argument 'i' says it will be integers. The second argument gives the initial values:
shared_dataDV03 = multiprocessing.Array ('i', (0, 0)) #a shared array
Then pass this array to the process as an argument:
j = multiprocessing.Process(target=getDV03CclDrivers, args=('LORR',shared_dataDV03))
You have to receive the array argument in the function being called, and then you can modify it within the function:
def getDV03CclDrivers(lib,arr): # call global variable
arr[1]=1
arr[0]=0
The array is shared with the parent, so you can print out the values at the end in the parent:
print 'DV03', shared_dataDV03[:]
And it will show the changes:
DV03 [0, 1]

I use p.map() to spin off a number of processes to remote servers and print the results when they come back at unpredictable times:
Servers=[...]
from multiprocessing import Pool
p=Pool(len(Servers))
p.map(DoIndividualSummary, Servers)
This worked fine if DoIndividualSummary used print for the results, but the overall result was in unpredictable order, which made interpretation difficult. I tried a number of approaches to use global variables but ran into problems. Finally, I succeeded with sqlite3.
Before p.map(), open a sqlite connection and create a table:
import sqlite3
conn=sqlite3.connect('servers.db') # need conn for commit and close
db=conn.cursor()
try: db.execute('''drop table servers''')
except: pass
db.execute('''CREATE TABLE servers (server text, serverdetail text, readings text)''')
conn.commit()
Then, when returning from DoIndividualSummary(), save the results into the table:
db.execute('''INSERT INTO servers VALUES (?,?,?)''', (server,serverdetail,readings))
conn.commit()
return
After the map() statement, print the results:
db.execute('''select * from servers order by server''')
rows=db.fetchall()
for server,serverdetail,readings in rows: print serverdetail,readings
May seem like overkill but it was simpler for me than the recommended solutions.

Related

Is there a way to sync a serializable structure with python multiprocessing?

If you create a new Process in python, it will serialize and copy the entire available scope, as far as I understand it. If you use multiprocessing.Pipe() it also allows sending various things, not just raw bytes.
However, instead of sending, I simply want to update a variable that contains a simple POD object like this:
class MyStats:
def __init__(self):
self.bytes_read = 0
self.bytes_written = 0
So say that in a process, when I update these stats, I want to tell python to serialize it and send it to the parent process' side somehow. I don't want to have to create multiprocessing.Value for each and every one of these things, that sounds super tedious.
Is there a way to tell python to pass and overwrite a specific object property somehow?
A manager is what you need here: it will be slower but all data stored inside will be automatically synced with other processes. Here is a simple example below:
from multiprocessing.managers import BaseManager, public_methods, NamespaceProxy
from multiprocessing import Process
def make_proxy(name, cls, base=None):
"""
Args:
name : A string that should match the variable name the proxy will be assigned to
cls : The class for which you want to create a proxy for
base : If you are subclassing NamespaceProxy (or any other implementation) and want to use that subclass as the
base for this new proxy, then pass the subclass as the base using this argument
"""
exposed = public_methods(cls) + ['__getattribute__', '__setattr__', '__delattr__']
return _MakeProxyType(name, exposed, base)
def _MakeProxyType(name, exposed, base=None):
"""
Attempts to replicate multiprocessing.managers.MakeProxType properly
"""
if base is None:
base = NamespaceProxy
exposed = tuple(exposed)
dic = {}
for meth in exposed:
if hasattr(base, meth):
continue
exec('''def %s(self, *args, **kwds):
return self._callmethod(%r, args, kwds)''' % (meth, meth), dic)
ProxyType = type(name, (base,), dic)
ProxyType._exposed_ = exposed
return ProxyType
class MyStats:
def __init__(self):
self.bytes_read = 0
self.bytes_written = 0
def worker(my_stats):
my_stats.bytes_read = 100
print("Worker process read 100 bytes!")
# Remember to set the name of the variable and the "name" argument to the same value otherwise you will have trouble
# pickling this. If for some reason you cannot do this then you must change the variable's __qualname__ property to
# reflect where the object actually resides so pickle can find it.
MyStatsProxy = make_proxy('MyStatsProxy', MyStats)
if __name__ == "__main__":
# Register our proxy and start the manager process
BaseManager.register("MyStats", MyStats, MyStatsProxy)
manager = BaseManager()
manager.start()
# Create our shared instance and modify it from another process
my_stats = manager.MyStats()
p = Process(target=worker, args=(my_stats,))
p.start()
p.join()
# Check value from main process
print(f"In main process, bytes read are {my_stats.bytes_read}!")
Output
Worker process read 100 bytes!
In main process, bytes read are 100!
Check this question and its answers for more useful information about managers/registering classes and alternate methods to achieve the same result
Note: Keep in mind that managers return pickled values for all objects you access through it. So any modifications you do on mutable objects should be done from within an instance method rather than requesting the mutable object through the proxy and modifying it from outside. For example, doing below will not modify the attribute some_list in the manager at all, only the local copy (to the process) of this attribute will be modified:
my_stats.some_list[0] = "some value"
Instead, you should create an instance method for modifications and call that instead:
my_stats.modify_list(0, "some value")
Alternatively, you can also force the manager to update the mutable object by re-assigning the new value for the object:
local_copy = my_stats.some_list
local_copy[0] = "some value"
my_stats.some_list = local_copy

Python multiprocessing.Pool.apply_async() not executing class function

In a custom class I have the following code:
class CustomClass():
triggerQueue: multiprocessing.Queue
def __init__(self):
self.triggerQueue = multiprocessing.Queue()
def poolFunc(queueString):
print(queueString)
def listenerFunc(self):
pool = multiprocessing.Pool(5)
while True:
try:
queueString = self.triggerQueue.get_nowait()
pool.apply_async(func=self.poolFunc, args=(queueString,))
except queue.Empty:
break
What I intend to do is:
add a trigger to the queue (not implemented in this snippet) -> works as intended
run an endless loop within the listenerFunc that reads all triggers from the queue (if any are found) -> works as intended
pass trigger to poolFunc which is to be executed asynchronosly -> not working
It works as soon as I source my poolFun() outside of the class like
def poolFunc(queueString):
print(queueString)
class CustomClass():
[...]
But why is that so? Do I have to pass the self argument somehow? Is it impossible to perform it this way in general?
Thank you for any hint!
There are several problems going on here.
Your instance method, poolFunc, is missing a self parameter.
You are never properly terminating the Pool. You should take advantage of the fact that a multiprocessing.Pool object is a context manager.
You're calling apply_async, but you're never waiting for the results. Read the documentation: you need to call the get method on the AsyncResult object to receive the result; if you don't do this before your program exits your poolFunc function may never run.
By making the Queue object part of your class, you won't be able to pass instance methods to workers.
We can fix all of the above like this:
import multiprocessing
import queue
triggerQueue = multiprocessing.Queue()
class CustomClass:
def poolFunc(self, queueString):
print(queueString)
def listenerFunc(self):
results = []
with multiprocessing.Pool(5) as pool:
while True:
try:
queueString = triggerQueue.get_nowait()
results.append(pool.apply_async(self.poolFunc, (queueString,)))
except queue.Empty:
break
for res in results:
print(res.get())
c = CustomClass()
for i in range(10):
triggerQueue.put(f"testval{i}")
c.listenerFunc()
You can, as you mention, also replace your instance method with a static method, in which case we can keep triggerQueue as part of the class:
import multiprocessing
import queue
class CustomClass:
def __init__(self):
self.triggerQueue = multiprocessing.Queue()
#staticmethod
def poolFunc(queueString):
print(queueString)
def listenerFunc(self):
results = []
with multiprocessing.Pool(5) as pool:
while True:
try:
queueString = self.triggerQueue.get_nowait()
results.append(pool.apply_async(self.poolFunc, (queueString,)))
except queue.Empty:
break
for r in results:
print(r.get())
c = CustomClass()
for i in range(10):
c.triggerQueue.put(f"testval{i}")
c.listenerFunc()
But we still need to reap the pool_async results.
Okay, I found an answer and a workaround:
the answer is based the anser of noxdafox to this question.
Instance methods cannot be serialized that easily. What the Pickle protocol does when serialising a function is simply turning it into a string.
For a child process would be quite hard to find the right object your instance method is referring to due to separate process address spaces.
A functioning workaround is to declare the poolFunc() as static function like
#staticmethod
def poolFunc(queueString):
print(queueString)

Use threads to speed up slow initialization of context manager

I am consuming from a python2.7 module which I cannot easily change. This module has a factory for a class that is a context manager. This works great for managing the lifetime of the object, but initializing this class involves waiting on creation of a cloud based resource and it can take many minutes to run. Within the scope of my program I need to create two of the objects such as the following:
import immutable_module
cloud_resource_name1 = "load_ai_model"
cloud_resource_name2 = "create_input_output_model"
with immutable_module.create_remote_cloud_resource(cloud_resource_name1) as a, \
immutable_module.create_remote_cloud_resource(cloud_resource_name2) as b:
result = __do_imporant_thing(a, b)
print(result)
Is there a way call these two contextmanagers concurrently using threads to speed up the load time?
After doing more research I found a solution, but it may not be the best answer. The solution relies on calling __enter__ and __exit__ manually for the contextmanager.
import immutable_module
cloud_resource_name1 = "load_ai_model"
cloud_resource_name2 = "create_input_output_model"
a_context = immutable_module.create_remote_cloud_resource(cloud_resource_name1)
b_context = immutable_module.create_remote_cloud_resource(cloud_resource_name2)
a = None
b = None
try:
def __set_a():
global a
a = a_context.__enter__()
def __set_b():
global b
b = b_context.__enter__()
a_thread = Thread(target=__set_a)
b_thread = Thread(target=__set_b)
a_thread.start()
b_thread.start()
a_thread.join()
b_thread.join()
result = __do_imporant_thing(a, b)
finally:
try:
if a is not None:
a_context.__exit__(*sys.exc_info())
finally:
if b is not None:
b_context.__exit__(*sys.exc_info())
print(result)
This is really dirty, not generic at all, and there are potential issues with exception handling, but it works.

Passing object references in a queue across processes

I have several multiprocessing.Processes and would like them to consume (queue get()) callable non-picklable objects and call them. These were created before the fork(), so they shouldn't need pickling.
Using multiprocessing.Queue doesn't work as it tries to pickle everything:
import multiprocessing as mp
# create non-global callable to make it unpicklable
def make_callable():
def foo():
print("running foo")
return foo
def bar():
print("running bar")
def runall(q):
while True:
c = q.get()
if c is None:
break
c()
if __name__ == '__main__':
q = mp.Queue()
call = make_callable()
p = mp.Process(target=runall, args=(q,))
p.start()
q.put(bar)
q.put(call)
q.put(None)
p.join()
running bar
Traceback (most recent call last):
File "/usr/lib64/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib64/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'make_callable.<locals>.foo'
An implementation equivalent would be putting all objects into a global (or passed) list and passing just indexes, which works:
import multiprocessing as mp
# create non-global callable to make it unpicklable
def make_callable():
def foo():
print("running foo")
return foo
def bar():
print("running bar")
def runall(q, everything):
while True:
c = q.get()
if c is None:
break
everything[c]()
if __name__ == '__main__':
q = mp.Queue()
call = make_callable()
everything = [bar, call]
p = mp.Process(target=runall, args=(q,everything))
p.start()
q.put(0)
q.put(1)
q.put(None)
p.join()
running bar
running foo
The problem is that while I know that none of the callables passed will be garbage collected (and thus their addresses will stay valid), I do not have the full list beforehand.
I also know I could probably use multiprocessing.Manager and its Queue implementation using a Proxy object, but this seems like a lot of overhead, especially as in the real implementation I would be passing other picklable data as well.
Is there a way to pickle and pass only the address reference to an object, shared across multiple processes?
Thanks!
True that Process' target objects must be pickable.
Note that functions (built-in and user-defined) are pickled by “fully
qualified” name reference, not by value.This means that only the
function name is pickled, along with the name of the module the
function is defined in. Neither the function’s code, nor any of its
function attributes are pickled. Thus the defining module must be
importable in the unpickling environment, and the module must contain
the named object, otherwise an exception will be raised.
Picklable functions and classes must be defined in the top level of a module.
So in your case you need to proceed with passing top-level callables but applying additional checks/workarounds in the crucial runall function:
import multiprocessing as mp
# create non-global callable to make it unpicklable
def make_callable():
def foo():
print("running foo")
return foo
def bar():
print("running bar")
def runall(q):
while True:
c = q.get()
if c is None:
break
res = c()
if callable(res): res()
if __name__ == '__main__':
q = mp.Queue()
p = mp.Process(target=runall, args=(q,))
p.start()
q.put(bar)
q.put(make_callable)
q.put(None)
p.join()
q.close()
The output:
running bar
running foo
After a bit of thinking and searching, I believe I have the answer I was looking for, mostly from: Get object by id()?.
I could pass an id() of the callable and then translate it back in the spawned process:
import ctypes
a = "hello world"
print ctypes.cast(id(a), ctypes.py_object).value
Or use the gc module and, as long as I keep a reference to the object alive, that should work too:
import gc
def objects_by_id(id_):
for obj in gc.get_objects():
if id(obj) == id_:
return obj
raise Exception("No found")
However neither of these are very clean and, in the end, it may be worth imposing a limitation of having all the callables first and just passing indexes.

Problems with session ids on parallel threads

So I am trying to use multiprocessing to iterate concurrently through files in separate folders. I have a function that calls the parallel process:
from multiprocessing.dummy import Pool
lsFolders = ['Folder1', 'Folder2']
pool = Pool( processes = 6 )
iterateThroughFiles = IterateThroughFiles() # instantiated by call to pool.map()
pool.map( iterateThroughFiles.runProcess, lsFolders )
Then I have the implementation of the IterateThroughFiles-class:
class IterateThroughFiles( object ):
def runProcess( self, folder ):
self.sessionId = uuid.uuid4()
print( self.sessionId ) # Prints a correct sessionId
logAtLevel( "INFO", "Session ID of: "
+ str( self.sessionId )
+ " has been generated for folder: "
+ folder
)
print( self.sessionId ) # Prints only the second generated
# # session id for both threads
print( folder ) # Prints the correct folder
When I generate the sessionId and print it directly after, the sessionId is correct, additionally the logAtLevel() wrapper function logs the correct value of the sessionId.
The next print statement, though, prints only the second session id and apparently the first sessionId is forgotten in the thread.
Does anyone know why this is happening? I thought when running in parallel each thread was distinct in terms of the objects it created and its memory? Is this incorrect? Does this have something to with the uuid generator?
The issue is that you are only generating one instance of IterateThroughFiles which is being used in both threads.
Instead, you want something like the following
def factory(folder):
return IterateThroughFiles().runProcess(folder)
and pass that factory function into map.
That way you will get two instances.
pool.map(iterateThroughFiles.runProcess, lsFolders)
In this line you are calling runProcess many times on a single instance of the class IterateThroughFiles. If you are thinking of each instance as a session, you need to instantiate a new object for each folder in lsFolders.
from multiprocessing.dummy import Pool
lsFolders = ['Folder1', 'Folder2']
pool = Pool(processes=6)
def worker(folder):
p = IterateThroughFiles()
p.runProcess(folder)
pool.map(worker, lsFolders)
This way, the worker function creates a NEW instance of IterateThroughFiles for each folder, and that way in the runProcess function, self refers to that individual instance, rather than re-using the same instance for each folder.

Categories