Python sharing multiprocessing.Manager.list with child processes - python

My question is so simple. I want to share a multiprocessing.Manager().dict() with child processes. But shared dict will be initialized AFTER starting childs. Here is the example code:
import multiprocessing
import time
class Singleton:
_instance = None
_lock = multiprocessing.Lock()
dns_list = multiprocessing.Manager().dict()
def __new__(cls):
with cls._lock:
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def worker1(singleton_object):
i = 0
while i in range(25):
i += 1
print(singleton_object.dns_list)
time.sleep(1)
singleton = Singleton()
p1 = multiprocessing.Process(target=worker1, args=(singleton,))
p1.start()
singleton.dns_list = {'1.1.1.1': {'status': True}, '2.2.2.2': {'status': True}}
In my real code, I have multiple processes running. One of them changes the dns_list but the other ones don't recieve the updated list. I've tried to use event but didn't worked as well. I need to see that printing the variable should change to {'1.1.1.1:....} from {}. If I can run this simple code, I can manupulate it into my code :D
Thanks for every comment.

Thanks to Spencer Pollock for showing the solution. In the code I updated the dict in the wrong way.
dns_list = multiprocessing.Manager().dict()
# This is wrong because the the of the dns_list will be dict()
dns_list = ({'1.1.1.1': {'status': True}, '2.2.2.2': {'status': True}})
# This is correct way to change multiprocessing.Manager().dict() object
dns_list.update({'1.1.1.1': {'status': True}, '2.2.2.2': {'status': True}})
# To clear the list:
dns_list.clear()

Related

Why Does my Python Multiprocessing Process Subclass Return as None When Instantiated?

my goal is to create a subclass of multiprocessing.Process to execute tasks based on instructions that are fed into a multiprocessing.Queue as objects and after the task is completed, set a task attribute to indicate the success of the process and pass the task-object to another queue handling responses. Later I plan to instantiate this custom class multiple times to have the different tasks completed faster.
My code is as follows:
import multiprocessing as mp
import time
# Define task-object that should be passed through queues
class Task(object):
def __init__(self, task_type, detail=None, error=None):
self.type = task_type
self.detail = detail
self.error = error
# Define how to handle a task - simulating for now, to see that everything is passed around properly
def handle_task(task):
if task.type == 'UPDATE':
task.detail = 'updating data'
elif task.type == 'ACTUALIZE':
task.detail = 'actualizing knowledge'
else:
task.detail = 'UNKNOWN TASK TYPE'
task.error = True
# if no errors by now, assume success
if task.error == None:
task.error = False
return task
# Define worker process that executes task handling
class Task_handler(mp.Process):
def __init__(self, task_queue, response_queue):
mp.Process.__init__(self)
self.task_queue = task_queue
self.response_queue = response_queue
self.keep_going = True
def run(self):
while self.keep_going:
task = self.task_queue.get()
if task.type == 'TERMINATE':
self.keep_going = False
self.detail = self.name
self.error = False
elif task.type == 'STATUS':
task.detail = self.name
task.error = False
else:
task = handle_task(task)
self.response_queue.put(task)
if __name__ == '__main__':
task_queue = mp.Queue()
response_queue = mp.Queue()
t = Task_handler(task_queue, response_queue)
t.start()
task_queue.put(Task('STATUS'))
task_queue.put(Task('TERMINATE'))
t.join()
while not response_queue.empty():
task = response_queue.get()
print('{} {}, error {}'.format(task.type, task.detail, task.error))
When I run my code in python 3.7.3 on windows 10 it runs fine, but when I run it in python 3.6.9 in linux it gets stuck and I don't understand why this is the case. Also I would appreciate hints on how to do this most efficiently, as I have received no formal training in programming and likely am not aware of all the "dos and don'ts".
Thank you in advance.

Updating member variable of object while using multprocessing pool

I have a class B which is composed of another class A.
In class B I am using multiprocessing pool to call a method from class A. This method updates a member variable of A (which is a dict).
When I print out this member variable it doesn't seem to have been updated. Here is the code describing the issue:
import multiprocessing as mp
class A():
def __init__(self):
self.aDict = {'key': 0}
def set_lock(self, lock):
self.lock = lock
def do_work(self, item):
print("Doing work for item: {}".format(item) )
self.aDict['key'] += 1
return [1,2,3] # return some list
class B():
def __init__(self):
self.objA = A()
def run_with_mp(self):
items=['item1', 'item2']
with mp.Pool(processes=mp.cpu_count()) as pool:
result = pool.map_async(self.objA.do_work, items)
result.wait()
pool.terminate()
print(self.objA.aDict)
def run(self):
items=['item1', 'item2']
for item in items:
self.objA.do_work(item)
print(self.objA.aDict)
if __name__ == "__main__":
b = B()
b.run_with_mp() # prints {'key': 0}
b.run() # prints {'key': 2}
b.run_with_mp() prints {'key': 0} whole b.run() prints {'key': 2}. I thought the multiprocessing pool version would also do the same since the object self.objA had scope for the full class of B where the multiprocessing pool runs.
I think each worker of the pool sees a different version of self.objA, which are different from the one in the main program flow. Is there a way to make all the workers update a common variable?
You are close to the explanation, indeed, each spawned process holds its own area of memory, it means that they are independent. When you run the do_work each process updates its version of aDict because that variable it's not shared. If you want to share a variable, the easiest way is to use a Manager, for example:
import multiprocessing as mp
class A():
def __init__(self):
self.aDict = mp.Manager().dict({'key': 0})
def set_lock(self, lock):
self.lock = lock
def do_work(self, item):
print("Doing work for item: {}".format(item) )
self.aDict['key'] += 1
return [1,2,3] # return some list
class B():
def __init__(self):
self.objA = A()
def run_with_mp(self):
items=['item1', 'item2']
with mp.Pool(processes=mp.cpu_count()) as pool:
result = pool.map_async(self.objA.do_work, items)
result.wait()
pool.terminate()
print(self.objA.aDict)
def run(self):
items=['item1', 'item2']
for item in items:
self.objA.do_work(item)
print(self.objA.aDict)
if __name__ == "__main__":
b = B()
b.run_with_mp() # prints {'key': 2}
b.run() # prints {'key': 4}
I modified your example to share the aDict variable, so each process will update that property (run_with_mp and run methods). Consider reading more in docs.

python thread pool copy parameters

I'm learning about multithreading and I try to implement a few things to understand it.
After reading several (and very technical topics) I cannot find a solution or way to understand my issue.
Basically, I have the following structure:
class MyObject():
def __init__():
self.lastupdate = datetime.datetime.now()
def DoThings():
...
def MyThreadFunction(OneOfMyObject):
OneOfMyObject.DoThings()
OneOfMyObject.lastupdate = datetime.datetime.now()
def main():
MyObject1 = MyObject()
MyObject2 = MyObject()
MyObjects = [MyObject1, MyObject2]
pool = Pool(2)
while True:
pool.map(MyThreadFunction, MyObjects)
if __name__ == '__main__':
main()
I think the function .map make a copy of my objects because it does not update the time. Is it right ? if yes, how could I input a Global version of my objects. If not, would you have any idea why the time is fixed in my objects ?
When I check the new time with print(MyObject.lastupdate), the time is right, but not in the next loop
Thank you very much for any of your ideas
Yes, python threading will serialize (actually, pickle) your objects and then reconstruct them in the thread. However, it also sends them back. To recover them, see the commented additions to the code below:
class MyObject():
def __init__():
self.lastupdate = datetime.datetime.now()
def DoThings():
...
def MyThreadFunction(OneOfMyObject):
OneOfMyObject.DoThings()
OneOfMyObject.lastupdate = datetime.datetime.now()
# NOW, RETURN THE OBJECT
return oneOfMyObject
def main():
MyObject1 = MyObject()
MyObject2 = MyObject()
MyObjects = [MyObject1, MyObject2]
with Pool(2) as pool: # <- this is just a neater way of doing it than a while loop for various reasons. Checkout context managers if interested.
# Now we recover a list of the updated objects:
processed_object_list = pool.map(MyThreadFunction, MyObjects)
# Now inspect
for my_object in processed_object_list:
print(my_object.lastupdate)
if __name__ == '__main__':
main()

QThread fails to use a QTimer properly

In my application, I have a custom QThread responsible for communicating with the backend, and call a utility function with a url and data from the run() method:
class SomeThread(QtCore.QThread):
def __init__(self, parent=None...):
QtCore.QThread.__init__(self, parent)
...
def run(self):
final_desired_content = some_utility_method(url, data ...)
# emitting success with final_desired_content
In the utility method(s), I'm making an http POST, getting a response back, parsing the response, and eventually passing the desired information to the above thread variable final_desired_content. Before I pass the information back, I am parsing some more information which I don't want to return, and would like to store in a SomeClass singleton instance:
def some_utility_method( ... ):
...
return response_parsing(response)
def response_parsing(response):
...
some_file.SomeClass.instance().setNewData(otherData)
return mainParsedData
Because there may be multiple threads contacting the BE within a few seconds (specifically during the application start) I would like to prevent the writing of data before has passed (it is ok that data we ignore is thrown away):
class SomeClass(QtCore.QObject):
_instance = None
#classmethod
def instance(klass):
if not klass._instance:
klass._instance = SomeClass()
return klass._instance
def __init__(self):
QtCore.QObject.__init__(self)
self._recentlyUpdatedTimer = QtCore.QTimer()
self._recentlyUpdatedTimer.setSingleShot(True)
self._recentlyUpdatedTimer.timeout.connect(self._setOkToUpdateCB)
self._storedData = None
self._allowUpdate = True
def _setOkToUpdateCB(self):
self._allowUpdate = True
def setNewData(self, newData):
if self._allowUpdate:
print "UPDATING!"
self._allowUpdate = False
self._storedData = newData
self._recentlyUpdatedTimer.start(<some_time>)
else:
print "BLOCKED!" # ok to ignore newData
The problem is that this successfully updates once, then after the second update succeeds, I am getting this error: QObject::startTimer: Timers cannot be started from another thread
From what I know and read about threads, the run() in the QThread is another thread, that might not know what has been happening in the main thread.
Debugging, it appears that the timer is still running, even though it is set to singleShot.
I will appreciate any suggestions :)
I resolved this by not using a timer.
What I ended up doing was:
class SomeClass(QtCore.QObject):
...
DATA_EXPIRE_THRESHOLD = 3
...
def __init__(self):
...
self._counter = 0 # Just for debugging
self._dataExpireTime = None
self._data = None
def setNewData(self, new_data):
self._counter += 1
if self._dataExpireTime is None or self._isExpired():
print "self._isExpired(): [%s], counter: [%s]" % (self._isExpired(), self.counter)
self._data = new_data
self._dataExpireTime = time.time() + self.DATA_EXPIRE_THRESHOLD
def _isExpired(self):
return time.time() >= self._dataExpireTime
so setting the threshold to three seconds, the output is:
self._isExpired(): [True], counter: [1]
self._isExpired(): [True], counter: [2]
self._isExpired(): [True], counter: [6]
self._isExpired(): [True], counter: [15]
self._isExpired(): [True], counter: [17]
self._isExpired(): [True], counter: [18]
I was trying to use a mutex, but it had a similar issue to the timer.
I would still appreciate an explanation, or advice to dealing with such issues.

Shared variable in Python Process subclass

I was wondering if it would be possible to create some sort of static set in a Python Process subclass to keep track the types processes that are currently running asynchronously.
class showError(Process):
# Define some form of shared set that is shared by all Processes
displayed_errors = set()
def __init__(self, file_name, error_type):
super(showError, self).__init__()
self.error_type = error_type
def run(self):
if error_type not in set:
displayed_errors.add(error_type)
message = 'Please try again. ' + str(self.error_type)
winsound.MessageBeep(-1)
result = win32api.MessageBox(0, message, 'Error', 0x00001000)
if result == 0:
displayed_errors.discard(error_type)
That way, when I create/start multiple showError processes with the same error_type, subsequent error windows will not be created. So how can we define this shared set?
You can use a multiprocessing.Manager.dict (there's no set object available, but you can use a dict in the same way) and share that between all your subprocesses.
import multiprocessing as mp
if __name__ == "__main__":
m = mp.Manager()
displayed_errors = m.dict()
subp = showError("some filename", "some error type", displayed_errors)
Then change showError.__init__ to accept the shared dict:
def __init__(self, file_name, error_type, displayed_errors):
super(showError, self).__init__()
self.error_type = error_type
self.displayed_errors = displayed_errors
Then this:
displayed_errors.add(error_type)
Becomes:
self.displayed_errors[error_type] = 1
And this:
displayed_errors.discard(error_type)
Becomes:
try:
del self.displayed_errors[error_type]
except KeyError:
pass

Categories