Update progressbar inside multiprocessor functions (Ray) simultaneously - python

I'm writing a program that uses ray package for multiprocessing programming. In the program, there is a function that would be called 5 times at the same time. During the execution, I want to show a progress bar using PyQT5 QprogressBar to indicate how much work is done. My idea is to let every execution of the function updates the progress bar by 20%. So I wrote the code like the following:
running_tasks = [myFunction.remote(x,y,z,self.progressBar,QApplication) for x in myList]
Results = list(ray.get(running_tasks))
Inside myFunction, there is a line to update the sent progress bar as the following:
QApplication.processEvents()
progressBar.setValue(progressBar.Value()+20)
But, when I run the code, I got the following error:
TypeError: Could not serialize the argument
<PyQt5.QtWidgets.QProgressBar object at 0x000001B787A36B80> for a task
or actor myFile.myFunction. Check
https://docs.ray.io/en/master/serialization.html#troubleshooting for
more information.
I searched through the internet (The URL returns 404) and I understand that this error is because multiprocessing in ray doesn't have shared memory between the processors, and sending a class attribute (like self.prgressBar) will lead each processor to have its own copy where it will modify it locally only. I also tried using the multiprocessing package instead of ray but it throws a pickling error, and I assume it is due to the same reason. So, Can anyone confirm if I'm right? or provide a further explanation about the error?
Also, how can I achieve my requirement in multiprocessing (i.e. updating the same progress bar simultaneously) If multiprocessing doesn't have shared memory between the processors?

I am unfamiliar with ray, but you can do this in the multiprocessing library using the multiprocessing.Queue().
The Queue is exactly as it's named, a queue where you can put data for other multiprocesses to read. In my case I usually put a dictionary in the Queue with a Command (Key) and what to do with that command (Value).
In one multiprocess you will do Queue.put() and in the other you can do Queue.get(). If you want to pass in one direction. In the example below I emulate what you may be looking to do.
I usually use a QTimer to check if there is any data in the queue, but you can also check whenever you feel like by calling a method to do so.
from multiprocessing import Process, Queue
myQueue = Queue()
class FirstProcess():
...
def update_progress_percentage(self, percentage):
self.progresss_percentage = percentage
def send_data_to_other_process(self):
myQueue.put({"UpdateProgress":self.progresss_percentage})
class SecondProcess():
...
def get_data_from_other_process(self):
while not myQueue.empty():
queue_dict = myQueue.get()
for key in queue_dict :
if key == "UpdateProgress":
percentage = queue_dict["UpdateProgress"]
progressBar.setValue(percentage)

Related

Share variables across scripts

I have 2 separate scripts working with the same variables.
To be more precise, one code edits the variables and the other one uses them (It would be nice if it could edit them too but not absolutely necessary.)
This is what i am currently doing:
When code 1 edits a variable it dumps it into a json file.
Code 2 repeatedly opens the json file to get the variables.
This method is really not elegant and the while loop is really slow.
How can i share variables across scripts?
My first scripts gets data from a midi controller and sends web-requests.
My second script is for LED strips (those run thanks to the same midi controller). Both script run in a "while true" loop.
I can't simply put them in the same script since every webrequest would slow the LEDs down. I am currently just sharing the variables via a json file.
If enough people ask for it i will post the whole code but i have been told not to do this
Considering the information you provided, meaning...
Both script run in a "while true" loop.
I can't simply put them in the same script since every webrequest would slow the LEDs down.
To me, you have 2 choices :
Use a client/server model. You have 2 machines. One acts as the server, and the second as the client. The server has a script with an infinite loop that consistently updates the data, and you would have an API that would just read and expose the current state of your file/database to the client. The client would be on another machine, and as I understand it, it would simply request the current data, and process it.
Make a single multiprocessing script. Each script would run on a separate 'thread' and would manage its own memory. As you also want to share variables between your two programs, you could pass as argument an object that would be shared between both your programs. See this resource to help you.
Note that there are more solutions to this. For instance, you're using a JSON file that you are consistently opening and closing (that is probably what takes the most time in your program). You could use a real Database that could handle being opened only once, and processed many times, while still being updated.
a Manager from multiprocessing lets you do this sort thing pretty easily
first I simplify your "midi controller and sends web-request" code down to something that just sleeps for random amounts of time and updates a variable in a managed dictionary:
from time import sleep
from random import random
def slow_fn(d):
i = 0
while True:
sleep(random() ** 2)
i += 1
d['value'] = i
next we simplify the "LED strip" control down to something that just prints to the screen:
from time import perf_counter
def fast_fn(d):
last = perf_counter()
while True:
sleep(0.05)
value = d.get('value')
now = perf_counter()
print(f'fast {value} {(now - last) * 1000:.2f}ms')
last = now
you can then run these functions in separate processes:
import multiprocessing as mp
with mp.Manager() as manager:
d = manager.dict()
procs = []
for fn in [slow_fn, fast_fn]:
p = mp.Process(target=fn, args=[d])
procs.append(p)
p.start()
for p in procs:
p.join()
the "fast" output happens regularly with no obvious visual pauses

Trying to understand multiprocessing and queues across python modules

I'm trying to understand multiprocessing. My actual application is to display log messages in real time on a pyqt5 GUI, but I ran into some problems using queues so I made a simple program to test it out.
The issue I'm seeing is that I am unable to add elements to a Queue across python modules and across processes. Here is my code and my output, along with the expected output.
Config file for globals:
# cfg.py
# Using a config file to import my globals across modules
#import queue
import multiprocessing
# q = queue.Queue()
q = multiprocessing.Queue()
Main module:
# mod1.py
import cfg
import mod2
import multiprocessing
def testq():
global q
print("q has {} elements".format(cfg.q.qsize()))
if __name__ == '__main__':
testq()
p = multiprocessing.Process(target=mod2.add_to_q)
p.start()
p.join()
testq()
mod2.pullfromq()
testq()
Secondary module:
# mod2.py
import cfg
def add_to_q():
cfg.q.put("Hello")
cfg.q.put("World!")
print("qsize in add_to_q is {}".format(cfg.q.qsize()))
def pullfromq():
if not cfg.q.empty():
msg = cfg.q.get()
print(msg)
Here is the output that I actually get from this:
q has 0 elements
qsize in add_to_q is 2
q has 0 elements
q has 0 elements
vs the output that I would expect to get:
q has 0 elements
qsize in add_to_q is 2
q has 2 elements
Hello
q has 1 elements
So far I have tried using both multiprocessing.Queue and queue.Queue. I have also tested this with and without Process.join().
If I run the same program without using multiprocessing, I get the expected output shown above.
What am I doing wrong here?
EDIT:
Process.run() gives me the expected output, but it also blocks the main process while it is running, which is not what I want to do.
My understanding is that Process.run() runs the created process in the context of the calling process (in my case the main process), meaning that it is no different from the main process calling the same function.
I still don't understand why my queue behavior isn't working as expected
I've discovered the root of the issue and I'll document it here for future searches, but I'd still like to know if there's a standard solution to creating a global queue between modules so I'll accept any other answers/comments.
I found the problem when I added the following to my cfg.py file.
print("cfg.py is running in process {}".format(multiprocessing.current_process()))
This gave me the following output:
cfg.py is running in process <_MainProcess(MainProcess, started)>
cfg.py is running in process <_MainProcess(Process-1, started)>
cfg.py is running in process <_MainProcess(Process-2, started)>
It would appear that I'm creating separate Queue objects for each process that I create, which would certainly explain why they aren't interacting as expected.
This question has a comment stating that
a shared queue needs to originate from the master process, which is then passed to all of its subprocesses.
All this being said, I'd still like to know if there is an effective way to share a global queue between modules without having to pass it between methods.

multiprocessing -> pathos.multiprocessing and windows

I'm currently using the standard multiprocessing in python to generate a bunch of processes that will run indefinitely. I'm not particularly concerned with performance; each thread is simply watching for a different change on the filesystem, and will take the appropriate action when a file is modified.
Currently, I have a solution that works, for my needs, in Linux. I have a dictionary of functions and arguments that looks like:
job_dict['func1'] = {'target': func1, 'args': (args,)}
For each, I create a process:
import multiprocessing
for k in job_dict.keys():
jobs[k] = multiprocessing.Process(target=job_dict[k]['target'],
args=job_dict[k]['args'])
With this, I can keep track of each one that is running, and, if necessary, restart a job that crashes for any reason.
This does not work in Windows. Many of the functions I'm using are wrappers, using various functools functions, and I get messages about not being able to serialize the functions (see What can multiprocessing and dill do together?). I have not figured out why I do not get this error in Linux, but do in Windows.
If I import dill before starting my processes in Windows, I do not get the serialization error. However, the processes do not actually do anything. I cannot figure out why.
I then switched to the multiprocessing implementation in pathos, but did not find an analog to the simple Process class within the standard multiprocessing module. I was able to generate threads for each job using pathos.pools.ThreadPool. This is not the intended use for map, I'm sure, but it started all the threads, and they ran in Windows:
import pathos
tp = pathos.pools.ThreadPool()
for k in job_dict.keys():
tp.uimap(job_dict[k]['target'], job_dict[k]['args'])
However, now I'm not sure how to monitor whether a thread is still active, which I'm looking for so that I can restart threads that crash for some reason or another. Any suggestions?
I'm the pathos and dill author. The Process class is buried deep within pathos at pathos.helpers.mp.process.Process, where mp itself is the actual fork of the multiprocessing library. Everything in multiprocessing should be accessible from there.
Another thing to know about pathos is that it keeps the pool alive for you until you remove it from the held state. This helps reduce overhead in creating "new" pools. To remove a pool, you do:
>>> # create
>>> p = pathos.pools.ProcessPool()
>>> # remove
>>> p.clear()
There's no such mechanism for a Process however.
For multiprocessing, windows is different than Linux and Macintosh… because windows doesn't have a proper fork like on linux… linux can share objects across processes, while on windows there is no sharing… it's basically a fully independent new process created… and therefore the serialization has to be better for the object to pass across to the other process -- just as if you would send the object to another computer. On, linux, you'd have to do this to get the same behavior:
def check(obj, *args, **kwds):
"""check pickling of an object across another process"""
import subprocess
fail = True
try:
_x = dill.dumps(x, *args, **kwds)
fail = False
finally:
if fail:
print "DUMP FAILED"
msg = "python -c import dill; print dill.loads(%s)" % repr(_x)
print "SUCCESS" if not subprocess.call(msg.split(None,2)) else "LOAD FAILED"

Strange blocking behavior with python multiprocessing queue put() and get()

I have written a class in python 2.7 (under linux) that uses multiple processes to manipulate a database asynchronously. I encountered a very strange blocking behaviour when using multiprocessing.Queue.put() and multiprocessing.Queue.get() which I can't explain.
Here is a simplified version of what I do:
from multiprocessing import Process, Queue
class MyDB(object):
def __init__(self):
self.inqueue = Queue()
p1 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
p1.daemon = True
started = False
while not started:
try:
p1.start()
started = True
except:
time.sleep(1)
#Sometimes I start a same second process but it makes no difference to my problem
p2 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
#blahblah... (same as above)
#staticmethod
def _worker_process(inqueue):
while True:
#--------------this blocks depite data having arrived------------
op = inqueue.get(block = True)
#do something with specified operation
#---------------problem area end--------------------
print "if this text gets printed, the problem was solved"
def delete_parallel(self, key, rawkey = False):
someid = ...blahblah
#--------------this section blocked when I was posting the question but for unknown reasons it's fine now
self.inqueue.put({"optype": "delete", "kwargs": {"key":key, "rawkey":rawkey}, "callid": someid}, block = True)
#--------------problem area end----------------
print "if you see this text, there was no blocking or block was released"
If I run the code above inside a test (in which I call delete_parallel on the MyDB object) then everything works, but if I run it in context of my entire application (importing other stuff, inclusive pygtk) strange things happen:
For some reason self.inqueue.get blocks and never releases despite self.inqueue having the data in its buffer. When I instead call self.inqueue.get(block = False, timeout = 1) then the call finishes by raising Queue.Empty, despite the queue containing data. qsize() returns 1 (suggests that data is there) while empty() returns True (suggests that there is no data).
Now clearly there must be something somewhere else in my application that renders self.inqueue unusable by causing acquisition of some internal semaphore. However I don't know what to look for. Eclipse dubugging becomes useless once a blocking semaphore is reached.
Edit 8 (cleaning up and summarizing my previous edits) Last time I had a similar problem, it turned out that pygtk was hijacking the global interpreter lock, but I solved it by calling gobject.threads_init() before I called anything else. Could this issue be related?
When I introduce a print "successful reception" after the get() method and execute my application in terminal, the same behaviour happens at first. When I then terminate by pressing CTRL+D I suddenly get the string "successful reception" inbetween messages. This looks to me like some other process/thread is terminated and releases the lock that blocks the process that is stuck at get().
Since the process that was stuck terminates later, I still see the message. What kind of process could externally mess with a Queue like that? self.inqueue is only accessed inside my class.
Right now it seems to come down to this queue which won't return anything despite the data being there:
the get() method seems to get stuck when it attempts to receive the actual data from some internal pipe. The last line before my debugger hangs is:
res = self._recv()
which is inside of multiprocessing.queues.get()
Tracking this internal python stuff further I find the assignments
self._recv = self._reader.recv and self._reader, self._writer = Pipe(duplex=False).
Edit 9
I'm currently trying to hunt down the import that causes it. My application is quite complex with hundreds of classes and each class importing a lot of other classes, so it's a pretty painful process. I have found a first candidate class which Uses 3 different MyDB instances when I track all its imports (but doesn't access MyDB.inqueue at any time as far as I can tell). The strange thing is, it's basically just a wrapper and the wrapped class works just fine when imported on its own. This also means that it uses MyDB without freezing. As soon as I import the wrapper (which imports that class), I have the blocking issue.
I started rewriting the wrapper by gradually reusing the old code. I'm testing each time I introduce a couple of new lines until I will hopefully see which line will cause the problem to return.
queue.Queue uses internal threads to maintain its state. If you are using GTK then it will break these threads. So you will need to call gobject.init_threads().
It should be noted that qsize() only returns an approximate size of the queue. The real size may be anywhere between 0 and the value returned by qsize().

Python simplest form of multiprocessing

Ive been trying to read up on threading and multiprocessing but all the examples are to intricate and advanced for my level of python/programming knowlegde. I want to run a function, which consists of a while loop, and while that loop runs I want to continue with the program and eventually change the condition for the while-loop and end that process. This is the code:
class Example():
def __init__(self):
self.condition = False
def func1(self):
self.condition = True
while self.condition:
print "Still looping"
time.sleep(1)
print "Finished loop"
def end_loop(self):
self.condition = False
The I make the following function-calls:
ex = Example()
ex.func1()
time.sleep(5)
ex.end_loop()
What I want is for the func1 to run for 5s before the end_loop() is called and changes the condition and ends the loop and thus also the function. I.e I want one process to start and "go" into func1 and at the same time I want time.sleep(5) to be called, so the processes "split" when arriving at func1, one process entering the function while the other continues down the program and start with the time.sleep(5) execution.
This must be the most basic example of a multiprocess, still Ive had trouble finding a simple way to do it!
Thank you
EDIT1: regarding do_something. In my real problem do_something is replaced by some code that communicates with another program via a socket and receives packages with coordinates every 0.02s and stores them in membervariables of the class. I want this constant updating of the coordinates to start and then be able to to read the coordinates via other functions at the same time.
However that is not so relevant. What if do_something is replaced by:
time.sleep(1)
print "Still looping"
How do I solve my problem then?
EDIT2: I have tried multiprocessing like this:
from multiprocessing import Process
ex = Example()
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
p1.start()
time.sleep(5)
p2.start()
When I ran this, I never got to p2.start(), so that did not help. Even if it had this is not really what Im looking for either. What I want would be just to start the process p1, and then continue with time.sleep and ex.end_loop()
The first problem with your code are the calls
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
With ex.func1() you're calling the function and pass the return value as target parameter. Since the function doesn't return anything, you're effectively calling
p1 = Process(target=None)
p2 = Process(target=None)
which makes, of course, no sense.
After fixing that, the next problem will be shared data: when using the multiprocessing package, you implement concurrency using multiple processes which, by default, cannot simply share data afaik. Have a look at Sharing state between processes in the package's documentation to read about this. Especially take the first sentence into account: "when doing concurrent programming it is usually best to avoid using shared state as far as possible"!
So you might want to also have a look at Exchanging objects between processes to read about how to send/receive data between two different processes. So, instead of simply setting a flag to stop the loop, it might be better to send a message to signal the loop should be terminated.
Also note that processes are a heavyweight form of multiprocessing, they spawn multiple OS processes which comes with a relatively big overhead. multiprocessing's main purpose is to avoid problems imposed by Python's Global Interpreter Lock (google about this to read more...) If your problem is'nt much more complex than what you've told us, you might want to use the threading package instead: threads come with less overhead than processes and also allow to access the same data (although you really should read about synchronization when doing this...)
I'm afraid, multiprocessing is an inherently complex subject. So I think you will need to advance your programming/python skills to successfully use it. But I'm sure you'll manage this, the python documentation about this is comprehensive and there are a lot of other resources about this.
To tackle your EDIT2 problem, you could try using the shared memory map Value.
import time
from multiprocessing import Process, Value
class Example():
def func1(self, cond):
while (cond.value == 1):
print('do something')
time.sleep(1)
return
if __name__ == '__main__':
ex = Example()
cond = Value('i', 1)
proc = Process(target=ex.func1, args=(cond,))
proc.start()
time.sleep(5)
cond.value = 0
proc.join()
(Note the target=ex.func1 without the parentheses and the comma after cond in args=(cond,).)
But look at the answer provided by MartinStettner to find a good solution.

Categories