I set a multiprocessing.Queue code to run 2 functions in parallel. 1st func parses and writes data to a text file, 2nd function pulls data from same text file and show a live graph. 2nd func must kick off once the 1st func has created a text file. The code works well.
However:
It takes almost all RAM (ca. 6gb), is that because is a multi-process? in task manager I see 3 python.exe processes of 2gb each running at the same time, while when I run only the 1st func (the most RAM consuming) I can see only 1 python.exe of 2gb.
Once the code has parsed all the text and the graph stopped the processes keep running until I terminate manually the code using eclipse console red button, is that normal?
I have a small script that run before and out of the multi-process functions. It provides a value I need to define in order to run the multi-process functions. It runs OK, but once the the multi-process functions are called, it run again!! I don't know why, because it's definitely out of the multi-process functions.
Part of this was resolved in another stackoverflow question.
define newlista
define key
def get_all(myjson, kind, type)
def on_data(data)
small script run out of the multi-process function
def parsing(q):
keep_running = True
numentries = 0
for text in get_all(newlista, "sentence", "text"):
if 80 < len(text) < 500:
firstword = key.lower().split(None, 1)[0]
if text.lower().startswith(firstword):
pass
else:
on_data(text)
numentries += 1
q.put(keep_running)
keep_running = False
q.put(keep_running)
def live_graph(q):
keep_running = True
while keep_running:
keep_running = q.get()
# do the graph updates here
if __name__=='__main__':
q = Queue()
p1 = Process(target = writing, args=(q,))
p1.start()
p2 = Process(target = live_graph, args=(q,))
p2.start()
UPDATE
The graph function is the one that generates two .py processes and once the 1st function terminated the second function keeps running.
Related
I'm trying to get my program to process data faster by using multiprocessing.Pool, but I'm getting some problems implementing it.
My program has a main file in which I have a tkinter GUI running, something like this:
WIDTH = 1345
HEIGHT = 665
root = tk.Tk()
Calculate(Config, matrices):
(Calls the function Manager(), which is stored in another .py)
def main():
--variables, bindings and stuff--
btncalc.bind('<Button-1>', lambda event: Calculate(Config=Config,matrices=matrices))
if __name__ == '__main__':
main()
and then, if the user clicks on the button, the function Calculate calls a function that manages the data processing, and is stored in another .py. This is where I use multiprocessing.Pool. The code on the other file is similar to this:
def worker(lista):
a,b,c,d,e = lista
(processing)
return [f,g,h,i]
def MultiprocessingFunc(a,b,c,d,e):
lista = []
results = []
for p in range(len(a)):
lista.append([a[p],b,c,d,e])
pool = Pool(os.cpu_count()) #in my case this is 4
results.append(pool.map(worker, lista)) #Hangs here sometimes
pool.close()
pool.join() #Hangs here if imap
return results
def Manager(Config, matrices):
(prepares stuff)
results = MultiprocessingFunc(a,b,c,d,e)
(uses the results)
The fun thing about this is that sometimes this code does work, and sometimes it just hangs on pool.map. I don't know why it keeps hanging sometimes, but sometimes it does work fine. Maybe I should be using Process and Queue instead of Pool? Since it just hangs indefinitely, there are no error messages, so I don't know where to start to debug the code. I don't know if it matters, but I'm using Python3.8.
What I'd like to do is the following program to print out:
Running Main
Running Second
Running Main
Running Second
[...]
Code:
from multiprocessing import Process
import time
def main():
while True:
print('Running Main')
time.sleep(1)
def second():
while True:
print('Running Second')
time.sleep(1)
p1 = Process(main())
p2 = Process(second())
p1.start()
p2.start()
But it doesn't have the desired behavior. Instead it just prints out:
Running Main
Running Main
[...]
I suspect my program doesn't work because of the while statement?
Is there any way I can overcome this problem and have my program print out what I mentioned no matter what I execute in my function?
The issue here seems to be when you make the process vars. I suspect the reason for why the process inclusively runs the first function is because of syntax. My interpretation is that instead of creating a process out of a function you are making a process that executes a function exclusively.
When you want to create Process object you want to avoid using this
p1 = Process(target=main())
and rather write
p1 = Process(target=main)
That also means if you want to include any input for the function you will have to
p1 = Process(target=main, args=('hi',))
I am a beginner when it comes to python threading and multiprocessing so please bear with me.
I want to make a system that consists of three python scripts. The first one creates some data and sends this data to the second script continuously. The second script takes the data and saves on some file until the file exceeds defined memory limit. When that happens, the third script sends the data to an external device and gets rid of this "cache". I need all of this to happen concurrently. The pseudo code sums up what I am trying to do.
def main_1():
data = [1,2,3]
send_to_second_script(data)
def main_2():
rec_data = receive_from_first_script()
save_to_file(rec_data)
if file>limit:
signal_third_script()
def main_3():
if signal is true:
send_data_to_external_device()
remove_data_from_disk()
I understand that I can use queues to make this happen but I am not sure how.
Also , so far to do this, I tried a different approach where I created one python script and used threading to spawn threads for each part of the process. Is this correct or using queues is better?
Firstly, for Python you need to be really aware what the benefits of multithreading/multiprocessing gives you. IMO you should be considering multiprocessing instead of multithreading. Threading in Python is not actually concurrent due to GIL and there are many explanations out on which one to use. Easiest way to choose is to see if your program is IO-bound or CPU-bound. Anyways on to the Queue which is a simple way to work with multiple processes in python.
Using your pseudocode as an example, here is how you would use a Queue.
import multiprocessing
def main_1(output_queue):
test = 0
while test <=10: # simple limit to not run forever
data = [1,2,3]
print("Process 1: Sending data")
output_queue.put(data) #Puts data in queue FIFO
test+=1
output_queue.put("EXIT") # triggers the exit clause
def main_2(input_queue,output_queue):
file = 0 # Dummy psuedo variables
limit = 1
while True:
rec_data = input_queue.get() # Get the latest data from queue. Blocking if empty
if rec_data == "EXIT": # Exit clause is a way to cleanly shut down your processes
output_queue.put("EXIT")
print("Process 2: exiting")
break
print("Process 2: saving to file:", rec_data, "count = ", file)
file += 1
#save_to_file(rec_data)
if file>limit:
file = 0
output_queue.put(True)
def main_3(input_queue):
while(True):
signal = input_queue.get()
if signal is True:
print("Process 3: Data sent and removed")
#send_data_to_external_device()
#remove_data_from_disk()
elif signal == "EXIT":
print("Process 3: Exiting")
break
if __name__== '__main__':
q1 = multiprocessing.Queue() # Intializing the queues and the processes
q2 = multiprocessing.Queue()
p1 = multiprocessing.Process(target = main_1,args = (q1,))
p2 = multiprocessing.Process(target = main_2,args = (q1,q2,))
p3 = multiprocessing.Process(target = main_3,args = (q2,))
p = [p1,p2,p3]
for i in p: # Start all processes
i.start()
for i in p: # Ensure all processes are finished
i.join()
The prints may be a little off because I did not bother to lock the std_out. But using a queue ensures that stuff moves from one process to another.
EDIT: DO be aware that you should also have a look at multiprocessing locks to ensure that your file is 'thread-safe' when performing the move/delete. The pseudo code above only demonstrates how to use queue
I am performing a large parallel mapping computation from within iPython notebook. I am mapping a dataframe by subject and condition to an machine learning prediction function, and I want each subject and condition to be spread among 20 cores.
def map_vars_to_functionPredict(subject,condition):
ans = map(predictBasic, [subject],[df],[condition])
return ans
def main_helperPredict(args):
return map_vars_to_functionPredict(*args)
def parallel_predict(subjects, conditions):
p = Pool(20)
# set each matching item into a tuple
job_args = list(itertools.product(*[subjects,conditions]))
print job_args
# map to pool
ans = p.map(main_helperPredict, job_args)
p.close()
p.join()
return ans
When I run these functions from iPython Notebook after starting the notebook, they run quickly and as expected (in 'Running' state at ~100% cpu in 20 cores). However, sometimes if I re-run the parallel_predict function right after running it for the first time, all 20 processes are marked as in uninterruptible sleep (D) state for no reason. I am not writing anything to disk, just having the output as a variable in iPython notebook.
As a last ditch attempt, I have tried including del p after p.join() and this helped somewhat (the function runs normally more often), but I still occasionally have the issue of processes being D, especially if I have a lot of processes in the queue.
Edit:
In general, adding del p after p.join() kept the processes from entering (D) state, but I continued to have an issue where the function would finish all the processes (as far as I could tell from top), but it would not return results. When I stopped the iPython Notebook kernel, I got the error ZMQError: Address already in use.
How should I properly start or finish the multiprocessing Pool to keep this from happening?
I changed four things and now 1) the processes no longer go into (D) state and 2) I can run these functions back-to-back and they always return results and don't hang.
To parallel_predict, I added freeze_support() and replaced p.close() with p.terminate() (and added a print line, but I don't think that makes a difference, but I'm including that since all of this is superstition anyway). I also added del p.
def parallel_predict(subjects, conditions):
freeze_support()
p = Pool(20)
# set each matching item into a tuple
job_args = list(itertools.product(*[subjects,conditions]))
print job_args
# map to pool
ans = p.map(main_helperPredict, job_args)
p.terminate()
p.join()
del p
print "finished"
return ans
Finally, I embedded the line where I call parallel_predict in if __name__ == "__main__" as such:
if __name__ == "__main__":
all_results = parallel_predict(subjects,conditions)
Hello I am trying to run 2 functions at the same time in python. Both read data from 2 separate meters over USB and they are not dependant on each other. I have tried multiprocessing but the second meter never starts.
def readMeter1():
while True:
#read Meter1
def readMeter2():
while True:
#read Meter2
if __name__ == "__main__":
Process(target = readMeter1()).start()
Process(target = readMeter2()).start()
Parameter target must be something callable (a function, in your case). You don't need to call that function yourself, start() will do it after launching a new process:
Process(target=readMeter1).start() # fork a new process, call readMeter1
Process(target=readMeter2).start() # fork a new process, call readMeter2
Because you call readMeter1, it starts an infinite loop in the current process and blocks everything else.