Why does an infinite loop not run in multiprocessing? - python

So im trying to have a function run in 4 separate instances using the multiprocessing module. Inside the function is an infinite loop but for some reason it only runs one time and returns control of my terminal window to me.
Here are the 2 functions:
The function that creates the pool
def mainLogic():
global direction_array
pool = Pool()
for dir in direction_array:
pool.apply_async(generic_arrow_logic, args=(dir, direction_array.index(dir)))
print("starting process " + str(direction_array.index(dir)))
pool.close()
pool.join()
The function that im trying to run infinitely
def generic_arrow_logic(arrType, thread):
#Average runtime = .3 seconds so roughly 3fps
global color_dictionary, key_dictionary, arrowArrayCurr, default_arrow_color
last_time = time.time()
parseCoords(False)
while True:
working = screenGrab("not") #numpy array of entire image
currArr = cutImage(working, arrType, "not")#.convert("RGB") - Another numpy array
(height, width, depth) = currArr.shape
print("Loop on process {0} took {1} seconds...".format(thread,time.time()-last_time))
last_time = time.time()
if(not (currArr[int(width/2), int(height/2)] == default_arrow_color).all()):
pydirectinput.press(key_dictionary[arrType])
# sys.stdout.flush()
and this is what happens when I run the program
starting process 0
starting process 1
starting process 2
starting process 3
Loop on process 0 took 0.02699136734008789 seconds...
Loop on process 2 took 0.04453277587890625 seconds...
Loop on process 1 took 0.060872793197631836 seconds...
Loop on process 3 took 0.07178044319152832 seconds...
Jalen Morgan#gridl0ck-TL MINGW64 ~
$
Does anything stand out in my code that would explain why this doesn't run forever?

Related

python multithreading 10 thread at a time until 200 thread completed

i have 200 SQL Servers and 40 same tables in all servers. i want to execute my python data pipeline per table in all 200 threads. just want know if i can run 10 thread concurrently at a time and if they finish, it run next 10 thread until 200 thread completed in one job automatically
for index, row in shops.iterrows():
tf =isOpen(row['shop_ip'] , row['port'])
endTime = time.time()
pingTime = endTime-startTime
if tf:
print(f"UP {row['shop_ip']} Ping Successful Time Taken : "+str(pingTime)+" seconds")
x = threading.Thread(target=ETLScript.ETLLoadingShopPos,args=(SelectColumns,tableName,tableName,row['shop_code'],'where 1=1',str(row['shop_code']),row))
jobs.append(x)
x.start()
x.join()
Based on your explanation, is it matter to run 10 thread as a running group or the matter thing is only 10 thread is allowed to run?
I think queue concept will works
import time
import threading
jobs = []
for index, row in shops.iterrows():
tf =isOpen(row['shop_ip'] , row['port'])
endTime = time.time()
pingTime = endTime-startTime
if tf:
while sum([j.is_alive() for j in jobs]) >= 10:
time.sleep(0.3) # give it a range to check to avoid resource consumed the whole time by main thread
print(f"UP {row['shop_ip']} Ping Successful Time Taken : "+str(pingTime)+" seconds")
x = threading.Thread(target=ETLScript.ETLLoadingShopPos,args=(SelectColumns,tableName,tableName,row['shop_code'],'where 1=1',str(row['shop_code']),row))
jobs.append(x)
x.start()
This allows you to run 10 threads at once. while loop will be used as a blocker to check if any of those threads is finished, another new thread will be started.

Mulitprocessing queue termination

I have a program i want to split into 10 parts with multiprocessing. Each worker will be searching for the same answer using different variables to look for it (in this case its brute forcing a password). How to I get the processes to communicate their status, and how do I terminate all processes once one process has found the answer. Thank you!
If you are going to split it into 10 parts than either you should have 10 cores or at least your worker function should not be 100% CPU bound.
The following code initializes each process with a multiprocess.Queue instance to which the worker function will write its result. The main process waits for the first entry written to the queue and then terminates all pool processes. For this demo, the worker function is passed arguments 1, 2, 3, ... 10 and then sleeps for that amount of time and returns the argument passed. So we would expect that the worker function that was passed the argument value of 1 to complete first and that the total running time of the program should be slightly more than 1 second (it takes some time to create the 10 processes):
import multiprocessing
import time
def init_pool(q):
global queue
queue = q
def worker(x):
time.sleep(x)
# write result to queue
queue.put_nowait(x)
def main():
queue = multiprocessing.Queue()
pool = multiprocessing.Pool(10, initializer=init_pool, initargs=(queue,))
for i in range(1, 11):
# non-blocking:
pool.apply_async(worker, args=(i,))
# wait for first result
result = queue.get()
pool.terminate() # kill all tasks
print('Result: ', result)
# required for Windows:
if __name__ == '__main__':
t = time.time()
main()
print('total time =', time.time() - t)
Prints:
Result: 1
total time = 1.2548246383666992

raspberry pi 3 multiprocessing queue syncronization between 2 processes

I have done a simple code using the multiprocessing library to build an extra process apart from the main code (2 processes in total). I did this code on W7 Professional x64 through Anaconda-spyder v3.2.4 and it works almost as I want except for the fact that when I run the code it increase the memory consumption of my second process (not the main one) until it reaches the total capacity and the computer got stuck and freezed (you can notice this at the whindows task manager).
"""
Example to print data from a function using multiprocessing library
Created on Thu Jan 30 12:07:49 2018
author: Kevin Machado Gamboa
Contct: ing.kevin#hotmail.com
"""
from time import time
import numpy as np
from multiprocessing import Process, Queue, Event
t0=time()
def ppg_parameters(hr, minR, ampR, minIR, ampIR, t):
HR = float(hr)
f= HR * (1/60)
# Spo2 Red signal function
sR = minR + ampR * (0.05*np.sin(2*np.pi*t*3*f)
+ 0.4*np.sin(2*np.pi*t*f) + 0.25*np.sin(2*np.pi*t*2*f+45))
# Spo2 InfraRed signal function
sIR = minIR + ampIR * (0.05*np.sin(2*np.pi*t*3*f)
+ 0.4*np.sin(2*np.pi*t*f) + 0.25*np.sin(2*np.pi*t*2*f+45))
return sR, sIR
def loop(q):
"""
generates the values of the function ppg_parameters
"""
hr = 60
ampR = 1.0814 # amplitud for Red signal
minR = 0.0 # Desplacement from zero for Red signal
ampIR = 1.12 # amplitud for InfraRed signal
minIR = 0.7 # Desplacement from zero for Red signal
# infinite loop to generate the signal
while True:
t = time()-t0
y = ppg_parameters(hr, minR, ampR, minIR, ampIR, t)
q.put([t, y[0], y[1]])
if __name__ == "__main__":
_exit = Event()
q = Queue()
p = Process(target=loop, args=(q,))
p.start()
# starts the main process
while q.qsize() != 1:
try:
data = q.get(True,2) # takes each data from the queue
print(data[0], data[1], data[2])
except KeyboardInterrupt:
p.terminate()
p.join()
print('supposed to stop')
break
Why is this happening? Perhaps is the while loop of my 2nd process? I don't know. I haven't seen this issue nowhere.
Moreover, if I run the same code on my Rpi 3 model B, there is a point when it pops an error that said "the queue is empty" something like if the main process is running faster than process two.
Please any guess of why is this happening, suggestion or link would be helpful.
Thanks
It looks like inside your infinite loop you are adding to the queue and I'm guessing that you are adding data faster than it can be taken off of the queue by the other process.
You could check the queue size periodically from inside the infinite loop and if it is over a certain amount (say 500 items), then you could sleep for a few seconds and then check again.
https://docs.python.org/2/library/queue.html#Queue.Queue.qsize

Run parallel Stata do files in python using multiprocess and subprocess

I have a stata do file pyexample3.do, which uses its argument as a regressor to run a regression. The F-statistic from the regression is saved in a text file. The code is as follows:
clear all
set more off
local y `1'
display `"first parameter: `y'"'
sysuse auto
regress price `y'
local f=e(F)
display "`f'"
file open myhandle using test_result.txt, write append
file write myhandle "`f'" _n
file close myhandle
exit, STATA clear
Now I am trying to run the stata do file in parallel in python and write all the F-statistics in one text file. My cpu has 4 cores.
import multiprocessing
import subprocess
def work(staname):
dofile = "pyexample3.do"
cmd = ["StataMP-64.exe","/e", "do", dofile,staname]
return subprocess.call(cmd, shell=False)
if __name__ == '__main__':
my_list =[ "mpg","rep78","headroom","trunk","weight","length","turn","displacement","gear_ratio" ]
my_list.sort()
print my_list
# Get the number of processors available
num_processes = multiprocessing.cpu_count()
threads = []
len_stas = len(my_list)
print "+++ Number of stations to process: %s" % (len_stas)
# run until all the threads are done, and there is no data left
for list_item in my_list:
# if we aren't using all the processors AND there is still data left to
# compute, then spawn another thread
if( len(threads) < num_processes ):
p = multiprocessing.Process(target=work,args=[list_item])
p.start()
print p, p.is_alive()
threads.append(p)
else:
for thread in threads:
if not thread.is_alive():
threads.remove(thread)
Although the do file is supposed to run 9 times as there are 9 strings in my_list, it was only run 4 times. So where went wrong?
In your for list_item in my_list loop, after the first 4 processes get initiated, it then goes into else:
for thread in threads:
if not thread.is_alive():
threads.remove(thread)
As you can see since thread.is_alive() won't block, this loop get executed immediately without any of those 4 processes finishing their task. Therefore only the first 4 processes get executed in total.
You could simply use a while loop to constantly check process status with a small interval:
keep_checking = True
while keep_checking:
for thread in threads:
if not thread.is_alive():
threads.remove(thread)
keep_checking = False
time.sleep(0.5) # wait 0.5s

Python numpy.fft very slow (10x slower) when run in subprocess

I've found that numpy.fft.fft (and its variants) very slow when run in the background. Here is an example of what I'm talking about
import numpy as np
import multiprocessing as mproc
import time
import sys
# the producer function, which will run in the background and produce data
def Producer(dataQ):
numFrames = 5
n = 0
while n < numFrames:
data = np.random.rand(3000, 200)
dataQ.put(data) # send the datta to the consumer
time.sleep(0.1) # sleep for 0.5 second, so we dont' overload CPU
n += 1
# the consumer function, which will run in the backgrounnd and consume data from the producer
def Consumer(dataQ):
while True:
data = dataQ.get()
t1 = time.time()
fftdata = np.fft.rfft(data, n=3000*5)
tDiff = time.time() - t1
print("Elapsed time is %0.3f" % tDiff)
time.sleep(0.01)
sys.stdout.flush()
# the main program if __name__ == '__main__': is necessary to prevent this code from being run
# only when this program is started by user
if __name__ == '__main__':
data = np.random.rand(3000, 200)
t1 = time.time()
fftdata = np.fft.rfft(data, n=3000*5, axis=0)
tDiff = time.time() - t1
print("Elapsed time is %0.3f" % tDiff)
# generate a queue for transferring data between the producedr and the consumer
dataQ = mproc.Queue(4)
# start up the processoso
producerProcess = mproc.Process(target=Producer, args=[dataQ], daemon=False)
consumerProcess = mproc.Process(target=Consumer, args=[dataQ], daemon=False)
print("starting up processes")
producerProcess.start()
consumerProcess.start()
time.sleep(10) # let program run for 5 seconds
producerProcess.terminate()
consumerProcess.terminate()
The output it produes on my machine:
Elapsed time is 0.079
starting up processes
Elapsed time is 0.859
Elapsed time is 0.861
Elapsed time is 0.878
Elapsed time is 0.863
Elapsed time is 0.758
As you can see, it is roughly 10x slower when run in the background, and I can't figure out why this would be the case. The time.sleep() calls should ensure that the other process (the main process and producer process) aren't doing anything when the FFT is being computed, so it should use all the cores. I've checked CPU utilization through Windows Task Manager and it seems to use up about 25% when numpy.fft.fft is called heavily in both the single process and multiprocess cases.
Anyone have an idea whats going on?
The main problem is that your fft call in the background thread is:
fftdata = np.fft.rfft(data, n=3000*5)
rather than:
fftdata = np.fft.rfft(data, n=3000*5, axis=0)
which for me made all the difference.
There are a few other things worth noting. Rather than having the time.sleep() everywhere, why not just let the processor take care of this itself? Further more, rather than suspending the main thread, you can use
consumerProcess.join()
and then have the producer process run dataQ.put(None) once it has finished loading the data, and break out of the loop in the consumer process, i.e.:
def Consumer(dataQ):
while True:
data = dataQ.get()
if(data is None):
break
...

Categories