Python - multiprocessing max # of processes - python

I would like to create and run at most N processes at once.
As soon as a process is finished, a new one should take its place.
The following code works(assuming Dostuff is the function to execute).
The problem is that I am using a loop and need time.sleep to allow
the processes to do their work. This is rather ineficient.
What's the best method for this task?
import time,multiprocessing
if __name__ == "__main__":
Jobs = []
for i in range(10):
while len(Jobs) >= 4:
NotDead = []
for Job in Jobs:
if Job.is_alive():
NotDead.append(Job)
Jobs = NotDead
time.sleep(0.05)
NewJob = multiprocessing.Process(target=Dostuff)
Jobs.append(NewJob)
NewJob.start()
After a bit of tinkering, I thought about creating new threads and then
launching my processes from these threads like so:
import threading,multiprocessing,time
def processf(num):
print("in process:",num)
now=time.clock()
while time.clock()-now < 2:
pass ##..Intensive processing..
def main():
z = [0]
lock = threading.Lock()
def threadf():
while z[0] < 20:
lock.acquire()
work = multiprocessing.Process(target=processf,args=(z[0],))
z[0] = z[0] +1
lock.release()
work.start()
work.join()
activet =[]
for i in range(2):
newt = threading.Thread(target=threadf)
activet.append(newt)
newt.start()
for i in activet:
i.join()
if __name__ == "__main__":
main()
This solution is better(doesn't slow down the launched processes), however,
I wouldn't really trust code that I wrote in a field I don't know..
I've had to use a list(z = [0]) since an integer was immutable.
Is there a way to embed processf into main()? I'd prefer not needing an additional
global variable. If I try to simply copy/paste the function inside, I get a nasty error(
Attribute error can't pickle local object 'main.(locals).processf')

Why not using concurrent.futures.ThreadPoolExecutor?
executor = ThreadPoolExecutor(max_workers=20)
res = execuror.submit(any_def)

Related

How to stop all processes if one of them changes a global "stop" variable to True

import multiprocessing
global stop
stop = False
def makeprocesses():
processes = []
for _ in range(50):
p = multiprocessing.Process(target=runprocess)
processes.append(p)
for _ in range(50):
processes[_].start()
runprocess()
def runprocess():
global stop
while stop == False:
x = 1 #do something here
if x = 1:
stop = True
makeprocesses()
while stop == True:
x = 0
makeprocesses()
How could I make all the other 49 processes stop if just one changes stop to True?
I would think since stop is a global variable once one process changes stop all the others would stop.
No. Each process gets its own copy. It's global to the script, but not across processes. Remember that each process has a completely separate address space. It gets a COPY of the first process' data.
If you need to communicate across processes, you need to use one of the synchronization techniques in the multiprocessing documentation (https://docs.python.org/3/library/multiprocessing.html#synchronization-primitives), like an Event or a shared object.
Whenever you want to synchronise threads you need some shared context and make sure it is safe. as #Tim Roberts mentioned These can be taken from (https://docs.python.org/3/library/multiprocessing.html#synchronization-primitives)
Try something like this:
import multiprocessing
from multiprocessing import Event
from time import sleep
def makeprocesses():
processes = []
e = Event()
for i in range(50):
p = multiprocessing.Process(target=runprocess,args= (e,i))
p.start()
processes.append(p)
for p in processes:
p.join()
def runprocess(e: Event() = None,name = 0):
while not e.is_set():
sleep(1)
if name == 1:
e.set() # here we make all other processes to stop
print("end")
if __name__ == '__main__':
makeprocesses()
My favorite way is using cancelation token which is a object wrapping what we did here

How do read and writes work with a manager in Python?

Sorry if this is a stupid question, but I'm having trouble understanding how managers work in python.
Let's say I have a manager that contains a dictionary to be shared across all processes. I want to have just one process writing to the dictionary at a time, while many others read from the dictionary.
Can this happen concurrently, with no synchronization primitives or will something break if read/writes happen at the same time?
What if I want to have multiple processes writing to the dictionary at once - is that allowed or will it break (I know it could cause race conditions, but could it error out)?
Additionally, does a manager process each read and write transaction in a queue like fashion, one at a time, or does it do them all at once?
https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes
It depends on how you write to the dictionary, i.e. whether the operation is atomic or not:
my_dict[some_key] = 9 # this is atomic
my_dict[some_key] += 1 # this is not atomic
So creating a new key and updating a an existing key as in the first line of code above are atomic operations. But the second line of code are really multiple operations equivalent to:
temp = my_dict[some_key]
temp = temp + 1
my_dict[some_key] = temp
So if two processes were executing my_dict[some_key] += 1 in parallel, they could be reading the same value of temp = my_dict[some_key] and incrementing temp to the same new value and the net effect would be that the dictionary value only gets incremented once. This can be demonstrated as follows:
from multiprocessing import Pool, Manager, Lock
def init_pool(the_lock):
global lock
lock = the_lock
def worker1(d):
for _ in range(1000):
with lock:
d['x'] += 1
def worker2(d):
for _ in range(1000):
d['y'] += 1
if __name__ == '__main__':
lock = Lock()
with Manager() as manager, \
Pool(4, initializer=init_pool, initargs=(lock,)) as pool:
d = manager.dict()
d['x'] = 0
d['y'] = 0
# worker1 will serialize with a lock
pool.apply_async(worker1, args=(d,))
pool.apply_async(worker1, args=(d,))
# worker2 will not serialize with a lock:
pool.apply_async(worker2, args=(d,))
pool.apply_async(worker2, args=(d,))
# wait for the 4 tasks to complete:
pool.close()
pool.join()
print(d)
Prints:
{'x': 2000, 'y': 1162}
Update
As far as serialization, goes:
The BaseManager creates a server using by default a socket for Linux and a named pipe for Windows. So essentially every method you execute against a managed dictionary, for example, is pretty much like a remote method call implemented with message passing. This also means that the server could also be running on a different computer altogether. But, these method calls are not serialized; the object methods themselves must be thread-safe because each method call is run in a new thread.
The following is an example of creating our own managed type and having the server listening for requests possibly from a different computer (although in this example, the client is running on the same computer). The client is calling increment on the managed object 1000 times across two threads, but the method implementation is not done under a lock and so the resulting value of self.x when we are all done is not 1000. Also, when we retrieve the value of x twice concurrently by method get_x we see that both invocations start up more-or-less at the same time:
from multiprocessing.managers import BaseManager
from multiprocessing.pool import ThreadPool
from threading import Event, Thread, get_ident
import time
class MathManager(BaseManager):
pass
class MathClass:
def __init__(self, x=0):
self.x = x
def increment(self, y):
temp = self.x
time.sleep(.01)
self.x = temp + 1
def get_x(self):
print(f'get_x started by thread {get_ident()}', time.time())
time.sleep(2)
return self.x
def set_x(self, value):
self.x = value
def server(event1, event2):
MathManager.register('Math', MathClass)
manager = MathManager(address=('localhost', 5000), authkey=b'abracadabra')
manager.start()
event1.set() # show we are started
print('Math server running; waiting for shutdown...')
event2.wait() # wait for shutdown
print("Math server shutting down.")
manager.shutdown()
def client():
MathManager.register('Math')
manager = MathManager(address=('localhost', 5000), authkey=b'abracadabra')
manager.connect()
math = manager.Math()
pool = ThreadPool(2)
pool.map(math.increment, [1] * 1000)
results = [pool.apply_async(math.get_x) for _ in range(2)]
for result in results:
print(result.get())
def main():
event1 = Event()
event2 = Event()
t = Thread(target=server, args=(event1, event2))
t.start()
event1.wait() # server started
client() # now we can run client
event2.set()
t.join()
# Required for Windows:
if __name__ == '__main__':
main()
Prints:
Math server running; waiting for shutdown...
get_x started by thread 43052 1629375415.2502146
get_x started by thread 71260 1629375415.2502146
502
502
Math server shutting down.

Run a function (in a new thread) on a subprocess that is already running

I have some expensive long-running functions that I'd like to run on multiple cores. This is easy to do with multiprocessing. But I will also need to periodically run a function that calculates a value based on the state (global variables) of a specific process. I think this should be possible by simply spawning a thread on the subprocess.
Here's a simplified example. Please suggest how I can call procces_query_state().
import multiprocessing
import time
def process_runner(x: int):
global xx
xx = x
while True:
time.sleep(0.1)
xx += 1 # actually an expensive calculation
def process_query_state() -> int:
y = xx * 2 # actually an expenseive calculation
return y
def main():
processes = {}
for x in range(10):
p = multiprocessing.get_context('spawn').Process(target=process_runner, args=(x,))
p.start()
processes[x] = p
while True:
time.sleep(1)
print(processes[3].process_query_state()) # this doesn't actually work
if __name__ == '__main__':
main()
I see two problems:
Process is not RPC (Remote Procedure Call) and you can't execute other function process_query_state from main process. You can only use queue to send some information to other process - but this process has to periodically check if there is new message.
Process can run only one function so it would stop one function when it get message to run other function or it would have to run threads on new processes to run many functions at the same time.
EDIT: It may give other problem - if two functions will work at the same time on the same data then one can change value before other will use old value and this can create wrong results.
I created example which uses queues to send message to process_runner, and it periodically check if there is message and run process_query_state, and it send result back to main process.
Main process wait for result from selected porcess - it blocks code - but if you want to work with more processes then it would have to make it more complex.
import multiprocessing
import time
def process_query_state():
y = xx * 2 # actually an expenseive calculation
return y
def process_runner(x: int, queue_in, queue_out):
global xx
xx = x
# reverse direction
q_in = queue_out
q_out = queue_in
while True:
time.sleep(0.1)
xx += 1 # actually an expensive calculation
# run other function - it will block main calculations
# but this way it will use correct `xx` (other calculations will not change it)
if not q_in.empty():
if q_in.get() == 'run':
result = process_query_state()
q_out.put(result)
def main():
processes = {}
for x in range(4):
ctx = multiprocessing.get_context('spawn')
q_in = ctx.Queue()
q_out = ctx.Queue()
p = ctx.Process(target=process_runner, args=(x, q_in, q_out))
p.start()
processes[x] = (p, q_in, q_out)
while True:
time.sleep(1)
q_in = processes[3][1]
q_out = processes[3][2]
q_out.put('run')
# non blocking version
#if not q_in.empty():
# print(q_in.get())
# blocking version
print(q_in.get())
if __name__ == '__main__':
main()

Child process hangs, preventing main process to terminate

Good afternoon,
I am trying to parallelize a linear programming solving scheme, code is partially reproduced below. The solving method make use of the PuLP library, which uses subprocesses to run solver
commands.
from collections import OrderedDict
from time import time
from multiprocessing import Queue, Process
from queue import Empty
from os import getpid, path, mkdir
import sys
SOLVER = None
NUMBER_OF_PROCESSES = 12
# other parameters
def choose_solver():
"""Choose an initial solver"""
if SOLVER == "CHOCO":
solver = plp.PULP_CHOCO_CMD()
elif SOLVER == "GLPK":
solver = plp.GLPK_CMD(msg=0)
elif SOLVER == "GUROBI":
solver = plp.GUROBI_CMD(msg=0)
else:
solver = plp.PULP_CBC_CMD(msg=0)
return solver
# other functions that are not multiprocess relevant
def is_infeasible(status):
"""Wrapper around PulP infeasible status"""
return status in (plp.LpStatusInfeasible, plp.LpStatusUndefined)
def feasible_problems(input_var, output_var, initial_problem, solver):
"""Perform LP solving on a initial
problem, return the feasible ones"""
input_gt = input_var - TOL >= 0
input_lt = input_var + TOL <= 0
output_eq_input = (output_var - input_var == 0)
output_eq_zero = (output_var == 0)
problem_a = initial_problem.deepcopy()
problem_a += input_gt
problem_a += output_eq_input
problem_b = initial_problem.deepcopy()
problem_b += input_lt
problem_b += output_eq_zero
problem_a.solve(solver)
problem_b.solve(solver)
status_act = problem_a.status
status_inact = problem_b.status
if is_infeasible(status_act):
return (problem_b,)
else:
if is_infeasible(status_inact):
return (problem_a,)
else:
return (problem_a, problem_b)
def worker(q, r, start_problem, start_idx, to_check):
"""Worker spawned in a new process.
Iterates over the neuron expression list.
Sends a new job to the tasks queue if two activations are available.
"""
problem = start_problem
solver = choose_solver()
for idx in range(start_idx, len(to_check) + 1):
if idx == len(to_check):
r.put_nowait(problem)
else:
output_var, input_var = to_check[idx]
pbs = feasible_problems(input_var, output_var, problem, solver)
if len(pbs) == 1:
problem = pbs[0]
elif len(pbs) == 2:
q.put_nowait((idx+1, pbs[0]))
problem = pbs[1]
def overseer(init_prob, neuron_exprs):
"""Running in the initial process,
this function create tasks and results queues,
maintain the number of current running processes
and spawn new processes when there is enough resources
for them to run.
"""
tasks = Queue()
results = Queue()
working_processes = {}
init_p = Process(target=worker,
args=(tasks, results, init_prob, 0, neuron_exprs))
init_p.start()
working_processes[init_p.pid] = init_p
res_list = []
while len(working_processes) > 0:
if len(working_processes) <= NUMBER_OF_PROCESSES:
# if there is enough room in the working queue,
# spawn a new process and add it
try:
(idx, problem) = tasks.get(timeout=1)
except Empty:
break
proc = Process(target=worker, args=(tasks,
results, problem, idx, neuron_exprs))
proc.start()
working_processes[proc.pid] = proc
to_del = []
for pid in working_processes:
pwork = working_processes[pid]
pwork.join(timeout=0)
if pwork.exitcode is not None:
to_del.append(pid)
for pid in to_del:
#deleting working process
del working_processes[pid]
results.join_thread()
for i in range(results.qsize()):
elt = results.get()
res_list.append(elt)
return res_list
def test_multi(init_prob, neuron_exprs):
print("Testing multi process mode")
now = time()
init_prob, exprs = #some function that calculate those
res = overseer(init_prob, exprs)
print("Time spent: {:.4f}s".format(time()-now))
for idx, problem in enumerate(res):
if not path.exists("results"):
mkdir("results")
problem.writeLP("results/"+str(idx))
if __name__ == '__main__':
torch_model = read_model(MODEL_PATH)
print("Number of neurons: ", count_neurons(torch_model))
print("Expected number of facets: ",
theoretical_number(torch_model, DIM_INPUT))
prob, to_check, hp, var_dict = init_problem(torch_model)
test_multi(prob, to_check)
In my worker, I perform some costly calculations that may result in two different problems;
if that happens, I send one problem to a tasks queue while keeping the other for the current worker process. My overseer take a task in the queue and launch a process when it can.
to_check is a list of PuLP expressions,
What I want to do is to fill the working_processes dictionnary with processes that are actually running, then look for their results at each iteration and remove those who have finished. The expected behaviour would be to keep spawning new processes when old ones terminates, which does not seem to be the case. However here I am indefinitely hanging: I successfully take the tasks in the queue, but my program hangs when I spawn more than NUMBER_OF_PROCESSES.
I'm quite new to multiprocessing, so there is maybe something wrong with how I'm doing it. Does anyone have any idea?
Take a look at the ProcessPoolExecutor from concurrent.futures.
Executor objects allow you to specify a pool of workers with a capped size. You can submit all your jobs simultaneously and the executors run through them picking up new jobs as old ones are completed.

How to structure code to be able to launch tasks that can kill/replace each other

I have a Python program that does the following:
1) endlessly wait on com port a command character
2) on character reception, launch a new thread to execute a particular piece of code
What I would need to do if a new command is received is:
1) kill the previous thread
2) launch a new one
I read here and there that doing so is not the right way to proceed.
What would be the best way to do this knowing that I need to do this in the same process so I guess I need to use threads ...
I would suggest you two differente approaches:
if your processes are both called internally from a function, you could set a timeout on the first function.
if you are running external script, you might want to kill the process.
Let me try to be more precise in my question by adding an example of my code structure.
Suppose synchronous functionA is still running because waiting internally for a particular event, if command "c" is received, I need to stop functionA and launch functionC.
def functionA():
....
....
call a synchronous serviceA that can take several seconds even more to execute
....
....
def functionB():
....
....
call a synchronous serviceB that nearly returns immediately
....
....
def functionC():
....
....
call a synchronous serviceC
....
....
#-------------------
def launch_async_task(function):
t = threading.Thread(target=function, name="async")
t.setDaemon(True)
t.start()
#------main----------
while True:
try:
car = COM_port.read(1)
if car == "a":
launch_async_task(functionA)
elif car == "b":
launch_async_task(functionB)
elif car == "c":
launch_async_task(functionC)
May want to run the serial port in a separate thread. When it receives a byte put that byte in a queue. Have the main program loop and check the queue to decide what to do with it. From the main program you can kill the thread with join and start a new thread. You may also want to look into a thread pool to see if it is what you want.
ser = serial.Serial("COM1", 9600)
que = queue.Queue()
def read_serial(com, q):
val = com.read(1)
q.put(val)
ser_th = threading.Thread(target=read_serial, args=(ser, que))
ser_th.start()
th = None
while True:
if not que.empty():
val = que.get()
if val == b"e":
break # quit
elif val == b"a":
if th is not None:
th.join(0) # Kill the previous function
th = threading.Thread(target=functionA)
th.start()
elif val == b"b":
if th is not None:
th.join(0) # Kill the previous function
th = threading.Thread(target=functionB)
th.start()
elif val == b"c":
if th is not None:
th.join(0) # Kill the previous thread (functionA)
th = threading.Thread(target=functionC)
th.start()
try:
ser.close()
th.join(0)
except:
pass
If you are creating and joining a lot of threads you may want to just have a function that checks what command to run.
running = True
def run_options(option):
if option == 0:
print("Running Option 0")
elif option == 1:
print("Running Option 1")
else:
running = False
while running:
if not que.empty():
val = que.get()
run_options(val)
Ok, I finally used a piece of code that uses ctypes lib to provide some kind of killing thread function.
I know this is not a clean way to proceed but in my case, there are no resources shared by the threads so it shouldn't have any impact ...
If it can help, here is the piece of code that can easily be found on the net:
def terminate_thread(thread):
"""Terminates a python thread from another thread.
:param thread: a threading.Thread instance
"""
if not thread.isAlive():
return
exc = ctypes.py_object(SystemExit)
res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
ctypes.c_long(thread.ident), exc)
if res == 0:
raise ValueError("nonexistent thread id")
elif res > 1:
# """if it returns a number greater than one, you're in trouble,
# and you should call it again with exc=NULL to revert the effect"""
ctypes.pythonapi.PyThreadState_SetAsyncExc(thread.ident, None)
raise SystemError("PyThreadState_SetAsyncExc failed")

Categories