I want to develop a system that reads input from two devices at the same time. Each process works independently at the moment but since I need to sync them, I want them both to write their output on the same file.
import multiprocessing as mp
from multiprocessing import Process
from multiprocessing import Pool
import time
# running the data aquisition from the screen
def Screen(fname):
for x in range(1, 9):
fname.write(str(x)+ '\n')
fname.flush()
time.sleep(0.5)
print(x)
# running the data aquisition from the EEG
def EEG(fname):
for y in range(10, 19):
fname.write(str(y)+ '\n')
fname.flush()
time.sleep(0.3)
print(y)
# main program body #
# open the common file that the processes write to
fname = open('C:/Users/Yaron/Documents/Python Scripts/research/demofile.txt', 'w+')
pool = Pool(processes=2)
p1 = pool.map_async(Screen,fname)
p2 = pool.map_async(EEG,fname)
print ('end')
fname.close()
In multiprocessing, depending on the OS you may not be able to pass an open file handle to the process. Here's code that should work on any OS:
import multiprocessing as mp
import time
def Screen(fname,lock):
with open(fname,'a') as f:
for y in range(1,11):
time.sleep(0.5)
with lock:
print(y)
print(y,file=f,flush=True)
def EEG(fname,lock):
with open(fname,'a') as f:
for y in range(11, 21):
time.sleep(0.3)
with lock:
print(y)
print(y,file=f,flush=True)
if __name__ == '__main__':
fname = 'demofile.txt'
lock = mp.Lock()
with open(fname,'w'): pass # truncates existing file and closes it
processes = [mp.Process(target=Screen,args=(fname,lock)),
mp.Process(target=EEG,args=(fname,lock))]
s = time.perf_counter()
for p in processes:
p.start()
for p in processes:
p.join()
print (f'end (time={time.perf_counter() - s}s)')
Some notes:
Open the file in each process. Windows, for example, doesn't fork() the process and doesn't inherit the handle. The handle isn't picklable to pass between processes.
Open the file for append. Two processes would have two different file pointers. Append makes sure it seeks to the end each time.
Protect the file accesses with a lock for serialization. Create the lock in the main thread and pass the same Lock to each process.
Use if __name__ == '__main__': to run one-time code in the main thread. Some OSes import the script in other processes and this protects the code from running multiple times.
map_async isn't used correctly. It takes an iterable of arguments to pass to the function. Instead, make the two processes, start them, and join them to wait for completion.
Related
So this is the first time I am playing around with threading so please bare with me here. In my main application (which I will implement this into), I need to add multithreading into my script. The script will read account info from a text file, then login & do some tasks with that account. I need to make sure that threads aren't reading the same line from the accounts text file since that would screw everything up, which I'm not quite sure about how to do.
from multiprocessing import Queue, Process
from threading import Thread
from time import sleep
urls_queue = Queue()
max_process = 10
def dostuff():
with open ('acc.txt', 'r') as accounts:
for account in accounts:
account.strip()
split = account.split(":")
a = {
'user': split[0],
'pass': split[1],
'name': split[2].replace('\n', ''),
}
sleep(1)
print(a)
for i in range(max_process):
urls_queue.put("DONE")
def doshit_processor():
while True:
url = urls_queue.get()
if url == "DONE":
break
def main():
file_reader_thread = Thread(target=dostuff)
file_reader_thread.start()
procs = []
for i in range(max_process):
p = Process(target=doshit_processor)
procs.append(p)
p.start()
for p in procs:
p.join()
print('all done')
# wait for all tasks in the queue
file_reader_thread.join()
if __name__ == '__main__':
main()
So at the moment I don't think the threading is even working, because it's printing one account out per second, even with 10 threads. So it should be printing 10 accounts per second which it isn't which has me confused. Also I am not sure how to make sure that threads won't pick the same account line. Help by a big brain is much appreciated
The problem is that you create a single thread to generate the data for your processes but then don't post that data to the queue. You sleep in that single thread so you see one item generated per second and then... nothing because the item isn't queued. It seems that all you are doing is creating a process pool and the inbuilt multiprocessing.Pool should work for you.
I've set pool "chunk size" low so that workers are only given 1 work item at a time. This is good for workflows where processing time can vary for each work item. By default, pool tries to optimize for the case where processing time is roughly equivalent and instead tries to reduce interprocess communication time.
Your data looks like a colon-separated file and you can use csv to cut down the processing there too. This smaller script should do what you want.
import multiprocessing as mp
from time import sleep
import csv
max_process = 10
def doshit_processor(row):
time.sleep(1) # if you want to simulate work
print(row)
def main():
with open ('acc.txt', newline='') as accounts:
table = list(csv.DictReader(accounts, fieldnames=('user', 'pass', 'name'),
delimiter=':')
with mp.Pool(max_process) as pool:
pool.map(doshit_processor, table, chunksize=1)
print('all done')
if __name__ == '__main__':
main()
I have a requirement where I have to launch multiple applications, get each of their stdout,stderr and these applications run infinitely. The processes do not exchange/share data and are independent of each other. To do variable stress tests, their timeouts might be different.
For eg:
app1 -> 70sec
app2 -> 30sec
.
.
appN -> 20sec
If these were multiple apps with one common timeout, I would have wrapped it in a timed while loop and killed all processes in the end.
Here are some approaches I think should work:
A timer thread for each app, which reads stdout and as soon as it expires, kills the process. The process is launched within the thread
One timer thread that loops through a dictionary of pid/process_objects:end_time for each process and kills the process when its end_time is >= current time
I have tried using asyncio gather, but it doesn't fully meet my needs and I have faced some issues on Windows.
Are there any other approaches that I can use?
Second option is pretty production-ready. Have a control loop where you poll for processes to complete and kill them if they are timing out
Here is the code for the second approach (extended https://stackoverflow.com/a/9745864/286990).
#!/usr/bin/env python
import io
import os
import sys
from subprocess import Popen
import threading
import time
import psutil
def proc_monitor_thread(proc_dict):
while proc_dict != {}:
for k,v in list(proc_dict.items()):
if time.time() > v:
print("killing " + str(k))
m = psutil.Process(k)
m.kill()
del proc_dict[k]
time.sleep(2)
pros = {}
ON_POSIX = 'posix' in sys.builtin_module_names
# create a pipe to get data
input_fd, output_fd = os.pipe()
# start several subprocesses
st_time = time.time()
for i in ["www.google.com", "www.amd.com", "www.wix.com"]:
proc = Popen(["ping", "-t", str(i)], stdout=output_fd,
close_fds=ON_POSIX) # close input_fd in children
if "google" in i:
pros[proc.pid] = time.time() + 5
elif "amd" in i:
pros[proc.pid] = time.time() + 8
else:
pros[proc.pid] = time.time() + 10
os.close(output_fd)
x = threading.Thread(target=proc_monitor_thread, args=(pros,))
x.start()
# read output line by line as soon as it is available
with io.open(input_fd, 'r', buffering=1) as file:
for line in file:
print(line, end='')
#
print("End")
x.join()
import random
import queue as Queue
import _thread as Thread
a = Queue.Queue()
def af():
while True:
a.put(random.randint(0,1000))
def bf():
while True:
if (not a.empty()): print (a.get())
def main():
Thread.start_new_thread(af, ())
Thread.start_new_thread(bf, ())
return
if __name__ == "__main__":
main()
the above code works fine with extreme high CPU usage, i tried to use multiprocessing with no avail. i have tried
def main():
multiprocessing.Process(target=af).run()
multiprocessing.Process(target=bf).run()
and
def main():
manager = multiprocessing.Manager()
a = manager.Queue()
pool = multiprocessing.Pool()
pool.apply_async(af)
pool.apply_async(bf)
both not working, can anyone please help me? thanks a bunch ^_^
def main():
multiprocessing.Process(target=af).run() # will not return
multiprocessing.Process(target=bf).run()
The above code does not work because af does not return; no chance to call bf. You need to separate run call to start/join so that both can run in parallel. (+ to make them share manage.Queue)
To make the second code work, you need to pass a (manager.Queue object) to functions. Otherwise they will use Queue.Queue global object which is not shared between processes; need to modify af, bf to accepts a, and main to pass a.
def af(a):
while True:
a.put(random.randint(0, 1000))
def bf(a):
while True:
print(a.get())
def main():
manager = multiprocessing.Manager()
a = manager.Queue()
pool = multiprocessing.Pool()
proc1 = pool.apply_async(af, [a])
proc2 = pool.apply_async(bf, [a])
# Wait until process ends. Uncomment following line if there's no waiting code.
# proc1.get()
# proc2.get()
In the first alternative main you use Process, but the method you should call to start the activity is not run(), as one would think, but rather start(). You will want to follow that up with appropriate join() statements. Following the information in multiprocessing (available here: https://docs.python.org/2/library/multiprocessing.html), here is a working sample:
import random
from multiprocessing import Process, Queue
def af(q):
while True:
q.put(random.randint(0,1000))
def bf(q):
while True:
if not q.empty():
print (q.get())
def main():
a = Queue()
p = Process(target=af, args=(a,))
c = Process(target=bf, args=(a,))
p.start()
c.start()
p.join()
c.join()
if __name__ == "__main__":
main()
To add to the accepted answer, in the original code:
while True:
if not q.empty():
print (q.get())
q.empty() is being called every time which is unnecessary since q.get() if the queue is empty will wait until something is available here documentation.
Similar answer here
I assume that this could affect the performance since calling the .empty() every iteration should consume more resources (it should be more noticeable if Thread was used instead of Process because Python Global Interpreter Lock (GIL))
I know it's an old question but hope it helps!
I'm struggling to get my head around multiprocessing and passing a global True/False variable into my function.
After get_data() finishes I want the analysis() function to start and process the data, while fetch() continues running. How can I make this work? TIA
import multiprocessing
ready = False
def fetch():
global ready
get_data()
ready = True
return
def analysis():
analyse_data()
if __name__ == '__main__':
p1 = multiprocessing.Process(target=fetch)
p2 = multiprocessing.Process(target=analysis)
p1.start()
if ready:
p2.start()
You should run the two processes and use a shared queue to exchange information between them, such as signaling the completion of an action in one of the processes.
Also, you need to have a join() statement to properly wait for completion of the processes you spawn.
from multiprocessing import Process, Queue
import time
def get_data(q):
#Do something to get data
time.sleep(2)
#Put an event in the queue to signal that get_data has finished
q.put('message from get_data to analyse_data')
def analyse_data(q):
#waiting for get_data to finish...
msg = q.get()
print msg #Will print 'message from get_data to analyse_data'
#get_data has finished
if __name__ == '__main__':
#Create queue for exchanging messages between processes
q = Queue()
#Create processes, and send the shared queue to them
processes = [Process(target=get_data,args(q,)),Process(target=analyse_data,args=(q,))]
#Start processes
for p in processes:
p.start()
#Wait until all processes complete
for p in processes:
p.join()
You example won't work for a few reasons :
Process cannot share a piece of memory with each other (you can't change the global in one process and see the change in the other)
Even if you could change the global value, you are checking it too fast and most likely it won't change in time
Read https://docs.python.org/3/library/ipc.html for more possibilities for inter-process-communications
I'm running python 2.7.3 and I noticed the following strange behavior. Consider this minimal example:
from multiprocessing import Process, Queue
def foo(qin, qout):
while True:
bar = qin.get()
if bar is None:
break
qout.put({'bar': bar})
if __name__ == '__main__':
import sys
qin = Queue()
qout = Queue()
worker = Process(target=foo,args=(qin,qout))
worker.start()
for i in range(100000):
print i
sys.stdout.flush()
qin.put(i**2)
qin.put(None)
worker.join()
When I loop over 10,000 or more, my script hangs on worker.join(). It works fine when the loop only goes to 1,000.
Any ideas?
The qout queue in the subprocess gets full. The data you put in it from foo() doesn't fit in the buffer of the OS's pipes used internally, so the subprocess blocks trying to fit more data. But the parent process is not reading this data: it is simply blocked too, waiting for the subprocess to finish. This is a typical deadlock.
There must be a limit on the size of queues. Consider the following modification:
from multiprocessing import Process, Queue
def foo(qin,qout):
while True:
bar = qin.get()
if bar is None:
break
#qout.put({'bar':bar})
if __name__=='__main__':
import sys
qin=Queue()
qout=Queue() ## POSITION 1
for i in range(100):
#qout=Queue() ## POSITION 2
worker=Process(target=foo,args=(qin,))
worker.start()
for j in range(1000):
x=i*100+j
print x
sys.stdout.flush()
qin.put(x**2)
qin.put(None)
worker.join()
print 'Done!'
This works as-is (with qout.put line commented out). If you try to save all 100000 results, then qout becomes too large: if I uncomment out the qout.put({'bar':bar}) in foo, and leave the definition of qout in POSITION 1, the code hangs. If, however, I move qout definition to POSITION 2, then the script finishes.
So in short, you have to be careful that neither qin nor qout becomes too large. (See also: Multiprocessing Queue maxsize limit is 32767)
I had the same problem on python3 when tried to put strings into a queue of total size about 5000 cahrs.
In my project there was a host process that sets up a queue and starts subprocess, then joins. Afrer join host process reads form the queue. When subprocess producess too much data, host hungs on join. I fixed this using the following function to wait for subprocess in the host process:
from multiprocessing import Process, Queue
from queue import Empty
def yield_from_process(q: Queue, p: Process):
while p.is_alive():
p.join(timeout=1)
while True:
try:
yield q.get(block=False)
except Empty:
break
I read from queue as soon as it fills so it never gets very large
I was trying to .get() an async worker after the pool had closed
indentation error outside of a with block
i had this
with multiprocessing.Pool() as pool:
async_results = list()
for job in jobs:
async_results.append(
pool.apply_async(
_worker_func,
(job,),
)
)
# wrong
for async_result in async_results:
yield async_result.get()
i needed this
with multiprocessing.Pool() as pool:
async_results = list()
for job in jobs:
async_results.append(
pool.apply_async(
_worker_func,
(job,),
)
)
# right
for async_result in async_results:
yield async_result.get()