I want to run several process in parallel without giving cpu too much work so that cpu can also do other jobs.
In python, I will use os.system to call some binary. And theses call are independent and can be parallel. But these binary may run for different length of time.
What I want to do is for example, always keep 8 of them run in parallel, if some one exit early, then start another one.
what I am doing now is like this:
count = 0
for f in files:
count = count + 1
cmd = exe
if (count != 8):
cmd = cmd + " &"
else:
count = 0
os.sytem(cmd)
but this will be not ideal if the cmd without & runs too long or too short.
I also tried multiprocessing module,
p=Pool(8)
print(p.map(f,list_of_args))
but in this case I am not running 8 processes in parallel for most of the time. Since some of them exit early.
There is no need for synchronization.
I have 16 cpu cores and I want half of them(8 processes to runs in parallel)
You'd better not to use os.system but subprocess.Popen as it is more powerful and safe. Moreover subprocess.Popen does not block on call so you don't need to append any '&' at the end of the command.
For the question itself, you need to know that Operating Systems are quite good in balancing automatically the workload so you should not worry about idling processes vs running ones. Just launch your workers with the Pool and let them run until needed without worrying of 'wasting' any resource. An idling process takes just a bit of memory and that's it.
When it comes to improving your code, something you might want to use is a Pool of threads instead of a Pool of processes. This due to the fact that your workers are simply waiting for other ones to finish so threads are better than processes for that.
If you can use Python 3 something like this will do the job for you.
import subprocess
from concurrent.futures import ThreadPoolExecutor
def function(myfile):
command = ('watever', 'you', 'want', 'to', 'do', 'with', myfile)
process = subprocess.Popen(command, stdout=subprocess.PIPE)
process.communicate()
with ThreadPoolExecutor(max_workers=8) as executor:
future = executor.map(function, files)
future.result()
Related
I have a problem with using Python to run Windows executables in parallel.
I will explain my problem in more detail.
I was able to write some code that creates an amount of threads equal to the number of cores. Each thread executes the following function that starts the executable with the use of subprocess.Popen().
The executable are unit test for an application. The test use gtest library. From what I know they just read and write on the file system.
def _execute(self, test_file_path) -> None:
test_path = self._get_test_path_without_extension(test_file_path)
process = subprocess.Popen(test_path,
shell=False,
stdout=sys.stdout,
stderr=sys.stderr,
universal_newlines=True)
try:
process.communicate(timeout=TEST_TIMEOUT_IN_SECONDS)
if process.returncode != 0:
print(f'Test fail')
except subprocess.TimeoutExpired:
process.kill()
During the execution of processes it happens that some hang, never ending. I set a timeout as workaround but I wondering why some of these application never terminate. This block the execution of the Python code.
The following code show the creation of the threads. The function _execute_tests just take a test from the Queue (with the .get() function) and pass it to the function execute(test_file_path).
### Peace of code used to spawn the threads
for i in range(int(core_num)):
thread = threading.Thread(target=self._execute_tests,
args=(tests,),
daemon=True)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
I already try to:
use subprocess.run, subprocess.call and the other function explained on the documentation page
use a larger buffer with the use of bufsize parameter
disable the buffer
move the stdout to a file per thread
move the stdout to subprocess.DEVNULL
remove the use of subprocess.communicate()
remove the use of threading
use multiprocessing
On my local machine with 16 core / 64GB RAM I can run without problems 16 threads. All of them always terminate without problems. To be able to reproduce the problem I need to increase the number of threads to 30/40.
On Azure machine with 8 core / 32 GB RAM the issues can be reproduce with just 8 threads in parallel.
If I run the executables from a bat
for /r "." %%a in (*.exe) do start /B "" "%%~fa"
the problem never happen.
Have someone an idea of what could be the problem?
The Problem
I have a program I'm running with Popen that spawns a bunch of subprocesses. Those subprocesses close every few minutes to be replaced by new ones. The main process (the program I called Popen on) dies pretty quickly. I'm trying to figure out how to get the cpu usage of everything in the process group.
What I've Tried
I've tried wait4 and getrusage. I've also tried psutil but I realized that if I want to check the resource usage I'd have to spawn a bunch of threads to concurrently check the resource usage of all the supprocesses and it'd be messy and error prone.
Here's a sample of code that doesn't work. It only gets the resource usage for the immediate child and none of the grand children.
old = time.time()
proc = Popen(["g09", "../sto.com"], text=True, stderr=PIPE, preexec_fn=setsid)
ru = wait4(proc.pid, 0)
new = time.time()
print(100*(ru.ru_utime + ru.ru_stime)/(new - old))
ru = getrusage(RUSAGE_CHILDREN)
print(100*(ru.ru_utime + ru.ru_stime)/(new - old))
I need to set a new sid (or pgid) because sometimes I want to kill the process before it's finished and if I don't set a new one then the whole python script goes down with it.
PS
I could just use the time command (like Popen(["time", "g09",...])) but I was wondering if there was a way to do this only in python.
I am using the multiprocessing module in python to launch few processes in parallel. These processes are independent of each other. They generate their own output and write out the results in different files. Each process calls an external tool using the subprocess.call method.
It was working fine until I discovered an issue in the external tool where due to some error condition it goes into a 'prompt' mode and waits for the user input. Now in my python script I use the join method to wait till all the processes finish their tasks. This is causing the whole thing to wait for this erroneous subprocess call. I can put a timeout for each of the process but I do not know in advance how long each one is going to run and hence this option is ruled out.
How do I figure out if any child process is waiting for an user input and how do I send an 'exit' command to it? Any pointers or suggestions to relevant modules in python will be really appreciated.
My code here:
import subprocess
import sys
import os
import multiprocessing
def write_script(fname,e):
f = open(fname,'w')
f.write("Some useful cammnd calling external tool")
f.close()
subprocess.call(['chmod','+x',os.path.abspath(fname)])
return os.path.abspath(fname)
def run_use(mname,script):
print "ssh "+mname+" "+script
subprocess.call(['ssh',mname,script])
if __name__ == '__main__':
dict1 = {}
dict['mod1'] = ['pp1','ext2','les3','pw4']
dict['mod2'] = ['aaa','bbb','ccc','ddd']
machines = ['machine1','machine2','machine3','machine4']
log_file.write(str(dict1.keys()))
for key in dict1.keys():
arr = []
for mod in dict1[key]:
d = {}
arr.append(mod)
if ((mod == dict1[key][-1]) | (len(arr)%4 == 0)):
for i in range(0,len(arr)):
e = arr.pop()
script = write_script(e+"_temp.sh",e)
d[i] = multiprocessing.Process(target=run_use,args=(machines[i],script,))
d[i].daemon = True
for pp in d:
d[pp].start()
for pp in d:
d[pp].join()
Since you're writing a shell script to run your subcommands, can you simply tell them to read input from /dev/null?
#!/bin/bash
# ...
my_other_command -a -b arg1 arg2 < /dev/null
# ...
This may stop them blocking on input and is a really simple solution. If this doesn't work for you, read on for some other options.
The subprocess.call() function is simply shorthand for constructing a subprocess.Popen instance and then calling the wait() method on it. So, your spare processes could instead create their own subprocess.Popen instances and poll them with poll() method on the object instead of wait() (in a loop with a suitable delay). This leaves them free to remain in communication with the main process so you can, for example, allow the main process to tell the child process to terminate the Popen instance with the terminate() or kill() methods and then itself exit.
So, the question is how does the child process tell whether the subprocess is awaiting user input, and that's a trickier question. I would say perhaps the easiest approach is to monitor the output of the subprocess and search for the user input prompt, assuming that it always uses some string that you can look for. Alternatively, if the subprocess is expected to generate output continually then you could simply look for any output and if a configured amount of time goes past without any output then you declare that process dead and terminate it as detailed above.
Since you're reading the output, actually you don't need poll() or wait() - the process closing its output file descriptor is good enough to know that it's terminated in this case.
Here's an example of a modified run_use() method which watches the output of the subprocess:
def run_use(mname,script):
print "ssh "+mname+" "+script
proc = subprocess.Popen(['ssh',mname,script], stdout=subprocess.PIPE)
for line in proc.stdout:
if "UserPrompt>>>" in line:
proc.terminate()
break
In this example we assume that the process either gets hung on on UserPrompt>>> (replace with the appropriate string) or it terminates naturally. If it were to get stuck in an infinite loop, for example, then your script would still not terminate - you can only really address that with an overall timeout, but you didn't seem keen to do that. Hopefully your subprocess won't misbehave in that way, however.
Finally, if you don't know in advance the prompt that will be giving from your process then your job is rather harder. Effectively what you're asking to do is monitor an external process and know when it's blocked reading on a file descriptor, and I don't believe there's a particularly clean solution to this. You could consider running a process under strace or similar, but that's quite an awful hack and I really wouldn't recommend it. Things like strace are great for manual diagnostics, but they really shouldn't be part of a production setup.
Most of the examples I've seen with os.fork and the subprocess/multiprocessing modules show how to fork a new instance of the calling python script or a chunk of python code. What would be the best way to spawn a set of arbitrary shell command concurrently?
I suppose, I could just use subprocess.call or one of the Popen commands and pipe the output to a file, which I believe will return immediately, at least to the caller. I know this is not that hard to do, I'm just trying to figure out the simplest, most Pythonic way to do it.
Thanks in advance
All calls to subprocess.Popen return immediately to the caller. It's the calls to wait and communicate which block. So all you need to do is spin up a number of processes using subprocess.Popen (set stdin to /dev/null for safety), and then one by one call communicate until they're all complete.
Naturally I'm assuming you're just trying to start a bunch of unrelated (i.e. not piped together) commands.
I like to use PTYs instead of pipes. For a bunch of processes where I only want to capture error messages I did this.
RNULL = open('/dev/null', 'r')
WNULL = open('/dev/null', 'w')
logfile = open("myprocess.log", "a", 1)
REALSTDERR = sys.stderr
sys.stderr = logfile
This next part was in a loop spawning about 30 processes.
sys.stderr = REALSTDERR
master, slave = pty.openpty()
self.subp = Popen(self.parsed, shell=False, stdin=RNULL, stdout=WNULL, stderr=slave)
sys.stderr = logfile
After this I had a select loop which collected any error messages and sent them to the single log file. Using PTYs meant that I never had to worry about partial lines getting mixed up because the line discipline provides simple framing.
There is no best for all possible circumstances. The best depends on the problem at hand.
Here's how to spawn a process and save its output to a file combining stdout/stderr:
import subprocess
import sys
def spawn(cmd, output_file):
on_posix = 'posix' in sys.builtin_module_names
return subprocess.Popen(cmd, close_fds=on_posix, bufsize=-1,
stdin=open(os.devnull,'rb'),
stdout=output_file,
stderr=subprocess.STDOUT)
To spawn multiple processes that can run in parallel with your script and each other:
processes, files = [], []
try:
for i, cmd in enumerate(commands):
files.append(open('out%d' % i, 'wb'))
processes.append(spawn(cmd, files[-1]))
finally:
for p in processes:
p.wait()
for f in files:
f.close()
Note: cmd is a list everywhere.
I suppose, I could just us subprocess.call or one of the Popen
commands and pipe the output to a file, which I believe will return
immediately, at least to the caller.
That's not a good way to do it if you want to process the data.
In this case, better do
sp = subprocess.Popen(['ls', '-l'], stdout=subprocess.PIPE)
and then sp.communicate() or read directly from sp.stdout.read().
If the data shall be processed in the calling program at a later time, there are two ways to go:
You can retrieve the data ASAP, maybe via a separate thread, reading them and storing them somewhere where the consumer can get them.
You can have the producing subprocess have block and retrieve the data from it when you need them. The subprocess produces as many data as fit in the pipe buffer (usually 64 kiB) and then blocks on further writes. As soon as you need the data, you read() from the subprocess object's stdout (maybe stderr as well) and use them - or, again, you use sp.communicate() at that later time.
Way 1 would the way to go if producing the data needs much time, so that your wprogram would have to wait.
Way 2 would be to be preferred if the size of the data is quite huge and/or the data is produced so fast that buffering would make no sense.
See an older answer of mine including code snippets to do:
Uses processes not threads for blocking I/O because they can more reliably be p.terminated()
Implements a retriggerable timeout watchdog that restarts counting whenever some output happens
Implements a long-term timeout watchdog to limit overall runtime
Can feed in stdin (although I only need to feed in one-time short strings)
Can capture stdout/stderr in the usual Popen means (Only stdout is coded, and stderr redirected to stdout; but can easily be separated)
It's almost realtime because it only checks every 0.2 seconds for output. But you could decrease this or remove the waiting interval easily
Lots of debugging printouts still enabled to see whats happening when.
For spawning multiple concurrent commands, you would need to alter the class RunCmd to instantiate mutliple read output/write input queues and to spawn mutliple Popen subprocesses.
This question is more fact finding and thought process than code oriented.
I have many compiled C++ programs that I need to run at different times and with different parameters. I'm looking at using Python multiprocessing to read a job from job queue (rabbitmq) and then feed that job to a C++ program to run (maybe subprocess). I was looking at the multiprocessing module because this will all run on dual Xeon server so I want to take full advantage of the multiprocessor ability of my server.
The Python program would be the central manager and would simply read jobs from the queue, spawn a process (or subprocess?) with the appropriate C++ program to run the job, get the results (subprocess stdout & stderr), feed that to a callback and put the process back in a queue of processes waiting for the next job to run.
First, does this sound like a valid strategy?
Second, are there any type of examples of something similar to this?
Thank you in advance.
The Python program would be the
central manager and would simply read
jobs from the que, spawn a process (or
subprocess?) with the appropriate C++
program to run the job, get the
results (subprocess stdout & stderr),
feed that to a callback and put the
process back in a que of processes
waiting for the next job to run.
You don't need the multiprocessing module for this. The multiprocessing module is good for running Python functions as separate processes. To run a C++ program and read results from stdout, you'd only need the subprocess module. The queue could be a list, and your Python program would simply loop while the list is non-empty.
However, if you want to
spawn multiple worker processes
have them read from a common queue
use the arguments from the queue to
spawn C++ programs (in parallel)
use the output of the C++ programs
to put new items in the queue
then you could do it with multiprocessing like this:
test.py:
import multiprocessing as mp
import subprocess
import shlex
def worker(q):
while True:
# Get an argument from the queue
x=q.get()
# You might change this to run your C++ program
proc=subprocess.Popen(
shlex.split('test2.py {x}'.format(x=x)),stdout=subprocess.PIPE)
out,err=proc.communicate()
print('{name}: using argument {x} outputs {o}'.format(
x=x,name=mp.current_process().name,o=out))
q.task_done()
# Put a new argument into the queue
q.put(int(out))
def main():
q=mp.JoinableQueue()
# Put some initial values into the queue
for t in range(1,3):
q.put(t)
# Create and start a pool of worker processes
for i in range(3):
p=mp.Process(target=worker, args=(q,))
p.daemon=True
p.start()
q.join()
print "Finished!"
if __name__=='__main__':
main()
test2.py (a simple substitute for your C++ program):
import time
import sys
x=int(sys.argv[1])
time.sleep(0.5)
print(x+3)
Running test.py might yield something like this:
Process-1: using argument 1 outputs 4
Process-3: using argument 3 outputs 6
Process-2: using argument 2 outputs 5
Process-3: using argument 6 outputs 9
Process-1: using argument 4 outputs 7
Process-2: using argument 5 outputs 8
Process-3: using argument 9 outputs 12
Process-1: using argument 7 outputs 10
Process-2: using argument 8 outputs 11
Process-1: using argument 10 outputs 13
Notice that the numbers in the right-hand column are fed back into the queue, and are (eventually) used as arguments to test2.py and show up as numbers in the left-hard column.
First, does this sound like a valid strategy?
Yes.
Second, are there any type of examples of something similar to this?
Celery
Sounds like a good strategy, but you don't need the multiprocessing module for it, but rather the subprocess module. subprocess is for running child processes from a Python program and interacting with them (stdio, stdout, pipes, etc.), while multiprocessing is more about distributing Python code to run in multiple processes to gain performance through parallelism.
Depending on the responsiveness strategy, you may also want to look at threading for launching subprocesses from a thread. This will allow you to wait on one subprocess while still being responsive on the queue to accept other jobs.