Run command for all files in parallel

Run command for all files in parallel - python

I have the following command on the build-server as a part of the build process:
os.system ('signtool sign /a /t http://timestamp.verisign.com/scripts/timstamp.dll "%s\\*.exe"' % (dir) )
This command signs each executable file in the specified directory. Is there a way to run this command in parallel for each executable file using Python? Is there something like OpenMP for Python?

You could use threads. This tutorial shows how to do something similar to what you're asking for using threads.

Perhaps multiprocessing could be of help here?
Specifically, multiprocessing.Pool.map() might be relevant to your needs.

The above answers are perfectly sensible ways of approaching things from the Python side, eg
from multiprocessing import Pool
import os
def processFile(x):
return os.system('ls '+x)
if __name__ == '__main__':
pool = Pool(processes=2)
files=['foo','foo.py','foo.cpp','foo.txt','foo.bar']
result = pool.map(processFile, files)
print 'Results are', result
But if you're using the shell anyway, you might want to consider using Gnu Parallel on the shell side, which runs like xargs but does the individual tasks in parallel, with options to control how many jobs can run simultaneously, etc.

Related

How to restart a Python script?

In a program I am writing in python I need to completely restart the program if a variable becomes true, looking for a while I found this command:
while True:
if reboot == True:
os.execv(sys.argv[0], sys.argv)
When executed it returns the error [Errno 8] Exec format error. I searched for further documentation on os.execv, but didn't find anything relevant, so my question is if anyone knows what I did wrong or knows a better way to restart a script (by restarting I mean completely re-running the script, as if it were been opened for the first time, so with all unassigned variables and no thread running).

There are multiple ways to achieve the same thing. Start by modifying the program to exit whenever the flag turns True. Then there are various options, each one with its advantages and disadvantages.
Wrap it using a bash script.
The script should handle exits and restart your program. A really basic version could be:
#!/bin/bash
while :
do
python program.py
sleep 1
done
Start the program as a sub-process of another program.
Start by wrapping your program's code to a function. Then your __main__ could look like this:
def program():
### Here is the code of your program
...
while True:
from multiprocessing import Process
process = Process(target=program)
process.start()
process.join()
print("Restarting...")
This code is relatively basic, and it requires error handling to be implemented.
Use a process manager
There are a lot of tools available that can monitor the process, run multiple processes in parallel and automatically restart stopped processes. It's worth having a look at PM2 or similar.
IMHO the third option (process manager) looks like the safest approach. The other approaches will have edge cases and require implementation from your side to handle edge cases.

This has worked for me. Please add the shebang at the top of your code and os.execv() as shown below
#!/usr/bin/env python3
import os
import sys
if __name__ == '__main__':
while True:
reboot = input('Enter:')
if reboot == '1':
sys.stdout.flush()
os.execv(sys.executable, [sys.executable, __file__] + [sys.argv[0]])
else:
print('OLD')

I got the same "Exec Format Error", and I believe it is basically the same error you get when you simply type a python script name at the command prompt and expect it to execute. On linux it won't work because a path is required, and the execv method is basically encountering the same error.
You could add the pathname of your python compiler, and that error goes away, except that the name of your script then becomes a parameter and must be added to the argv list. To avoid that, make your script independently executable by adding "#!/usr/bin/python3" to the top of the script AND chmod 755.
This works for me:
#!/usr/bin/python3
# this script is called foo.py
import os
import sys
import time
if (len(sys.argv) >= 2):
Arg1 = int(sys.argv[1])
else:
sys.argv.append(None)
Arg1 = 1
print(f"Arg1: {Arg1}")
sys.argv[1] = str(Arg1 + 1)
time.sleep(3)
os.execv("./foo.py", sys.argv)
Output:
Arg1: 1
Arg1: 2
Arg1: 3
.
.
.

Python subprocess shell scripts still runs in background

I am running two python scripts using subprocess one of them still runs.
import subprocess
subprocess.run("python3 script_with_loop.py & python3 scrip_with_io.py", shell=True)
script_with_loop still runs in the background.
What is the way to kill both scripts if one of them dies?

So, you're basically not using python here, you're using your shell.
a & b runs a, disavows it, and runs b. Since you're using the shell, if you wanted to terminate the background task, you'd have to use shell commands to do that.
Of course, since you're using python, there is a better way.
with subprocess.Popen(["somecommand"]) as proc:
try:
subprocess.run(["othercommand"])
finally:
proc.terminate()
Looking at your code though - python3 script_with_loop.py and python3 script_with_io.py - my guess is you'd be better off using the asyncio module because it basically does what the names of those two files are describing.

you should use threading for this sort of thing. try this.
import threading
def script_with_loop():
try:
# script_with_loop.py code goes here
except:
_exit()
def script_with_io():
try:
# script_with_io.py code goes here
except:
_exit()
threading.Thread(target=script_with_loop, daemon=True).start()
threading.Thread(target=script_with_io, daemon=True).start()

How to Parallelize a Python program on Linux

I have a script that takes in input a list of filenames and loops over them to generate an output file per input file, so this is a case which can be easily parallelized I think.
I have a 8 core machine.
I tried on using -parallel flag on this command:
python perfile_code.py list_of_files.txt
But I can't make it work, i.e. specific question is: how to use parallel in bash with a python command in Linux, along with the arguments for the specific case mentioned above.
There is a Linux parallel command (sudo apt-get install parallel), which I read somewhere can do this job but I don't know how to use it.
Most of the internet resources explain how to do it in python but can it be done in bash?
Please help, thanks.
Based on an answer, here is a working example that is still not working, please suggest how to make it work.
I have a folder with 2 files, i just want to create their duplicates with a different name parallely in this example.
# filelist is the directory containing two file names, a.txt and b.txt.
# a.txt is the first file, b.xt is the second file
# i pass an .txt file with both the names to the main program
from concurrent.futures import ProcessPoolExecutor, as_completed
from pathlib import Path
import sys
def translate(filename):
print(filename)
f = open(filename, "r")
g = open(filename + ".x", , "w")
for line in f:
g.write(line)
def main(path_to_file_with_list):
futures = []
with ProcessPoolExecutor(max_workers=8) as executor:
for filename in Path(path_to_file_with_list).open():
executor.submit(translate, "filelist/" + filename)
for future in as_completed(futures):
future.result()
if __name__ == "__main__":
main(sys.argv[1])

Based on your comment,
#Ouroborus no, no consider this opensource.com/article/18/5/gnu-parallel i want to run a python program along with this parallel..for a very specific case..if an arbitrary convert program can be piped to parallel ..why wouldn't a python program?
I think this might help:
convert wasn't chosen arbitrarily. It was chosen because it is a better known program that (roughly) maps a single input file, provided via the command line, to a single output file, also provided via the command line.
The typical shell for loop can be used to iterate over a list. In the article you linked, they show an example
for i in *jpeg; do convert $i $i.png ; done
This (again, roughly) takes a list of file names and applies them, one by one, to a command template and then runs that command.
The issue here is that for would necessarily wait until a command is finished before running the next one and so may under-utilize today's multi-core processors.
parallel acts a kind of replacement for for. It makes the assumption that a command can be executed multiple times simultaneously, each with different arguments, without each instance interfering with the others.
In the article, they show a command using parallel
find . -name "*jpeg" | parallel -I% --max-args 1 convert % %.png
that is equivalent to the previous for command. The difference (still roughly) is that parallel runs several variants of the templated command simultaneously without necessarily waiting for each to complete.
For your specific situation, in order to be able to use parallel, you would need to:
Adjust your python script so that it takes one input (such as a file name) and one output (also possibly a file name), both via the command line.
Figure out how to setup parallel so that it can receive a list of those file names for insertion into a command template to run your python script on each of those files individually.

You can just use an ordinary shell for command, and append the & background indicator to the python command inside the for:
for file in `cat list_of_files.txt`;
do python perfile_code.py $file &
done
Of course, assuming your python code will generate separate outputs by itself.
It is just this simple.
Although not usual - in general people will favor using Python itself to control the parallel execution of the loop, if you can edit the program. One nice way to do is to use concurrent.futures in Python to create a worker pool with 8 workers - the shell approach above will launch all instances in parallel at once.
Assuming your code have a translate function that takes in a filename, your Python code could be written as:
from concurrent.futures import ProcessPoolExecutor, as_completed
from pathlib import Path:
def translate(filename):
...
def main(path_to_file_with_list):
futures = []
with ProcessPoolExecutor(max_workers=8) as executor:
for filename in Path(path_to_file_with_list).open():
executor.submit(translate, filename)
for future in as_completed(futures):
future.result()
if __name__ == "__main__":
import sys
main(argv[1])
This won't depend on special shell syntax, and takes care of corner cases, and number-or-workers handling, which could be hard to do properly from bash.

It is unclear from your question how you run your tasks in serial. But if we assume you run:
python perfile_code.py file1
python perfile_code.py file2
python perfile_code.py file3
:
python perfile_code.py fileN
then the simple way to parallelize this would be:
parallel python perfile_code.py ::: file*
If you have a list of files with one line per file then use:
parallel python perfile_code.py :::: filelist.txt
It will run one job per cpu thread in parallel. So if filelist.txt contains 1000000 names, then it will not run them all at the same time, but only start a new job when one finishes.

Python subprocess always waits for programm [duplicate]

I'm trying to port a shell script to the much more readable python version. The original shell script starts several processes (utilities, monitors, etc.) in the background with "&". How can I achieve the same effect in python? I'd like these processes not to die when the python scripts complete. I am sure it's related to the concept of a daemon somehow, but I couldn't find how to do this easily.

While jkp's solution works, the newer way of doing things (and the way the documentation recommends) is to use the subprocess module. For simple commands its equivalent, but it offers more options if you want to do something complicated.
Example for your case:
import subprocess
subprocess.Popen(["rm","-r","some.file"])
This will run rm -r some.file in the background. Note that calling .communicate() on the object returned from Popen will block until it completes, so don't do that if you want it to run in the background:
import subprocess
ls_output=subprocess.Popen(["sleep", "30"])
ls_output.communicate() # Will block for 30 seconds
See the documentation here.
Also, a point of clarification: "Background" as you use it here is purely a shell concept; technically, what you mean is that you want to spawn a process without blocking while you wait for it to complete. However, I've used "background" here to refer to shell-background-like behavior.

Note: This answer is less current than it was when posted in 2009. Using the subprocess module shown in other answers is now recommended in the docs
(Note that the subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using these functions.)
If you want your process to start in the background you can either use system() and call it in the same way your shell script did, or you can spawn it:
import os
os.spawnl(os.P_DETACH, 'some_long_running_command')
(or, alternatively, you may try the less portable os.P_NOWAIT flag).
See the documentation here.

You probably want the answer to "How to call an external command in Python".
The simplest approach is to use the os.system function, e.g.:
import os
os.system("some_command &")
Basically, whatever you pass to the system function will be executed the same as if you'd passed it to the shell in a script.

I found this here:
On windows (win xp), the parent process will not finish until the longtask.py has finished its work. It is not what you want in CGI-script. The problem is not specific to Python, in PHP community the problems are the same.
The solution is to pass DETACHED_PROCESS Process Creation Flag to the underlying CreateProcess function in win API. If you happen to have installed pywin32 you can import the flag from the win32process module, otherwise you should define it yourself:
DETACHED_PROCESS = 0x00000008
pid = subprocess.Popen([sys.executable, "longtask.py"],
creationflags=DETACHED_PROCESS).pid

Use subprocess.Popen() with the close_fds=True parameter, which will allow the spawned subprocess to be detached from the Python process itself and continue running even after Python exits.
https://gist.github.com/yinjimmy/d6ad0742d03d54518e9f
import os, time, sys, subprocess
if len(sys.argv) == 2:
time.sleep(5)
print 'track end'
if sys.platform == 'darwin':
subprocess.Popen(['say', 'hello'])
else:
print 'main begin'
subprocess.Popen(['python', os.path.realpath(__file__), '0'], close_fds=True)
print 'main end'

Both capture output and run on background with threading
As mentioned on this answer, if you capture the output with stdout= and then try to read(), then the process blocks.
However, there are cases where you need this. For example, I wanted to launch two processes that talk over a port between them, and save their stdout to a log file and stdout.
The threading module allows us to do that.
First, have a look at how to do the output redirection part alone in this question: Python Popen: Write to stdout AND log file simultaneously
Then:
main.py
#!/usr/bin/env python3
import os
import subprocess
import sys
import threading
def output_reader(proc, file):
while True:
byte = proc.stdout.read(1)
if byte:
sys.stdout.buffer.write(byte)
sys.stdout.flush()
file.buffer.write(byte)
else:
break
with subprocess.Popen(['./sleep.py', '0'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc1, \
subprocess.Popen(['./sleep.py', '10'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc2, \
open('log1.log', 'w') as file1, \
open('log2.log', 'w') as file2:
t1 = threading.Thread(target=output_reader, args=(proc1, file1))
t2 = threading.Thread(target=output_reader, args=(proc2, file2))
t1.start()
t2.start()
t1.join()
t2.join()
sleep.py
#!/usr/bin/env python3
import sys
import time
for i in range(4):
print(i + int(sys.argv[1]))
sys.stdout.flush()
time.sleep(0.5)
After running:
./main.py
stdout get updated every 0.5 seconds for every two lines to contain:
0
10
1
11
2
12
3
13
and each log file contains the respective log for a given process.
Inspired by: https://eli.thegreenplace.net/2017/interacting-with-a-long-running-child-process-in-python/
Tested on Ubuntu 18.04, Python 3.6.7.

You probably want to start investigating the os module for forking different threads (by opening an interactive session and issuing help(os)). The relevant functions are fork and any of the exec ones. To give you an idea on how to start, put something like this in a function that performs the fork (the function needs to take a list or tuple 'args' as an argument that contains the program's name and its parameters; you may also want to define stdin, out and err for the new thread):
try:
pid = os.fork()
except OSError, e:
## some debug output
sys.exit(1)
if pid == 0:
## eventually use os.putenv(..) to set environment variables
## os.execv strips of args[0] for the arguments
os.execv(args[0], args)

You can use
import os
pid = os.fork()
if pid == 0:
Continue to other code ...
This will make the python process run in background.

I haven't tried this yet but using .pyw files instead of .py files should help. pyw files dosen't have a console so in theory it should not appear and work like a background process.

Python linux shells

In my program I Iwant to access multiple linux shells using different process.
Currently I am using subprocess I don't have a linux machine to test this on currently so can you tell me if this works.
Does subprocess work on one terminal? If so is there an alternative?
This is something like what I am developing:
import multiprocessing
import subprocess
def doSomething(filepath):
subprocess.call("somecommands")
subprocess.call("somecommands")
if __name__ == "__main__":
while True:
processList=[]
for i in range(numberOfThreads):
process=multiprocessing.Process(target=doSomething,args=[files])
process.start()
processList.append(process)
for process in processList:
process.join()

You Should use the,
Popen
feature of the subprocess module, that way, I don't think you will need threading anymore since it doesn't look like you're going somewhere serious with sharing data.
Now your code should look like,
import subprocess as s_p
s_p.Popen('Command to be given','*args')
print 'Process started in a separate shell'
I believe this will do your job!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.