I am just getting started with Multiprocessing & Python, and I am needing some help catching Control-C in my program. The script that I am making is going to read in a file, and then perform some tasks on each line. Before anyone comments on I/O and the advantages/disadvantages of multiprocessing, I am aware :) these tasks lend themselves to be very multi-threaded friendly.
I have the following code, and from the documentation, I would expect it to work, however it is not catching my keyboard exception! ARRGH... Please help
Running on Win10 if that makes any difference:
from multiprocessing import cpu_count
from multiprocessing.dummy import Pool as ThreadPool
import argparse
from time import sleep
import signal
import sys
def readfile(file):
with open(file, 'r') as file:
data = file.readlines()
file.close()
return data
def work(line):
while(True):
try:
print(f"\rgoing to do some work on {line}")
countdown(5)
except (KeyboardInterrupt, SystemExit):
print("Exiting...")
break
def countdown(time=30):
sleep(time)
def parseArgs(args):
if args.verbose:
verbose = True
print("[+] Verbosity turned on")
else:
verbose = False
if args.threads:
threads = args.threads
else:
threads = cpu_count()
print(f'[+] Using {threads} threads')
return threads, verbose, args.file
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("-f", "--file", required = True, help="Insert the flie you plan on parsing")
parser.add_argument("-t", "--threads", help="Number of threads, by default will use all available processors")
parser.add_argument("-v", "--verbose", help="increase output verbosity",
action="store_true")
threads, verbose, filename = parseArgs(parser.parse_args())
#read the entire file and store it in a variable:
data = readfile(filename)
#Init the data pool
pool = ThreadPool(threads) # Number of threads going to use
try:
pool.map(work,data) # This launches the workers at the function to do work
except KeyboardInterrupt:
print("Exiting...")
finally:
pool.close()
pool.join()
At the time when you use Control-C the program probably is at pool.join() waiting for all threads to be finished. The pool.map function just starts all the processes but does not block. This means that at the time the KeyboardInterrupt occurs it is not catched because the program is not inside the try-block.
I am not too sure about the best practices here but I would try:
try:
pool.map(work, data) # This launches the workers at the function to do work
pool.close()
pool.join()
except KeyboardInterrupt:
print("Exiting...")
Related
i have written a NMap-TCP-Port-Scanner in python and everything works just fine except that i'm no longer able to see what i'm writing on the terminal.
First things first.
The code:
import argparse, nmap, sys
from threading import *
def initParser():
parser = argparse.ArgumentParser()
parser.add_argument("tgtHost", help="Specify target host")
parser.add_argument("tgtPort", help="Specify target port")
args = parser.parse_args()
return (args.tgtHost,args.tgtPort.split(","))
def nmapScan(tgtHost, tgtPorts):
nm = nmap.PortScanner()
lock = Semaphore(value=1)
for tgtPort in tgtPorts:
t = Thread(target=nmapScanThread, args=(tgtHost, tgtPort, lock, nm))
t.start()
def nmapScanThread(tgtHost, tgtPort, lock, nm):
nm.scan(tgtHost, tgtPort)
state = nm[tgtHost]['tcp'][int(tgtPort)]['state']
lock.acquire()
print("Port {} is {}".format(tgtPort, state))
lock.release()
if __name__ == '__main__':
(tgtHost, tgtPorts) = initParser()
nmapScan(tgtHost, tgtPorts)
sys.exit(0)
So, after i have run the script i don't see what i'm typing on the console anymore, but i can still execute my invisible commands. As you can see i want to start a thread for each port just because i am learning about threading right now.
My assumption is that not all threads are terminated properly because everthing works just fine after i have added "t.join()" to the code.
Unfortunately i couldn't manage to find anything about this issue.
Just like this:
import argparse, nmap, sys
from threading import *
def initParser():
parser = argparse.ArgumentParser()
parser.add_argument("tgtHost", help="Specify target host")
parser.add_argument("tgtPort", help="Specify target port")
args = parser.parse_args()
return (args.tgtHost,args.tgtPort.split(","))
def nmapScan(tgtHost, tgtPorts):
nm = nmap.PortScanner()
lock = Semaphore(value=1)
for tgtPort in tgtPorts:
t = Thread(target=nmapScanThread, args=(tgtHost, tgtPort, lock, nm))
t.start()
t.join()
def nmapScanThread(tgtHost, tgtPort, lock, nm):
nm.scan(tgtHost, tgtPort)
state = nm[tgtHost]['tcp'][int(tgtPort)]['state']
lock.acquire()
print("Port {} is {}".format(tgtPort, state))
lock.release()
if __name__ == '__main__':
(tgtHost, tgtPorts) = initParser()
nmapScan(tgtHost, tgtPorts)
sys.exit(0)
Is it the proper way to handle this problem or did i mess things up a bit?
Additionally:
I cannot see the join() method as useful in this example because i don't think that there is any major difference to the same script without threading
I've read a lot of questions on SO and elsewhere on this topic but can't get it working. Perhaps it's because I'm using Windows, I don't know.
What I'm trying to do is download a bunch of files (whose URLs are read from a CSV file) in parallel. I've tried using multiprocessing and concurrent.futures for this with no success.
The main problem is that I can't stop the program on Ctrl-C - it just keeps running. This is especially bad in the case of processes instead of threads (I used multiprocessing for that) because I have to kill each process manually every time.
Here is my current code:
import concurrent.futures
import signal
import sys
import urllib.request
class Download(object):
def __init__(self, url, filename):
self.url = url
self.filename = filename
def perform_download(download):
print('Downloading {} to {}'.format(download.url, download.filename))
return urllib.request.urlretrieve(download.url, filename=download.filename)
def main(argv):
args = parse_args(argv)
queue = []
with open(args.results_file, 'r', encoding='utf8') as results_file:
# Irrelevant CSV parsing...
queue.append(Download(url, filename))
def handle_interrupt():
print('CAUGHT SIGINT!!!!!!!!!!!!!!!!!!!11111111')
sys.exit(1)
signal.signal(signal.SIGINT, handle_interrupt)
with concurrent.futures.ThreadPoolExecutor(max_workers=args.num_jobs) as executor:
futures = {executor.submit(perform_download, d): d for d in queue}
try:
concurrent.futures.wait(futures)
except KeyboardInterrupt:
print('Interrupted')
sys.exit(1)
I'm trying to catch Ctrl-C in two different ways here but none of them works. The latter one (except KeyboardInterrupt) actually gets run but the process won't exit after calling sys.exit.
Before this I used the multiprocessing module like this:
try:
pool = multiprocessing.Pool(processes=args.num_jobs)
pool.map_async(perform_download, queue).get(1000000)
except Exception as e:
pool.close()
pool.terminate()
sys.exit(0)
So what is the proper way to add ability to terminate all worker threads or processes once you hit Ctrl-C in the terminal?
System information:
Python version: 3.6.1 32-bit
OS: Windows 10
You are catching the SIGINT signal in a signal handler and re-routing it as a SystemExit exception. This prevents the KeyboardInterrupt exception to ever reach your main loop.
Moreover, if the SystemExit is not raised in the main thread, it will just kill the child thread where it is raised.
Jesse Noller, the author of the multiprocessing library, explains how to deal with CTRL+C in a old blog post.
import signal
from multiprocessing import Pool
def initializer():
"""Ignore CTRL+C in the worker process."""
signal.signal(SIGINT, SIG_IGN)
pool = Pool(initializer=initializer)
try:
pool.map(perform_download, dowloads)
except KeyboardInterrupt:
pool.terminate()
pool.join()
I don't believe the accepted answer works under Windows, certainly not under current versions of Python (I am running 3.8.5). In fact, it won't run at all since SIGINT and SIG_IGN will be undefined (what is needed is signal.SIGINT and signal.SIG_IGN).
This is a know problem under Windows. A solution I have come up with is essentially the reverse of the accepted solution: The main process must ignore keyboard interrupts and we initialize the process pool to initially set a global flag ctrl_c_entered to False and to set this flag to True if Ctrl-C is entered. Then any multiprocessing worker function (or method) is decorated with a special decorator, handle_ctrl_c, that firsts tests the ctrl_c_entered flag and only if False does it run the worker function after re-enabling keyboard interrupts and establishing a try/catch handler for keyboard interrups. If the ctrl_c_entered flag was True or if a keyboard interrupt occurs during the execution of the worker function, the value returned is an instance of KeyboardInterrupt, which the main process can check to determine whether a Ctrl-C was entered.
Thus all submitted tasks will be allowed to start but will immediately terminate with a return value of a KeyBoardInterrupt exception and the actual worker function will never be called by the decorator function once a Ctrl-C has been entered.
import signal
from multiprocessing import Pool
from functools import wraps
import time
def handle_ctrl_c(func):
"""
Decorator function.
"""
#wraps(func)
def wrapper(*args, **kwargs):
global ctrl_c_entered
if not ctrl_c_entered:
# re-enable keyboard interrups:
signal.signal(signal.SIGINT, default_sigint_handler)
try:
return func(*args, **kwargs)
except KeyboardInterrupt:
ctrl_c_entered = True
return KeyboardInterrupt()
finally:
signal.signal(signal.SIGINT, pool_ctrl_c_handler)
else:
return KeyboardInterrupt()
return wrapper
def pool_ctrl_c_handler(*args, **kwargs):
global ctrl_c_entered
ctrl_c_entered = True
def init_pool():
# set global variable for each process in the pool:
global ctrl_c_entered
global default_sigint_handler
ctrl_c_entered = False
default_sigint_handler = signal.signal(signal.SIGINT, pool_ctrl_c_handler)
#handle_ctrl_c
def perform_download(download):
print('begin')
time.sleep(2)
print('end')
return True
if __name__ == '__main__':
signal.signal(signal.SIGINT, signal.SIG_IGN)
pool = Pool(initializer=init_pool)
results = pool.map(perform_download, range(20))
if any(map(lambda x: isinstance(x, KeyboardInterrupt), results)):
print('Ctrl-C was entered.')
print(results)
I have a script that takes a text file as input and performs the testing. What I want to do is create two threads and divide the input text file in 2 parts and run them so as to minimize the execution time. Is there a way I can do this ?
Thanks
class myThread (threading.Thread):
def __init__(self, ip_list):
threading.Thread.__init__(self)
self.input_list = ip_list
def run(self):
# Get lock to synchronize threads
threadLock.acquire()
print "python Audit.py " + (",".join(x for x in self.input_list))
p = subprocess.Popen("python Audit.py " + (",".join(x for x in self.input_list)), shell=True)
# Free lock to release next thread
threadLock.release()
while p.poll() is None:
print('Test Execution in Progress ....')
time.sleep(60)
print('Not sleeping any longer. Exited with returncode %d' % p.returncode)
def split_list(input_list, split_count):
for i in range(0, len(input_list), split_count):
yield input_list[i:i + split_count]
if __name__ == '__main__':
threadLock = threading.Lock()
threads = []
with open("inputList.txt", "r") as Ptr:
for i in Ptr:
try:
id = str(i).rstrip('\n').rstrip('\r')
input_list.append(id)
except Exception as err:
print err
print "Exception occured..."
try:
test = split_list(input_list, len(input_list)/THREAD_COUNT)
list_of_lists = list(test)
except Exception as err:
print err
print "Exception caught in splitting list"
try:
#Create Threads & Start
for i in range(0,len(list_of_lists)-1):
# Create new threads
threads.append(myThread(list_of_lists[i]))
threads[i].start()
time.sleep(1)
# Wait for all threads to complete
for thread in threads:
thread.join()
print "Exiting Main Thread..!"
except Exception as err:
print err
print "Exception caught during THREADING..."
You are trying to do 2 things at the same time, which is the definition of parallelism. The problem here is that if you are using CPython, you won't be able to do parallelism because of the GIL(Global Interpreter Lock). The GIL makes sure that only 1 thread is running because the python interpreter is not considered thread safe.
What you should use if you really want to do two operations in parallel is to use the multiprocessing module (import multiprocessing)
Read this: Multiprocessing vs Threading Python
Some notes, in random order:
In python, multithreading is not a good solution to approach computationally intensive tasks. A better approach is multiprocessing:
Python: what are the differences between the threading and multiprocessing modules?
For resources that are not shared (in your case, each line will be used exclusively by a single process) you do not need locks. A better approach would be the map function.
def processing_function(line):
suprocess.call(["python", "Audit.py", line])
with open('file.txt', 'r') as f:
lines = f.readlines()
to_process = [lines[:len(lines)//2], lines[len(lines)//2:]]
p = multiprocessing.Pool(2)
results = p.map(processing_func, to_process)
If the computation requires a variable amount of time depending on the line, using Queues to move data between processes instead of mapping could help to balance the load
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I was messing around with a zip file cracker and decided to use the multiprocessing module to speed the process up. It was a complete pain since it was my first time using the module and I don't even fully understand it yet. However, I got it to work.
The problem is that it doesn't complete the word list; it just stops at random puts during the word list, and if the password is found it continues to go through the word list instead of just stopping the process.
Does anyone know why it's exhibiting this behaviour?
Source Code For ZipFile Cracker
#!/usr/bin/env python3
import multiprocessing as mp
import zipfile # Handeling the zipfile
import sys # Command line arguments, and quiting application
import time # To calculate runtime
def usage(program_name):
print("Usage: {0} <path to zipfile> <dictionary>".format(program_name))
sys.exit(1)
def cracker(password):
try:
zFile.extractall(pwd=password)
print("[+] Password Found! : {0}".format(password.decode('utf-8')))
pool.close()
except:
pass
def main():
global zFile
global pool
if len(sys.argv) < 3:
usage(sys.argv[0])
zFile = zipfile.ZipFile(sys.argv[1])
print("[*] Started Cracking")
startime = time.time()
pool = mp.Pool()
for i in open(sys.argv[2], 'r', errors='ignore'):
pswd = bytes(i.strip('\n'), 'utf-8')
pool.apply_async(cracker, (pswd,))
print (pswd)
runtime = round(time.time() - startime, 5)
print ("[*] Runtime:", runtime, 'seconds')
sys.exit(0)
if __name__ == "__main__":
main()
You are terminating your program too early. To test this out, add a harmless time.sleep(10) in the cracker method and observe your program still terminating within a second.
Call join to wait for the pool to finish:
pool = mp.Pool()
for i in open(sys.argv[2], 'r', errors='ignore'):
pswd = bytes(i.strip('\n'), 'utf-8')
pool.apply_async(cracker, (pswd,))
pool.close() # Indicate that no more data is coming
pool.join() # Wait for pool to finish processing
runtime = round(time.time() - startime, 5)
print ("[*] Runtime:", runtime, 'seconds')
sys.exit(0)
Additionally, once you find the right password, calling close just indicates that no more future tasks are coming - all tasks already submitted will still be done. Instead, call terminate to kill the pool without processing any more tasks.
Furthermore, depending on the implementation details of multiprocessing.Pool, the global variable pool may not be available when you need it (and its value isn't serializable anyways). To solve this problem, you can use a callback, as in
def cracker(password):
try:
zFile.extractall(pwd=password)
except RuntimeError:
return
return password
def callback(found):
if found:
pool.terminate()
...
pool.apply_async(cracker, (pswd,), callback=cb)
Of course, since you now look at the result all the time, apply is not the right way to go. Instead, you can write your code using imap_unordered:
with open(sys.argv[2], 'r', errors='ignore') as passf, \
multiprocessing.Pool() as pool:
passwords = (line.strip('\n').encode('utf-8') for line in passf)
for found in pool.imap_unordered(cracker, passwords):
if found:
break
Instead of using globals, you may also want to open the zip file (and create a ZipFile object) in each process, by using an initializer for the pool. Even better (and way faster), forgo all of the I/O in the first place and read just the bytes you need once and then pass them on to the children.
phihag's answer is the correct solution.
I just wanted to provide an additional detail regarding calling terminate() when you've found the correct password. The pool variable in cracker() was not defined when I ran the code. So trying to invoke it from there simply threw an exception:
NameError: name 'pool' is not defined
(My fork() experience is weak, so I don't completely understand why the global zFile is copied to the child processes successfully while pool is not. Even if it were copied, it would not be the same pool in the parent process, right? So any methods invoked on it would have no effect on the real pool in the parent process. Regardless, I prefer this advice listed within the multiprocessing module's Programming guidelines section: Explicitly pass resources to child processes.)
My suggestion is to make cracker() return the password if it is correct, otherwise return None. Then pass a callback to apply_async() that records the correct password, as well as terminating the pool. Here's my take at modifying your code to do this:
#!/usr/bin/env python3
import multiprocessing as mp
import zipfile # Handeling the zipfile
import sys # Command line arguments, and quiting application
import time # To calculate runtime
import os
def usage(program_name):
print("Usage: {0} <path to zipfile> <dictionary>".format(program_name))
sys.exit(1)
def cracker(zip_file_path, password):
print('[*] Starting new cracker (pid={0}, password="{1}")'.format(os.getpid(), password))
try:
time.sleep(1) # XXX: to simulate the task taking a bit of time
with zipfile.ZipFile(zip_file_path) as zFile:
zFile.extractall(pwd=bytes(password, 'utf-8'))
return password
except:
return None
def main():
if len(sys.argv) < 3:
usage(sys.argv[0])
print('[*] Starting main (pid={0})'.format(os.getpid()))
zip_file_path = sys.argv[1]
password_file_path = sys.argv[2]
startime = time.time()
actual_password = None
with mp.Pool() as pool:
def set_actual_password(password):
nonlocal actual_password
if password:
print('[*] Found password; stopping future tasks')
pool.terminate()
actual_password = password
with open(password_file_path, 'r', errors='ignore') as password_file:
for pswd in password_file:
pswd = pswd.strip('\n')
pool.apply_async(cracker, (zip_file_path, pswd,), callback=set_actual_password)
pool.close()
pool.join()
if actual_password:
print('[*] Cracked password: "{0}"'.format(actual_password))
else:
print('[*] Unable to crack password')
runtime = round(time.time() - startime, 5)
print("[*] Runtime:", runtime, 'seconds')
sys.exit(0)
if __name__ == "__main__":
main()
Here's an implementation of the advice from #phihag's and #Equality 7-2521's answers:
#!/usr/bin/env python3
"""Brute force zip password.
Usage: brute-force-zip-password <zip archive> <passwords>
"""
import sys
from multiprocessing import Pool
from time import monotonic as timer
from zipfile import ZipFile
def init(archive): # run at the start of a worker process
global zfile
zfile = ZipFile(open(archive, 'rb')) # open file in each process once
def check(password):
assert password
try:
with zfile.open(zfile.infolist()[0], pwd=password):
return password # assume success
except Exception as e:
if e.args[0] != 'Bad password for file':
# assume all other errors happen after the password was accepted
raise RuntimeError(password) from e
def main():
if len(sys.argv) != 3:
sys.exit(__doc__) # print usage
start = timer()
# decode passwords using the preferred locale encoding
with open(sys.argv[2], errors='ignore') as file, \
Pool(initializer=init, initargs=[sys.argv[1]]) as pool: # use all CPUs
# check passwords encoded using utf-8
passwords = (line.rstrip('\n').encode('utf-8') for line in file)
passwords = filter(None, passwords) # filter empty passwords
for password in pool.imap_unordered(check, passwords, chunksize=100):
if password is not None: # found
print("Password: '{}'".format(password.decode('utf-8')))
break
else:
sys.exit('Unable to find password')
print('Runtime: %.5f seconds' % (timer() - start,))
if __name__=="__main__":
main()
Note:
each worker process has its own ZipFile object and the zip file is opened once per process: it should make it more portable (Windows support) and improve time performance
the content is not extracted: check(password) tries to open and immediately closes an archive member on success: it is safer and it should improve time performance (no need to create directories, etc)
all errors except 'Bad password for file' while decrypting the archive member are assumed to happen after the password is accepted: the rational is to avoid silencing unexpected errors -- each exception should be considered individually
check(password) expects nonempty passwords
chunksize parameter may drastically improve performance
a rare for/else syntax is used, to report cases when the password is not found
the with-statement calls pool.terminate() for you
This question already has answers here:
Keyboard Interrupts with python's multiprocessing Pool
(11 answers)
Closed 7 years ago.
I would like my program to exit as soon as I press Ctrl+C:
import multiprocessing
import os
import time
def sqr(a):
time.sleep(0.2)
print 'local {}'.format(os.getpid())
#raise Exception()
return a * a
pool = multiprocessing.Pool(processes=4)
try:
r = [pool.apply_async(sqr, (x,)) for x in range(100)]
pool.close()
pool.join()
except:
print 121312313
pool.terminate()
pool.join()
print 'main {}'.format(os.getpid())
This code doesn't work as intended: the program does not quit when I press Ctrl+C. Instead, it prints a few KeyboardInterrupt each time, and just gets stuck forever.
Also, I would like it to exit ASAP if I uncomment #raise ... in sqr. The solutions in Exception thrown in multiprocessing Pool not detected do not seem to be helpful.
Update
I think I finally ended up with this: (let me know if it is wrong)
def sqr(a):
time.sleep(0.2)
print 'local {}'.format(os.getpid())
if a == 20:
raise Exception('fff')
return a * a
pool = Pool(processes=4)
r = [pool.apply_async(sqr, (x,)) for x in range(100)]
pool.close()
# Without timeout, cannot respond to KeyboardInterrupt.
# Also need get to raise the exceptions workers may throw.
for item in r:
item.get(timeout=999999)
# I don't think I need join since I already get everything.
pool.join()
print 'main {}'.format(os.getpid())
This is because of a Python 2.x bug that makes the call to pool.join() uninterruptable. It works fine in Python 3.x. Normally the work around is to pass a really large timeout to join, but multiprocessing.Pool.join doesn't take a timeout parameter, so you can't use it at all. Instead, you'll need to wait for each individual task in the pool to complete, and pass timeout to the wait() method:
import multiprocessing
import time
import os
Pool = multiprocessing.Pool
def sqr(a):
time.sleep(0.2)
print('local {}'.format(os.getpid()))
#raise Exception()
return a * a
pool = Pool(processes=4)
try:
r = [pool.apply_async(sqr, (x,)) for x in range(100)]
pool.close()
for item in r:
item.wait(timeout=9999999) # Without a timeout, you can't interrupt this.
except KeyboardInterrupt:
pool.terminate()
finally:
pool.join()
print('main {}'.format(os.getpid()))
This can be interrupted on both Python 2 and 3.