Speed up Python script's for loop

Speed up Python script's for loop - python

Assuming you got something like this (copied from here):
#!/usr/bin/python
from scapy.all import *
TIMEOUT = 2
conf.verb = 0
for ip in range(0, 256):
packet = IP(dst="192.168.0." + str(ip), ttl=20)/ICMP()
reply = sr1(packet, timeout=TIMEOUT)
if not (reply is None):
print reply.src, "is online"
else:
print "Timeout waiting for %s" % packet[IP].src
There is no need to wait for each ping to finish before trying the next host. Could I put the loop interior each time into the background along the lines of the & in:
for ip in 192.168.0.{0..255}; do
ping -c 1 $ip &
done

The first thing you should do is change your range to range(0, 256) so that it is inclusive of 0-255.
Second, you're looking at Python's threading, which can be somewhat similar to Bash process daemonization at an abstract level.
Import multiprocessing and create a pool:
from multiprocessing.pool import ThreadPool
pool = ThreadPool(20) # However many you wish to run in parallel
So take the ping lookup, which is everything inside of the for loop, and make it a function.
def ping(ip):
packet = IP(dst="192.168.0." + str(ip), ttl=20)/ICMP()
reply = sr1(packet, timeout=TIMEOUT)
if not (reply is None):
print reply.src, "is online"
else:
print "Timeout waiting for %s" % packet[IP].src
Then in your for loop,
for ip in range(0, 256):
pool.apply_async(ping, (ip,))
pool.close()
pool.join()
pool.join() is what waits for all of your threads to return.

You can use threading or multiprocessing module to run aynchronous/non-blocking IO calls.
Read about hte difference on SO:
multiprocess or threading in python?

Related

Multiple stdout w/ flush going on in Python threading

I have a small piece of code that I made to test out and hopefully debug the problem without having to modify the code in my main applet in Python. This has let me to build this code:
#!/usr/bin/env python
import sys, threading, time
def loop1():
count = 0
while True:
sys.stdout.write('\r thread 1: ' + str(count))
sys.stdout.flush()
count = count + 1
time.sleep(.3)
pass
pass
def loop2():
count = 0
print ""
while True:
sys.stdout.write('\r thread 2: ' + str(count))
sys.stdout.flush()
count = count + 2
time.sleep(.3)
pass
if __name__ == '__main__':
try:
th = threading.Thread(target=loop1)
th.start()
th1 = threading.Thread(target=loop2)
th1.start()
pass
except KeyboardInterrupt:
print ""
pass
pass
My goal with this code is to be able to have both of these threads displaying output in stdout format (with flushing) at the same time and have then side by side or something. problem is that I assume since it is flushing each one, it flushes the other string by default. I don't quite know how to get this to work if it is even possible.
If you just run one of the threads, it works fine. However I want to be able to run both threads with their own string running at the same time in the terminal output. Here is a picture displaying what I'm getting:
terminal screenshot
let me know if you need more info. thanks in advance.

Instead of allowing each thread to output to stdout, a better solution is to have one thread control stdout exclusively. Then provide a threadsafe channel for the other threads to dispatch data to be output.
One good method to achieve this is to share a Queue between all threads. Ensure that only the output thread is accessing data after it has been added to the queue.
The output thread can store the last message from each other thread and use that data to format stdout nicely. This can include clearing output to display something like this, and update it as each thread generates new data.
Threads
#1: 0
#2: 0
Example
Some decisions were made to simplify this example:
There are gotchas to be wary of when giving arguments to threads.
Daemon threads terminate themselves when the main thread exits. They are used to avoid adding complexity to this answer. Using them on long-running or large applications can pose problems. Other
questions discuss how to exit a multithreaded application without leaking memory or locking system resources. You will need to think about how your program needs to signal an exit. Consider using asyncio to save yourself these considerations.
No newlines are used because \r carriage returns cannot clear the whole console. They only allow the current line to be rewritten.
import queue, threading
import time, sys
q = queue.Queue()
keepRunning = True
def loop_output():
thread_outputs = dict()
while keepRunning:
try:
thread_id, data = q.get_nowait()
thread_outputs[thread_id] = data
except queue.Empty:
# because the queue is used to update, there's no need to wait or block.
pass
pretty_output = ""
for thread_id, data in thread_outputs.items():
pretty_output += '({}:{}) '.format(thread_id, str(data))
sys.stdout.write('\r' + pretty_output)
sys.stdout.flush()
time.sleep(1)
def loop_count(thread_id, increment):
count = 0
while keepRunning:
msg = (thread_id, count)
try:
q.put_nowait(msg)
except queue.Full:
pass
count = count + increment
time.sleep(.3)
pass
pass
if __name__ == '__main__':
try:
th_out = threading.Thread(target=loop_output)
th_out.start()
# make sure to use args, not pass arguments directly
th0 = threading.Thread(target=loop_count, args=("Thread0", 1))
th0.daemon = True
th0.start()
th1 = threading.Thread(target=loop_count, args=("Thread1", 3))
th1.daemon = True
th1.start()
# Keep the main thread alive to wait for KeyboardInterrupt
while True:
time.sleep(.1)
except KeyboardInterrupt:
print("Ended by keyboard stroke")
keepRunning = False
for th in [th0, th1]:
th.join()
Example Output:
(Thread0:110) (Thread1:330)

Python multiprocessing - AssertionError: can only join a child process

I'm taking my first foray into the python mutliprocessing module and I'm running into some problems. I'm very familiar with the threading module but I need to make sure the processes I'm executing are running in parallel.
Here's an outline of what I'm trying to do. Please ignore things like undeclared variables/functions because I can't paste my code in full.
import multiprocessing
import time
def wrap_func_to_run(host, args, output):
output.append(do_something(host, args))
return
def func_to_run(host, args):
return do_something(host, args)
def do_work(server, client, server_args, client_args):
server_output = func_to_run(server, server_args)
client_output = func_to_run(client, client_args)
#handle this output and return a result
return result
def run_server_client(server, client, server_args, client_args, server_output, client_output):
server_process = multiprocessing.Process(target=wrap_func_to_run, args=(server, server_args, server_output))
server_process.start()
client_process = multiprocessing.Process(target=wrap_func_to_run, args=(client, client_args, client_output))
client_process.start()
server_process.join()
client_process.join()
#handle the output and return some result
def run_in_parallel(server, client):
#set up commands for first process
server_output = client_output = []
server_cmd = "cmd"
client_cmd = "cmd"
process_one = multiprocessing.Process(target=run_server_client, args=(server, client, server_cmd, client_cmd, server_output, client_output))
process_one.start()
#set up second process to run - but this one can run here
result = do_work(server, client, "some server args", "some client args")
process_one.join()
#use outputs above and the result to determine result
return final_result
def main():
#grab client
client = client()
#grab server
server = server()
return run_in_parallel(server, client)
if __name__ == "__main__":
main()
Here's the error I'm getting:
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib64/python2.7/multiprocessing/util.py", line 319, in _exit_function
p.join()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 143, in join
assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
I've tried a lot of different things to fix this but my feeling is that there's something wrong with the way I'm using this module.
EDIT:
So I created a file that will reproduce this by simulating the client/server and the work they do - Also I missed an important point which was that I was running this in unix. Another important bit of information was that do_work in my actual case involves using os.fork(). I was unable to reproduce the error without also using os.fork() so I'm assuming the problem is there. In my real world case, that part of the code was not mine so I was treating it like a black box (likely a mistake on my part). Anyways here's the code to reproduce -
#!/usr/bin/python
import multiprocessing
import time
import os
import signal
import sys
class Host():
def __init__(self):
self.name = "host"
def work(self):
#override - use to simulate work
pass
class Server(Host):
def __init__(self):
self.name = "server"
def work(self):
x = 0
for i in range(10000):
x+=1
print x
time.sleep(1)
class Client(Host):
def __init__(self):
self.name = "client"
def work(self):
x = 0
for i in range(5000):
x+=1
print x
time.sleep(1)
def func_to_run(host, args):
print host.name + " is working"
host.work()
print host.name + ": " + args
return "done"
def do_work(server, client, server_args, client_args):
print "in do_work"
server_output = client_output = ""
child_pid = os.fork()
if child_pid == 0:
server_output = func_to_run(server, server_args)
sys.exit(server_output)
time.sleep(1)
client_output = func_to_run(client, client_args)
# kill and wait for server to finish
os.kill(child_pid, signal.SIGTERM)
(pid, status) = os.waitpid(child_pid, 0)
return (server_output == "done" and client_output =="done")
def run_server_client(server, client, server_args, client_args):
server_process = multiprocessing.Process(target=func_to_run, args=(server, server_args))
print "Starting server process"
server_process.start()
client_process = multiprocessing.Process(target=func_to_run, args=(client, client_args))
print "Starting client process"
client_process.start()
print "joining processes"
server_process.join()
client_process.join()
print "processes joined and done"
def run_in_parallel(server, client):
#set up commands for first process
server_cmd = "server command for run_server_client"
client_cmd = "client command for run_server_client"
process_one = multiprocessing.Process(target=run_server_client, args=(server, client, server_cmd, client_cmd))
print "Starting process one"
process_one.start()
#set up second process to run - but this one can run here
print "About to do work"
result = do_work(server, client, "server args from do work", "client args from do work")
print "Joining process one"
process_one.join()
#use outputs above and the result to determine result
print "Process one has joined"
return result
def main():
#grab client
client = Client()
#grab server
server = Server()
return run_in_parallel(server, client)
if __name__ == "__main__":
main()
If I remove the use of os.fork() in do_work I don't get the error and the code behaves like I would have expected it before (except for the passing of outputs which I've accepted as my mistake/misunderstanding). I can change the old code to not use os.fork() but I'd also like to know why this caused this problem and if there's a workable solution.
EDIT 2:
I started working on a solution that omits os.fork() before the accepted answer. Here's what I have with some tweaking to the amount of simulated work that can be done -
#!/usr/bin/python
import multiprocessing
import time
import os
import signal
import sys
from Queue import Empty
class Host():
def __init__(self):
self.name = "host"
def work(self, w):
#override - use to simulate work
pass
class Server(Host):
def __init__(self):
self.name = "server"
def work(self, w):
x = 0
for i in range(w):
x+=1
print x
time.sleep(1)
class Client(Host):
def __init__(self):
self.name = "client"
def work(self, w):
x = 0
for i in range(w):
x+=1
print x
time.sleep(1)
def func_to_run(host, args, w, q):
print host.name + " is working"
host.work(w)
print host.name + ": " + args
q.put("ZERO")
return "done"
def handle_queue(queue):
done = False
results = []
return_val = 0
while not done:
#try to grab item from Queue
tr = None
try:
tr = queue.get_nowait()
print "found element in queue"
print tr
except Empty:
done = True
if tr is not None:
results.append(tr)
for el in results:
if el != "ZERO":
return_val = 1
return return_val
def do_work(server, client, server_args, client_args):
print "in do_work"
server_output = client_output = ""
child_pid = os.fork()
if child_pid == 0:
server_output = func_to_run(server, server_args)
sys.exit(server_output)
time.sleep(1)
client_output = func_to_run(client, client_args)
# kill and wait for server to finish
os.kill(child_pid, signal.SIGTERM)
(pid, status) = os.waitpid(child_pid, 0)
return (server_output == "done" and client_output =="done")
def run_server_client(server, client, server_args, client_args, w, mq):
local_queue = multiprocessing.Queue()
server_process = multiprocessing.Process(target=func_to_run, args=(server, server_args, w, local_queue))
print "Starting server process"
server_process.start()
client_process = multiprocessing.Process(target=func_to_run, args=(client, client_args, w, local_queue))
print "Starting client process"
client_process.start()
print "joining processes"
server_process.join()
client_process.join()
print "processes joined and done"
if handle_queue(local_queue) == 0:
mq.put("ZERO")
def run_in_parallel(server, client):
#set up commands for first process
master_queue = multiprocessing.Queue()
server_cmd = "server command for run_server_client"
client_cmd = "client command for run_server_client"
process_one = multiprocessing.Process(target=run_server_client, args=(server, client, server_cmd, client_cmd, 400000000, master_queue))
print "Starting process one"
process_one.start()
#set up second process to run - but this one can run here
print "About to do work"
#result = do_work(server, client, "server args from do work", "client args from do work")
run_server_client(server, client, "server args from do work", "client args from do work", 5000, master_queue)
print "Joining process one"
process_one.join()
#use outputs above and the result to determine result
print "Process one has joined"
return_val = handle_queue(master_queue)
print return_val
return return_val
def main():
#grab client
client = Client()
#grab server
server = Server()
val = run_in_parallel(server, client)
if val:
print "failed"
else:
print "passed"
return val
if __name__ == "__main__":
main()
This code has some tweaked printouts just to see exactly what is happening. I used a multiprocessing.Queue to store and share outputs across the processes and back into my main thread to be handled. I think this solves the python portion of my problem but there's still some issues in the code I'm working on. The only other thing I can say is that the equivalent to func_to_run involves sending a command over ssh and grabbing any err along with the output. For some reason, this works perfectly fine for a command that has a low execution time, but not well for a command that has a much larger execution time/output. I tried simulating this with the drastically different work values in my code here but haven't been able to reproduce similar results.
EDIT 3
Library code I'm using (again not mine) uses Popen.wait() for the ssh commands and I just read this:
Popen.wait()
Wait for child process to terminate. Set and return returncode attribute.
Warning This will deadlock when using stdout=PIPE and/or stderr=PIPE and the >child process generates enough output to a pipe such that it blocks waiting for >the OS pipe buffer to accept more data. Use communicate() to avoid that.
I adjusted the code to not buffer and just print as it is received and everything works.

I can change the old code to not use os.fork() but I'd also like to know why this caused this problem and if there's a workable solution.
The key to understanding the problem is knowing exactly what fork() does. CPython docs state "Fork a child process." but this presumes you understand the C library call fork().
Here's what glibc's manpage says about it:
fork() creates a new process by duplicating the calling process. The new process, referred to as the child, is an exact duplicate of the calling process, referred to as the parent, except for the following points: ...
It's basically as if you took your program and made a copy of its program state (heap, stack, instruction pointer, etc) with small differences and let it execute independent of the original. When this child process exits naturally, it will use exit() and that will trigger atexit() handlers registered by the multiprocessing module.
What can you do to avoid it?
omit os.fork(): use multiprocessing instead, like you are exploring now
probably effective: import multiprocessing after executing fork(), only in the child or parent as necessary.
use _exit() in the child (CPython docs state, "Note The standard way to exit is sys.exit(n). _exit() should normally only be used in the child process after a fork().")
https://docs.python.org/2/library/os.html#os._exit

In addition to the excellent solution from Cain, if you're facing the same situation as I was, where you can't control how the subprocesses are created, you can try to unregister the atexit function in your subprocesses to get rid of these messages:
import atexit
from multiprocessing.util import _exit_function
atexit.unregister(_exit_function)
ATTENTION: This may lead to leakage. For instance, if your subprocesses have their own children, they won't be cleared. So clearify your situation and test thoroughly afterwards.

It seems to me that you are threading it one time too many. I would not thread it from run_in_parallel, but simply calling run_server_client with the proper arguments, because they will thread inside.

Multiprocessing Advice

I've been trying to get my head around multiprocessing. The problem is all the examples I've come across don't seem to fit my scenario. I'd like to multiprocess or thread work that involves sharing a list from an argument, now of course I don't want an item from the said list being worked on twice so the work needs to be divided out to each new thread/process (or across processes).
Any advice on the approach I should be looking at would be appreciated.
I am aware my code below is not correct by any means, it is only to aid in visualising what I am trying to attempt to explain.
SUDO
def work_do(ip_list)
for ip in list
ping -c 4 ip
def mp_handler(ip_range):
p = multiprocessing.Pool(4)
p.map(work_do, args=(ip_range))
ip_list = [192.168.1.1-192.168.1.254]
mp_handler(ip_list)
EDITED:
Some Working Code
import multiprocessing
import subprocess
def job(ip_range):
p = subprocess.check_output(["ping", "-c", "4", ip])
print p
def mp_handler(ip_range):
p = multiprocessing.Pool(2)
p.map(job, ip_list)
ip_list = ("192.168.1.74", "192.168.1.254")
for ip in ip_list:
mp_handler(ip)
If you run the above code, you'll notice both IP's are run twice. How do I manage the processes to only work on unique data from the list?

What you are currently doing should pose no problem, but if you want to manually create the processes and then join them later on:
import subprocess
import multiprocessing as mp
# Creating our target function here
def do_work(args):
# dummy function
p = subprocess.check_output(["ping", "-c", "4", ip])
print(p)
# Your ip list
ip_list = ['8.8.8.8', '8.8.4.4']
procs = [] # Will contain references to our processes
for ip in ip_list:
# Creating a new process
p = mp.Process(target=do_work, args=(ip,))
# Appending to procs
procs.append(p)
# starting process
p.start()
# Waiting for other processes to join
for p in procs:
p.join()

To ping multiple ip addresses concurrently is easy using multiprocessing:
#!/usr/bin/env python
from multiprocessing.pool import ThreadPool # use threads
from subprocess import check_output
def ping(ip, timeout=10):
cmd = "ping -c4 -n -w {timeout} {ip}".format(**vars())
try:
result = check_output(cmd.split())
except Exception as e:
return ip, None, str(e)
else:
return ip, result, None
pool = ThreadPool(100) # no more than 100 pings at any single time
for ip, result, error in pool.imap_unordered(ping, ip_list):
if error is None: # no error
print(ip) # print ips that have returned 4 packets in timeout seconds
Note: I've used ThreadPool here as a convient way to limit number of concurrent pings. If you want to do all pings at once then you don't need neither threading nor multiprocessing modules because each ping is already in its own process. See Multiple ping script in Python.

Python threads hang and don't close

This is my first try with threads in Python,
I wrote the following program as a very simple example. It just gets a list and prints it using some threads. However, Whenever there is an error, the program just hangs in Ubuntu, and I can't seem to do anything to get the control prompt back, so have to restart another SSH session to get back in.
Also have no idea what the issue with my program is.
Is there some kind of error handling I can put in to ensure it doesn't hang.
Also, any idea why ctrl/c doesn't work (I don't have a break key)
from Queue import Queue
from threading import Thread
import HAInstances
import logging
log = logging.getLogger()
logging.basicConfig()
class GetHAInstances:
def oraHAInstanceData(self):
log.info('Getting HA instance routing data')
# HAData = SolrGetHAInstances.TalkToOracle.main()
HAData = HAInstances.main()
log.info('Query fetched ' + str(len(HAData)) + ' HA Instances to query')
# for row in HAData:
# print row
return(HAData)
def do_stuff(q):
while True:
print q.get()
print threading.current_thread().name
q.task_done()
oraHAInstances = GetHAInstances()
mainHAData = oraHAInstances.oraHAInstanceData()
q = Queue(maxsize=0)
num_threads = 10
for i in range(num_threads):
worker = Thread(target=do_stuff, args=(q,))
worker.setDaemon(True)
worker.start()
for row in mainHAData:
#print str(row[0]) + ':' + str(row[1]) + ':' + str(row[2]) + ':' + str(row[3])i
q.put((row[0],row[1],row[2],row[3]))
q.join()

In your thread method, it is recommended to use the "try ... except ... finally". This structure guarantees to return the control to the main thread even when errors occur.
def do_stuff(q):
while True:
try:
#do your works
except:
#log the error
finally:
q.task_done()
Also, in case you want to kill your program, go find out the pid of your main thread and use kill #pid to kill it. In Ubuntu or Mint, use ps -Ao pid,cmd, in the output, you can find out the pid (first column) by searching for the command (second column) you yourself typed to run your Python script.

Your q is hanging because your worker as errored. So your q.task_done() never got called.
import threading
to use
print threading.current_thread().name

parallelly execute blocking calls in python

I need to do a blocking xmlrpc call from my python script to several physical server simultaneously and perform actions based on response from each server independently.
To explain in detail let us assume following pseudo code
while True:
response=call_to_server1() #blocking and takes very long time
if response==this:
do that
I want to do this for all the servers simultaneously and independently but from same script

Use the threading module.

Boilerplate threading code (I can tailor this if you give me a little more detail on what you are trying to accomplish)
def run_me(func):
while not stop_event.isSet():
response= func() #blocking and takes very long time
if response==this:
do that
def call_to_server1():
#code to call server 1...
return magic_server1_call()
def call_to_server2():
#code to call server 2...
return magic_server2_call()
#used to stop your loop.
stop_event = threading.Event()
t = threading.Thread(target=run_me, args=(call_to_server1))
t.start()
t2 = threading.Thread(target=run_me, args=(call_to_server2))
t2.start()
#wait for threads to return.
t.join()
t2.join()
#we are done....

You can use multiprocessing module
import multiprocessing
def call_to_server(ip,port):
....
....
for i in xrange(server_count):
process.append( multiprocessing.Process(target=call_to_server,args=(ip,port)))
process[i].start()
#waiting process to stop
for p in process:
p.join()

You can use multiprocessing plus queues. With one single sub-process this is the example:
import multiprocessing
import time
def processWorker(input, result):
def remoteRequest( params ):
## this is my remote request
return True
while True:
work = input.get()
if 'STOP' in work:
break
result.put( remoteRequest(work) )
input = multiprocessing.Queue()
result = multiprocessing.Queue()
p = multiprocessing.Process(target = processWorker, args = (input, result))
p.start()
requestlist = ['1', '2']
for req in requestlist:
input.put(req)
for i in xrange(len(requestlist)):
res = result.get(block = True)
print 'retrieved ', res
input.put('STOP')
time.sleep(1)
print 'done'
To have more the one sub-process simply use a list object to store all the sub-processes you start.
The multiprocessing queue is a safe object.
Then you may keep track of which request is being executed by each sub-process simply storing the request associated to a workid (the workid can be a counter incremented when the queue get filled with new work). Usage of multiprocessing.Queue is robust since you do not need to rely on stdout/err parsing and you also avoid related limitation.
Then, you can also set a timeout on how long you want a get call to wait at max, eg:
import Queue
try:
res = result.get(block = True, timeout = 10)
except Queue.Empty:
print error

Use twisted.
It has a lot of useful stuff for work with network. It is also very good at working asynchronously.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speed up Python script's for loop - python

You can use threading or multiprocessing module to run aynchronous/non-blocking IO calls. Read about hte difference on SO: multiprocess or threading in python?

Related

Multiple stdout w/ flush going on in Python threading

Python multiprocessing - AssertionError: can only join a child process

Multiprocessing Advice

Python threads hang and don't close

parallelly execute blocking calls in python

Categories

Resources