Multiprocessing Advice

Multiprocessing Advice - python

I've been trying to get my head around multiprocessing. The problem is all the examples I've come across don't seem to fit my scenario. I'd like to multiprocess or thread work that involves sharing a list from an argument, now of course I don't want an item from the said list being worked on twice so the work needs to be divided out to each new thread/process (or across processes).
Any advice on the approach I should be looking at would be appreciated.
I am aware my code below is not correct by any means, it is only to aid in visualising what I am trying to attempt to explain.
SUDO
def work_do(ip_list)
for ip in list
ping -c 4 ip
def mp_handler(ip_range):
p = multiprocessing.Pool(4)
p.map(work_do, args=(ip_range))
ip_list = [192.168.1.1-192.168.1.254]
mp_handler(ip_list)
EDITED:
Some Working Code
import multiprocessing
import subprocess
def job(ip_range):
p = subprocess.check_output(["ping", "-c", "4", ip])
print p
def mp_handler(ip_range):
p = multiprocessing.Pool(2)
p.map(job, ip_list)
ip_list = ("192.168.1.74", "192.168.1.254")
for ip in ip_list:
mp_handler(ip)
If you run the above code, you'll notice both IP's are run twice. How do I manage the processes to only work on unique data from the list?

What you are currently doing should pose no problem, but if you want to manually create the processes and then join them later on:
import subprocess
import multiprocessing as mp
# Creating our target function here
def do_work(args):
# dummy function
p = subprocess.check_output(["ping", "-c", "4", ip])
print(p)
# Your ip list
ip_list = ['8.8.8.8', '8.8.4.4']
procs = [] # Will contain references to our processes
for ip in ip_list:
# Creating a new process
p = mp.Process(target=do_work, args=(ip,))
# Appending to procs
procs.append(p)
# starting process
p.start()
# Waiting for other processes to join
for p in procs:
p.join()

To ping multiple ip addresses concurrently is easy using multiprocessing:
#!/usr/bin/env python
from multiprocessing.pool import ThreadPool # use threads
from subprocess import check_output
def ping(ip, timeout=10):
cmd = "ping -c4 -n -w {timeout} {ip}".format(**vars())
try:
result = check_output(cmd.split())
except Exception as e:
return ip, None, str(e)
else:
return ip, result, None
pool = ThreadPool(100) # no more than 100 pings at any single time
for ip, result, error in pool.imap_unordered(ping, ip_list):
if error is None: # no error
print(ip) # print ips that have returned 4 packets in timeout seconds
Note: I've used ThreadPool here as a convient way to limit number of concurrent pings. If you want to do all pings at once then you don't need neither threading nor multiprocessing modules because each ping is already in its own process. See Multiple ping script in Python.

Related

multiprocessing, threading gets stuck and printing output gets messed up

I'm running multiple threads in python. I've tried using threading module, multiprocessing module. Even though the execution gives the correct result, everytime the terminal gets stuck and printing of the output gets messed up.
Here's a simplified version of the code.
import subprocess
import threading
import argparse
import sys
result = []
def check_thread(args,components,id):
for i in components:
cmd = <command to be given to terminal>
output = subprocess.check_output([cmd],shell=True)
result.append((id,i,output))
def check(args,components):
# lock = threading.Lock()
# lock = threading.Semaphore(value=1)
thread_list = []
for id in range(3):
t=threading.Thread(target=check_thread, args=(args,components,i))
thread_list.append(t)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
for res in result:
print(res)
return res
if __name__ == 'main':
parser = argparse.ArgumentParser(....)
parser.add_argument(.....)
args = parser.parse_args()
components = ['comp1','comp2']
while True:
print('SELECTION MENU\n1)\n2)\n')
option = raw_input('Enter option')
if option=='1':
res = check(args, components)
if option=='2':
<do something else>
else:
sys.exit(0)
I've tried using multiprocessing module with Process, pool. Tried passing a lock to check_thread, tried returning a value from check_thread() and using a queue to take in the values, but everytime it's the same result, execution is successful but the terminal gets stuck and printed output is shabby.
Is there any fix to this? I'm using python 2.7. I'm using a linux terminal.
Here is how the shabby output looks
output

You should use queue module not list.
import multiprocessing as mp
# Define an output queue
output = mp.Queue()
# define a example function
def function(params, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
# Process params and store results in res variable
output.put(res)
# Setup a list of processes that we want to run
processes = [mp.Process(target=function, args=(5, output)) for x in range(10)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)

Python - How to run array batches

I am new to Python and I am currently working on a multiple webpage scraper. While I was playing around with Python I found out about Threading, this really speeds up the code. The problem is that the script scrapes lots of sites and I like to this in an array of 'batches' while using Threading.
When I've got an array of 1000 items I'd like to grab 10 items. When the script is done with these 10 items, grab 10 new items until there is nothing left
I hope someone can help me. Thanks in advance!
import subprocess
import threading
from multiprocessing import Pool
def scrape(url):
return subprocess.call("casperjs test.js --url=" + url, shell=True)
if __name__ == '__main__':
pool = Pool()
sites = ["http://site1.com", "http://site2.com", "http://site3.com", "http://site4.com"]
results = pool.imap(scrape, sites)
for result in results:
print(result)
In the future I use an sqlite database where I store all the URLs (this will replace the array). When I run the script I want the control of stopping the process and continue whenever I want to. This is not my question but the context of my problem.

Question: ... array of 1000 items I'd like to grab 10 items
for p in range(0, 1000, 10):
process10(sites[p:p+10])

How about using Process and Queue from multiprocessing? Writing a worker function and calling it from a loop makes it run as batches. With Process, jobs can be started and stopped when needed and give you more control over them.
import subprocess
from multiprocessing import Process, Queue
def worker(url_queue, result_queue):
for url in iter(url_queue.get, 'DONE'):
scrape_result = scrape(url)
result_queue.put(scrape_result)
def scrape(url):
return subprocess.call("casperjs test.js --url=" + url, shell=True)
if __name__ == '__main__':
sites = ['http://site1.com', "http://site2.com", "http://site3.com", "http://site4.com", "http://site5.com",
"http://site6.com", "http://site7.com", "http://site8.com", "http://site9.com", "http://site10.com",
"http://site11.com", "http://site12.com", "http://site13.com", "http://site14.com", "http://site15.com",
"http://site16.com", "http://site17.com", "http://site18.com", "http://site19.com", "http://site20.com"]
url_queue = Queue()
result_queue = Queue()
processes = []
for url in sites:
url_queue.put(url)
for i in range(10):
p = Process(target=worker, args=(url_queue, result_queue))
p.start()
processes.append(p)
url_queue.put('DONE')
for p in processes:
p.join()
result_queue.put('DONE')
for response in iter(result_queue.get, 'DONE'):
print response
Note that Queue is the FIFO Queue that supports putting and pulling elements.

Speed up Python script's for loop

Assuming you got something like this (copied from here):
#!/usr/bin/python
from scapy.all import *
TIMEOUT = 2
conf.verb = 0
for ip in range(0, 256):
packet = IP(dst="192.168.0." + str(ip), ttl=20)/ICMP()
reply = sr1(packet, timeout=TIMEOUT)
if not (reply is None):
print reply.src, "is online"
else:
print "Timeout waiting for %s" % packet[IP].src
There is no need to wait for each ping to finish before trying the next host. Could I put the loop interior each time into the background along the lines of the & in:
for ip in 192.168.0.{0..255}; do
ping -c 1 $ip &
done

The first thing you should do is change your range to range(0, 256) so that it is inclusive of 0-255.
Second, you're looking at Python's threading, which can be somewhat similar to Bash process daemonization at an abstract level.
Import multiprocessing and create a pool:
from multiprocessing.pool import ThreadPool
pool = ThreadPool(20) # However many you wish to run in parallel
So take the ping lookup, which is everything inside of the for loop, and make it a function.
def ping(ip):
packet = IP(dst="192.168.0." + str(ip), ttl=20)/ICMP()
reply = sr1(packet, timeout=TIMEOUT)
if not (reply is None):
print reply.src, "is online"
else:
print "Timeout waiting for %s" % packet[IP].src
Then in your for loop,
for ip in range(0, 256):
pool.apply_async(ping, (ip,))
pool.close()
pool.join()
pool.join() is what waits for all of your threads to return.

You can use threading or multiprocessing module to run aynchronous/non-blocking IO calls.
Read about hte difference on SO:
multiprocess or threading in python?

How to launch multiple other python scripts all together from one and send them arguments?

I've to launch and execute 24 independent python scripts on windows 7. I want that one script launches them all at the same time... without ruling them all (I'm not Sauron) or waiting their ends. I find os.startfile() interesting for that. But I did not succeed in sending arguments to those 24.
coincoin1.py (one of the 24 script to be launched)
import sys
print "hello:",sys.argv
Anti_Sauron_script.py (the one that will launch the 24 all together)
sys.argv=["send","those","arguments"]
os.startfile("C:\\Users\\coincoin1.py")
How to send arguments to those scripts and launch them all together?

You may use an indipendent process (multiprocessing.Process) and using two queues to communicate with it (multiprocessing.Queue) one for the input and the other one for the output.
Example on starting the process:
import multiprocessing
def processWorker(input, result):
work = input.get()
## execute your command here
pipe = subprocess.Popen(command, stdout = subprocess.PIPE,
stderr = subprocess.PIPE, shell = True)
stdout, stderr = pipe.communicate()
result.put(pipe.returncode)
input = multiprocessing.Queue()
result = multiprocessing.Queue()
p = multiprocessing.Process(target = processWorker, args = (input, result))
p.start()
commandlist = ['ls -l /', 'ls -l /tmp/']
for command in commandlist:
input.put(command)
for i in xrange(len(commandlist)):
res = result.get(block = True)
if not res is 0:
print 'One command failed'
Then you may keep track of which command is being executed by each subprocess simply storing the command associated to a workid (the workid can be a counter incremented when the queue get filled with new work).
Usage of multiprocessing.Queue is robust since you do not need to rely on stdout/err parsing and you also avoid related limitation.
Moreover you can easily manage more subprocesses.
Then, you can also set a timeout on how long you want a get call to wait at max, eg:
import Queue
try:
res = result.get(block = True, timeout = 10)
except Queue.Empty:
print error

You want to use the subprocess module: http://docs.python.org/library/subprocess.html and specifically the first example in this subsection on spawning processes without waiting for them to finish http://docs.python.org/library/subprocess.html#replacing-the-os-spawn-family

something like this?
from subprocess import Popen, PIPE
python_scripts = ['coincoin1.py','coincoin2.py','coincoin3.py'...]
args = ' -w hat -e ver'
procs = []
for f in python_scripts:
procs.append(Popen(f+args, shell=True,stdout=PIPE,stderr=PIPE))
results = []
while procs:
results.append (procs.pop(0).communicate())
do_something_with_results(resuls)

Use the call function from the subprocess module (http://docs.python.org/library/subprocess.html#module-subprocess).
import subprocess
subprocess.call([path, arg1, arg2...])

parallelly execute blocking calls in python

I need to do a blocking xmlrpc call from my python script to several physical server simultaneously and perform actions based on response from each server independently.
To explain in detail let us assume following pseudo code
while True:
response=call_to_server1() #blocking and takes very long time
if response==this:
do that
I want to do this for all the servers simultaneously and independently but from same script

Use the threading module.

Boilerplate threading code (I can tailor this if you give me a little more detail on what you are trying to accomplish)
def run_me(func):
while not stop_event.isSet():
response= func() #blocking and takes very long time
if response==this:
do that
def call_to_server1():
#code to call server 1...
return magic_server1_call()
def call_to_server2():
#code to call server 2...
return magic_server2_call()
#used to stop your loop.
stop_event = threading.Event()
t = threading.Thread(target=run_me, args=(call_to_server1))
t.start()
t2 = threading.Thread(target=run_me, args=(call_to_server2))
t2.start()
#wait for threads to return.
t.join()
t2.join()
#we are done....

You can use multiprocessing module
import multiprocessing
def call_to_server(ip,port):
....
....
for i in xrange(server_count):
process.append( multiprocessing.Process(target=call_to_server,args=(ip,port)))
process[i].start()
#waiting process to stop
for p in process:
p.join()

You can use multiprocessing plus queues. With one single sub-process this is the example:
import multiprocessing
import time
def processWorker(input, result):
def remoteRequest( params ):
## this is my remote request
return True
while True:
work = input.get()
if 'STOP' in work:
break
result.put( remoteRequest(work) )
input = multiprocessing.Queue()
result = multiprocessing.Queue()
p = multiprocessing.Process(target = processWorker, args = (input, result))
p.start()
requestlist = ['1', '2']
for req in requestlist:
input.put(req)
for i in xrange(len(requestlist)):
res = result.get(block = True)
print 'retrieved ', res
input.put('STOP')
time.sleep(1)
print 'done'
To have more the one sub-process simply use a list object to store all the sub-processes you start.
The multiprocessing queue is a safe object.
Then you may keep track of which request is being executed by each sub-process simply storing the request associated to a workid (the workid can be a counter incremented when the queue get filled with new work). Usage of multiprocessing.Queue is robust since you do not need to rely on stdout/err parsing and you also avoid related limitation.
Then, you can also set a timeout on how long you want a get call to wait at max, eg:
import Queue
try:
res = result.get(block = True, timeout = 10)
except Queue.Empty:
print error

Use twisted.
It has a lot of useful stuff for work with network. It is also very good at working asynchronously.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing Advice - python

Related

multiprocessing, threading gets stuck and printing output gets messed up

Python - How to run array batches

Speed up Python script's for loop

How to launch multiple other python scripts all together from one and send them arguments?

parallelly execute blocking calls in python

Categories

Resources