The multiprocessing python module is unable to put back the processed results

The multiprocessing python module is unable to put back the processed results - python

I'm writing a TCP SYN scanner that checks for all the opened ports. The script is able to get all the opened ports by making use of multiple cores. At the end of the script, when trying to fetch the results using the get() method, the scripts becomes non-functional. On doing the keyboard interrupt, there appears a Traceback which is mentioned below the code. When I'm using 2 cores, the script runs fine but when loop is made to run for 3 or more times (utilizing 3 or more cores), the script gets stuck. Any suggestions on how to go further with this ?
==============Code is below=====================================
#!/usr/bin/python
import multiprocessing as mp
from scapy.all import *
import sys
import time
results = []
output = mp.Queue()
processes = []
def portScan(ports,output):
ip = sys.argv[1]
for port in range(ports-100,ports):
response = sr1(IP(dst=ip)/TCP(dport=port, flags="S"), verbose=False, timeout=.2)
if response:
if response[TCP].flags == 18 :
print "port number ======> %d <====== Status: OPEN" %(port)
output.put(port)
ports = 0
for loop in range(4):
ports += 100
print "Ports %d sent as the argument"%ports
processes.append(mp.Process(target=portScan,args=(ports,output)))
for p in processes:
p.start()
for p in processes:
p.join()
results = [output.get() for p in processes]
===========Output======================
./tcpSynmultiprocess.py 10.0.2.1
WARNING: No route found for IPv6 destination :: (no default route?)
Ports 100 sent as the argument
Ports 200 sent as the argument
Ports 300 sent as the argument
port number ======> 23 <====== Status: OPEN
port number ======> 80 <====== Status: OPEN
^CTraceback (most recent call last):
===========TraceBack===================
^CTraceback (most recent call last):
File "./tcpSynmultiprocess.py", line 43, in <module>
results = [output.get() for p in processes]
File "/usr/lib/python2.7/multiprocessing/queues.py", line 117, in get
res = self._recv()
KeyboardInterrupt

By default, Queue.get() blocks until it has data to return, which it won't if all the processes have already ended.
You can use output.get(False) to not block on processes that return nothing (you'll have to handle the Queue.Empty exception).
Or, since the queue size can also be bigger than the number of processes, you should rather use Queue.qsize() instead of processes:
results = [output.get() for x in range(output.qsize())]

#!/usr/bin/python
import multiprocessing as mp
from scapy.all import *
import sys
import time
results = []
output = mp.Queue()
processes = []
def portScan(ports,output):
ip = sys.argv[1]
for port in range(ports-100,ports):
response = sr1(IP(dst=ip)/TCP(dport=port, flags="S"), verbose=False, timeout=.2)
if response:
if response[TCP].flags == 18 :
print "port number ======> %d <====== Status: OPEN" %(port)
output.put(port)
ports = 0
for loop in range(4):
ports += 100
print "Ports %d sent as the argument"%ports
processes.append(mp.Process(target=portScan,args=(ports,output)))
for p in processes:
p.start()
for p in processes:
p.join()
for size in range(output.qsize()):
try:
results.append(output.get())
except:
print "Nothing fetched from the Queue..."
print results

Related

How can i get userinput in a thread without EOFError occuring in python?

I am trying to receive/send data at the same time, and my idea to doing this was
import multiprocessing
import time
from reprint import output
import time
import random
def receiveThread(queue):
while True:
queue.put(random.randint(0, 50))
time.sleep(0.5)
def sendThread(queue):
while True:
queue.put(input())
if __name__ == "__main__":
send_queue = multiprocessing.Queue()
receive_queue = multiprocessing.Queue()
send_thread = multiprocessing.Process(target=sendThread, args=[send_queue],)
receive_thread = multiprocessing.Process(target=receiveThread, args=[receive_queue],)
receive_thread.start()
send_thread.start()
with output(initial_len=2, interval=0) as output_lines:
while True:
output_lines[0] = "Received: {}".format(str(receive_queue.get()))
output_lines[1] = "Last Sent: {}".format(str(send_queue.get()))
#output_lines[2] = "Input: {}".format() i don't know how
#also storing the data in a file but that's irrelevant for here
This however results in
Received: 38 Process Process-1:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/mge/repos/python/post_ug/manual_post/main.py", line 14, in sendThread
queue.put(input())
EOFError: EOF when reading a line
I hope you see what I am trying to do but I will explain it some more: I want one thread that gets data from a server that I have replaced with the random.randint(), and I want one thread that, while the otherone is constantly checking for the data, is getting an input. I would like it to look somewhat like:
Received: 38 Received: 21 Received: 12
Last Sent: => Last Sent: Hello World! => Last Sent: Lorem Ipsum => ...
Input: Hello Wo Input: Lore Input:
But I have no Idea how to get it done. If I replace the queue.put(input()) with another queue.put(random.randint(0, 50)) the printing in the two lines will work as expected, but
how can I have an 'input field' in the bottom and
how can I get the Input without the EOF?

Looks like, according to your description: I want one thread that gets data from a server that I have replaced with the random.randint(), and I want one thread that, while the otherone is constantly checking for the data, is getting an input. what you really want to use is multi-threading, but in your code your are creating and executing 2 new Processes, instead of 2 new Threads. So, if what you want to use is multi-threading, do the following instead, replacing the use of multi-processing by the use of multi-threading:
from queue import Queue
import threading
import time
from reprint import output
import time
import random
def receiveThread(queue):
while True:
queue.put(random.randint(0, 50))
time.sleep(0.5)
def sendThread(queue):
while True:
queue.put(input())
if __name__ == "__main__":
send_queue = Queue()
receive_queue = Queue()
send_thread = threading.Thread(target=sendThread, daemon=True, args=(send_queue,))
receive_thread = threading.Thread(target=receiveThread, daemon=True, args=(receive_queue,))
receive_thread.start()
send_thread.start()
with output(initial_len=2, interval=0) as output_lines:
while True:
output_lines[0] = "Received: {}".format(str(receive_queue.get()))
output_lines[1] = "Last Sent: {}".format(str(send_queue.get()))
#output_lines[2] = "Input: {}".format() i don't know how
#also storing the data in a file but that's irrelevant for here
The instances of queue.Queue are thread-safe, so they can safely be used by multi-thread code, like in the code.

Running lots of processes modifying the same list with Python Multiprocessing results in OSError: Too many open files

I can't get the following code to execute successfully, always ending up with the following error:
OSError: [Errno 24] Too many open files
with the most recent call on p.start()
As you can see I already tried to execute the processes in chunks of 500, as they run fine when only 500 are executed, however after the second loop I still receive the above error.
I guess that the processes are not properly closed after execution, however I could not figure out how to check and properly close them...
here is my code:
import multiprocessing
manager = multiprocessing.Manager()
manager_lists = manager.list()
def multiprocessing_lists(a_list):
thread_lists = []
for r in range(400):
thread_lists.append(a_list)
manager_lists.extend(thread_lists)
manager = multiprocessing.Manager()
manager_lists = manager.list()
processes = []
counter = 0
chunk_size = 500
all_processes = 4000
for i in range(0, all_processes, chunk_size):
for j in range(chunk_size):
a_list = [1,2,3,4,5,6,7,8,9]
p = multiprocessing.Process(target=multiprocessing_lists, args=(a_list,))
processes.append(p)
p.start()
counter += 1
for process in processes:
process.join()
process.terminate()
print(len(manager_lists))

How can I terminate running jobs without closing connection to the core? (currently using execnet)

I have a cluster of computers which uses a master node to communicate with the slave nodes in the cluster.
The main problem I'm facing is using execnet is being able to kill certain jobs that are running and then having new jobs requeue on the same core that the other job just got terminated on (as I want to utilize all cores of the slave nodes at any given time).
As of now there is no way to terminate running jobs using execnet, so I figured if I could just kill the jobs manually through a bash script, say sudo kill 12345 where 12345 is the PID of the job (obtaining the PID of each job is another thing not supported by execnet, but that's another topic), then it would terminate the job and then requeue another on the same core that was just terminated on. It does kill the job correctly, however it closes the connection to that channel (the core; the master node communicates to each core individually) and then does not utilize that core anymore, until all jobs are done. Is there a way to terminate a running job, without killing the connection to the core?
Here is the script to submit jobs
import execnet, os, sys
import re
import socket
import numpy as np
import pickle, cPickle
from copy import deepcopy
import time
import job
def main():
print 'execnet source files are located at:\n {}/\n'.format(
os.path.join(os.path.dirname(execnet.__file__))
)
# Generate a group of gateways.
work_dir = '/home/mpiuser/pn2/'
f = 'cluster_core_info.txt'
n_start, n_end = 250000, 250008
ci = get_cluster_info(f)
group, g_labels = make_gateway_group(ci, work_dir)
mch = group.remote_exec(job)
args = range(n_start, n_end+1) # List of parameters to compute factorial.
manage_jobs(group, mch, queue, g_labels, args)
# Close the group of gateways.
group.terminate()
def get_cluster_info(f):
nodes, ncores = [], []
with open(f, 'r') as fid:
while True:
line = fid.readline()
if not line:
fid.close()
break
line = line.strip('\n').split()
nodes.append(line[0])
ncores.append(int(line[1]))
return dict( zip(nodes, ncores) )
def make_gateway_group(cluster_info, work_dir):
''' Generate gateways on all cores in remote nodes. '''
print 'Gateways generated:\n'
group = execnet.Group()
g_labels = []
nodes = list(cluster_info.keys())
for node in nodes:
for i in range(cluster_info[node]):
group.makegateway(
"ssh={0}//id={0}_{1}//chdir={2}".format(
node, i, work_dir
))
sys.stdout.write(' ')
sys.stdout.flush()
print list(group)[-1]
# Generate a string 'node-id_core-id'.
g_labels.append('{}_{}'.format(re.findall(r'\d+',node)[0], i))
print ''
return group, g_labels
def get_mch_id(g_labels, string):
ids = [x for x in re.findall(r'\d+', string)]
ids = '{}_{}'.format(*ids)
return g_labels.index(ids)
def manage_jobs(group, mch, queue, g_labels, args):
args_ref = deepcopy(args)
terminated_channels = 0
active_jobs, active_args = [], []
while True:
channel, item = queue.get()
if item == 'terminate_channel':
terminated_channels += 1
print " Gateway closed: {}".format(channel.gateway.id)
if terminated_channels == len(mch):
print "\nAll jobs done.\n"
break
continue
if item != "ready":
mch_id_completed = get_mch_id(g_labels, channel.gateway.id)
depopulate_list(active_jobs, mch_id_completed, active_args)
print " Gateway {} channel id {} returned:".format(
channel.gateway.id, mch_id_completed)
print " {}".format(item)
if not args:
print "\nNo more jobs to submit, sending termination request...\n"
mch.send_each(None)
args = 'terminate_channel'
if args and \
args != 'terminate_channel':
arg = args.pop(0)
idx = args_ref.index(arg)
channel.send(arg) # arg is copied by value to the remote side of
# channel to be executed. Maybe blocked if the
# sender queue is full.
# Get the id of current channel used to submit a job,
# this id can be used to refer mch[id] to terminate a job later.
mch_id_active = get_mch_id(g_labels, channel.gateway.id)
print "Job {}: {}! submitted to gateway {}, channel id {}".format(
idx, arg, channel.gateway.id, mch_id_active)
populate_list(active_jobs, mch_id_active,
active_args, arg)
def populate_list(jobs, job_active, args, arg_active):
jobs.append(job_active)
args.append(arg_active)
def depopulate_list(jobs, job_completed, args):
i = jobs.index(job_completed)
jobs.pop(i)
args.pop(i)
if __name__ == '__main__':
main()
and here is my job.py script:
#!/usr/bin/env python
import os, sys
import socket
import time
import numpy as np
import pickle, cPickle
import random
import job
def hostname():
return socket.gethostname()
def working_dir():
return os.getcwd()
def listdir(path):
return os.listdir(path)
def fac(arg):
return np.math.factorial(arg)
def dump(arg):
path = working_dir() + '/out'
if not os.path.exists(path):
os.mkdir(path)
f_path = path + '/fac_{}.txt'.format(arg)
t_0 = time.time()
num = fac(arg) # Main operation
t_1 = time.time()
cPickle.dump(num, open(f_path, "w"), protocol=2) # Main operation
t_2 = time.time()
duration_0 = "{:.4f}".format(t_1 - t_0)
duration_1 = "{:.4f}".format(t_2 - t_1)
#num2 = cPickle.load(open(f_path, "rb"))
return '--Calculation: {} s, dumping: {} s'.format(
duration_0, duration_1)
if __name__ == '__channelexec__':
channel.send("ready")
for arg in channel:
if arg is None:
break
elif str(arg).isdigit():
channel.send((
str(arg)+'!',
job.hostname(),
job.dump(arg)
))
else:
print 'Warnning! arg sent should be number | None'

Yes, you are on the right track. Use psutil library to manage the processes, find their pids etc.
And kill them. No need for involveing bash anywhere. Python covers it all.
Or, even better, program your script to terminate when master say so.
It is usually done that way.
You can even make it start another script before terminating itself if you want/need.
Or, if it is the same that you would be doing in another process, just stop the current work and start a new one in the script without terminating it at all.
And, if I may make a suggestion. Don't read your file line by line, read a whole file and then use *.splitlines(). For small files reading them in chunks just tortures the IO. You wouldn't be needing *.strip() as well. And you should remove unused imports too.

Python multiprocessing does not exit

I had a code that was running successfully, but takes too long to run. So I decided to try to parallelize it.
Here is a simplified version of the code:
import multiprocessing as mp
import os
import time
output = mp.Queue()
def calcSum(Nstart,Nstop,output):
pid = os.getpid()
for s in range(Nstart, Nstop):
file_name = 'model' + str(s) + '.pdb'
file = 'modelMap' + str(pid) + '.dat'
#does something with the contents of the pdb file
#creates another file by using some other library:
someVar.someFunc(file_name=file)
#uses a function to read the file
density += readFile(file)
os.remove(file)
print pid,s
output.put(density)
if __name__ == '__main__':
snapshots = int(sys.argv[1])
cpuNum = int(sys.argv[2])
rangeSet = np.zeros((cpuNum)) + snapshots//cpuNum
for i in range(snapshots%cpuNum):
rangeSet[i] +=1
processes = []
for c in range(cpuNum):
na,nb = (np.sum(rangeSet[:c])+1, np.sum(rangeSet[:c+1]))
processes.append(mp.Process(target=calcSum,args=(int(na),int(nb),output)))
for p in processes:
p.start()
print 'now i''m here'
results = [output.get() for p in processes]
print 'now i''m there'
for p in processes:
p.join()
print 'think i''l stay around'
t1 =time.time()
print len(results)
print (t1-t0)
I run this code with the command python run.py 10 4.
This code prints the pid and s successfully in the outer loop in calcSum. I can also see that two CPUs are at 100% in the terminal. What happens is that finally pid 5 and pid 10 are printed, then the CPU usage drops to zero, and nothing happens. None of the following print statements work, and the script still looks like it's running in the terminal. I'm guessing that the processes are not exited. Is that the case? How can I fix it?
Here's the complete output:
$ python run.py 10 4
now im here
9600
9601
9602
9603
9602 7
9603 9
9601 4
9600 1
now im there
9602 8
9600 2
9601 5
9603 10
9600 3
9601 6
At that point I have to stop termination with Ctrl+C.
A few other notes:
if I comment os.remove(file) out, I can see the created files in the directory
unfortunately, I cannot bypass the part in which a file is created and then read, within calcSum
EDIT At first it worked to switch output.get() and p.join(), but upon some other edits in the code, this is no longer working. I have updated the code above.

List of IP addresses/hostnames from local network in Python

How can I get a list of the IP addresses or host names from a local network easily in Python?
It would be best if it was multi-platform, but it needs to work on Mac OS X first, then others follow.
Edit: By local I mean all active addresses within a local network, such as 192.168.xxx.xxx.
So, if the IP address of my computer (within the local network) is 192.168.1.1, and I have three other connected computers, I would want it to return the IP addresses 192.168.1.2, 192.168.1.3, 192.168.1.4, and possibly their hostnames.

If by "local" you mean on the same network segment, then you have to perform the following steps:
Determine your own IP address
Determine your own netmask
Determine the network range
Scan all the addresses (except the lowest, which is your network address and the highest, which is your broadcast address).
Use your DNS's reverse lookup to determine the hostname for IP addresses which respond to your scan.
Or you can just let Python execute nmap externally and pipe the results back into your program.

Update: The script is now located on github.
I wrote a small python script, that leverages scapy's arping().

If you know the names of your computers you can use:
import socket
IP1 = socket.gethostbyname(socket.gethostname()) # local IP adress of your computer
IP2 = socket.gethostbyname('name_of_your_computer') # IP adress of remote computer
Otherwise you will have to scan for all the IP addresses that follow the same mask as your local computer (IP1), as stated in another answer.

I have collected the following functionality from some other threads and it works for me in Ubuntu.
import os
import socket
import multiprocessing
import subprocess
def pinger(job_q, results_q):
"""
Do Ping
:param job_q:
:param results_q:
:return:
"""
DEVNULL = open(os.devnull, 'w')
while True:
ip = job_q.get()
if ip is None:
break
try:
subprocess.check_call(['ping', '-c1', ip],
stdout=DEVNULL)
results_q.put(ip)
except:
pass
def get_my_ip():
"""
Find my IP address
:return:
"""
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.connect(("8.8.8.8", 80))
ip = s.getsockname()[0]
s.close()
return ip
def map_network(pool_size=255):
"""
Maps the network
:param pool_size: amount of parallel ping processes
:return: list of valid ip addresses
"""
ip_list = list()
# get my IP and compose a base like 192.168.1.xxx
ip_parts = get_my_ip().split('.')
base_ip = ip_parts[0] + '.' + ip_parts[1] + '.' + ip_parts[2] + '.'
# prepare the jobs queue
jobs = multiprocessing.Queue()
results = multiprocessing.Queue()
pool = [multiprocessing.Process(target=pinger, args=(jobs, results)) for i in range(pool_size)]
for p in pool:
p.start()
# cue hte ping processes
for i in range(1, 255):
jobs.put(base_ip + '{0}'.format(i))
for p in pool:
jobs.put(None)
for p in pool:
p.join()
# collect he results
while not results.empty():
ip = results.get()
ip_list.append(ip)
return ip_list
if __name__ == '__main__':
print('Mapping...')
lst = map_network()
print(lst)

For OSX (and Linux), a simple solution is to use either os.popen or os.system and run the arp -a command.
For example:
import os
devices = []
for device in os.popen('arp -a'): devices.append(device)
This will give you a list of the devices on your local network.

I found this network scanner in python article and wrote this short code. It does what you want! You do however need to know accessible ports for your devices. Port 22 is ssh standard and what I am using. I suppose you could loop over all ports. Some defaults are:
linux: [20, 21, 22, 23, 25, 80, 111, 443, 445, 631, 993, 995]
windows: [135, 137, 138, 139, 445]
mac: [22, 445, 548, 631]
import socket
def connect(hostname, port):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket.setdefaulttimeout(1)
result = sock.connect_ex((hostname, port))
sock.close()
return result == 0
for i in range(0,255):
res = connect("192.168.1."+str(i), 22)
if res:
print("Device found at: ", "192.168.1."+str(i) + ":"+str(22))
EDIT by TheLizzard:
Using the code above and adding threading:
from threading import Thread, Lock
from time import perf_counter
from sys import stderr
from time import sleep
import socket
# I changed this from "192.168.1.%i" to "192.168.0.%i"
BASE_IP = "192.168.0.%i"
PORT = 80
class Threader:
"""
This is a class that calls a list of functions in a limited number of
threads. It uses locks to make sure the data is thread safe.
Usage:
from time import sleep
def function(i):
sleep(2)
with threader.print_lock:
print(i)
threader = Threader(10) # The maximum number of threads = 10
for i in range(20):
threader.append(function, i)
threader.start()
threader.join()
This class also provides a lock called: `<Threader>.print_lock`
"""
def __init__(self, threads=30):
self.thread_lock = Lock()
self.functions_lock = Lock()
self.functions = []
self.threads = []
self.nthreads = threads
self.running = True
self.print_lock = Lock()
def stop(self) -> None:
# Signal all worker threads to stop
self.running = False
def append(self, function, *args) -> None:
# Add the function to a list of functions to be run
self.functions.append((function, args))
def start(self) -> None:
# Create a limited number of threads
for i in range(self.nthreads):
thread = Thread(target=self.worker, daemon=True)
# We need to pass in `thread` as a parameter so we
# have to use `<threading.Thread>._args` like this:
thread._args = (thread, )
self.threads.append(thread)
thread.start()
def join(self) -> None:
# Joins the threads one by one until all of them are done.
for thread in self.threads:
thread.join()
def worker(self, thread:Thread) -> None:
# While we are running and there are functions to call:
while self.running and (len(self.functions) > 0):
# Get a function
with self.functions_lock:
function, args = self.functions.pop(0)
# Call that function
function(*args)
# Remove the thread from the list of threads.
# This may cause issues if the user calls `<Threader>.join()`
# But I haven't seen this problem while testing/using it.
with self.thread_lock:
self.threads.remove(thread)
start = perf_counter()
# I didn't need a timeout of 1 so I used 0.1
socket.setdefaulttimeout(0.1)
def connect(hostname, port):
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
result = sock.connect_ex((hostname, port))
with threader.print_lock:
if result == 0:
stderr.write(f"[{perf_counter() - start:.5f}] Found {hostname}\n")
threader = Threader(10)
for i in range(255):
threader.append(connect, BASE_IP%i, PORT)
threader.start()
threader.join()
print(f"[{perf_counter() - start:.5f}] Done searching")
input("Press enter to exit.\n? ")

Try:
import socket
print ([ip for ip in socket.gethostbyname_ex(socket.gethostname())[2] if not ip.startswith("127.")][:1])

I have done following code to get the IP of MAC known device. This can be modified accordingly to obtain all IPs with some string manipulation. Hope this will help you.
#running windows cmd line statement and put output into a string
cmd_out = os.popen("arp -a").read()
line_arr = cmd_out.split('\n')
line_count = len(line_arr)
#search in all lines for ip
for i in range(0, line_count):
y = line_arr[i]
z = y.find(mac_address)
#if mac address is found then get the ip using regex matching
if z > 0:
ip_out= re.search('[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+', y, re.M | re.I)

I have just had the problem. I solved it like this:
import kthread #pip install kthread
from time import sleep
import subprocess
def getips():
ipadressen = {}
def ping(ipadresse):
try:
outputcap = subprocess.run([f'ping', ipadresse, '-n', '1'], capture_output=True) #sends only one package, faster
ipadressen[ipadresse] = outputcap
except Exception as Fehler:
print(Fehler)
t = [kthread.KThread(target = ping, name = f"ipgetter{ipend}", args=(f'192.168.0.{ipend}',)) for ipend in range(255)] #prepares threads
[kk.start() for kk in t] #starts 255 threads
while len(ipadressen) < 255:
print('Searching network')
sleep(.3)
alldevices = []
for key, item in ipadressen.items():
if not 'unreachable' in item.stdout.decode('utf-8') and 'failure' not in item.stdout.decode('utf-8'): #checks if there wasn't neither general failure nor 'unrechable host'
alldevices.append(key)
return alldevices
allips = getips() #takes 1.5 seconds on my pc

One of the answers in this question might help you. There seems to be a platform agnostic version for python, but I haven't tried it yet.

Here is a small tool scanip that will help you to get all ip addresses and their corresponding mac addresses in the network (Works on Linux).
https://github.com/vivkv/scanip

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

The multiprocessing python module is unable to put back the processed results - python

Related

How can i get userinput in a thread without EOFError occuring in python?

Running lots of processes modifying the same list with Python Multiprocessing results in OSError: Too many open files

How can I terminate running jobs without closing connection to the core? (currently using execnet)

Python multiprocessing does not exit

List of IP addresses/hostnames from local network in Python

Categories

Resources