Python Scapy vs dpkt - python

I am trying to analyse packets using Python's Scapy from the beginning. Upon recent searching, I found there is another module in python named as dpkt. With this module I can parse the layers of a packet, create packets, read a .pcap file and write into a .pcap file. The difference I found among them is:
Missing of live packet sniffer in dpkt
Some of the fields need to be unpacked using struct.unpack in dpkt.
Is there any other differences I am missing?

Scapy is a better performer than dpkt.
You can create, sniff, modify and send a packet using scapy. While dpkt can only analyse packets and create them. To send them, you need raw sockets.
As you mentioned, Scapy can sniff live. It can sniff from a network as well as can read a .pcap file using the rdpcap method or offline parameter of sniff method.
Scapy is generally used to create packet analyser and injectors. Its modules can be used to create a specific application for a specific purpose.
There might be many other differences also.

I don't understand why people say that Scapy is better performer. I quickly checked as shown below and the winner is dpkt. It's dpkt > scapy > pyshark.
My input pcap file used for testing is about 12.5 MB. The time is derived with bash time command time python testing.py. In each snippet I ensure that the packet is indeed decoded from raw bites. One can assign variable FILENAME with the needed pcap-file name.
dpkt
from dpkt.pcap import *
from dpkt.ethernet import *
import os
readBytes = 0
fileSize = os.stat(FILENAME).st_size
with open(FILENAME, 'rb') as f:
for t, pkt in Reader(f):
readBytes += len(Ethernet(pkt))
print("%.2f" % (float(readBytes) / fileSize * 100))
The average time is about 0.3 second.
scapy -- using PcapReader
from scapy.all import *
import os
readBytes = 0
fileSize = os.stat(FILENAME).st_size
for pkt in PcapReader(FILENAME):
readBytes += len(pkt)
print("%.2f" % (float(readBytes) / fileSize * 100))
The average time is about 4.5 seconds.
scapy -- using RawPcapReader
from scapy.all import *
import os
readBytes = 0
fileSize = os.stat(FILENAME).st_size
for pkt, (sec, usec, wirelen, c) in RawPcapReader(FILENAME):
readBytes += len(Ether(pkt))
print("%.2f" % (float(readBytes) / fileSize * 100))
The average time is about 4.5 seconds.
pyshark
import pyshark
import os
filtered_cap = pyshark.FileCapture(FILENAME)
readBytes = 0
fileSize = os.stat(FILENAME).st_size
for pkt in filtered_cap:
readBytes += int(pkt.length)
print("%.2f" % (float(readBytes) / fileSize * 100))
The average time is about 12 seconds.
I do not advertise dpkt at all -- I do not care. The point is that I need to parse 8GB files currently. So I checked that with dpkt the above-written code for a 8GB pcap-file is done for 4.5 minutes which is bearable, while I would not even wait for other libraries to ever finish. At least, this is my quick first impression. If I have some new information I will update the post.

Related

Deserializing messages without loading entire file into memory?

I am using Google Protocol Buffers and Python to decode some large data files--200MB each. I have some code below that shows how to decode a delimited stream and it works just fine. However it uses the read() command which loads the whole file into memory and then iterates over it.
import feed_pb2 as sfeed
import sys
from google.protobuf.internal.encoder import _VarintBytes
from google.protobuf.internal.decoder import _DecodeVarint32
with open('/home/working/data/feed.pb', 'rb') as f:
buf = f.read() ## PROBLEM-LOADS ENTIRE FILE TO MEMORY.
n = 0
while n < len(buf):
msg_len, new_pos = _DecodeVarint32(buf, n)
n = new_pos
msg_buf = buf[n:n+msg_len]
n += msg_len
read_row = sfeed.standard_feed()
read_row.ParseFromString(msg_buf)
# do something with read_metric
print(read_row)
Note that this code comes from another SO post, but I don't remember the exact url. I was wondering if there was a readlines() equivalent with protocol buffers that allows me to read in one delimited message at a time and decode it? I basically want a pipeline that is not limited by the RAM I have to load the file.
Seems like there was a pystream-protobuf package that supported some of this functionality, but it has not been updated in a year or two. There is also a post from 7 years ago that asked a similar question. But I was wondering if there was any new information since then.
python example for reading multiple protobuf messages from a stream
If it is ok to load one full message at a time, this is quite simple to implement by modifying the code you posted:
import feed_pb2 as sfeed
import sys
from google.protobuf.internal.encoder import _VarintBytes
from google.protobuf.internal.decoder import _DecodeVarint32
with open('/home/working/data/feed.pb', 'rb') as f:
buf = f.read(10) # Maximum length of length prefix
while buf:
msg_len, new_pos = _DecodeVarint32(buf, 0)
buf = buf[new_pos:]
# read rest of the message
buf += f.read(msg_len - len(buf))
read_row = sfeed.standard_feed()
read_row.ParseFromString(buf)
buf = buf[msg_len:]
# do something with read_metric
print(read_row)
# read length prefix for next message
buf += f.read(10 - len(buf))
This reads 10 bytes, which is enough to parse the length prefix, and then reads the rest of the message once its length is known.
String mutations are not very efficient in Python (they make a lot of copies of the data), so using bytearray can improve performance if your individual messages are also large.
https://github.com/cartoonist/pystream-protobuf/ was updated 6 months ago. I haven't tested it much so far, but it seems to work fine without any need for an update. It provides optional gzip and async.

Write packets captured with scapy sniff in time intervals

I’m trying to dump packets to a file captured by scapy sniff function every 10 second to no avail.
That is possible with tcpdump like: tcpdump -s 0 -i <interface> -G 10 -w <output.pcap>.
G flag is the rotate_seconds.
Is this achievable with scapy?
Of course it is. Have a look at the wrpcap() documentation.
Essentially, you will simply build a callback function that receives packets and takes actions. Here's a very simple example that is not necessarily intended to be functional. (I'm writing it on the fly here) This should save a cap file every 100 packets. You would simply need to change the logic to be time based instead of packet count based.
#!/usr/bin/env python
from scapy import sniff
pendingPackets = []
baseFilename = "capture-"
totalPackets = 0
def handle_packet(packet):
pendingPackets.append(packet)
totalPackets += 1
if len(pendingPackets) >= 100:
filename = baseFilename + str(totalPackets) + ".pcap"
wrpcap(filename, pendingPackets)
pendingPackets = []
sniff(filter="ip", prn=handle_packet)

Reading PCAP file with scapy

I have about 10GB pcap data with IPv6 traffic to analyze infos stored in IPv6 header and other extension header. To do this I decided to use Scapy framework. I tried rdpcap function , but for such big files it is not recommended. It tries to load all file into memory and get stuck in my case.
I found in the Net that in such situation sniff is recommended, my code look like:
def main():
sniff(offline='traffic.pcap', prn=my_method,store=0)
def my_method(packet):
packet.show()
In function called my_method I receive each packet separately and I can parse them, but....
When I call show function with is in-build framework method I got sth like this:
When opened in wireshark I got properly looking packet:
Could you tell me how to parse this packets in scapy to get proper results?
EDIT:
According to the discussion in comments I found a way to parse PCAP file with Python. In my opinion the easies way is to use pyshark framework:
import pyshark
pcap = pyshark.FileCapture(pcap_path) ### for reading PCAP file
It is possible to easily iterate read file with for loop
for pkt in pcap:
#do what you want
For parsing IPv6 header following methods may be useful:
pkt['ipv6'].tclass #Traffic class field
pkt['ipv6'].tclass_dscp #Traffic class DSCP field
pkt['ipv6'].tclass_ecn #Traffic class ECN field
pkt['ipv6'].flow #Flow label field
pkt['ipv6'].plen #Payload length field
pkt['ipv6'].nxt #Next header field
pkt['ipv6'].hlim #Hop limit field
Update
The latest scapy versions now support ipv6 parsing.
So to parse an ipv6 ".pcap" file with scapy now it can be done like so:
from scapy.all import *
scapy_cap = rdpcap('file.pcap')
for packet in scapy_cap:
print packet[IPv6].src
Now as I had commented back when this question was originally asked, for older
scapy versions (that don't support ipv6 parsing):
pyshark can be used instead (pyshark is a tshark wrapper) like so:
import pyshark
shark_cap = pyshark.FileCapture('file.pcap')
for packet in shark_cap:
print packet.ipv6.src
or even of course tshark (kind of the terminal version of wireshark):
$ tshark -r file.pcap -q -Tfields -e ipv6.src
If you want to keep using scapy and read the file Iteratively I'd recommend you to give it a shot to PcapReader()
It would do the same you tried to do with pyshark but in Scapy
from scapy.all import *
for packet in PcapReader('file.pcap')
try:
print(packet[IPv6].src)
except:
pass
I'd recommend wrapping this around just as a failsafe if you have any packet that does not have an IPv6 address.

How to parse Ethernet Header of pcap file using Python?

I would like to decode the link-layer type and version of packets in a pcap file using Python. So, I have to parse pcap using Python. Here is my code.
import dpkt
import socket
import sys
f = open('filename')
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
tcp = ip.data
print ts, len(buf)
print eth
print ip
print tcp
f.close()
The dpkt python library is maybe not the best tools for this task, according to their website dpkt is best used for:
fast, simple packet creation / parsing, with definitions for the basic TCP/IP protocols.
Use instead scapy which is a
powerful interactive packet manipulation program for Python.
You can use a script like this
from scapy.all import *
packets = rdpcap('tmp.pcap')
for p in packets:
(p/Ether()).show()
Read Infinite possibilities with Python's Scapy Module for more details.

Scapy PcapReader and packets time

I'm reading a PCAP file using Scapy using a script such as the (semplified) following one:
#! /usr/bin/env python
from scapy.all import *
# ...
myreader = PcapReader(myinputfile)
for p in myreader:
pkt = p.payload
print pkt.time
In this case the packets time is not relative to PCAP capture time, but starts from the instant I've launched my script.
I'd like to start from 0.0 or to be relative to the PCAP capture.
How can I fix it (possibly without "manually" retrieving the first packet time and repeatedly using math to fix the problem)?
I saw that using pkt.time is wrong, in this case.
I should print p.time instead.

Categories