How to parse Ethernet Header of pcap file using Python? - python

I would like to decode the link-layer type and version of packets in a pcap file using Python. So, I have to parse pcap using Python. Here is my code.
import dpkt
import socket
import sys
f = open('filename')
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
tcp = ip.data
print ts, len(buf)
print eth
print ip
print tcp
f.close()

The dpkt python library is maybe not the best tools for this task, according to their website dpkt is best used for:
fast, simple packet creation / parsing, with definitions for the basic TCP/IP protocols.
Use instead scapy which is a
powerful interactive packet manipulation program for Python.
You can use a script like this
from scapy.all import *
packets = rdpcap('tmp.pcap')
for p in packets:
(p/Ether()).show()
Read Infinite possibilities with Python's Scapy Module for more details.

Related

How to get TCP-Timestamp (TSval) using python

I've searched in several places, but I didn't find a simple answer to this question -
I have a .pcap file, generated using Wireshark, with several packets in it, and I wish to extract from each packet it's TCP-Timestamp (TSval).
I've managed to open each packet using scapy
packets = rdpcap('pcap_file.pcap')
for packet in packets:
print(packet.payload.id)
but I can't find the TSval of the packet (even though I can see the TSval field in the packet when I open it with Wireshark as shown in the picture below).
Packets can be accessed like dictionaries whose keys are protocols and values are payloads. For instance you can print the TCP payload in a packet like this:
if TCP in packet:
packet[TCP].show()
Now to get the TSval of the payload you have to look in TCP options. Each TCP option is encoded by scapy as a couple (option name, option value). For the timestamp option, the option value is itself a couple (TSval, TSecr). So you can basically get what you want doing the following:
from scapy.all import TCP, rdpcap
packets = rdpcap('packets.pcapng')
for packet in packets:
if TCP in packet: # ignore packets without TCP payload
for opt, val in packet[TCP].options: # consider all TCP options
if opt == 'Timestamp':
TSval, TSecr = val # decode the value of the option
print('TSval =', TSval)

Reading PCAP file with scapy

I have about 10GB pcap data with IPv6 traffic to analyze infos stored in IPv6 header and other extension header. To do this I decided to use Scapy framework. I tried rdpcap function , but for such big files it is not recommended. It tries to load all file into memory and get stuck in my case.
I found in the Net that in such situation sniff is recommended, my code look like:
def main():
sniff(offline='traffic.pcap', prn=my_method,store=0)
def my_method(packet):
packet.show()
In function called my_method I receive each packet separately and I can parse them, but....
When I call show function with is in-build framework method I got sth like this:
When opened in wireshark I got properly looking packet:
Could you tell me how to parse this packets in scapy to get proper results?
EDIT:
According to the discussion in comments I found a way to parse PCAP file with Python. In my opinion the easies way is to use pyshark framework:
import pyshark
pcap = pyshark.FileCapture(pcap_path) ### for reading PCAP file
It is possible to easily iterate read file with for loop
for pkt in pcap:
#do what you want
For parsing IPv6 header following methods may be useful:
pkt['ipv6'].tclass #Traffic class field
pkt['ipv6'].tclass_dscp #Traffic class DSCP field
pkt['ipv6'].tclass_ecn #Traffic class ECN field
pkt['ipv6'].flow #Flow label field
pkt['ipv6'].plen #Payload length field
pkt['ipv6'].nxt #Next header field
pkt['ipv6'].hlim #Hop limit field
Update
The latest scapy versions now support ipv6 parsing.
So to parse an ipv6 ".pcap" file with scapy now it can be done like so:
from scapy.all import *
scapy_cap = rdpcap('file.pcap')
for packet in scapy_cap:
print packet[IPv6].src
Now as I had commented back when this question was originally asked, for older
scapy versions (that don't support ipv6 parsing):
pyshark can be used instead (pyshark is a tshark wrapper) like so:
import pyshark
shark_cap = pyshark.FileCapture('file.pcap')
for packet in shark_cap:
print packet.ipv6.src
or even of course tshark (kind of the terminal version of wireshark):
$ tshark -r file.pcap -q -Tfields -e ipv6.src
If you want to keep using scapy and read the file Iteratively I'd recommend you to give it a shot to PcapReader()
It would do the same you tried to do with pyshark but in Scapy
from scapy.all import *
for packet in PcapReader('file.pcap')
try:
print(packet[IPv6].src)
except:
pass
I'd recommend wrapping this around just as a failsafe if you have any packet that does not have an IPv6 address.

Parsing a PCAP File in python

I am trying to parse a Pcap file in python. When i run this code
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
print eth
I get junk values instead of getting the following output:
Ethernet(src='\x00\x1a\xa0kUf', dst='\x00\x13I\xae\x84,',
data=IP(src='\xc0\xa8\n\n', off=16384, dst='C\x17\x030', sum=25129,
len=52, p=6, id=51105, data=TCP(seq=9632694, off_x2=128,
ack=3382015884, win=54, sum=65372, flags=17, dport=80, sport=56145)))
can anyone please tell me how to get this above output?
Be sure the file is opened to read as binary.
https://stackoverflow.com/a/15746971
f = open(pcapfile, 'rb')
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
print(eth)
If the link-layer header type of the file isn't Ethernet, you will not get useful information if you try to parse the packets as Ethernet packets. The dpkt documentation isn't very good, but there's some way to get the link-layer header type; before any program reading a pcap file makes any attempt to get anything from the raw packet data, it must determine the link-layer header type in the file, and base the way it extracts information from the raw packet data on the link-layer header type (or quit if the file doesn't have a link-layer header type that it can parse).
(And feel free to tell Mr. Oberheide that his code is broken because it's not checking the link-layer header type!)
What you have tried to do only works in a python REPL shell. When you want it to work from a python script, you need to call the repr method like so:
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
print repr(eth) # this is key
Ethernet(src='\x00\x1a\xa0kUf', dst='\x00\x13I\xae\x84,', data=IP(src='\xc0\xa8\n\n', off=16384, dst='C\x17\x030', sum=25129,
len=52, p=6, id=51105, data=TCP(seq=9632694, off_x2=128,
ack=3382015884, win=54, sum=65372, flags=17, dport=80, sport=56145)))
I am assuming that you have a pcap which has proper Ethernet packets, and you have checked like link-layer.

Python Scapy vs dpkt

I am trying to analyse packets using Python's Scapy from the beginning. Upon recent searching, I found there is another module in python named as dpkt. With this module I can parse the layers of a packet, create packets, read a .pcap file and write into a .pcap file. The difference I found among them is:
Missing of live packet sniffer in dpkt
Some of the fields need to be unpacked using struct.unpack in dpkt.
Is there any other differences I am missing?
Scapy is a better performer than dpkt.
You can create, sniff, modify and send a packet using scapy. While dpkt can only analyse packets and create them. To send them, you need raw sockets.
As you mentioned, Scapy can sniff live. It can sniff from a network as well as can read a .pcap file using the rdpcap method or offline parameter of sniff method.
Scapy is generally used to create packet analyser and injectors. Its modules can be used to create a specific application for a specific purpose.
There might be many other differences also.
I don't understand why people say that Scapy is better performer. I quickly checked as shown below and the winner is dpkt. It's dpkt > scapy > pyshark.
My input pcap file used for testing is about 12.5 MB. The time is derived with bash time command time python testing.py. In each snippet I ensure that the packet is indeed decoded from raw bites. One can assign variable FILENAME with the needed pcap-file name.
dpkt
from dpkt.pcap import *
from dpkt.ethernet import *
import os
readBytes = 0
fileSize = os.stat(FILENAME).st_size
with open(FILENAME, 'rb') as f:
for t, pkt in Reader(f):
readBytes += len(Ethernet(pkt))
print("%.2f" % (float(readBytes) / fileSize * 100))
The average time is about 0.3 second.
scapy -- using PcapReader
from scapy.all import *
import os
readBytes = 0
fileSize = os.stat(FILENAME).st_size
for pkt in PcapReader(FILENAME):
readBytes += len(pkt)
print("%.2f" % (float(readBytes) / fileSize * 100))
The average time is about 4.5 seconds.
scapy -- using RawPcapReader
from scapy.all import *
import os
readBytes = 0
fileSize = os.stat(FILENAME).st_size
for pkt, (sec, usec, wirelen, c) in RawPcapReader(FILENAME):
readBytes += len(Ether(pkt))
print("%.2f" % (float(readBytes) / fileSize * 100))
The average time is about 4.5 seconds.
pyshark
import pyshark
import os
filtered_cap = pyshark.FileCapture(FILENAME)
readBytes = 0
fileSize = os.stat(FILENAME).st_size
for pkt in filtered_cap:
readBytes += int(pkt.length)
print("%.2f" % (float(readBytes) / fileSize * 100))
The average time is about 12 seconds.
I do not advertise dpkt at all -- I do not care. The point is that I need to parse 8GB files currently. So I checked that with dpkt the above-written code for a 8GB pcap-file is done for 4.5 minutes which is bearable, while I would not even wait for other libraries to ever finish. At least, this is my quick first impression. If I have some new information I will update the post.

Scapy PcapReader and packets time

I'm reading a PCAP file using Scapy using a script such as the (semplified) following one:
#! /usr/bin/env python
from scapy.all import *
# ...
myreader = PcapReader(myinputfile)
for p in myreader:
pkt = p.payload
print pkt.time
In this case the packets time is not relative to PCAP capture time, but starts from the instant I've launched my script.
I'd like to start from 0.0 or to be relative to the PCAP capture.
How can I fix it (possibly without "manually" retrieving the first packet time and repeatedly using math to fix the problem)?
I saw that using pkt.time is wrong, in this case.
I should print p.time instead.

Categories