Parsing a PCAP File in python

Parsing a PCAP File in python - python

I am trying to parse a Pcap file in python. When i run this code
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
print eth
I get junk values instead of getting the following output:
Ethernet(src='\x00\x1a\xa0kUf', dst='\x00\x13I\xae\x84,',
data=IP(src='\xc0\xa8\n\n', off=16384, dst='C\x17\x030', sum=25129,
len=52, p=6, id=51105, data=TCP(seq=9632694, off_x2=128,
ack=3382015884, win=54, sum=65372, flags=17, dport=80, sport=56145)))
can anyone please tell me how to get this above output?

Be sure the file is opened to read as binary.
https://stackoverflow.com/a/15746971
f = open(pcapfile, 'rb')
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
print(eth)

If the link-layer header type of the file isn't Ethernet, you will not get useful information if you try to parse the packets as Ethernet packets. The dpkt documentation isn't very good, but there's some way to get the link-layer header type; before any program reading a pcap file makes any attempt to get anything from the raw packet data, it must determine the link-layer header type in the file, and base the way it extracts information from the raw packet data on the link-layer header type (or quit if the file doesn't have a link-layer header type that it can parse).
(And feel free to tell Mr. Oberheide that his code is broken because it's not checking the link-layer header type!)

What you have tried to do only works in a python REPL shell. When you want it to work from a python script, you need to call the repr method like so:
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
print repr(eth) # this is key
Ethernet(src='\x00\x1a\xa0kUf', dst='\x00\x13I\xae\x84,', data=IP(src='\xc0\xa8\n\n', off=16384, dst='C\x17\x030', sum=25129,
len=52, p=6, id=51105, data=TCP(seq=9632694, off_x2=128,
ack=3382015884, win=54, sum=65372, flags=17, dport=80, sport=56145)))
I am assuming that you have a pcap which has proper Ethernet packets, and you have checked like link-layer.

Related

Python2 and Python3 DPKT appears to return different output formats

The DPKT library says it supports Python3 now, but it has different behavior when I use it in Python 2.x vs 3.x. Although, both are incorrect it appears.
For example, in Python 2.x, the example given here
with open('test.pcap') as f:
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
print eth
Returns a format that I don't expect, an object similar to:
^����6#���l�m�
Q!6�(�����k����~�pO���o���N�l �k4�'���8�9�j��#mf���5��pB�6bٌ�~p��Jf.Jܼ3H�:�ݭ�k-O7+�O��
4�(�9��^F�fb��V��t˜������\�X1��#�.�ج<�Q�!����>�^ɹDĀ�orC=bC���S�6;��SR�`�� �
ZD����j2Q���m����h��)1#��1���aw}�d�ڧn� ��
0Z:�`8ຄE(�#4���}������Mu��63fP�/�
������h'7�h'7�;������
However, in Python 3, I'm forced to open the pcap file in 'rb' mode, which is fine, except for the output issues (I'm not sure 'rb' has anything to do with the issues now):
with open('test.pcap', 'rb') as f:
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
print eth
This now returns what I believe is a bytestring, and I haven't found a way to get the data out of this that I need. For example, if I needed the number of flags, I can easily get 17 from the above example from their site, but I can't seem to get their example to work at all:
b'\x00\x0f\x1f\x16\xd1\xcd\x00\xc0\xf0y\x9a\xfd\x08\x00E\x00\x00\x1c\xb1\xce\x00\x006\x01N\xf7\xc0\xa8\x01d\xc0\xa8\x01g\x08\x00\xd9\xd7\xb7\xc4fc'
I haven't had any luck converting this string into a human readable object. No combination of decode, binascii or anything else I've tried has worked. Am I using this library incorrectly?

One of the major differences between python2 and python3 is that in python3, str and bytes are no longer the same. Compare:
$ python2 -c 'print(b"foo" == "foo")'
True
$ python3 -c 'print(b"foo" == "foo")'
False
This explains why you must open the file with "rb" in python3. (Although it's quite likely that you would get bogus results if you didn't do so on some platforms with python2, because without the b line endings that happen to exist in the file may get expanded inappropriately.)
Another difference: in python3, print is a function, not a statement so the code you've shown above for python3 is actually a syntax error. Instead you need print(eth)
To answer your actual question: When you simply print eth, you are implicitly asking the eth object to make itself printable. That is the same as calling print(str(eth)) and so it's giving you a printable string version of the binary data buffer that contains the ethernet frame.
You need to use the facilities of dpkt to discover, then dissect the parts of the frame that you care about.
Here's a short example that decodes a pcap containing DNS packets:
import dpkt
with open("/tmp/dns.pcap", "rb") as f:
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
l2 = dpkt.ethernet.Ethernet(buf)
print("Ethernet (L2) frame:", repr(l2))
if l2.type not in (dpkt.ethernet.ETH_TYPE_IP, dpkt.ethernet.ETH_TYPE_IP6):
print("Not an IP packet")
continue
l3 = l2.data
print("IP packet:", repr(l3))
if l3.p not in (dpkt.ip.IP_PROTO_TCP, dpkt.ip.IP_PROTO_UDP):
print("Not TCP or UDP")
continue
l4 = l3.data
print("Layer 4:", repr(l4))
if l4.dport in (53, 5353) or l4.sport in (53, 5353):
dns = l4.data
if not isinstance(dns, dpkt.dns.DNS):
dns = dpkt.dns.DNS(dns)
print("DNS packet:", repr(dns))
As for why your output looks different than the tutorial. The tutorial is out of date. Apparently at some point, the implementation of the __str__ magic method on the dpkt objects changed (when you just print an object, you get the result of its __str__ method).
Originally, __str__ returned a formatted representation of the object. Later it just returns a string representation of the raw bytes of the object. So now you need to call repr(obj) in order to get the formatted representation.

try open the pcap-file as binary
'with open('test.pcap','rb')'

Parsing the pcap file through scapy

I like to read a pcap file through scapy and use a filter of source address,destination address and length of the packet above or equal to 400,
After matching those packets, i would like to remove the first 16 bytes and then extract the remaining bytes sequentially.
file=rdpcap(pcap)
for pkt in file:
if pkt[0].src=='198.18.32.1' and pkt[0].dst=='198.18.50.97':

This is more than normal. pkt[0] does not really mean anything...
When doing pkt.src, you’ll get the Ethernet address (of the first layer), which is no IP address. You would need pkt[IP].src to get the IP address

Getting a particular package from the pcap file using the Scapy module (python)

Is there a way to load a particular package from the pcap file using Scapy?
I know that I can load a specific number of packages using the' sniff' function and count attribute, e. g.'
sniff(offline='file.pcap', prn=action, count=31)
However, I need to get a 30th packet without loading the previous packets.
In other words, I am not satisfied with such an example:
packages = (pkt for pkt in sniff (offline=path, prn=action, count=31)
print(packages[30])
The attempt to load a millionth of a package is too long.

Each packet header states how long it is. Once the parser has read that header, it can calculate the position of the next one. So as far as I know, you cannot open a pcap file and instantly locate packet 30; you'll need to parse the headers of the first 29.
But you don't have to keep all packets in memory either, as long as you process them while receiving.
i = 0
for pkt in sniff(offline=path, prn=action):
if i == 30:
print pkt
break

How to parse Ethernet Header of pcap file using Python?

I would like to decode the link-layer type and version of packets in a pcap file using Python. So, I have to parse pcap using Python. Here is my code.
import dpkt
import socket
import sys
f = open('filename')
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
tcp = ip.data
print ts, len(buf)
print eth
print ip
print tcp
f.close()

The dpkt python library is maybe not the best tools for this task, according to their website dpkt is best used for:
fast, simple packet creation / parsing, with definitions for the basic TCP/IP protocols.
Use instead scapy which is a
powerful interactive packet manipulation program for Python.
You can use a script like this
from scapy.all import *
packets = rdpcap('tmp.pcap')
for p in packets:
(p/Ether()).show()
Read Infinite possibilities with Python's Scapy Module for more details.

print scapy sniff output to file

I have created a sniffer in scapy and I want the packets captured by scapy to be written onto a file for further analysis?
def sniffer(ip):
filter_str = "icmp and host " + ip
packets=sniff(filter=filter_str,count=20)
f = open('log.txt',"a")
#f.write(packets)
The last line of code does not work. Is there any way I could do this?

f.write expects a character buffer, but you supply it with a Sniffed object which is the result of calling sniff. You can, very simply, do the following:
f.write(str(packets))
This should work. But it probably won't display the information exactly as you would like it. You are going to have to do more work collecting information from packets as strings before you write to f.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing a PCAP File in python - python

Be sure the file is opened to read as binary. https://stackoverflow.com/a/15746971 f = open(pcapfile, 'rb') pcap = dpkt.pcap.Reader(f) for ts, buf in pcap: eth = dpkt.ethernet.Ethernet(buf) print(eth)

Related

Python2 and Python3 DPKT appears to return different output formats

Parsing the pcap file through scapy

Getting a particular package from the pcap file using the Scapy module (python)

How to parse Ethernet Header of pcap file using Python?

print scapy sniff output to file

Categories

Resources