I have a script that loops through a lot of pcap files. For each pcap file I need to read it and then write some information to a txt file. I'm using the rdcap function from Scapy. Is there anyway to close the pcap file once I'm done reading it? My script has a memory leak and I'm worried this may be the culprit (via leaving many pcap files essentially open)
Inspecting Scapy's source code reveals that the rdpcap function neglects to close the pcap file:
#conf.commands.register
def rdpcap(filename, count=-1):
"""Read a pcap file and return a packet list
count: read only <count> packets"""
return PcapReader(filename).read_all(count=count)
I suggest you implement your own version of this function as follows:
def rdpcap_and_close(filename, count=-1):
"""Read a pcap file, return a packet list and close the file
count: read only <count> packets"""
pcap_reader = PcapReader(filename)
packets = pcap_reader.read_all(count=count)
pcap_reader.close()
return packets
I've created an issue for this problem here.
EDIT: The issue has been resolved in this changeset.
Related
I am reading a pcap file I have acquired with tcpdump. The pcap file is ~500MB. I read the file with FileCapture() and then I want to loop through each packet to extract the TLS payload. When I create the FileCapture object I also use override_prefs={'tls.keylog_file': os.path.abspath('tlsKey') where tlsKey is the file with the master keys to decrypt the file. The decryption works just fine, I can extract all the information from each single packet. However, if I want to loop through each packet and extract some information, the loop stops working at the packet for which packet.number = 258. My file contains more than 258 packets. What is going on?
My code
import pyshark
import os
cap = pyshark.FileCapture('traffic.pcap')
for packet in cap:
print(packet.number)
if "IP" in packet:
print(packet)
print('Finished')
the last output I get is here. As you can see, the layer TLS does not get printed. Why?
Expected behavior
I would expect my script to print Finished at the end, but it doesn't. The for loop looks stuck. Since the pcap file is large I cannot attach it. Any explanation of what's happening?
Versions (please complete the following information):
OS: MacOS 13.1
pyshark version: 0.5.3
tshark version: TShark (Wireshark) 4.0.2 (v4.0.2-0-g415456d13370)
Is there a way to load a particular package from the pcap file using Scapy?
I know that I can load a specific number of packages using the' sniff' function and count attribute, e. g.'
sniff(offline='file.pcap', prn=action, count=31)
However, I need to get a 30th packet without loading the previous packets.
In other words, I am not satisfied with such an example:
packages = (pkt for pkt in sniff (offline=path, prn=action, count=31)
print(packages[30])
The attempt to load a millionth of a package is too long.
Each packet header states how long it is. Once the parser has read that header, it can calculate the position of the next one. So as far as I know, you cannot open a pcap file and instantly locate packet 30; you'll need to parse the headers of the first 29.
But you don't have to keep all packets in memory either, as long as you process them while receiving.
i = 0
for pkt in sniff(offline=path, prn=action):
if i == 30:
print pkt
break
I have about 10GB pcap data with IPv6 traffic to analyze infos stored in IPv6 header and other extension header. To do this I decided to use Scapy framework. I tried rdpcap function , but for such big files it is not recommended. It tries to load all file into memory and get stuck in my case.
I found in the Net that in such situation sniff is recommended, my code look like:
def main():
sniff(offline='traffic.pcap', prn=my_method,store=0)
def my_method(packet):
packet.show()
In function called my_method I receive each packet separately and I can parse them, but....
When I call show function with is in-build framework method I got sth like this:
When opened in wireshark I got properly looking packet:
Could you tell me how to parse this packets in scapy to get proper results?
EDIT:
According to the discussion in comments I found a way to parse PCAP file with Python. In my opinion the easies way is to use pyshark framework:
import pyshark
pcap = pyshark.FileCapture(pcap_path) ### for reading PCAP file
It is possible to easily iterate read file with for loop
for pkt in pcap:
#do what you want
For parsing IPv6 header following methods may be useful:
pkt['ipv6'].tclass #Traffic class field
pkt['ipv6'].tclass_dscp #Traffic class DSCP field
pkt['ipv6'].tclass_ecn #Traffic class ECN field
pkt['ipv6'].flow #Flow label field
pkt['ipv6'].plen #Payload length field
pkt['ipv6'].nxt #Next header field
pkt['ipv6'].hlim #Hop limit field
Update
The latest scapy versions now support ipv6 parsing.
So to parse an ipv6 ".pcap" file with scapy now it can be done like so:
from scapy.all import *
scapy_cap = rdpcap('file.pcap')
for packet in scapy_cap:
print packet[IPv6].src
Now as I had commented back when this question was originally asked, for older
scapy versions (that don't support ipv6 parsing):
pyshark can be used instead (pyshark is a tshark wrapper) like so:
import pyshark
shark_cap = pyshark.FileCapture('file.pcap')
for packet in shark_cap:
print packet.ipv6.src
or even of course tshark (kind of the terminal version of wireshark):
$ tshark -r file.pcap -q -Tfields -e ipv6.src
If you want to keep using scapy and read the file Iteratively I'd recommend you to give it a shot to PcapReader()
It would do the same you tried to do with pyshark but in Scapy
from scapy.all import *
for packet in PcapReader('file.pcap')
try:
print(packet[IPv6].src)
except:
pass
I'd recommend wrapping this around just as a failsafe if you have any packet that does not have an IPv6 address.
I am trying to read PCAP file in python 2.7.10. The code is:--->
import dpkt
f = open('testbed-11jun.pcap')
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
print ts, len(buf)
But I got this error:--->
1276225266.46 60
1276225266.72 60
1276225266.84 110
1276225266.84 110
1276225266.84 134
277171502.827 132
Traceback (most recent call last):
File "D:/UC subjects/MS Thesis/code/python/readpcap_dpkt.py", line 5, in
for ts, buf in pcap:
File "C:\Python27\lib\site-packages\dpkt\pcap.py", line 159, in iter
buf = self.__f.read(hdr.caplen)
MemoryError
So basically after reading 6 traces from the "testbed-11jun.pcap" file it showed memory error. The size of "testbed-11jun.pcap" is 2 GB. It has hundreds of traces. So only 6 traces will be few MB max. Still I got error.(my laptop RAM is 6 GB)
Can anybody tell how to read all hundred traces without any memory error?
I realize that this question was asked a long time back, but I thought I should still provide a couple of possible resolutions to the issue as it may help others.
There could be a couple of reasons for this error:
1: The pcap file was opened for parsing as an ascii file instead of a binary file. Try opening the file explicitly with "b" parameter i.e.
f = open('testbed-11jun.pcap','rb')
Note that not specifying a flag defaults mode character to 'r' which is meant for reading text files, as per the python documentation.
2: The format of PCAP file cannot be fully parsed by dpkt. Note that there are multiple versions of PCAP e.g. libpcap and pcap-ng, both of which have the same extension. Ensure that you have captured the wireshark dump correctly. For eg, if using Dumpcap then the following command line will capture the pcap for dpkt parsing in
dumpcap.exe -P -i "Wireless Network Connection" -w input.pcap -a duration:10
The -P flag ensures that the capture is performed using libpcap.
So here's the problem. I have sample.gz file which is roughly 60KB in size. I want to decompress the first 2000 bytes of this file. I am running into CRC check failed error, I guess because the gzip CRC field appears at the end of file, and it requires the entire gzipped file to decompress. Is there a way to get around this? I don't care about the CRC check. Even if I fail to decompress because of bad CRC, that is OK. Is there a way to get around this and unzip partial .gz files?
The code I have so far is
import gzip
import time
import StringIO
file = open('sample.gz', 'rb')
mybuf = MyBuffer(file)
mybuf = StringIO.StringIO(file.read(2000))
f = gzip.GzipFile(fileobj=mybuf)
data = f.read()
print data
The error encountered is
File "gunzip.py", line 27, in ?
data = f.read()
File "/usr/local/lib/python2.4/gzip.py", line 218, in read
self._read(readsize)
File "/usr/local/lib/python2.4/gzip.py", line 273, in _read
self._read_eof()
File "/usr/local/lib/python2.4/gzip.py", line 309, in _read_eof
raise IOError, "CRC check failed"
IOError: CRC check failed
Also is there any way to use zlib module to do this and ignore the gzip headers?
The issue with the gzip module is not that it can't decompress the partial file, the error occurs only at the end when it tries to verify the checksum of the decompressed content. (The original checksum is stored at the end of the compressed file so the verification will never, ever work with a partial file.)
The key is to trick gzip into skipping the verification. The answer by caesar0301 does this by modifying the gzip source code, but it's not necessary to go that far, simple monkey patching will do. I wrote this context manager to temporarily replace gzip.GzipFile._read_eof while I decompress the partial file:
import contextlib
#contextlib.contextmanager
def patch_gzip_for_partial():
"""
Context manager that replaces gzip.GzipFile._read_eof with a no-op.
This is useful when decompressing partial files, something that won't
work if GzipFile does it's checksum comparison.
"""
_read_eof = gzip.GzipFile._read_eof
gzip.GzipFile._read_eof = lambda *args, **kwargs: None
yield
gzip.GzipFile._read_eof = _read_eof
An example usage:
from cStringIO import StringIO
with patch_gzip_for_partial():
decompressed = gzip.GzipFile(StringIO(compressed)).read()
I seems that you need to look into Python zlib library instead
The GZIP format relies on zlib, but introduces a file-level compression concept along with CRC checking, and this appears to be what you do not want/need at the moment.
See for example these code snippets from Dough Hellman
Edit: the code on Doubh Hellman's site only show how to compress or decompress with zlib. As indicated above, GZIP is "zlib with an envelope", and you'll need to decode the envellope before getting to the zlib-compressed data per se. Here's more info to go about it, it's really not that complicated:
see RFC 1952 for details about the GZIP format
This format starts with a 10 bytes header, followed by optional, non compressed elements such as the file name or a comment, followed by the zlib-compressed data, itself followed by a CRC-32 (precisely an "Adler32" CRC).
By using Python's struct module, parsing the header should be relatively simple
The zlib sequence (or its first few thousand bytes, since that is what you want to do) can then be decompressed with python's zlib module, as shown in the examples above
Possible problems to handle: if there are more than one file in the GZip archive, and if the second file starts within the block of a few thousand bytes we wish to decompress.
Sorry to provide neither an simple procedure nor a ready-to-go snippet, however decoding the file with the indication above should be relatively quick and simple.
I can't see any possible reason why you would want to decompress the first 2000 compressed bytes. Depending on the data, this may uncompress to any number of output bytes.
Surely you want to uncompress the file, and stop when you have uncompressed as much of the file as you need, something like:
f = gzip.GzipFile(fileobj=open('postcode-code.tar.gz', 'rb'))
data = f.read(4000)
print data
AFAIK, this won't cause the whole file to be read. It will only read as much as is necessary to get the first 4000 bytes.
I also encounter this problem when I use my python script to read compressed files generated by gzip tool under Linux and the original files were lost.
By reading the implementation of gzip.py of Python, I found that gzip.GzipFile had similar methods of File class and exploited python zip module to process data de/compressing. At the same time, the _read_eof() method is also present to check the CRC of each file.
But in some situations, like processing Stream or .gz file without correct CRC (my problem), an IOError("CRC check failed") will be raised by _read_eof(). Therefore, I try to modify the gzip module to disable the CRC check and finally this problem disappeared.
def _read_eof(self):
pass
https://github.com/caesar0301/PcapEx/blob/master/live-scripts/gzip_mod.py
I know it's a brute-force solution, but it save much time to rewrite yourself some low level methods using the zip module, like of reading data chuck by chuck from the zipped files and extract the data line by line, most of which has been present in the gzip module.
Jamin