Decode HTTP packet content in python as seen in wireshark

Decode HTTP packet content in python as seen in wireshark - python

Ok, so bascially what I want to do is intercept some packets that I know contains some JSON data. But HTTP packets aren't human-readable, so that's my problem, I need to make the entire packet (not just the header, which is already plain text), human-readable. I have no experience with networking at all.
import pcap
from impacket import ImpactDecoder, ImpactPacket
def print_packet(pktlen, data, timestamp):
if not data:
return
decoder = ImpactDecoder.EthDecoder()
ether = decoder.decode(data)
iphdr = ether.child()
tcphdr = iphdr.child()
if iphdr.get_ip_src() == '*******':
print tcphdr
p = pcap.pcapObject()
dev = 'wlan0'
p.open_live(dev, 1600, 0, 100)
try:
p.setfilter('tcp', 0, 0)
while 1:
p.loop(1, print_packet)
except KeyboardInterrupt:
print 'shutting down'
I've found tools like libpcap-python, scapy, Impacket pcapy and so on. They all seem good, but I can't figure out how to decode the packets properly with them.
Wireshark has this thing called "Line-based text data: text/html" which basically displays the information I'm after, so I thought it would be trivial to get the same info with python, it turns out it was not.

Both HTTP and JSON are human readable. On Wireshark, select a packet that relates to your HTTP transaction and right-click, select Follow TCP Stream, which should display the transaction in a Human readable form.

Related

Scapy - TCPSession from list of packets

I'm trying to use TCPSession funcionality (like: sniff(offline="./my_file.pcap", prn=func, store=False, session=TCPSession)) but without creating a PCAP file.
I receive a list of RAW Packets so I can build a list of Scapy packets but I need the TCPSession funcionality because of the HTTP Packets: Without TCPSession the headers and the body are in different packets so HTTP Layers Class can't identify the body part.
So I have this code that finds the HTTP Requests:
import pickle
from scapy.all import *
from scapy.layers import http
load_layer("http")
def expand(x):
yield x
while x.payload:
x = x.payload
yield x
file_pickle = open('prueba.pkl','rb')
pkt_list = pickle.load(file_pickle)
for pkt_raw in pkt_list:
p = Ether(pkt_raw)
if p.haslayer(IP):
srcIP = p[IP].src
if p.haslayer(HTTP):
if p.haslayer(HTTPRequest):
print(list(expand(p)), end="\n---------------------------------------------------\n")
The execution of this code finds the HTTP Requests but without the Body part of the POST Requests:
[...]<HTTPRequest Method='POST' Path='/NP3POCF.jsp' Http_Version='HTTP/1.1' Accept='*/*' Accept_Encoding='gzip, deflate' Connection='keep-alive' Content_Length='56' Content_Type='application/x-www-form-urlencoded' Host='172.16.191.129' User_Agent='python-requests/2.7.0 CPython/3.7.5 Linux/5.3.0-kali2-amd64' |>]
With a sniffer with TCPSession (such as Scapy sniff function) the packet has a Raw Layer that contains the body of the request.
Any help to apply TCPSession? Thank You.

You can call sniff(offline=X) with X a packet list, a packet, a file name or a list of files.
Make sure you are using the github development version (see https://scapy.readthedocs.io/en/latest/installation.html#current-development-version), as I'm not sure if this is in a release yet.

Twisted Server receive data stream via POST, read request.content.read() byte by byte in a deferred for over an hour

I'm intending to receive a binary stream of data via a http POST call.
I believe the client side is working, that is, it writes chunks of bytes to the server, I can see the amount of data being sent with tcpdump, yet Twisted's request.content file-like object only starts producing output once the client disconnects.
This is what the server handler looks like:
def render(self, request):
if request.path == '/incoming-stream':
d = deferLater(reactor, 0, lambda: request)
d.addCallback(self.async_read)
return NOT_DONE_YET
def async_read(self, request):
sys.stdout.write('\nasync_read ' + str(request) + '\n')
sys.stdout.flush()
while True:
byte = request.content.read(1) # <--- read one byte
if len(byte) > 0:
sys.stdout.write(repr(byte))
sys.stdout.flush()
else:
break
sys.stdout.write('\nfinished ' + str(request) + '\n')
sys.stdout.flush()
request.write(b"finished")
request.finish()
If I can't do this with POST, I have no problem with switching over to WebSocket, but I'd first like to try to get this done via POST. The data posted are long running (one new POST request every hour, with it being alive and receiving data for an hour), relatively high bandwidth sensor data at approx 1kbps.
I am aware that there are better methods of transferring the data (WebSocket, MQTT, AMQP), but POST and WebSocket will give me least amount of trouble when receiving the data through an NGINX SSL endpoint. Currently NGINX is not being used (to discard any buffering it could be causing).

Twisted Web does not support streaming uploads in its IResource abstraction.
See https://twistedmatrix.com/trac/ticket/288

How to obtain packet count of TCP packets?

I am retrieving flow statistics using a _flow_stats_reply_handler as demonstrated in the Ryu Traffic Monitor example.
I print using the following:
file.write("\n{},{},{},{},{},{},{},{},{}"
.format(ev.msg.datapath.id,
stat.match['in_port'], stat.match['eth_src'], stat.match['eth_dst'],
stat.instructions[0].actions[0].port,
stat.packet_count, stat.byte_count,
stat.duration_sec, stat.duration_nsec))
Note the stat.packet_count.
How could I change this to count TCP packets? I understand there is an ip_proto field and a tcp_flags field but I don't know how to code the match/count.
Edit:
I have further investigated this and added a flow match to my request flow stats function:
def _request_stats(self, datapath):
self.logger.debug('send stats request: %016x', datapath.id)
ofp = datapath.ofproto
parser = datapath.ofproto_parser
cookie = cookie_mask = 0
match = parser.OFPMatch(eth_type=0x0800)
req = parser.OFPFlowStatsRequest(datapath, 0, ofp.OFPTT_ALL, ofp.OFPP_ANY, ofp.OFPG_ANY,
cookie, cookie_mask, match)
datapath.send_msg(req)
This unfortunately still doesn't work, any ideas as to why not would be greatly appreciated.

You should add more data to your match, like ip_proto in order to match with tcp, as you may know, IP protocol number of TCP is 6, for more information about IP Protocol numbers check Wikipedia.
Please use the code below, You don't need to settcp_flags in this case.
match = parser.OFPMatch(
eth_type=0x0800,
ip_proto=6,
)

Efficiently reading lines from compressed, chunked HTTP stream as they arrive

I've written a HTTP-Server that produces endless HTTP streams consisting of JSON-structured events. Similar to Twitter's streaming API. These events are separated by \n (according to Server-sent events with Content-Type:text/event-stream) and can vary in length.
The response is
chunked (HTTP 1.1 Transfer-Encoding:chunked) due to the endless stream
compressed (Content-Encoding: gzip) to save bandwidth.
I want to consume these lines in Python as soon as they arrive and as resource-efficient as possible, without reinventing the wheel.
As I'm currently using python-requests, do you know how to make it work?
If you think, python-requests cannot help here, I'm totally open for alternative frameworks/libraries.
My current implementation is based on requests and uses iter_lines(...) to receive the lines. But the chunk_size parameter is tricky. If set to 1 it is very cpu-intense, since some events can be several kilobytes. If set to any value above 1, some events got stuck until the next arrive and the whole buffer "got filled". And the time between events can be several seconds.
I expected that the chunk_size is some sort of "maximum number of bytes to receive" as in unix's recv(...). The corresponding man-page says:
The receive calls normally return any data available, up to the
requested amount, rather than waiting for receipt of the full amount
requested.
But this is obviously not how it works in the requests-library. They use it more or less as an "exact number of bytes to receive".
While looking at their source code, I couldn't identify which part is responsible for that. Maybe httplib's Response or ssl's SSLSocket.
As a workaround I tried padding my lines on the server to a multiple of the chunk-size.
But the chunk-size in the requests-library is used to fetch bytes from the compressed response stream.
So this won't work until I can pad my lines so that their compressed byte-sequence is a multiple of the chunk-size. But this seems far too hacky.
I've read that Twisted could be used for non-blocking, non-buffered processing of http streams on the client, but I only found code for creating stream responses on the server.

Thanks to Martijn Pieters answer I stopped working around python-requests behavior and looked for a completely different approach.
I ended up using pyCurl. You can use it similar to a select+recv loop without inverting the control flow and giving up control to a dedicated IO-loop as in Tornado, etc. This way it is easy to use a generator that yields new lines as soon as they arrive - without further buffering in intermediate layers that could introduce delay or additional threads that run the IO-loop.
At the same time, it is high-level enough, that you don't need to bother about chunked transfer encoding, SSL encryption or gzip compression.
This was my old code, where chunk_size=1 resulted in 45% CPU load and chunk_size>1 introduced additional lag.
import requests
class RequestsHTTPStream(object):
def __init__(self, url):
self.url = url
def iter_lines(self):
headers = {'Cache-Control':'no-cache',
'Accept': 'text/event-stream',
'Accept-Encoding': 'gzip'}
response = requests.get(self.url, stream=True, headers=headers)
return response.iter_lines(chunk_size=1)
Here is my new code based on pyCurl:
(Unfortunately the curl_easy_* style perform blocks completely, which makes it difficult to yield lines in between without using threads. Thus I'm using the curl_multi_* methods)
import pycurl
import urllib2
import httplib
import StringIO
class CurlHTTPStream(object):
def __init__(self, url):
self.url = url
self.received_buffer = StringIO.StringIO()
self.curl = pycurl.Curl()
self.curl.setopt(pycurl.URL, url)
self.curl.setopt(pycurl.HTTPHEADER, ['Cache-Control: no-cache', 'Accept: text/event-stream'])
self.curl.setopt(pycurl.ENCODING, 'gzip')
self.curl.setopt(pycurl.CONNECTTIMEOUT, 5)
self.curl.setopt(pycurl.WRITEFUNCTION, self.received_buffer.write)
self.curlmulti = pycurl.CurlMulti()
self.curlmulti.add_handle(self.curl)
self.status_code = 0
SELECT_TIMEOUT = 10
def _any_data_received(self):
return self.received_buffer.tell() != 0
def _get_received_data(self):
result = self.received_buffer.getvalue()
self.received_buffer.truncate(0)
self.received_buffer.seek(0)
return result
def _check_status_code(self):
if self.status_code == 0:
self.status_code = self.curl.getinfo(pycurl.HTTP_CODE)
if self.status_code != 0 and self.status_code != httplib.OK:
raise urllib2.HTTPError(self.url, self.status_code, None, None, None)
def _perform_on_curl(self):
while True:
ret, num_handles = self.curlmulti.perform()
if ret != pycurl.E_CALL_MULTI_PERFORM:
break
return num_handles
def _iter_chunks(self):
while True:
remaining = self._perform_on_curl()
if self._any_data_received():
self._check_status_code()
yield self._get_received_data()
if remaining == 0:
break
self.curlmulti.select(self.SELECT_TIMEOUT)
self._check_status_code()
self._check_curl_errors()
def _check_curl_errors(self):
for f in self.curlmulti.info_read()[2]:
raise pycurl.error(*f[1:])
def iter_lines(self):
chunks = self._iter_chunks()
return self._split_lines_from_chunks(chunks)
#staticmethod
def _split_lines_from_chunks(chunks):
#same behaviour as requests' Response.iter_lines(...)
pending = None
for chunk in chunks:
if pending is not None:
chunk = pending + chunk
lines = chunk.splitlines()
if lines and lines[-1] and chunk and lines[-1][-1] == chunk[-1]:
pending = lines.pop()
else:
pending = None
for line in lines:
yield line
if pending is not None:
yield pending
This code tries to fetch as many bytes as possible from the incoming stream, without blocking unnecessarily if there are only a few. In comparison, the CPU load is around 0.2%

It is not requests' fault that your iter_lines() calls are blocking.
The Response.iter_lines() method calls Response.iter_content(), which calls urllib3's HTTPResponse.stream(), which calls HTTPResponse.read().
These calls pass along a chunk-size, which is what is passed on to the socket as self._fp.read(amt). This is the problematic call, as self._fp is a file object produced by socket.makefile() (as done by the httplib module); and this .read() call will block until amt (compressed) bytes are read.
This low-level socket file object does support a .readline() call that will work more efficiently, but urllib3 cannot make use of this call when handling compressed data; line terminators are not going to be visible in the compressed stream.
Unfortunately, urllib3 won't call self._fp.readline() when the response isn't compressed either; the way the calls are structured it'd be hard to pass along you want to read in line-buffering mode instead of in chunk-buffering mode as it is.
I must say that HTTP is not the best protocol to use for streaming events; I'd use a different protocol for this. Websockets spring to mind, or a custom protocol for your specific use-case.

Using python sockets to receive large http requests

I am using python sockets to receive web style and soap requests. The code I have is
import socket
svrsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = socket.gethostname();
svrsocket.bind((host,8091))
svrsocket.listen(1)
clientSocket, clientAddress = svrsocket.accept()
message = clientSocket.recv(4096)
Some of the soap requests I receive, however, are huge. 650k huge, and this could become several Mb. Instead of the single recv I tried
message = ''
while True:
data = clientSocket.recv(4096)
if len(data) == 0:
break;
message = message + data
but I never receive a 0 byte data chunk with firefox or safari, although the python socket how to says I should.
What can I do to get round this?

Unfortunately you can't solve this on the TCP level - HTTP defines its own connection management, see RFC 2616. This basically means you need to parse the stream (at least the headers) to figure out when a connection could be closed.
See related questions here - https://stackoverflow.com/search?q=http+connection

Hiya
Firstly I want to reinforce what the previous answer said
Unfortunately you can't solve this on the TCP level
Which is true, you can't. However you can implement an http parser on top of your tcp sockets. And that's what I want to explore here.
Let's get started
Problem and Desired Outcome
Right now we are struggling to find the end to a datastream. We expected our stream to end with a fixed ending but now we know that HTTP does not define any message suffix
And yet, we move forward.
There is one question we can now ask, "Can we ever know the length of the message in advance?" and the answer to that is YES! Sometimes...
You see HTTP/1.1 defines a header called Content-Length and as you'd expect it has exactly what we want, the content length; but there is something else in the shadows: Transfer-Encoding: chunked. unless you really want to learn about it, we'll stay away from it for now.
Solution
Here is a solution. You're not gonna know what some of these functions are at first, but if you stick with me, I'll explain. Alright... Take a deep breath.
Assuming conn is a socket connection to the desired HTTP server
...
rawheaders = recvheaders(conn,end=CRLF)
headers = dict_headers(io.StringIO(rawheaders))
l_content = headers['Content-Length']
#okay. we've got content length by magic
buffersize = 4096
while True:
if l_content <= 0: break
data = clientSocket.recv(buffersize)
message += data
l_content -= len(data)
...
As you can see, we enter the loop already knowing the Content-Length as l_content
While we iterate we keep track of the remaining content by subtracting the length of clientSocket.recv(buff) from l_content.
When we've read at least as much data as l_content, we are done
if l_content <= 0: break
Frustration
Note: For some these next bits I'm gonna give psuedo code because the code can be a bit dense
So now you're asking, what is rawheaders = recvheaders(conn), what is headers = dict_headers(io.StringIO(rawheaders)),
and HOW did we get headers['Content-Length']?!
For starters, recvheaders. The HTTP/1.1 spec doesn't define a message suffix, but it does define something useful: a suffix for the http headers! And that suffix is CRLF aka \r\n.That means we know when we've recieved the headers when we read CRLF. So we can write a function like
def recvheaders(sock):
rawheaders = ''
until we read crlf:
rawheaders = sock.recv()
return rawheaders
Next, parsing the headers.
def dict_header(ioheaders:io.StringIO):
"""
parses an http response into the status-line and headers
"""
#here I expect ioheaders to be io.StringIO
#the status line is always the first line
status = ioheaders.readline().strip()
headers = {}
for line in ioheaders:
item = line.strip()
if not item:
break
//headers look like this
//'Header-Name' : 'Value'
item = item.split(':', 1)
if len(item) == 2:
key, value = item
headers[key] = value
return status, headers
Here we read the status line then we continue to iterate over every remaining line
and build [key,value] pairs from Header: Value with
item = line.strip()
item = item.split(':', 1)
# We do split(':',1) to avoid cases like
# 'Header' : 'foo:bar' -> ['Header','foo','bar']
# when we want ---------> ['Header','foo:bar']
then we take that list and add it to the headers dict
#unpacking
#key = item[0], value = item[1]
key, value = item
header[key] = value
BAM, we've created a map of headers
From there headers['Content-Length'] falls right out.
So,
This structure will work as long as you can guarantee that you will always recieve Content-Length
If you've made it this far WOW, thanks for taking the time and I hope this helped you out!
TLDR; if you want to know the length of an http message with sockets, write an http parser

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.