Decoding tcp packets using python

Decoding tcp packets using python - python

I am trying to decode data received over a tcp connection. The packets are small, no more than 100 bytes. However when there is a lot of them I receive some of the the packets joined together. Is there a way to prevent this. I am using python
I have tried to separate the packets, my source is below. The packets start with STX byte and end with ETX bytes, the byte following the STX is the packet length, (packet lengths less than 5 are invalid) the checksum is the last bytes before the ETX
def decode(data):
while True:
start = data.find(STX)
if start == -1: #no stx in message
pkt = ''
data = ''
break
#stx found , next byte is the length
pktlen = ord(data[1])
#check message ends in ETX (pktken -1) or checksum invalid
if pktlen < 5 or data[pktlen-1] != ETX or checksum_valid(data[start:pktlen]) == False:
print "Invalid Pkt"
data = data[start+1:]
continue
else:
pkt = data[start:pktlen]
data = data[pktlen:]
break
return data , pkt
I use it like this
#process reports
try:
data = sock.recv(256)
except: continue
else:
while data:
data, pkt = decode(data)
if pkt:
process(pkt)
Also if there are multiple packets in the data stream, is it best to return the packets as a collection of lists or just return the first packet
I am not that familiar with python, only C, is this method OK. Any advice would be most appreciated. Thanks in advance
Thanks

I would create a class that is responsible for decoding the packets from a stream, like this:
class PacketDecoder(object):
STX = ...
ETX = ...
def __init__(self):
self._stream = ''
def feed(self, buffer):
self._stream += buffer
def decode(self):
'''
Yields packets from the current stream.
'''
while len(self._stream) > 2:
end = self._stream.find(self.ETX)
if end == -1:
break
packet_len = ord(self._stream[1])
packet = self._stream[:end]
if packet_len >= 5 and check_sum_valid(packet):
yield packet
self._stream = self._stream[end+1:]
And then use like this:
decoder = PacketDecoder()
while True:
data = sock.recv(256)
if not data:
# handle lost connection...
decoder.feed(data)
for packet in decoder.decode():
process(packet)

TCP provides a data stream, not individual packets, at the interface level. If you want discrete packets, you can use UDP (and handle lost or out of order packets on your own), or put some data separator inline. It sounds like you are doing that already, with STX/ETX as your separators. However, as you note, you get multiple messages in one data chunk from your TCP stack.
Note that unless you are doing some other processing, data in the code you show does not necessarily contain an integral number of messages. That is, it is likely that the last STX will not have a matching ETX. The ETX will be in the next data chunk without an STX.
You should probably read individual messages from the TCP data stream and return them as they occur.

Try scapy, a powerful interactive packet manipulation program.

Where does the data come from ? Instead of trying to decode it by hand, why not use the excellent Impacket package:
http://oss.coresecurity.com/projects/impacket.html

Nice and simple... :)
The trick is in the file object.
f=sock.makefile()
while True:
STX = f.read(1)
pktlen = f.read(1)
wholePacket = STX + pktlen + f.read(ord(pktlen)-2)
doSomethingWithPacket(wholePacket)
And that's it! (There is also no need to check checksums when using TCP.)
And here is a more "robust"(?) version (it uses STX and checksum):
f=sock.makefile()
while True:
while f.read(1)!=STX:
continue
pktlen = f.read(1)
wholePacket = STX + pktlen + f.read(ord(pktlen)-2)
if checksum_valid(wholePacket):
doSomethingWithPacket(wholePacket)

Related

Send UDP Datagrams using Python

I want to send a data request over udp using the socket API. The format of the request is as follows:
ID | Data_Length | Data
The request contains the following parameters :An Identifier (ID), (Data_Length) is the size of (Data) and (Data) which is the data to be sent, (Data) has a variable size.
The code I wrote is as follows:
def send_request():
request_format="bbs" # 1 Byte for the ID 1 Byte for Data_Length and s for data
data_buff=np.array([1,2,3,4,5,6,7,8,9]) # Data to be sent
msg = struct.pack(request_format,0x01,0x09,data_buff.tobytes())
print("msg = ", msg)
s0.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
s0.sendto(msg, (UDP_BC_IP, UDP_SERVER_PORT))
My questions:
1- Using Wireshark I can see that only the 1st Byte of Data has been sent why ?
2- The output of the print instruction is msg = b'\x01\t\x01' why did I get this output, I was waiting for something similar to [0x01,0x09,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09]

Check the dtype of data_buff - it is int64 unless you use:
data_buff = np.array([1,2,3,4,5,6,7,8,9], dtype=np.uint8)
Then repeat your s specifier according to the size of the array:
request_format="bb" + str(data_buff.size) + "s"
Now you can pack with:
msg = struct.pack(request_format,0x01,0x09,data_buff.tobytes())
and your message will look like this:
b'\x01\t\x01\x02\x03\x04\x05\x06\x07\x08\t'
The TAB character is ASCII code 9, so you will see \t where your data is 9.

Is the output of scapy.sprintf a raw string?why the length is wrong?

I want to analyze TCP packets by scapy. And I use pkt.sprintf('%Raw.load%') to extract tcp data. But the output string has something wrong with length. but the '\' is deemed to be a str instead of a Escaped character.so '\x11' is considered as 4 different strings instead of a ASCII character.
Here are my codes:
from scapy.all import *
def findTCPdata(pkt):
raw = pkt.sprintf("%Raw.load%")
print raw
print 'length of TCP data: '+ str(len(raw))
def main():
pkts = rdpcap('XXX.pcap')
for pkt in pkts:
findTCPdata(pkt)
if __name__ == '__main__':
main()
enter image description here
The length of each tcp data should be 17 instead of the value in screen(53,52,46,52).
4 tcp data are:
'U\x11\x04\x92\x02\x03\x1e\x03#\x03\xf8q=e\xcb\x15\r'
'U\x11\x04\x92\x02\x03.\x03#\x03\xf8q=e\xcb\xb8\x05'
'U\x11\x04\x92\x02\x03X\x03#\x03\xf8q=e\xcbiO'
'U\x11\x04\x92\x02\x03n\x03#\x03\xf8q=e\xcb\xdb\xe3'
Please help me solve the problem.Thank you!

i see. i need a function to transform rawstring to string.
so i add codes after line 3(raw = pkt.sprintf("%Raw.load%")) like:
raw = raw.replace('\'','')
string = raw.decode('string_escape')
then the output is right

Sending Sparse Files

I've been messing around with sockets in Python and I'd like to be able to send a sparse image file from one machine to another. As expected, sending a sparse file over a python socket doesn't preserve the sparseness of the file. I'd like to do a sparse tar and send it that way, but I just can't figure it out.
The tarfile module says it supports reading sparse files with the GNU format which doesn't help me for creating them... but the python docs say the Pax format has "virtually no limits". I'm not sure if that means I can create an archive and preserve the sparse file or not using the pax format... I've been trying but I just have no idea how it might work.
If this solution isn't an option, is there any other way to send a sparse file over a socket? I hate to have to call 'tar -xSf' via a system command from my application...
Thanks,
Server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
s.bind((socket.gethostname(), 50001))
s.listen(1)
img = open('test.img', 'rb')
client, addr = s.accept()
l = img.read(8192)
while(l):
client.send(l)
l = img.read(8192)
img.close()
s.close()
Client
host = ''
port = 50001
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
s.connect((host, port))
img = open('./newimg.img', 'wb')
l = s.recv(8192)
while(l):
img.write(l)
l = s.recv(8192)
img.close()
s.close()
On the server, I make a new sparse file: truncate -s 1G test.img
a du -h shows: 0 test.img
I run my server and client. Here is a du -h on the transferred file: 1.0G newimg.img
As you can see, it expands the file and it is no longer sparse.

Holes in files are normally created if you write to the beginning of a file, seek to the end and write there. If you read the file you are reading zeros even if there are holes in the file. When you send the files the literal bytes are sent and of course also read. When you then write the bytes all bytes will be written and it will not happen that the holes are created by the filesystem.
To mitigate that you can first seek the holes in the file, sent where they are, and then send the rest of the file.
The following is not polished but should give you a starting point.
import os
f = open(path, "b")
fd = f.fileno()
end = os.stat(fd).st_size
holes = []
offset = os.lseek(fd, 0, os.SEEK_HOLE)
while offset != end:
end_hole = os.lseek(fd, offset, os.SEEK_DATA)
holes.append((offset, end_hole))
offset = end_hole
[open socket and stuff]
# send the holes
socket.write(json.dumps(holes)) # encode appropriately
# send file
f.seek(0)
total = 0
for hole in holes:
while total < hole[0]:
l = f.read(8192)
if len(l) + total > hole[0]:
socket.write(l[:len(l) + total - hole[0]])
l.seek(hole[1])
total += len(1) + total - hole[0]
else:
socket.write(l)
total += len(l)
Then on the client side:
still_json = True
a = []
l = s.recv(8192)
while(still_json):
a.append(l)
if check_json_end(l):
still_json = False
else:
l = s.recv(8192)
holes = parse_json(a) # the last chunk can contain something that is not json
# I asume that a still contains the bytes that are not json
fout = open(outfile, "wb")
total = 0
fout.write(a[0]) # handle the case where the first rest after the json in a is already after a hole
total += len(a[0])
for hole in holes:
while total < hole[0]:
l = socket.recv(8192)
if len(l) + total > hole[0]:
fout.write(l[:len(l) + total - hole[0]])
fout.seek(hole[1])
fout.write(l[len(l) + total - hole[0]:])
else:
fout.write(l)
total += len(l)
There are probably lots of bugs in it and you should rethink each line, but the general principle should be alright. JSON is of course arbitrarily chosen, there are probably other protocols that are better in that case. You could also create your own.

Python (pySerial): cannot trim/remove CR, LF or unexpected characters from string

I am using an Arduino to output Temp & Hum data from a sensor, this is being read by a PC running Python using pySerial. The data is reading in correctly but I would like to remove the CR/LF and unexpected characters. One idea I found on this site was to use lstrip or lreplace but they do not seem to work correctly. They will remove one instance of the character but even repeating the line or making a small loop has no effect.
This is what the program Prints (Bottom line is string after I've tried to cut out the unnecessary characters):
[b'\n', b'\r\n', b'Read sensor: OK\r\n', b'Hum40.00\r\n', b'TempC18.00\r\n']
[" b'Hum40.00\r\n'", " b'TempC18.00\r\n']"]
I am aiming for it to read:
[Hum40.00, TempC18.00]
I can hopefully fine tune the message later.
This is the code:
import serial as ser
import time
count = 0
msgArray = []
saveMsg = []
ser = ser.Serial('COM16', 9600, timeout=1, parity='N', stopbits=1, bytesize=8, xonxoff=0) # Setting up and opening COM port
ser.close()
ser.open()
def readSerial(): #reads a whole line from COM port
serLine = ser.readline()
return serLine
def sveMsgCut(): #saves the buffer as a message then cuts message
cutMsg = saveMsg
words = cutMsg.split(',')
return words
while True: #main program
dataSerial = readSerial()
if count < 5: #reads COM port 5 times and passes along to buffer msgArray
msgArray.append(dataSerial)
count = count+1
else:
print(msgArray) #~display msgArray
saveMsg = str(msgArray) #convert to string
splitMsg = saveMsg.split(',') #splits string (csv)
phrase = splitMsg[3:5] #cuts out excess either side of Temp & Hum/
phraseString = str(phrase)
phraseNew = phraseString.lstrip("/n") #an attempts ot remove CR
print(phraseNew) #~print adjusted string
saveMsg = msgArray
count = 0 #resets msgArray and counter
msgArray = []
time.sleep(5)
I am fairly new to programming, especially Python so it may be something simple that I've missed but have tried several different ideas and cannot remove the extra characters.

not sure why rstrip/lstrip do not work for you.
this code runs as expected on my machine:
s = '\r\nHum40.00\r\n'
print (s.rstrip().lstrip())
The only difference I see is the "/n" parameter so instead try:
phraseNew = phraseString.lstrip()

Decided to go about this another way. Placed the raw serial data into a list and extracted the temp and humidity readings then joined them together as a string:
else:
print(msgArray) #Raw serial data
msgString = str(msgArray) #convert serial data to str
character = list(msgString) #str to list
# the next two lines extract the temp & hum readings then converts
them to their respective strings
humidity = ''.join(character[46:51])
temperature = ''.join(character[65:70])
print('H:' + (humidity) + ' T:' + (temperature))
The output now looks like this (Raw then processed data):
[b'\n', b'\r\n', b'Read sensor: OK\r\n', b'Hum40.00\r\n', b'TempC21.00\r\n']
H:40.00 T:21.00
Now onto the next stage.

One thought:
So what you have now is:
"b'Hum40.00\r\n'"
Now what I see here is that the ' symbol is now a part of the string. This means that ' is the last symbol instead of \r\n. I have found success with starting on the outside are working inwards. Let's first start by removing the b. To do this try:
x = str(ser.readline())
x = x.lstrip("b")
Now we should see this:
"'Hum40.00\r\n'"
Next remove the '
x = x.strip("'")
Now we see this:
"Hum40.00\r\n"
Now the tricky part here is deleting the \r\n. There is a post here:Can't delete "\r\n" from a string
that explains why:
x = x.rstrip("\r\n")
will not work.
Instead type:
x = x.rstrip("\\r\\n")
Final code:
x = str(ser.readline())
x = x.lstrip("b")
x = x.strip("'")
x = x.rstrip("\\r\\n")
print(x)
Should yeild:
"Hum40.00"

Read Wireshark dump file for packet times

My instructions were to read the wireshark.bin data file dumped from the Wireshark program and pick out the packet times. I have no idea how to skip the header and find the first time.
"""
reads the wireshark.bin data file dumped from the wireshark program
"""
from datetime import datetime
import struct
import datetime
#file = r"V:\workspace\Python3_Homework08\src\wireshark.bin"
file = open("wireshark.bin", "rb")
idList = [ ]
with open("wireshark.bin", "rb") as f:
while True:
bytes_read = file.read(struct.calcsize("=l"))
if not bytes_read:
break
else:
if len(bytes_read) > 3:
idList.append(struct.unpack("=l", bytes_read)[0])
o = struct.unpack("=l111", bytes_read)[0]
print( datetime.date.fromtimestamp(o))

Try reading the entire file at once, and then accessing it as a list:
data = open("wireshark.bin", "rb").read() # let Python automatically close file
magic = data[:4] # magic wireshark number (also reveals byte order)
gmt_correction = data[8:12] # GMT offset
data = data[24:] # actual packets
Now you can loop through data in (16?) byte size chunks, looking at the appropriate offset in each chunk for the timestamp.
The magic number is 0xa1b2c3d4, which takes four bytes, or two words. We can determine the order (big-endian or little-endian) by examining those first four bytes by using the struct module:
magic = struct.unpack('>L', data[0:4])[0] # before the data = data[24:] line above
if magic == 0xa1b2c3d4:
order = '>'
elif magic == 0xd4c3b2a1:
order = '<'
else:
raise NotWireSharkFile()
Now that we have the order (and know it's a wireshark file), we can loop through the packets:
field0, field1, field2, field3 = \
struct.unpack('%sllll' % order, data[:16])
payload = data[16 : 16 + field?]
data = data[16 + field?]
I left the names vague, since this is homework, but those field? names represent the information stored in the packet header which includes the timestamp and the length of the following packet data.
This code is incomplete, but hopefully will be enough to get you going.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Decoding tcp packets using python - python

Try scapy, a powerful interactive packet manipulation program.

Where does the data come from ? Instead of trying to decode it by hand, why not use the excellent Impacket package: http://oss.coresecurity.com/projects/impacket.html

Related

Send UDP Datagrams using Python

Is the output of scapy.sprintf a raw string?why the length is wrong?

Sending Sparse Files

Python (pySerial): cannot trim/remove CR, LF or unexpected characters from string

Read Wireshark dump file for packet times

Categories

Resources