How to get raw hex values from pcap file? - python

I've been playing around with scapy and want to read through and analyse every hex byte. So far I've been using scapy simply because I don't know another way currently. Before just writing tools myself to go through the pcap files I was wondering if there was an easy way to do it. Here's what I've done so far.
packets = rdpcap('file.pcap')
tcpPackets = []
for packet in packets:
if packet.haslayer(TCP):
tcpPackets.append(packet)
When I run type(tcpPackets[0]) the type I get is:
<class 'scapy.layers.l2.Ether'>
Then when I try to covert the Ether object into a string it gives me a mix of hex and ascii (as noted by the random parenthesis and brackets).
str(tcpPackets[0])
"b'$\\xa2\\xe1\\xe6\\xee\\x9b(\\xcf\\xe9!\\x14\\x8f\\x08\\x00E\\x00\\x00[:\\xc6#\\x00#\\x06\\x0f\\xb9\\n\\x00\\x01\\x04\\xc6)\\x1e\\xf1\\xc0\\xaf\\x07[\\xc1\\xe1\\xff0y<\\x11\\xe3\\x80\\x18 1(\\xb8\\x00\\x00\\x01\\x01\\x08\\n8!\\xd1\\x888\\xac\\xc2\\x9c\\x10%\\x00\\x06MQIsdp\\x03\\x02\\x00\\x05\\x00\\x17paho/34AAE54A75D839566E'"
I have also tried using hexdump but I can't find a way to parse through it.

I can't find the proper dupe now, but this is just a miss-use/miss-understanding of str(). The original data is in a bytes format, for instance x = b'moo'.
When str() retrieves your bytes string, it will do so by calling the __str__ function of the bytes class/object. That will return a representation of itself. The representation will keep b at the beginning because it's believed to distinguish and make it easier for humans to understand that it's a bytes object, as well as avoid encoding issues I guess (alltho that's speculations).
Same as if you tried accessing tcpPackets[0] from a terminal, it would call __repr__ and show you something like <class 'scapy.layers.l2.Ether'> most likely.
As an example code you can experiment with, try this out:
class YourEther(bytes):
def __str__(self):
return '<Made Up Representation>'
print(YourEther())
Obviously scapy's returns another representation, not just a static string that says "made up representation". But you probably get the idea.
So in the case of <class 'scapy.layers.l2.Ether'> it's __repr__ or __str__ function probably returns b'$\\xa2\\....... instead of just it's default class representation (some correction here might be in place tho as I don't remember/know all the technical namification of the behaviors).
As a workaround, this might fix your issue:
hexlify(str(tcpPackets[0]))
All tho you probably have to account for the prepended b' as well as trailing ' and remove those accordingly. (Note: " won't be added in the beginning or end, those are just a second representation in your console when printing. They're not actually there in terms of data)
Scapy is probably more intended to use tcpPackets[0].dst rather than grabing the raw data. But I've got very little experience with Scapy, but it's an abstraction layer for a reason and it's probably hiding the raw data or it's in the core docs some where which I can't find right now.
More info on the __str__ description: Does python `str()` function call `__str__()` function of a class?
Last note, and that is if you actually want to access the raw data, it seams like you can access it with the Raw class: Raw load found, how to access?

You can put all the bytes of a packet into a numpy array as follows:
for p in tcpPackets:
raw_pack_data = np.frombuffer(p.load, dtype = np.uint8)
# Manipulate the bytes stored in raw_pack_data as you like.
This is fast. In my case, rdpcap takes ~20 times longer than putting all the packets into a big array in a similar for loop for a 1.5GB file.

Related

disabling XML output for the type() function in Python

I'm working with an external compiled API, that does some XML manipulation prior to screen rendering. I am passing data to and from the API, and I want it to break cleanly so I can see where it is broken. This means I need to be able to send it text, rather Pythons internal formatting for types.
So I need to debug what types I'm sending and recieving. type() hands back xml?, and isinstance() would require testing every possible type.
So is there an alternative that will give me the stringified type that is suitable for inline evaluation?
Example:
mystring = str("")
print (type(mystr))
returns <class "str">
If I am passing angle brackets into a binary API, I have no idea what it is doing with the data, or whether it will render in the UI at all. It does its own string parsing based on code I don't have, or even want to know.
The only part I care about is: "str".
So I have:
mystr = str("")
blackbox.String(mystr)
Where the contents and even type of mystr are unknown. (the string value is also handed to me by a blackbox) The API is rendering nothing. Though I don't know whether the fact that it is rendering nothing is because of bad string formatting, or a bad type, or because there is a string, but it is empty. I know it is SUPPOSED to be a text string with length. But I don't know if it is. If I use isinstance I have to know what I'm testing for. Which tells me nothing if I'm getting something weird. If I use type() I am sending something weird to the rending engine. So I am screwed both ways.
What I need is:
blackbox.String(type(mystr))
where the result of type is a plain ascii type name without punctuation, so that I can reasonably assess that the black box is giving and getting a plain, untainted text string.
The angle brackets are a representational convention of the output of type() rather xml as such.
The output of type() is an object, so you can access it's __name__ attribute to get the name of the type.
>>> type('')
<class 'str'>
>>> type('').__name__
'str'

python proper way to decode raw udp packet

I'm making a script to get Valve's server information (players online, map, etc)
the packet I get when I request for information is this:
'\xff\xff\xff\xffI\x11Stargate Central CAP SBEP\x00sb_wuwgalaxy_fix\x00garrysmod\x00Spacebuild\x00\xa0\x0f\n\x0c\x00dw\x00\x0114.09.08\x00\xb1\x87i\x06\xb4g\x17.\x15#\x01gm:spacebuild3\x00\xa0\x0f\x00\x00\x00\x00\x00\x00'
This may help you to see what I'm trying to do https://developer.valvesoftware.com/wiki/Server_queries#A2S_INFO
The problem is, I don't know how to decode this properly, it's easy to get the string but I have no idea how to get other types like byte and short
for example '\xa0\x0f'
For now I'm doing multiple split but do you know if there is any better way of doing this?
Python has functions for encoding/decoding different data types into bytes. Take a look at the struct package, the functions struct.pack() and struct.unpack() are your friends there.
taken from https://docs.python.org/2/library/struct.html
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
The first argument of the unpack function defines the format of the data stored in the second argument. Now you need to translate the description given by valve to a format string. If you wanted to unpack 2 bytes and a short from a data string (that would have a length of 4 bytes, of course), you could do something like this:
(first_byte, second_byte, the_short) = unpack("cc!h", data)
You'll have to take care yourself, to get the correct part of the data string (and I don't know if those numbers are signed or not, be sure to take care of that, too).
The strings you'll have to do differently (they are null-terminated here, so start were you know a string starts and read to the first "\0" byte).
pack() work's the other way around and stores data in a byte string. Take a look at the examples on the python doc and play around with it a bit to get a feel for it (when a tuple is returned/needed, e.g.).
struct supports you in getting the right byte order, which most of the time is network byte order and different from your system. That is of course only necessary for multi byte integers (like short) - so a format string of `"!h" should unpack a short correctly.

Serialization via pickle/eval and zlib

I wish to compress a string using zlib and append it to a text message as a string. I have a couple of issues with it:
a. Is there a problem to combine "binary" string with normal string? For example, is there any problem sending via socket a string that looks like that:
MSG 10=12 20=x\x9c+(\xc0\x00\x00S3\x08Q 33=hansz
I ask it since when opening files one usually declares whether he intends to read at binary mode or not, and I never fully understood that.
b. Can I be sure that some characters will not appear in the compressed string? For example, if the compressed string will include some char sequence like x\x9c 33=eve, I'll have trouble parsing the message properly. If I know that whitespaces will never appear in a zlib compressed string, I can do some string split; If I know that quotes and apostrophes do not appear, I might use shlex split.
c. My intention is to use either zlib.compress(str(obj)) or zlib.compress(pickle.dumps(obj)) as kind of pickling, and use either eval(zlib.decompress(s)) or pickle.loads(zlib.decompress(s)) for unpickling. Do you think it makes sense? The first idea is less safe (as eval is never that safe), but it's an inner system, so I'm ok with it, and on the other hand the compressing turns out to be shorter on most cases, and as quick. Do you think it's a good practice?
d. The reason I wish to have these messages short is that I wish to send them later via socket. I am not proficient with sockets, however, I know these tend to read small (4k?) buffers, so I try to make my messages not much longer than that.
a. The Problem with combining bytes and a unicode string is the following: There are more letters that 255. So historically, hundreds of encodings were created to put different alphabets into one byte.
>>> print b'\xE4'.decode('cp1251') # russian d
д
>>> print b'\xE4'.decode('cp1252') # german ae
ä
The letters have different meaning. To not loose the meaning of these letters, you use unicode.
>>> print u'\u00e4\u0434'
äд
However when you see bytes then you may not know the encoding. So you can not combine unicode and bytes straight away because one byte may be different letters.
Use 'UTF-8' as encoding for the next years. It uses more than one byte if necessairy and stores all letters.
b. zlib takes bytes and outputs bytes. It can contain any byte.
c. zlib.compress(pickle.dumps(obj)) and pickle.loads(zlib.decompress(s)) is totally fine. Pickle takes objects and returns bytes. You can save and store more objects than with zlib.compress(repr(obj)) and eval(zlib.decompress(s)). pickle is as safe as eval. If you need save evaluation have a look at import ast ast.literal_eval or use json instead of pickle.
d. Make sure to know when a message ends and another message starts. I think you can use zlib.decompressobj for this. Otherwise zib can get confused. Sockets can send much more that 4k bytes. The buffer means that a socket saves up to 4k bytes and does not want to receive more until you take bytes out of the buffer. If you use TCP you can send endless streams of bytes and nothing is lost.

Uncompress Zlib string in using ByteArrays

I have a web application developed in Adobe Flex 3 and Python 2.5 (deployed on Google App Engine). A RESTful web service has been created in Python and its results are currently in an XML format which is being read by Flex using the HttpService object.
Now the main objective is to compress the XML so that there is as less a time between the HttpService send() method and result events. I looked up Python docs and managed to use zlib.compress() to compress the XML result.
Then I set the HttpService result type from "xml" to "text" and tried using ByteArrays to uncompress the string back to XML. Here's where I failed. I am doing something like this:
var byteArray:ByteArray = new ByteArray();
byteArray.writeUTF( event.result.toString() );
byteArray.uncompress();
var xmlResult:XML = byteArray.readUTF();
Its throwing an exception at byteArray.uncompress() and says unable to uncompress the byteArray. Also when I trace the length of the byteArray it gets 0.
Unable to figure out what I'm doing wrong. All help is appreciated.
-- Edit --
The code:
# compressing the xml result in Python
print zlib.compress(xmlResult)
# decompresisng it in AS3
var byteArray:ByteArray = new ByteArray();
byteArray.writeUTF( event.result.toString() );
byteArray.uncompress()
Event is of type ResultEvent.
The error:
Error: Error #2058: There was an error decompressing the data.
The error could be because the value of byteArray.bytesAvailable = 0 which means the raw bytes python generated hasn't been written into byteArray properly..
-- Sri
What is byteArray.writeUTF( event.result.toString() ); supposed to do? The result of zlib.compress() is neither unicode nor "UTF" (meaningless without a number after it!?); it is binary aka raw bytes; you should neither decode it nor encode it nor apply any other transformation to it. The receiver should decompress immediately the raw bytes that it receives, in order to recover the data that was passed to zlib.compress().
Update What documentation do you have to support the notion that byteArray.uncompress() is expecting a true zlib stream and not a deflate stream (i.e. a zlib stream after you've snipped the first 2 bytes and the last 4)?
The Flex 3 documentation of ByteArray gives this example:
bytes.uncompress(CompressionAlgorithm.DEFLATE);
but unhelpfully doesn't say what the default (if any) is. If there is a default, it's not documented anywhere obvious, so it would be a very good idea for you to use
bytes.uncompress(CompressionAlgorithm.ZLIB);
to make it obvious what you intend.
AND the docs talk about a writeUTFBytes method, not a writeUTF method. Are you sure that you copy/pasted the exact receiver code in your question?
Update 2
Thanks for the URL. Looks like I got hold of the "help", not the real docs :=(. A couple of points:
(1) Yes, there is an explicit inflate() method. However uncompress DOES have an algorithm arg; it can be either CompressionAlgorithm.ZLIB (the default) or CompressionAlgorithm.DEFLATE ... interestingly the latter is however only available in Adobe Air, not in Flash Player. At least we know the uncompress() call appears OK, and we can get back to the problem of getting the raw bytes onto the wire and off again into a ByteArray instance.
(2) More importantly, there are both writeUTF (Writes a UTF-8 string to the byte stream. The length of the UTF-8 string in bytes is written first, as a 16-bit integer, followed by the bytes representing the characters of the string) and writeUTFBytes (Writes a UTF-8 string to the byte stream. Similar to the writeUTF() method, but writeUTFBytes() does not prefix the string with a 16-bit length word).
Whatever the merits of supplying UTF8-encoded bytes (nil, IMHO), you don't want a 2-byte length prefix there; using writeUTF() is guaranteed to cause uncompress() to bork.
Getting it on to the wire: using Python print on binary data doesn't seem like a good idea (unless sys.stdout has been nobbled to run in raw mode, which you didn't show in your code).
Likewise doing event.result.toString() getting a string (similar to a Python unicode object, yes/no?) -- with what and then encoding it in UTF-8 seem rather unlikely to work.
Given I didn't know that flex existed until today, I really can't help you effectively. Here are some further suggestions towards self-sufficiency in case nobody who knows more flex comes along soon:
(1) Do some debugging. Start off with a minimal XML document. Show repr(xml_doc). Show repr(zlib_compress_output). In (a cut-down version of) your flex script, use the closest function/method to repr() that you can find to show: event.result, event.result.toString() and the result of writeUTF*(). Make sure you understand the effects of everything that can happen after zlib.compress(). Reading the docs carefully may help.
(2) Look at how you can get raw bytes out of event.result.
HTH,
John

Type of object from udp buffer in python using metaclasses/reflection

Is it possible to extract type of object or class name from message received on a udp socket in python using metaclasses/reflection ?
The scenario is like this:
Receive udp buffer on a socket.
The UDP buffer is a serialized binary string(a message). But the type of message is not known at this time. So can't de-serialize into appropriate message.
Now, my ques is Can I know the classname of the seraialized binary string(recvd as UDP buffer) so that I can de-serialize into appropriate message and process further.
Thanks in Advance.
What you receive from the udp socket is a byte string -- that's all the "type of object or class name" that's actually there. If the byte string was built as a serialized object (e.g. via pickle, or maybe marshal etc) then you can deserialize it back to an object (using e.g. pickle.loads) and then introspect to your heart's content. But most byte strings were built otherwise and will raise exceptions when you try to loads from them;-).
Edit: the OP's edit mentions the string is "a serialized object" but still doesn't say what serialization approach produced it, and that makes all the difference. pickle (and for a much narrower range of type marshal) place enough information on the strings they produce (via the .dumps functions of the modules) that their respective loads functions can deserialize back to the appropriate type; but other approaches (e.g., struct.pack) do not place such metadata in the strings they produce, so it's not feasible to deserialize without other, "out of bands" so to speak, indications about the format in use. So, o O.P., how was that serialized string of bytes produced in the first place...?
You need to use a serialization module. pickle and marshal are both options. They provide functions to turn objects into bytestreams, and back again.
Updated answer after updated question:
"But the type of message is not known at this time. So can't de-serialize into appropriate message."
What you get is a sequence of bytes. How that sequence of types should be interpreted is a question of how the protocol looks. Only you know what protocol you use. So if you don't know the type of message, then there is nothing you can do about it. If you are to receive a stream of data an interpret it, you must know what that data means, otherwise you can't interpret it.
It's as simple as that.
"Now, my ques is Can I know the classname of the seraialized binary string"
Yes. The classname is "str", as all strings. (Unless you use Python 3, in which case you would not get a str but a binary). The data inside that str has no classname. It's just binary data. It means whatever the sender wants it to mean.
Again, I need to stress that you should not try to make this into a generic question. Explain exactly what you are trying to do, not generically, but specifically.

Categories