Interpreting padded string - python

I have little experience with unicode strings. I am not even sure this fits the criteria.
In any case I was using nmap and ran:
# nmap -sV -O 192.168.0.8
against a box in my LAN. Nmap produced a string over several lines returned from an open port, but I cannot understand a lot of the output due to its formatting. For example, a small snippet looks like this:
-Port8081-TCP:V=6.00%I=7%D=10/20%Time=52642C3A%P=i686-pc-linux-gnu%r(FourOhFourRequest,37,"HTTP/1\.1\x20503\x20Service\x20Unavailable\r\nContent-Length:\x200\r\n\r\n")%r
My first thought was URL encoding which requires decoding, but that's incorrect. It looks almost like padding from serial communication? Anybody able to shed light on how to interpret the "\x200" or "\x20503" or another that shows often is "x\20".
I thought about writing a small Python script to take in the entire string and convert to ASCII with:
>>> s = '<STRING>'
>>> eval('\x20"'+s.replace('"', r'"')+'"').encode('ascii')
Am I on the right track?

The string you see is a service fingerprint. It contains the responses that were received to the various probes that Nmap sends. If you think there is identifying information in the responses, please submit the fingerprint to the Nmap project to improve detection in the future.
More than likely, what happened is that the service is not sending any useful information. The sample you gave, for instance, does not have a Server: header that would identify the HTTP server.
To answer the technical problem of how to turn this string:
"HTTP/1\.1\x20503\x20Service\x20Unavailable\r\nContent-Length:\x200\r\n\r\n"
into an unescaped version, you can do this:
>>> print mystring
"HTTP/1\.1\x20503\x20Service\x20Unavailable\r\nContent-Length:\x200\r\n\r\n"
>>> print mystring.decode('string-escape')
"HTTP/1\.1 503 Service Unavailable
Content-Length: 0
"

Those numbers bring to mind hexidecimal values due to the 'x' in front.
I know that hexidecimal values actually start with '0x' and not just x, but I thought it was worth googling them as hex values with the '0x' in front. I did get a full page of search results which seemed to contain these three values(perhaps inevitable that three random values would show up somewhere, but then again, perhaps not):
0x200, 0x20503, 0x20
Sorry that this isn't an answer as such, but I thought I would mention it since you didn't mention trying this in your post. I wanted to post this as a comment, but the option wasn't available for some reason...

Related

How to decode Common Industrial Protocol (CIP) packets using python?

I tried to decode this highlighted segment however i ran into some issues.
I used this code in order to decipher the content
hexed ="01000c0000000040000040400000803f0000003f2af0ce4004040000404000008040cdcc4c3ecdcccc3d305b1a3e2903fa42240000484400006144000048430000c8424ddc4143200000484400006144000048430000c84218380b440000000000000000000000000000000000000000000000000b010001deddf7420b0100016666e6400201000102000000000000000000000000305b1a3e4ddc414318380b4400010000000101000100010002000300121204000200010000050006000600ffffffff00000000deddf742"
ether_pkt = Ether(binascii.unhexlify(hexed))
ether_pkt.show()
And the result i got is:
How do i further decipher this content?
'\x80?\x00\x00\x00?*\xf0\xce#\x04\x04\x00\x00##\x00\x00\x80#\xcd\xccL>\xcd\xcc\xcc=0[\x1a>)\x03\xfaB$\x00\x00HD\x00\x00aD\x00\x00HC\x00\x00\xc8BM\xdcAC \x00\x00HD\x00\x00aD\x00\x00HC\x00\x00\xc8B\x188\x0bD\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0b\x01\x00\x01\xde\xdd\xf7B\x0b\x01\x00\x01ff\xe6#\x02\x01\x00\x01\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x000[\x1a>M\xdcAC\x188\x0bD\x00\x01\x00\x00\x00\x01\x01\x00\x01\x00\x01\x00\x02\x00\x03\x00\x12\x12\x04\x00\x02\x00\x01\x00\x00\x05\x00\x06\x00\x06\x00\xff\xff\xff\xff\x00\x00\x00\x00\xde\xdd\xf7B'
I've tried to .decode() and hex() in order to turn them into string however the output is not human readable
Have a look at pycomm3. Especially its CIP reference.
According to the reference, 0x4c is the "read_tag" custom service for Rockwell devices, whatever that means.
The data you highlighted is listed as "command specific data". That suggests that it is not defined in the CIP, but is custom to the device that sent it. If it had been part of the CIP, wireshark could probably have decoded it further. So you will have to find and read documentation for the device in question.
There is no magic, you need to download the specs and write a parser to decode it. As you can see in your wireshark screenshot, the protocol isn't string/ascii.

Hexadecimal and words mixed in Python bytes

I'm receiving frames from a websocket server and I'm not sure how to interpret some of the bytes object because they are mixed with actual words inside them.
I get something like this:
b'\x00\x17\x04\x00\x00\x00\xc0\x05FOCUS\x01\x00\xff\xfc\x00\x05;\xea\x01\x03\xe8\x81'
This one has 'FOCUS' and a ';' in it. I am expecting 'FOCUS' to be part of the payload, but I don't know why it's showing up as is, and not in hex form. Can someone explain what's going on and how I can unpack the rest of the data?
Also, it seems I'm getting the data in reverse order. I think \x81 is supposed to be the first byte of the frame.
I'm using Python 3.6 and the websocket-client lib. Thank you.

Using python to measure Wi-Fi

I am working on a school project in which I must measure and log Wi-Fi (I know how to log the data, I just don't know the most efficient way to do it). I have tried using by using
subproject.check_output('iwconfig', stderr=subprocess.STDOUT)
but that outputs bytes, which I really don't want to deal with (and I don't know how to, either, so if that is the only option, then can someone explain how to handle bytes). Is there any other way, maybe to get it in plain text? And please do not just give me the code I need, tell me how to do it.
Thank you in advance!
You are almost there. I assume that you are using python 3.x. iwconfig is sending you text encoded in whatever character set your terminal uses. That encoding is available as sys.stdin.encoding. So just put it together to get a string. BTW, you want a command list instead of a string.
raw = subprocess.check_output(['iwconfig'],stderr=subprocess.STDOUT)
data = raw.decode(sys.stdin.encoding)

getting data from python script on webserver using swift

I have a python script on my webserver which simply prints out 2 to 5 words, one under the other.
Ruyterplaats
Civic Centre
Racecourse
Atlantis
What I need to do is the following:
open the url www.webserveraddress.com?variable1=variable2
get the words from each line and put them into an array
No need to display the webpage, I just need the words. Thats all.
Iv seen I can use things like Libxml2 and Hpple, but these are ObjC wrappers around other code. I'm not sure how Swift will cope with that.
I quite frankly have no idea where to start or even if I'm going about it the wrong or not :/
PS. I would post code but the python script is around 6500 lines :)
The quickest way to get the contents of a URL as a string is to use the constructor on NSString:
var contents = NSString(contentsOfURL: NSURL(string: "http://example.com"), encoding: NSUTF8StringEncoding, error: nil)
Then you can separate the contents into an array using componentsSeparatedByCharactersInSet:
var wordArray = contents.componentsSeparatedByCharactersInSet(NSCharacterSet.newlineCharacterSet())
Note: The server side technology doesn't matter at all, which is one of the best things about the HTTP protocol ;). That URl could return a static file for all the Swift code (or anyone else) will care.

How to reliable tell the uploaded file type (text or binary)?

I have an application where users should be able to upload a wide variety of files, but I need to know for each file, if I can safely display its textual representation as plain text.
Using python-magic like
m = Magic(mime=True).from_buffer(cgi.FieldStorage.file.read())
gives me the correct MIME type.
But sometimes, the MIME type for scripts is application/*, so simply looking for m.startswith('text/') is not enough.
Another site suggested using
m = Magic().from_buffer(cgi.FieldStorage.file.read())
and checking for 'text' in m.
Would the second approach be reliable enough for a collection of arbitrary file uploads or could someone give me another idea?
Thanks a lot.
What is your goal? Do you want the real mime type? Is that important for security reasons? Or is it "nice to have"?
The problem is that the same file can have different mime types. When a script file has a proper #! header, python-magic can determine the script type and tell you. If the header is missing, text/plain might be the best you can get.
This means there is no general "will always work" magic solution (despite the name of the module). You will have to sit down and think what information you can get, what it means and how you want to treat it.
The secure solution would be to create a list of mime types that you accept and check them with:
allowed_mime_types = [ ... ]
if m in allowed_mime_types:
That means only perfect matches are accepted. It also means that your server will reject valid files which don't have the correct mime type for some reason (missing header, magic failed to recognize the file, you forgot to mention the mime type in your list).
Or to put it another way: Why do you check the mime type of the file if you don't really care?
[EDIT] When you say
I need to know for each file, if I can safely display its textual representation as plain text.
then this isn't as easy as it sounds. First of all, "text" files have no encoding stored in them, so you will need to know the encoding that the user used when they created the file. This isn't a trivial task. There are heuristics to do so but things get hairy when encodings like ISO 8859-1 and 8859-15 are used (the latter has the Euro symbol).
To fix this, you will need to force your users to either save the text files in a specific encoding (UTF-8 is currently the best choice) or you need to supply a form into which users will have to paste the text.
When using a form, the user can see whether the text is encoded correctly (they see it on the screen), they can fix any problems and you can make sure that the browser sends you the text encoded with UTF-8.
If you can't do that, your only choice is to check for any bytes below 0x20 in the input with the exception of \r, \n and \t. That is a pretty good check for "is this a text document".
But when users use umlauts (like when you write an application that is being used world wide), this approach will eventually fail unless you can enforce a specific encoding on the user's side (which you probably can't since you don't trust the user).
[EDIT2] Since you need this to check actual source code: If you want to make sure the source code is "safe", then parse it. Most languages allow to parse the code without actually executing it. That would give you some real information (because the parsers know what to look for) and you wouldn't need to make wild guesses :-)
After playing around a bit, I discovered that I can propably use the Magic(mime_encoding=True) results!
I ran a simple script on my Dropbox folder and grouped the results both by encoding and by extension to check for irregularities.
But it does seem pretty usable by looking for 'binary' in encoding.
I think I will hang on to that, but thank you all.

Categories