I am working on a school project in which I must measure and log Wi-Fi (I know how to log the data, I just don't know the most efficient way to do it). I have tried using by using
subproject.check_output('iwconfig', stderr=subprocess.STDOUT)
but that outputs bytes, which I really don't want to deal with (and I don't know how to, either, so if that is the only option, then can someone explain how to handle bytes). Is there any other way, maybe to get it in plain text? And please do not just give me the code I need, tell me how to do it.
Thank you in advance!
You are almost there. I assume that you are using python 3.x. iwconfig is sending you text encoded in whatever character set your terminal uses. That encoding is available as sys.stdin.encoding. So just put it together to get a string. BTW, you want a command list instead of a string.
raw = subprocess.check_output(['iwconfig'],stderr=subprocess.STDOUT)
data = raw.decode(sys.stdin.encoding)
Related
I am trying to read a message on a pidgin window using python. I have read Pidgin how to and I using the following code:
purple.PurpleGetConversations()
and I get the following output:
dbus.Array([dbus.Int32(14414)], signature=dbus.Signature('i'))
I dont know how to access the elements of this dbus.Array
Best Regards
PD: I am interested in reading the messages, if there is a better way please let me know
Progress update: If anyone else is interested in this, I came up with an alternative solution. Pidgin leaves chat logs in ~/purple, from python you can open this files and use regex to read all msgs.
(If there is a more straigthforward way please tell me)
I found it, Here is the resulting code:
convID = purple.PurpleGetConversations()
msgpos = purple.PurpleConversationGetMessageHistory(convID[0])[0]
print purple.PurpleConversationMessageGetMessage(msgpos)
This will print the last message from an open chat
You need to use PurpleConversationGetChatData method, it takes conversation id as a parameter (14414 in your case).
I have javascript client generated from introspection xml, it might be helpful addition to a dbus documentation - https://github.com/sidorares/node-pidgin/blob/master/index.js
I have little experience with unicode strings. I am not even sure this fits the criteria.
In any case I was using nmap and ran:
# nmap -sV -O 192.168.0.8
against a box in my LAN. Nmap produced a string over several lines returned from an open port, but I cannot understand a lot of the output due to its formatting. For example, a small snippet looks like this:
-Port8081-TCP:V=6.00%I=7%D=10/20%Time=52642C3A%P=i686-pc-linux-gnu%r(FourOhFourRequest,37,"HTTP/1\.1\x20503\x20Service\x20Unavailable\r\nContent-Length:\x200\r\n\r\n")%r
My first thought was URL encoding which requires decoding, but that's incorrect. It looks almost like padding from serial communication? Anybody able to shed light on how to interpret the "\x200" or "\x20503" or another that shows often is "x\20".
I thought about writing a small Python script to take in the entire string and convert to ASCII with:
>>> s = '<STRING>'
>>> eval('\x20"'+s.replace('"', r'"')+'"').encode('ascii')
Am I on the right track?
The string you see is a service fingerprint. It contains the responses that were received to the various probes that Nmap sends. If you think there is identifying information in the responses, please submit the fingerprint to the Nmap project to improve detection in the future.
More than likely, what happened is that the service is not sending any useful information. The sample you gave, for instance, does not have a Server: header that would identify the HTTP server.
To answer the technical problem of how to turn this string:
"HTTP/1\.1\x20503\x20Service\x20Unavailable\r\nContent-Length:\x200\r\n\r\n"
into an unescaped version, you can do this:
>>> print mystring
"HTTP/1\.1\x20503\x20Service\x20Unavailable\r\nContent-Length:\x200\r\n\r\n"
>>> print mystring.decode('string-escape')
"HTTP/1\.1 503 Service Unavailable
Content-Length: 0
"
Those numbers bring to mind hexidecimal values due to the 'x' in front.
I know that hexidecimal values actually start with '0x' and not just x, but I thought it was worth googling them as hex values with the '0x' in front. I did get a full page of search results which seemed to contain these three values(perhaps inevitable that three random values would show up somewhere, but then again, perhaps not):
0x200, 0x20503, 0x20
Sorry that this isn't an answer as such, but I thought I would mention it since you didn't mention trying this in your post. I wanted to post this as a comment, but the option wasn't available for some reason...
I have an application where users should be able to upload a wide variety of files, but I need to know for each file, if I can safely display its textual representation as plain text.
Using python-magic like
m = Magic(mime=True).from_buffer(cgi.FieldStorage.file.read())
gives me the correct MIME type.
But sometimes, the MIME type for scripts is application/*, so simply looking for m.startswith('text/') is not enough.
Another site suggested using
m = Magic().from_buffer(cgi.FieldStorage.file.read())
and checking for 'text' in m.
Would the second approach be reliable enough for a collection of arbitrary file uploads or could someone give me another idea?
Thanks a lot.
What is your goal? Do you want the real mime type? Is that important for security reasons? Or is it "nice to have"?
The problem is that the same file can have different mime types. When a script file has a proper #! header, python-magic can determine the script type and tell you. If the header is missing, text/plain might be the best you can get.
This means there is no general "will always work" magic solution (despite the name of the module). You will have to sit down and think what information you can get, what it means and how you want to treat it.
The secure solution would be to create a list of mime types that you accept and check them with:
allowed_mime_types = [ ... ]
if m in allowed_mime_types:
That means only perfect matches are accepted. It also means that your server will reject valid files which don't have the correct mime type for some reason (missing header, magic failed to recognize the file, you forgot to mention the mime type in your list).
Or to put it another way: Why do you check the mime type of the file if you don't really care?
[EDIT] When you say
I need to know for each file, if I can safely display its textual representation as plain text.
then this isn't as easy as it sounds. First of all, "text" files have no encoding stored in them, so you will need to know the encoding that the user used when they created the file. This isn't a trivial task. There are heuristics to do so but things get hairy when encodings like ISO 8859-1 and 8859-15 are used (the latter has the Euro symbol).
To fix this, you will need to force your users to either save the text files in a specific encoding (UTF-8 is currently the best choice) or you need to supply a form into which users will have to paste the text.
When using a form, the user can see whether the text is encoded correctly (they see it on the screen), they can fix any problems and you can make sure that the browser sends you the text encoded with UTF-8.
If you can't do that, your only choice is to check for any bytes below 0x20 in the input with the exception of \r, \n and \t. That is a pretty good check for "is this a text document".
But when users use umlauts (like when you write an application that is being used world wide), this approach will eventually fail unless you can enforce a specific encoding on the user's side (which you probably can't since you don't trust the user).
[EDIT2] Since you need this to check actual source code: If you want to make sure the source code is "safe", then parse it. Most languages allow to parse the code without actually executing it. That would give you some real information (because the parsers know what to look for) and you wouldn't need to make wild guesses :-)
After playing around a bit, I discovered that I can propably use the Magic(mime_encoding=True) results!
I ran a simple script on my Dropbox folder and grouped the results both by encoding and by extension to check for irregularities.
But it does seem pretty usable by looking for 'binary' in encoding.
I think I will hang on to that, but thank you all.
I am converting a Python 2 program to Python 3 and I'm not sure about the approach to take.
The program reads in either a single email from STDIN, or file(s) are specified containing emails. The program then parses the emails and does some processing on them.
SO we need to work with the raw data of the email input, to store it on disk and do an MD5 hash on it. We also need to work with the text of the email input in order to run it through the Python email parser and extract fields etc.
With Python 3 it is unclear to me how we should be reading in the data. I believe we need the raw binary data in order to do an md5 on it, and also to be able to write it to disk. I understand we also need it in text form to be able to parse it with the email library. Python 3 has made significant changes to the IO handling and text handling and I can't see the "correct" approach to read the email raw data and also use the same data in text form.
Can anyone offer general guidance on this?
The general guidance is convert everything to unicode ASAP and keep it that way until the last possible minute.
Remember that str is the old unicode and bytes is the old str.
See http://docs.python.org/dev/howto/unicode.html for a start.
With Python 3 it is unclear to me how we should be reading in the data.
Specify the encoding when you open the file it and it will automatically give you unicode. If you're reading from stdin, you'll get unicode. You can read from stdin.buffer to get binary data.
I believe we need the raw binary data in order to do an md5 on it
Yes, you do. encode it when you need to hash it.
and also to be able to write it to disk.
You specify the encoding when you open the file you're writing it to, and the file object encodes it for you.
I understand we also need it in text form to be able to parse it with the email library.
Yep, but since it'll get decoded when you open the file, that's what you'll have.
That said, this question is really too open ended for Stack Overflow. When you have a specific problem / question, come back and we'll help.
To preface I'm very new to python (about 7 days) but I'm an experienced software eng undergrad.
I would like to send data between machines running python scripts. The idea I had (in order to simplify things) was to concatenate the data (strings & ints) into a string and do the parsing client-side.
The UDP packets send beautifully with simple strings but when I try to send useful data python always complains about the data I send; specifically python won't let me concatenate tuples.
In order to parse the data on the client I need to seperate the data with a dash character: '-'.
nodeList is of type dictionary where the key is a string and value is a double.
randKey = random.choice( nodeList.keys() )
data = str(randKey) +'-'+ str(nodeList[randKey])
mySocket.sendto ( data , address )
The code above produces the following error:
TypeError: coercing to Unicode: need string or buffer, tuple found
I don't understand why it thinks it is a tuple I am trying to concatenate...
So my question is how can I correct this to keep Python happy, or can someone suggest I better way of sending the data?
Thank you in advance.
I highly suggest using Google Protocol Buffers as implemented in Python as protobuf for this as it will handle the serialization on both ends of the line. It has Python bindings that will allow you to easily use it with your existing Python program.
Using your example code you would create a .proto file like so:
message SomeCoolMessage {
required string key = 1;
required double value = 2;
}
Then after generating, you can use it like so:
randKey = random.choice( nodeList.keys() )
data = SomeCoolMessage()
data.key = randKey
data.value = nodeList[randKey]
mySocket.sendto ( data.SerializeToString() , address )
I'd probably use the json module serialize the data.
You need to serialize the data. Pickle does this built in for you, and you can ask pickle for an ascii representation of the data vs binary data (see the docs), or you could use json (it also serializes the data for you) both are in the standard library. But really there are a hundred thousand different libraries that handle ALL the work for you, in getting data from 1 machine to another. I'd suggest using a library.
Depending on speed, etc. there are different trade offs for the various libraries. In the standard library you get HTTP, that's about it (well and raw sockets). But there are others.
If super fast speed is more important than other things..., zeroMQ, or google's protocol buffers might be valid options.
For me, I use rpyc usually, it lets me be totally lazy, and just call over to the other process across the network. It's fast enough usually.
You know that UDP has no guarantee that the data will ever show up on the other side, or that it will show up IN ORDER. for your application you may not care, I don't know, but just thought I'd bring it up.