I'm trying to use the nntplib that comes with python to make some posts to usenet. However I can't figure out how to post binary files using the .post method.
I can post plain text files just fine, but not binary files. any ideas?
-- EDIT--
So thanks to Adrian's comment below I've managed to make one step towards my goal.
I now use the email library to make a multipart message and attach the binary files to the message. However I can't seem to figure out how to pass that message directly to the nttplib post method.
I have to first write a temporary file, then read it back in to the nttplib method. There has to be a way to do this all in memory....any suggestions?
you have to MIME-encode your post: a binary post in an NNTP newsgroup is like a mail with an attachment.
the file has to be encoded in ASCII, generally using the base64 encoding, then the encoded file is packaged iton a multipart MIME message and posted...
have a look at the email module: it implements all that you want.
i encourage you to read RFC3977 which is the official standard defining the NNTP protocol.
for the second part of your question:
use StringIO to build a fake file object from a string (the post() method of nntplib accepts open file objects).
email.Message objects have a as_string() method to retrieve the content of the message as a plain string.
Related
I am converting a Python 2 program to Python 3 and I'm not sure about the approach to take.
The program reads in either a single email from STDIN, or file(s) are specified containing emails. The program then parses the emails and does some processing on them.
SO we need to work with the raw data of the email input, to store it on disk and do an MD5 hash on it. We also need to work with the text of the email input in order to run it through the Python email parser and extract fields etc.
With Python 3 it is unclear to me how we should be reading in the data. I believe we need the raw binary data in order to do an md5 on it, and also to be able to write it to disk. I understand we also need it in text form to be able to parse it with the email library. Python 3 has made significant changes to the IO handling and text handling and I can't see the "correct" approach to read the email raw data and also use the same data in text form.
Can anyone offer general guidance on this?
The general guidance is convert everything to unicode ASAP and keep it that way until the last possible minute.
Remember that str is the old unicode and bytes is the old str.
See http://docs.python.org/dev/howto/unicode.html for a start.
With Python 3 it is unclear to me how we should be reading in the data.
Specify the encoding when you open the file it and it will automatically give you unicode. If you're reading from stdin, you'll get unicode. You can read from stdin.buffer to get binary data.
I believe we need the raw binary data in order to do an md5 on it
Yes, you do. encode it when you need to hash it.
and also to be able to write it to disk.
You specify the encoding when you open the file you're writing it to, and the file object encodes it for you.
I understand we also need it in text form to be able to parse it with the email library.
Yep, but since it'll get decoded when you open the file, that's what you'll have.
That said, this question is really too open ended for Stack Overflow. When you have a specific problem / question, come back and we'll help.
I've found this snippet, which seems to do the job, but I can't understand why it uses StringIO. Isn't f already a file-like object? What is the need to read it, then make it look like a file again, only to read it again? I've tested it (well, a slightly modified version of it), and it doesn't work without StringIO.
Seems to be a flaw in python standard library which is fixed in Python 3.2.
see http://www.enricozini.org/2011/cazzeggio/python-gzip/
urllib and urllib2 file objects do not provide a method tell() as requested by gzip.
It's possible that the gunzip code needs a file-like object that has a seek method, which a HTTP library is very unlikely to provide. What does "doesn't work" mean? Error message?
If efficiency is your real concern, slightly modify the code so that it uses cStringIO, not StringIO.
The way I read the relevant part of the code says:
Open an url
Download it completely into memory (with the read method)
Store the content in a StringIO object, so that it's available as a file-like object
Do the gzip and json stuff with it.
Hello I have python script that takes apart an email from a string. I am using the get_payload(decode=True) function from the email class and it works great for pdf's and jpg's but it does not decode bmp files. The file is still encoded base64 when I write it to disk.
Has anyone come across this issue themselves?
OK so I finally found the problem and it was not related to the python mail class at all. I was reading from a named pipe using the .read() function and it was not reading the entire email from the pipe. I had to pass the read function a size argument and then it was able to read the entire email. So ultimately the reason why my bmp file was not decoded is because I had invalid base64 data causing the get_payload() function to not be able to decode the attatchment.
Usually I would download it to StringIO object, then run this:
m = magic.Magic()
m.from_buffer(thefile.read(1024))
But this time , I can't download the file, because the image might be 20 Megabytes. I want to use Python magic to find the file type without downloading the entire file.
If python-magic can't do it...is the next best way to observe the mime type in the headers? But how accurate is this??
I need accuracy.
You can call read(1024) without downloading the whole file:
thefile = urllib2.urlopen(someURL)
Then, just use your existing code. urlopen returns a file-like object, so this works naturally.
If it is one of the common image formats like png of jpg, and you see the server is a reliable one, then you can use the 'Content-Type' header to give what you are looking for.
But this is not as reliable as using the portion of the file and passing it to python-magic, because if server had not identified the proper format and it might have set it to application/octet-stream. This is more common with video formats, but pictures, I think Content-Type is okay.
Sorry, I can't find any statistics or research on Content-Type's accuracy. The suggested answer of downloading only part of the file is a good option too.
I'm trying to read a field from an Active Directory entry which contains raw jpeg binary data. I'd like to read that data and convert it to an image file for use in my django-based application. I cannot for the life of me figure out how to handle this data in a nice way. Any ideas?
Edit:
To anyone who might come across this in the future: there's a method in python's OS library:
os.tmpfile()
it creates a file and destroys it once the file descriptor is closed. Very useful for this situation.
Here is somebody who was having the same problem -- check out the latest post at the bottom.
http://groups.google.com/group/django-users/browse_thread/thread/4214db6699863ded/5d816b02daca3186
Looks like passing raw data to SimpleUploadedFile is what you are looking for.
request._raw_post_data
The raw HTTP POST data as a byte
string. This is useful for processing
data in different formats than of
conventional HTML forms: binary
images, XML payload etc.
http://docs.djangoproject.com/en/dev/ref/request-response/#httprequest-objects
I know this isn't part of the question, but this looks pretty awesome! "HttpRequest.read() file-like interface"
http://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.read