Python 3 email body encoding

Python 3 email body encoding - python

I am working on setting up a script that forwards incoming mail to a list of recipients.
Here's what I have now:
I read the email from stdin (that's how postfix passes it):
email_in = sys.stdin.read()
incoming = Parser().parse(email_in)
sender = incoming['from']
this_address = incoming['to']
I test for multipart:
if incoming.is_multipart():
for payload in incoming.get_payload():
# if payload.is_multipart(): ...
body = payload.get_payload()
else:
body = incoming.get_payload(decode=True)`
I set up the outgoing message:
msg = MIMEMultipart()
msg['Subject'] = incoming['subject']
msg['From'] = this_address
msg['reply-to'] = sender
msg['To'] = "foo#bar.com"
msg.attach(MIMEText(body.encode('utf-8'), 'html', _charset='UTF-8'))
s = smtplib.SMTP('localhost')
s.send_message(msg)
s.quit()
This works pretty well with ASCII characters (English text), forwards it and all.
When I send non-ascii characters though, it gives back gibberish (depending on email client bytes or ascii representations of the utf-8 chars)
What can be the problem? Is it on the incoming or the outgoing side?

The problem is that many email clients (including Gmail) send non-ascii emails in base64. stdin on the other hand passes everything into a string. If you parse that with Parser.parse(), it returns a string type with base64 inside.
Instead the optional decode argument should be used on the get_payload() method. When that is set, the method returns a bytes type. After that you can use the builtin decode() method to get utf-8 string like so:
body = payload.get_payload(decode=True)
body = body.decode('utf-8')
There is great insight into utf-8 and python in Ned Batchelder's talk.
My final code works a bit differently, you can check that, too here.

Related

Converting ASCII encoded characters inside string in Python

I am using the IMapLib library to read emails from my mailserver. The emails contain JSON encoded messages which my program should interpret.
Mail code:
tmp, data = imap.search(None, "UNSEEN")
emails = []
for num in data[0].split():
tmp, data = imap.fetch(num, "(BODY[TEXT])")
# Only append the email body
emails.append(str(data[0][1]))
The strings I get from imaplib however contain some special characters. I have figured out that the =xx looks like the ASCII encoded version of the 'special' characters. How could I convert a string containing such characters to a 'regular' Python string or am I perhaps missing an option in the imaplib code which is encoding the strings incorrectly?
An example string I get:
b'This is a message in Mime Format. If you see this, your mail reader does not support this format.\r\n\r\n--=_8e336d0902b13eaec4e7906847c21a6d\r\nContent-Type: text/plain; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n=0A=0A=0A=0A =0A =0A =0A =0A =0A =0A JSON{"arrival":"03.03.21","departure":"07.03.21","email":"test=\r\n=2Etest#gmail.com","apartment":"app","ov=\r\nerride":0}JSON =0A =0A=0A\r\n--=_8e336d0902b13eaec4e7906847c21a6d\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n=0A=0A=0A=0A =0A <meta charset=3D"utf-8"=20=\r\n/>=0A <meta http-equiv=3D"Content-Type" content=3D"text/html charset=\r\n=3DUTF-8" />=0A =0A =0A =0A JSON{"arrival":"03.03.21","departure":"07.03.21","email":"test=\r\n=2Etest#gmail.com","apartment":"app","ov=\r\nerride":0}JSON =0A =0A=0A\r\n--=_8e336d0902b13eaec4e7906847c21a6d--\r\n'
I was initially just removing all '\n', '\r' and '=' but today I received this email/string and my code incorrectly interpreted "test=\r\n=2Etest#gmail.com" as "test2Etest#gmail.com" instead of "test.test#gmail.com"

You are dealing with the encoding scheme named "quoted printable" (more details in RFC 2045,section 6.7).
You have at least two options:
You could use the Python module quopri
You could parse your email with the parser of the Python email module (email.parser).
But if your goal is to easily get the email content, it would be easier to use the modules imap_tools or IMAPClient.
Some example code from their documentations:
imap_tools (https://pypi.org/project/imap-tools/):
from imap_tools import MailBox, AND
# get list of email subjects from INBOX folder
with MailBox('imap.mail.com').login('test#mail.com', 'pwd') as mailbox:
subjects = [msg.subject for msg in mailbox.fetch()]
# get list of email subjects from INBOX folder - equivalent verbose version
mailbox = MailBox('imap.mail.com')
mailbox.login('test#mail.com', 'pwd', initial_folder='INBOX') # or mailbox.folder.set instead 3d arg
subjects = [msg.subject for msg in mailbox.fetch(AND(all=True))]
mailbox.logout()
IMAPClient (https://imapclient.readthedocs.io/en/2.1.0/):
from imapclient import IMAPClient
server = IMAPClient('imap.mailserver.com', use_uid=True)
server.login('someuser', 'somepassword')
select_info = server.select_folder('INBOX')
print('%d messages in INBOX' % select_info[b'EXISTS'])
#34 messages in INBOX
messages = server.search(['FROM', 'best-friend#domain.com'])
print("%d messages from our best friend" % len(messages))
#5 messages from our best friend
for msgid, data in server.fetch(messages, ['ENVELOPE']).items():
envelope = data[b'ENVELOPE']

You have hint relating encoding in your message, namely:
Content-Transfer-Encoding: quoted-printable
which explains =s in your text. You might use quopri built-in module for dealing with it, following way:
import quopri
message = b'test=\r\n=2Etest#gmail.com'
decoded = quopri.decodestring(message)
print(decoded)
output:
b'test.test#gmail.com'
Note that quopri.decodestring return bytes, so you would have to make correct .decode if you must have text, if utf-8 is used it will be:
decoded = quopri.decodestring(message).decode('utf-8')

G mail API for Sending Mail with attachment is failing when recipients name is in multi byte character

I am using below APi to send mail with attachment
https://www.googleapis.com/upload/gmail/v1/users/me/messages/send?uploadType=multipart
which is working fine when recipients name is in English, but if recipients name contains multi-byte(e.g Japanese), I am getting 400(Bad request) as response.
Code snippet
def create_raw():
message['to'] = ','.join([recipients_dict['name']+
<"+recipients_dict['email_address']+">" for recipients_dict in
recipients['to']])
message['from'] = email_address
message['subject'] = subject
msg = MIMEText(body)
message.attach(msg)
When recipients_dict['name'] is "English" the API works as expected, but for multi-byte character
getting HTTP 400(Bad request) Error
.

You have a good approach over the Gmail API. The only step necessary is to encode the string into UTF-8 before sending it in bytes over base64 (due to the definition of MIME). You can accomplish that using a code similar to:
import base64
…
recipients_dict['name'] = base64.b64encode(u'ジョージ'.encode("utf-8"))
If you still have any question, please do not hesitate to ask for further help.

Python email module: form header "From" with some unicode name + email

I'm generating email with the help of Python email module.
Here are few lines of code, which demonstrates my question:
msg = email.MIMEMultipart.MIMEMultipart('alternative')
msg['From'] = "somemail#somedomain.com"
msg.as_string()
Out[7]: 'Content-Type: multipart/alternative;\n boundary="===============9006870443159801881=="\nMIME-Version: 1.0\nFrom: somemail#somedomain.com\n\n--===============9006870443159801881==\n\n--===============9006870443159801881==--'
As you can see, everything is okay here, From field contains email ant it is cool. But what if I want to add some name before email? Especially unicode one:
In [8]: u.get_full_name()
Out[8]: u'\u0414\u0438\u043c\u0430 \u0426\u0443\u043a\u0430\u043d\u043e\u0432'
In [9]: msg = email.MIMEMultipart.MIMEMultipart('alternative')
In [10]: msg['From'] = "%s <%s>" % (u.get_full_name(), "email#at.com")
In [11]: msg.as_string()
Out[11]: 'Content-Type: multipart/alternative;\n boundary="===============5792069034892928634=="\nMIME-Version: 1.0\nFrom: =?utf-8?b?0JTQuNC80LAg0KbRg9C60LDQvdC+0LIgPGVtYWlsQGF0LmNvbT4=?=\n\n--===============5792069034892928634==\n\n--===============5792069034892928634==--'
Here you can see, that all the string (name, email) was encoded in base64 (and it is even quite logical, how MIMEMultipart will know that string contains unicode and non-unicode parts).
So, my question is: how do I have to tell email module to make me pretty "From" header like:
From: =?UTF-8?B?0JLQmtC+0L3RgtCw0LrRgtC1?= <admin#notify.vk.com> ?
Also, I've learned a little RFC2822 (http://www.faqs.org/rfcs/rfc2822.html , p.3.6.2). It tells:
The originator fields indicate the mailbox(es) of the source of the
message. The "From:" field specifies the author(s) of the message,
that is, the mailbox(es) of the person(s) or system(s) responsible
for the writing of the message. The "Sender:" field specifies the
mailbox of the agent responsible for the actual transmission of the
message. For example, if a secretary were to send a message for
another person, the mailbox of the secretary would appear in the
"Sender:" field and the mailbox of the actual author would appear in
the "From:" field. If the originator of the message can be indicated
by a single mailbox and the author and transmitter are identical, the
"Sender:" field SHOULD NOT be used. Otherwise, both fields SHOULD
appear.
Does it mean that I should combine these two headers? (From and Sender). I'm a bit confused, because I noticed a lot of emails in my gmail (looking through "Show original") where in From field name and email are presented.
Thanks for help.

You need to encode the name part separately using email.header.Header:
from email.MIMEMultipart import MIMEMultipart
from email.header import Header
from email.utils import formataddr
author = formataddr((str(Header(u'Alał', 'utf-8')), "somemail#somedomain.com"))
msg = MIMEMultipart('alternative')
msg['From'] = author
print msg
I hope this will help.

Getting mail attachment to python file object

I have got an email multipart message object, and I want to convert the attachment in that email message into python file object. Is this possible? If it is possible, what method or class in Python I should look into to do such task?

I don't really understand what you mean by "email multipart message object". Do you mean an object belonging to the email.message.Message class?
If that is what you mean, it's straightforward. On a multipart message, the get_payload method returns a list of message parts (each of which is itself a Message object). You can iterate over these parts and examine their properties: for example, the get_content_type method returns the part's MIME type, and the get_filename method returns the part's filename (if any is specified in the message). Then when you've found the correct message part, you can call get_payload(decode=True) to get the decoded contents.
>>> import email
>>> msg = email.message_from_file(open('message.txt'))
>>> len(msg.get_payload())
2
>>> attachment = msg.get_payload()[1]
>>> attachment.get_content_type()
'image/png'
>>> open('attachment.png', 'wb').write(attachment.get_payload(decode=True))
If you're programmatically extracting attachments from email messages you have received, you might want to take precautions against viruses and trojans. In particular, you probably ought only to extract attachments whose MIME types you know are safe, and you probably want to pick your own filename, or at least sanitize the output of get_filename.

Here is working solution, messages are form IMAP server
self.imap.select()
typ, data = self.imap.uid('SEARCH', 'ALL')
msgs = data[0].split()
print "Found {0} msgs".format(len(msgs))
for uid in msgs:
typ, s = self.imap.uid('FETCH', uid, '(RFC822)')
mail = email.message_from_string(s[0][1])
print "From: {0}, Subject: {1}, Date: {2}\n".format(mail["From"], mail["Subject"], mail["Date"])
if mail.is_multipart():
print 'multipart'
for part in mail.walk():
ctype = part.get_content_type()
if ctype in ['image/jpeg', 'image/png']:
open(part.get_filename(), 'wb').write(part.get_payload(decode=True))

Actually using now-suggested email.EmailMessage API (don't confuse with old email.Message API) it is fairly easy to:
Iterate over all message elements and select only attachments
Iterate over just attachments
Let's assume that you have your message stored as byte content in envelope variable
Solution no.1:
import email
from email.message import EmailMessage
email_message: EmailMessage = email.message_from_bytes(envelope, _class=EmailMessage)
for email_message_part in email_message.walk():
if email_message.is_attachment():
# Do something with your attachment
Solution no.2: (preferable since you don't have to walk through other parts of your message object)
import email
from email.message import EmailMessage
email_message: EmailMessage = email.message_from_bytes(envelope, _class=EmailMessage)
for email_message_attachment in email_message.iter_attachments():
# Do something with your attachment
Couple things to note:
We explicitly tell to use new EmailMessage class in our byte read method through _class=EmailMessage parameter
You can read your email message (aka envelope) from sources such as bytes-like object, binary file object or string thanks to built-in methods in message.Parser API

Python 3 smtplib send with unicode characters

I'm having a problem emailing unicode characters using smtplib in Python 3. This fails in 3.1.1, but works in 2.5.4:
import smtplib
from email.mime.text import MIMEText
sender = to = 'ABC#DEF.com'
server = 'smtp.DEF.com'
msg = MIMEText('€10')
msg['Subject'] = 'Hello'
msg['From'] = sender
msg['To'] = to
s = smtplib.SMTP(server)
s.sendmail(sender, [to], msg.as_string())
s.quit()
I tried an example from the docs, which also failed. http://docs.python.org/3.1/library/email-examples.html, the Send the contents of a directory as a MIME message example
Any suggestions?

The key is in the docs:
class email.mime.text.MIMEText(_text, _subtype='plain', _charset='us-ascii')
A subclass of MIMENonMultipart, the
MIMEText class is used to create MIME
objects of major type text. _text is
the string for the payload. _subtype
is the minor type and defaults to
plain. _charset is the character set
of the text and is passed as a
parameter to the MIMENonMultipart
constructor; it defaults to us-ascii.
No guessing or encoding is performed
on the text data.
So what you need is clearly, not msg = MIMEText('€10'), but rather:
msg = MIMEText('€10'.encode('utf-8'), _charset='utf-8')
While not all that clearly documented, sendmail needs a byte-string, not a Unicode one (that's what the SMTP protocol specifies); look to what msg.as_string() looks like for each of the two ways of building it -- given the "no guessing or encoding", your way still has that euro character in there (and no way for sendmail to turn it into a bytestring), mine doesn't (and utf-8 is clearly specified throughout).

_charset parameter of MIMEText defaults to us-ascii according to the docs. Since € is not from us-ascii set it isn't working.
example in the docs that you've tried clearly states:
For this example, assume that the text file contains only ASCII characters.
You could use .get_charset method on your message to investigate the charset, there is incidentally .set_charset as well.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3 email body encoding - python

Related

Converting ASCII encoded characters inside string in Python

G mail API for Sending Mail with attachment is failing when recipients name is in multi byte character

Python email module: form header "From" with some unicode name + email

Getting mail attachment to python file object

Python 3 smtplib send with unicode characters

Categories

Resources