Python 3 smtplib send with unicode characters - python

I'm having a problem emailing unicode characters using smtplib in Python 3. This fails in 3.1.1, but works in 2.5.4:
import smtplib
from email.mime.text import MIMEText
sender = to = 'ABC#DEF.com'
server = 'smtp.DEF.com'
msg = MIMEText('€10')
msg['Subject'] = 'Hello'
msg['From'] = sender
msg['To'] = to
s = smtplib.SMTP(server)
s.sendmail(sender, [to], msg.as_string())
s.quit()
I tried an example from the docs, which also failed. http://docs.python.org/3.1/library/email-examples.html, the Send the contents of a directory as a MIME message example
Any suggestions?

The key is in the docs:
class email.mime.text.MIMEText(_text, _subtype='plain', _charset='us-ascii')
A subclass of MIMENonMultipart, the
MIMEText class is used to create MIME
objects of major type text. _text is
the string for the payload. _subtype
is the minor type and defaults to
plain. _charset is the character set
of the text and is passed as a
parameter to the MIMENonMultipart
constructor; it defaults to us-ascii.
No guessing or encoding is performed
on the text data.
So what you need is clearly, not msg = MIMEText('€10'), but rather:
msg = MIMEText('€10'.encode('utf-8'), _charset='utf-8')
While not all that clearly documented, sendmail needs a byte-string, not a Unicode one (that's what the SMTP protocol specifies); look to what msg.as_string() looks like for each of the two ways of building it -- given the "no guessing or encoding", your way still has that euro character in there (and no way for sendmail to turn it into a bytestring), mine doesn't (and utf-8 is clearly specified throughout).

_charset parameter of MIMEText defaults to us-ascii according to the docs. Since € is not from us-ascii set it isn't working.
example in the docs that you've tried clearly states:
For this example, assume that the text file contains only ASCII characters.
You could use .get_charset method on your message to investigate the charset, there is incidentally .set_charset as well.

Related

python3 email message to disable base64 and remove MIME-Version

from email.message import EmailMessage
from email.headerregistry import Address
msg = EmailMessage()
msg['From'] = Address("Pepé Le Pew", "pepe", "example.com")
msg['To'] = (
Address("Penelope Pussycat", "penelope", "example.com")
, Address("Fabrette Pussycat", "fabrette", "example.com")
)
msg['Subject'] = 'This email sent from Python code'
msg.set_content("""\
Salut!
Cela ressemble à un excellent recipie[1] déjeuner.
[1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718
--Pepé
""")
print(msg)
The above code produces an email message that uses base64 encoding. How to disable it? How to remove the field of MIME-Version?
Will the encoding of "Pepé" be correctly interpreted by the recipient? If not, what is the correct way to ensure its encoding is interpreted by the recipient correctly?
From: Pepé Le Pew <pepe#example.com>
To: Penelope Pussycat <penelope#example.com>,
Fabrette Pussycat <fabrette#example.com>
Subject: This email sent from Python code
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
CQlTYWx1dCEKCgkJQ2VsYSByZXNzZW1ibGUgw6AgdW4gZXhjZWxsZW50IHJlY2lwaWVbMV0gZMOp
amV1bmVyLgoKCQlbMV0gaHR0cDovL3d3dy55dW1tbHkuY29tL3JlY2lwZS9Sb2FzdGVkLUFzcGFy
YWd1cy1FcGljdXJpb3VzLTIwMzcxOAoKCQktLVBlcMOpCgkJCg==
You absolutely must not remove the MIME-Version: header; it's what identifies this as a MIME message.
The From: header should indeed be RFC2047-encoded, and the documentation suggests that it will be "when the message is serialized". When you print(msg) you are not properly serializing it; you want print(msg.as_string()) which does exhibit the required serialization.
When it comes to the transfer encoding, Python's email library has an unattractive penchant for using base64 for content which could very well be encoded as quoted-printable instead. You can't really reliably send the content completely unencoded (though if you wanted to, the MIME 8bit or binary encodings would be able to accommodate that; but for backwards compatibility, SMTP requires everything to be encoded into a 7-bit representation).
In the old email library, various shenanigans were required to do this, but in the new EmailMessage API introduced in Python 3.6, you really only have to add cte='quoted-printable' to the set_content call.
from email.message import EmailMessage
from email.headerregistry import Address
msg = EmailMessage()
msg['From'] = Address("Pepé Le Pew", "pepe", "example.com")
msg['To'] = (
Address("Penelope Pussycat", "penelope", "example.com")
, Address("Fabrette Pussycat", "fabrette", "example.com")
)
msg['Subject'] = 'This email sent from Python code'
msg.set_content("""\
Salut!
Cela ressemble à un excellent recipie[1] déjeuner.
[1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718
--Pepé
""", cte="quoted-printable") # <-- notice added parameter
print(msg.as_string()) # <-- properly serialize
Unfortunately, figuring this out from the documentation is next to impossible. The documentation for set_content basically just defers to the policy which obscurely points to the raw_data_manager (if you even notice the link) ... where finally you hopefully notice the presence of the cte keyword argument.
Demo: https://ideone.com/eLAt11
(As an aside, you might also want to
replace('\n ', '\n')
in the body text.)
If you go for an 8bit or binary content transfer encoding, the difference between them is that the former has a line length limit (max 900-something characters) whereas the latter is completely unconstrained. But you need to be sure that the entire SMTP transfer path is 8-bit clean (at which point you might as well pivot to Unicode email / ESMTP SMTPUTF8 entirely).
For your possible entertainment, here are some old questions with crazy hacks for Python 3.5 and earlier.

Python 3 email body encoding

I am working on setting up a script that forwards incoming mail to a list of recipients.
Here's what I have now:
I read the email from stdin (that's how postfix passes it):
email_in = sys.stdin.read()
incoming = Parser().parse(email_in)
sender = incoming['from']
this_address = incoming['to']
I test for multipart:
if incoming.is_multipart():
for payload in incoming.get_payload():
# if payload.is_multipart(): ...
body = payload.get_payload()
else:
body = incoming.get_payload(decode=True)`
I set up the outgoing message:
msg = MIMEMultipart()
msg['Subject'] = incoming['subject']
msg['From'] = this_address
msg['reply-to'] = sender
msg['To'] = "foo#bar.com"
msg.attach(MIMEText(body.encode('utf-8'), 'html', _charset='UTF-8'))
s = smtplib.SMTP('localhost')
s.send_message(msg)
s.quit()
This works pretty well with ASCII characters (English text), forwards it and all.
When I send non-ascii characters though, it gives back gibberish (depending on email client bytes or ascii representations of the utf-8 chars)
What can be the problem? Is it on the incoming or the outgoing side?
The problem is that many email clients (including Gmail) send non-ascii emails in base64. stdin on the other hand passes everything into a string. If you parse that with Parser.parse(), it returns a string type with base64 inside.
Instead the optional decode argument should be used on the get_payload() method. When that is set, the method returns a bytes type. After that you can use the builtin decode() method to get utf-8 string like so:
body = payload.get_payload(decode=True)
body = body.decode('utf-8')
There is great insight into utf-8 and python in Ned Batchelder's talk.
My final code works a bit differently, you can check that, too here.

Python: Attaching MIME encoded text file

After a bunch of fiddling, I finally hit upon the magical sequence to attach a text file to an email (many thanks to previous posts on this service).
I'm left wondering what the lines:
attachment.add_header('Content-Disposition'. . .)
--and--
e_msg = MIMEMultipart('alternative')
actually do.
Can someone unsilence the Mimes for me please (sorry couldn't resist)
import smtplib
from email import Encoders
from email.message import Message
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
smtp_server = "1.2.3.4"
smtp_login = "account"
smpt_password = "password"
server = smtplib.SMTP(smtp_server)
server.login(smtp_login,smtp_password)
f = file("filename.csv")
attachment = MIMEText(f.read())
attachment.add_header('Content-Disposition', 'attachment', filename="filename.csv")
e_msg = MIMEMultipart('alternative')
e_msg.attach(attachment)
e_msg['Subject'] = 'Domestic Toll Monitor'
e_msg['From'] = smtp_account
body = 'Some nifty text goes here'
content = MIMEText(body)
e_msg.attach(content)
server.sendmail(smtp_from, smtp_to, e_msg.as_string())
Basically, MIME is the specification defining email structure. The Multipart structure is designed to allow for multiple types of messages and attachments to be sent within the same message. For example, an email might have a plain text version for backwards compatibility and a rich text or html formatted message for modern clients. Attachments count as a "part", and thus require their own header. In this case, you're adding a "Content-Disposition" type header for the attachment. If you're really interested in what that means, you can read the specification here. As for the "Alternative portion, you're setting the message to multipart and defining the types of parts that you have attached and how the client needs to handle them. There are some standard presets defining various scenarios, but Alternative is something of a wildcard, used when there is a part whose type might not be recognized or handled by most clients. For the record, I believe you also could have used a "Mixed" type. The nice thing about MIME is that while it is complicated, its thoroughly defined and its very easy to look up the specification.

Python email module: form header "From" with some unicode name + email

I'm generating email with the help of Python email module.
Here are few lines of code, which demonstrates my question:
msg = email.MIMEMultipart.MIMEMultipart('alternative')
msg['From'] = "somemail#somedomain.com"
msg.as_string()
Out[7]: 'Content-Type: multipart/alternative;\n boundary="===============9006870443159801881=="\nMIME-Version: 1.0\nFrom: somemail#somedomain.com\n\n--===============9006870443159801881==\n\n--===============9006870443159801881==--'
As you can see, everything is okay here, From field contains email ant it is cool. But what if I want to add some name before email? Especially unicode one:
In [8]: u.get_full_name()
Out[8]: u'\u0414\u0438\u043c\u0430 \u0426\u0443\u043a\u0430\u043d\u043e\u0432'
In [9]: msg = email.MIMEMultipart.MIMEMultipart('alternative')
In [10]: msg['From'] = "%s <%s>" % (u.get_full_name(), "email#at.com")
In [11]: msg.as_string()
Out[11]: 'Content-Type: multipart/alternative;\n boundary="===============5792069034892928634=="\nMIME-Version: 1.0\nFrom: =?utf-8?b?0JTQuNC80LAg0KbRg9C60LDQvdC+0LIgPGVtYWlsQGF0LmNvbT4=?=\n\n--===============5792069034892928634==\n\n--===============5792069034892928634==--'
Here you can see, that all the string (name, email) was encoded in base64 (and it is even quite logical, how MIMEMultipart will know that string contains unicode and non-unicode parts).
So, my question is: how do I have to tell email module to make me pretty "From" header like:
From: =?UTF-8?B?0JLQmtC+0L3RgtCw0LrRgtC1?= <admin#notify.vk.com> ?
Also, I've learned a little RFC2822 (http://www.faqs.org/rfcs/rfc2822.html , p.3.6.2). It tells:
The originator fields indicate the mailbox(es) of the source of the
message. The "From:" field specifies the author(s) of the message,
that is, the mailbox(es) of the person(s) or system(s) responsible
for the writing of the message. The "Sender:" field specifies the
mailbox of the agent responsible for the actual transmission of the
message. For example, if a secretary were to send a message for
another person, the mailbox of the secretary would appear in the
"Sender:" field and the mailbox of the actual author would appear in
the "From:" field. If the originator of the message can be indicated
by a single mailbox and the author and transmitter are identical, the
"Sender:" field SHOULD NOT be used. Otherwise, both fields SHOULD
appear.
Does it mean that I should combine these two headers? (From and Sender). I'm a bit confused, because I noticed a lot of emails in my gmail (looking through "Show original") where in From field name and email are presented.
Thanks for help.
You need to encode the name part separately using email.header.Header:
from email.MIMEMultipart import MIMEMultipart
from email.header import Header
from email.utils import formataddr
author = formataddr((str(Header(u'Alał', 'utf-8')), "somemail#somedomain.com"))
msg = MIMEMultipart('alternative')
msg['From'] = author
print msg
I hope this will help.

python - parse email - special characters [duplicate]

I am displaying new email with IMAP, and everything looks fine, except for one message subject shows as:
=?utf-8?Q?Subject?=
How can I fix it?
In MIME terminology, those encoded chunks are called encoded-words. You can decode them like this:
import email.header
text, encoding = email.header.decode_header('=?utf-8?Q?Subject?=')[0]
Check out the docs for email.header for more details.
This is a MIME encoded-word. You can parse it with email.header:
import email.header
def decode_mime_words(s):
return u''.join(
word.decode(encoding or 'utf8') if isinstance(word, bytes) else word
for word, encoding in email.header.decode_header(s))
print(decode_mime_words(u'=?utf-8?Q?Subject=c3=a4?=X=?utf-8?Q?=c3=bc?='))
The text is encoded as a MIME encoded-word. This is a mechanism defined in RFC2047 for encoding headers that contain non-ASCII text such that the encoded output contains only ASCII characters.
In Python 3.3+, the parsing classes and functions in email.parser automatically decode "encoded words" in headers if their policy argument is set to policy.default
>>> import email
>>> from email import policy
>>> msg = email.message_from_file(open('message.txt'), policy=policy.default)
>>> msg['from']
'Pepé Le Pew <pepe#example.com>'
The parsing classes and functions are:
email.parser.BytesParser
email.parser.Parser
email.message_from_bytes
email.message_from_binary_file
email.message_from_string
email.message_from_file
Confusingly, up to at least Python 3.10, the default policy for these parsing functions is not policy.default, but policy.compat32, which does not decode "encoded words".
>>> msg = email.message_from_file(open('message.txt'))
>>> msg['from']
'=?utf-8?q?Pep=C3=A9?= Le Pew <pepe#example.com>'
Try Imbox
Because imaplib is a very excessive low level library and returns results which are hard to work with
Installation
pip install imbox
Usage
from imbox import Imbox
with Imbox('imap.gmail.com',
username='username',
password='password',
ssl=True,
ssl_context=None,
starttls=False) as imbox:
all_inbox_messages = imbox.messages()
for uid, message in all_inbox_messages:
message.subject
In Python 3, decoding this to an approximated string is as easy as:
from email.header import decode_header, make_header
decoded = str(make_header(decode_header("=?utf-8?Q?Subject?=")))
See the documentation of decode_header and make_header.
High level IMAP lib may be useful here: imap_tools
from imap_tools import MailBox, AND
# get list of email subjects from INBOX folder
with MailBox('imap.mail.com').login('test#mail.com', 'pwd', 'INBOX') as mailbox:
subjects = [msg.subject for msg in mailbox.fetch()]
Parsed email message attributes
Query builder for searching emails
Actions with emails: copy, delete, flag, move, seen
Actions with folders: list, set, get, create, exists, rename, delete, status
No dependencies

Categories