Python % character problem while manipulation (Escape char problem) - python

I try to send a confirmation email in Django but there is a problem with excape characters.
I have a helper function for content of mail as
def getActivationMailBody():
email_body = "<table width='100%'>
email_body = email_body + '<p>' + '%(confirmLink)s' + '</p>'
return email_body
And the confirmation url is embedded like
email_body = getActivationMailBody()
email_body = email_body % {'confirmLink': '%s/kullanici/onay/%s/%s'%(WEB_URL,md5.new(form.cleaned_data['email']).hexdigest()[:30], activation_key)}
msg = EmailMessage(email_subject, email_body, DEFAULT_FROM_EMAIL, [email_to])
msg.content_subtype="html"
res = msg.send(fail_silently=False)
However, while the confirmLink embedding I get an error as
unsupported format character ''' (0x27) at index 18
I found that the problem is caused by % character but I couldn't figure out how can I correct that.
Could you give me any suggestion ? Thanks

In a format string, a % can be escaped by doubling:
email_body = "<table width='100%%'>"
It's a little odd how you've constructed this, since getActivationEmailBody isn't returning the body of the email, but instead a format string to create the body. You might want to rename the function.

Related

Downloading emails with UTF-8 B encoded header

I have a problem with a code which is supposed to download your emails in eml files.
Its supposed to go through the INBOX email listing, retrieve the email content and attachments(if any) and create an .eml file which contains all that.
What it does is that it works with content type of text and the majority multiparts. If an email in the listing contains utf-8B in its header, it simply acts like its the end of the email listing, without displaying any error.
The code in question is:
result, data = p.uid('search',None, search_criteria) # search_criteria is defined earlier in code
if result == 'OK':
data = get_newer_emails_first(data) # get_newer_emails_first() is a function defined to return the list of UIDs in reverse order (newer first)
context['emailsum'] = len(data) # total amount of emails based on the search_criteria parameter.
for num in data:
mymail2 = {}
result,data1 = p.iud('fetch', num, '(RFC822)')
email_message = email.message_from_bytes(data[0][1])
fullemail = email_message.as_bytes()
default_charset = 'ASCII'
if email_message.is_multipart():
m_subject = make_header(decode_header(email_message['Subject']))
else:
m_subject = r''.join([ six.text_type(t[0], t[1] or default_charset) for t in email.header.decode_header(email_message['Subject']) ])
m_from = string(make_header(decode_header(email_message['From'])))
m_date = email_message['Date']
I have done my tests and discovered that while the fullemail variable contains the email properly (thus it reads the data from the actual email successfully), the problem should be in the if else immediately after, but I cannot find what the problem is exactly.
Any ideas?
PS: I accidentally posted this question as a guest, but I opted to delete it and repost it from my account.
Apparently the error lay in my code in the silliest of ways.
Instead of:
m_from = string(make_header(decode_header(email_message['From'])))
m_date = email_message['Date']
It should be:
m_from = str(make_header(decode_header(email_message['From'])))
m_date = str(make_header(decode_header(email_message['Date'])))

Facing problem to decode ?UTF-8?B?ZnVjayDwn5CO?=! type in subject. Using IMAP and Python

Need to get real string instead of that encoded string. Few subjects are proper in string format but few are in this encoded format, I don't know how to solve it.
How can I decode the string and print the decoded part of the subject?
FROM_EMAIL = "my_id#gmail.com"
FROM_PWD = "my Password"
SMTP_SERVER = "imap.gmail.com"
SMTP_PORT = 993
l=['Developer','Architect','NEED','Internship','Urgent']
def get_body(msg):
if msg.is_multipart():
return get_body(msg.get_payload(0))
else:
return msg.get_payload(None,True)
def readmail():
mail = imaplib.IMAP4_SSL(SMTP_SERVER)
mail.login(FROM_EMAIL,FROM_PWD)
mail.select('inbox')
type, data = mail.search(None, '(SINCE "20-May-2020" BEFORE "26-May-2020")')
mail_ids = data[0]
id_list = mail_ids.split()
id_list=id_list[::-1]
first_email_id = id_list[0]
latest_email_id = id_list[-1]
for byte_obj in id_list:
typ, data = mail.fetch(byte_obj, '(RFC822)' )
raw=email.message_from_bytes(data[0][1])
msg=get_body(raw)
s=''
s=raw['SUBJECT']
s1=raw['Date']
print(s)
readmail()
output:
Winner announcement! Amazon Kindle Oasis.
[FREE WEBINAR] Natural Language Processing for Beginners
Godrej 24 | Get Rs. 2 Lakh Gold Voucher | 2 & 3 BHK at Rs. 83 Lakh*
=?UTF-8?B?TGFzdCBkYXkgdG8gc2F2ZSEgUG9wdWxhciBjb3Vyc2VzIGFzIGw=?=
=?UTF-8?B?b3cgYXMg4oK5NDU1?=
Panda just uploaded a video
Vernix Gamerz just uploaded a video
Most of your question has been answered here:
Find, decode and replace all base64 values in text file
In order to better understand your example I have some additional information:
Part of your subject lines are encoded in the base64-Format.
Take the following part of your string s=raw['SUBJECT'] as example
=?UTF-8?B?TGFzdCBkYXkgdG8gc2F2ZSEgUG9wdWxhciBjb3Vyc2VzIGFzIGw=?=
=?UTF-8?B?b3cgYXMg4oK5NDU1?=
The structure is as follows:
First you have:
?UTF-8?B?
Then comes the encoded string:
TGFzdCBkYXkgdG8gc2F2ZSEgUG9wdWxhciBjb3Vyc2VzIGFzIGw
Followed by
=?
Converting the encoded string from base64 to UTF-8 gives you the text:
Last day to save! Popular courses as l
You can verify this under https://www.base64decode.org/

Python / json : Check content of keys that may or may not exist

For those familiar with imageboards, an OP post may or may not contain a 'subject' and a 'comment'
I wrote this to search all pages of a given board for thread subjects and OP posts.
If my search term exists on one of them but the other key is inexistent it will not get appended to my res list.
So how do I search json keys where 1 key or the other may not exist?
import urllib, json, HTMLParser
def s4Chan(board, search):
logo = '3::54chan'
res = []
p = HTMLParser.HTMLParser()
catalog = json.load(urllib.urlopen('https://api.4chan.org/%s/catalog.json' % board))
for i in catalog:
for j in i['threads']:
try:
if search.lower() in j['sub'].lower() or search.lower() in j['com'].lower():
subject = j['sub']
post = p.unescape(str(j['com'])).replace('<br>', ' ')
if len(post) > 300:
post = post[0:300]
post = post + '...'
text = str('%s /%s/ %s | %s | %s (R:%s, I:%s)' % (logo, board, subject, post, 'https://4chan.org/%s/res/%s' % (board, j['no']), j['replies'], j['images']))
res.append(text)
except(KeyError):
continue
return res
json.load returns objects as Python dictionaries. You can, for example, use the get method of dict:
if search.lower() in j.get('sub', '').lower() or search.lower() in j.get('com', '').lower():

Cant seem to find how to check for valid emails in App Engine

any one know where any docs might be about this?
So far I've only found this
http://code.google.com/appengine/articles/djangoforms.html
EmailProperty() only validates for empty strings... sigh
The following validates the email address on the server:
from google.appengine.api import mail
if not mail.is_email_valid(to_addr):
# Return an error message...
Hope that helps?
If you check the source for Google's mail function you'll see that mail.is_email_valid() only checks that the string is not None/empty.
From this site I found an RFC822 compliant Python email address validator.
import re
qtext = '[^\\x0d\\x22\\x5c\\x80-\\xff]'
dtext = '[^\\x0d\\x5b-\\x5d\\x80-\\xff]'
atom = '[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+'
quoted_pair = '\\x5c[\\x00-\\x7f]'
domain_literal = "\\x5b(?:%s|%s)*\\x5d" % (dtext, quoted_pair)
quoted_string = "\\x22(?:%s|%s)*\\x22" % (qtext, quoted_pair)
domain_ref = atom
sub_domain = "(?:%s|%s)" % (domain_ref, domain_literal)
word = "(?:%s|%s)" % (atom, quoted_string)
domain = "%s(?:\\x2e%s)*" % (sub_domain, sub_domain)
local_part = "%s(?:\\x2e%s)*" % (word, word)
addr_spec = "%s\\x40%s" % (local_part, domain)
email_address = re.compile('\A%s\Z' % addr_spec)
# How this is used:
def isValidEmailAddress(email):
if email_address.match(email):
return True
else:
return False
* If you use this please use this version as it contains the name and so on of the person whom created it.

Get python getaddresses() to decode encoded-word encoding

msg = \
"""To: =?ISO-8859-1?Q?Caren_K=F8lter?= <ck#example.dk>, bob#example.com
Cc: "James =?ISO-8859-1?Q?K=F8lter?=" <jk#example.dk>
Subject: hello
message body blah blah blah
"""
import email.parser, email.utils
import itertools
parser = email.parser.Parser()
parsed_message = parser.parsestr(msg)
address_fields = ('to', 'cc')
addresses = itertools.chain(*(parsed_message.get_all(field) for field in address_fields if parsed_message.has_key(field)))
address_list = set(email.utils.getaddresses(addresses))
print address_list
It seems like email.utils.getaddresses() doesn't seem to automatically handle MIME RFC 2047 in address fields.
How can I get the expected result below?
actual result:
set([('', 'bob#example.com'), ('=?ISO-8859-1?Q?Caren_K=F8lter?=', 'ck#example.dk'), ('James =?ISO-8859-1?Q?K=F8lter?=', 'jk#example.dk')])
desired result:
set([('', 'bob#example.com'), (u'Caren_K\xf8lter', 'ck#example.dk'), (u'James \xf8lter', 'jk#example.dk')])
The function you want is email.header.decode_header, which returns a list of (decoded_string, charset) pairs. It's up to you to further decode them according to charset and join them back together again before passing them to email.utils.getaddresses or wherever.
You might think that this would be straightforward:
def decode_rfc2047_header(h):
return ' '.join(s.decode(charset or 'ascii')
for s, charset in email.header.decode_header(h))
But since message headers typically come from untrusted sources, you have to handle (1) badly encoded data; and (2) bogus character set names. So you might do something like this:
def decode_safely(s, charset='ascii'):
"""Return s decoded according to charset, but do so safely."""
try:
return s.decode(charset or 'ascii', 'replace')
except LookupError: # bogus charset
return s.decode('ascii', 'replace')
def decode_rfc2047_header(h):
return ' '.join(decode_safely(s, charset)
for s, charset in email.header.decode_header(h))
Yeah, the email package interface really isn't very helpful a lot of the time.
Here, you have to use email.header.decode_header manually on each address, and then, since that gives you a list of decoded tokens, you have to stitch them back together again manually:
for name, address in email.utils.getaddresses(addresses):
name= u' '.join(
unicode(b, e or 'ascii') for b, e in email.header.decode_header(name)
)
...
Thank you Gareth Rees.Your answer was helpful in solving a problem case:
Input: 'application/octet-stream;\r\n\tname="=?utf-8?B?KFVTTXMpX0FSTE8uanBn?="'
The absence of whitespace around the encoded-word caused email.Header.decode_header to overlook it. I'm too new to this to know if I've only made things worse, but this kludge, along with joining with a '' instead of ' ', fixed it:
if not ' =?' in h:
h = h.replace('=?', ' =?').replace('?=', '?= ')
Output: u'application/octet-stream; name="(USMs)_ARLO.jpg"'

Categories