App Engine Datastore storing email in wrong format - python

My app receives incoming email and saves the sender's email address into the datastore. Problem is it saves them in this format:
John Smith <jsmith#email.com>
Because of the way my app and it's queries are set up I can only search for addresses in this format: jsmith#email.com
I have tried to use regex to format the messages but it doesn't work for some reason, I get list index out of range errors on every expression I try. here is the code for my mail handler if that is helpful.
I have checked the logs in App Engine and incoming messages do come in in the format I need, but then I check the datastore entities and it adds it in with the name as well.
I just need to know how I can get email addresses stored without the extra bits.
import webapp2
import logging
from google.appengine.ext.webapp import mail_handlers
from google.appengine.api import mail
import os
from main import WorkRequest
import re
class IncomingMailHandler(mail_handlers.InboundMailHandler):
def receive(self, message):
(encoding, payload) = list(message.bodies(content_type='text/plain'))[0]
body_text = payload.decode()
logging.info('Received email message from %s, subject "%s": %s' %
(message.sender, message.subject, body_text))
logging.info (message.sender)
logging.info(message.subject)
logging.info(body_text)
sender = str(message.sender)
logging.info(sender)
email_address = re.findall('<([^>])>', sender)[0]
wr = WorkRequest()
wr.email = email_address
wr.userId = None
wr.title = message.subject
wr.content = body_text
wr.status = "OPEN"
wr.submission_type = "EMAIL"
wr.assigned_to = "UNASSIGNED"
wr.put()
application = webapp2.WSGIApplication([('/_ah/mail/.+', IncomingMailHandler)],debug=True)

Unless something got mangled when your code fragement was posted, that regex is very unlikely to match. Try
email_address = re.findall('<(.*?)>', sender)[0]
That will handle that one particular form of address.

Alternatively, instead of regex:
email = email.split('<')[1].split('>')[0]

Use the parseaddr function from the standard library's email package.
>>> from email.utils import parseaddr
>>> sender = 'John Smith <jsmith#email.com>'
>>> name, address = parseaddr(sender)
>>> print name
John Smith
>>> print address
jsmith#email.com
From the docs:
Parse address – which should be the value of some address-containing
field such as To or Cc – into its constituent realname and email
address parts. Returns a tuple of that information, unless the parse
fails, in which case a 2-tuple of ('', '') is returned.

Related

App Engine - Check if incoming email is from Google group member

I'm trying to restrict incoming emails to my app so it only accepts mail from members of a google group. More specifically, I want to only add the contents of the email to my datastore if they are part of the group. I've found the hasMember/IsMember method here: https://developers.google.com/admin-sdk/directory/v1/reference/members/hasMember and think this may be what I am looking for, but I do not know how to use it as they haven't provided an example and I'm very new to this.
Would this be the correct API to use for this? Here is my incoming mail handler code, I have added the IF statement comment to show what I would like to do:
import webapp2
import logging
from google.appengine.ext.webapp import mail_handlers
from google.appengine.api import mail
import os
from main import WorkRequest
import re
class IncomingMailHandler(mail_handlers.InboundMailHandler):
def receive(self, message):
(encoding, payload) = list(message.bodies(content_type='text/plain'))[0]
body_text = payload.decode()
logging.info('Received email message from %s, subject "%s": %s' %
(message.sender, message.subject, body_text))
logging.info (message.sender)
logging.info(message.subject)
logging.info(body_text)
#IF MESSAGE_SENDER == MEMBER OF GOOGLE GROUP:
wr = WorkRequest()
wr.email = message.sender
wr.userId = None
wr.title = message.subject
wr.content = body_text
wr.status = "OPEN"
wr.submission_type = "EMAIL"
wr.assigned_to = "UNASSIGNED"
wr.put()
application = webapp2.WSGIApplication([('/_ah/mail/.+', IncomingMailHandler)],debug=True)

Python IMAP - Read Gmail with '+' in email address

I've previously used imaplib in Python 3to extract emails from gmail. However I would want to generate a script to differentiate emails to the same address with different strings after a plus sign. For example, the base email address can be:
example#gmail.com
Then I would want to separately read all emails with the addresses:
example+test1#gmail.com,
example+test2#gmail.com,
example#gmail.com.
Therefore I would wind up with a dictionary of lists containing the specific emails. This only works for example#gmail.com. For example:
{'example':[],
'example_test':[],
'example_test2':[]}
Currently I can retrieve the emails that I need with this function from a class:
def get_emails(self):
"""Retrieve emails"""
self.M = imaplib.IMAP4_SSL(self.server)
self.M.login(self.emailaddress,self.password)
self.M.select(readonly=1)
self.M.select('INBOX', readonly=True)
#Yesterdays date
date = (datetime.date.today() - datetime.timedelta(self.daysback)).strftime("%d-%b-%Y")
print("Selecting email messages since %s" % date)
#Retrieve all emails from yesterday on
result,data = self.M.uid('search', None, '(SENTSINCE {date})'.format(date=date))
return result,data
You should directly use the exact mail address you want in the IMAP search request. For example it could be something like :
result,data = self.M.uid('search', None, '(SENTSINCE {date})'.format(date=date),
('TO example+test1#gmail.com'))

Python 3 Reciving email problems

I'm writing a script to receive emails from my gmail email in python. I'm managing to download the raw email however I am then unable to access certain types of it, E.G BODY, TO, FROM etc.
import imaplib, email
msrvr = imaplib.IMAP4_SSL('imap.gmail.com', 993)
unm = 'stackoverflow#gmail.com'
pwd = 'lovetocode'
msrvr.login(unm,pwd)
stat,cnt = msrvr.select('Inbox')
stat, dta = msrvr.fetch(cnt[0], '(RFC822)')
b = email.message_from_string(str(dta))
print(b)
print(b['[To]'])
msrvr.close()
msrvr.logout()
Where am I going wrong?
You might find it easier to use native Python Google SDK's for working with their email:
https://developers.google.com/appengine/docs/python/mail/
The imaplib module you are using is will only give you a subset of all gmail features..
Here's some code that parses an email and prints some header fields:
msg = email.message_from_string(raw_email)
for field in ('From', 'Subject', 'Received', 'Message-ID'):
print '{0}: {1}'.format(field, msg[field])
For debugging, also print the raw parts of the Message object:
print msg.__dict__
(Note: I'm using Python2.7, but I believe there's not much difference.)

Python email module: form header "From" with some unicode name + email

I'm generating email with the help of Python email module.
Here are few lines of code, which demonstrates my question:
msg = email.MIMEMultipart.MIMEMultipart('alternative')
msg['From'] = "somemail#somedomain.com"
msg.as_string()
Out[7]: 'Content-Type: multipart/alternative;\n boundary="===============9006870443159801881=="\nMIME-Version: 1.0\nFrom: somemail#somedomain.com\n\n--===============9006870443159801881==\n\n--===============9006870443159801881==--'
As you can see, everything is okay here, From field contains email ant it is cool. But what if I want to add some name before email? Especially unicode one:
In [8]: u.get_full_name()
Out[8]: u'\u0414\u0438\u043c\u0430 \u0426\u0443\u043a\u0430\u043d\u043e\u0432'
In [9]: msg = email.MIMEMultipart.MIMEMultipart('alternative')
In [10]: msg['From'] = "%s <%s>" % (u.get_full_name(), "email#at.com")
In [11]: msg.as_string()
Out[11]: 'Content-Type: multipart/alternative;\n boundary="===============5792069034892928634=="\nMIME-Version: 1.0\nFrom: =?utf-8?b?0JTQuNC80LAg0KbRg9C60LDQvdC+0LIgPGVtYWlsQGF0LmNvbT4=?=\n\n--===============5792069034892928634==\n\n--===============5792069034892928634==--'
Here you can see, that all the string (name, email) was encoded in base64 (and it is even quite logical, how MIMEMultipart will know that string contains unicode and non-unicode parts).
So, my question is: how do I have to tell email module to make me pretty "From" header like:
From: =?UTF-8?B?0JLQmtC+0L3RgtCw0LrRgtC1?= <admin#notify.vk.com> ?
Also, I've learned a little RFC2822 (http://www.faqs.org/rfcs/rfc2822.html , p.3.6.2). It tells:
The originator fields indicate the mailbox(es) of the source of the
message. The "From:" field specifies the author(s) of the message,
that is, the mailbox(es) of the person(s) or system(s) responsible
for the writing of the message. The "Sender:" field specifies the
mailbox of the agent responsible for the actual transmission of the
message. For example, if a secretary were to send a message for
another person, the mailbox of the secretary would appear in the
"Sender:" field and the mailbox of the actual author would appear in
the "From:" field. If the originator of the message can be indicated
by a single mailbox and the author and transmitter are identical, the
"Sender:" field SHOULD NOT be used. Otherwise, both fields SHOULD
appear.
Does it mean that I should combine these two headers? (From and Sender). I'm a bit confused, because I noticed a lot of emails in my gmail (looking through "Show original") where in From field name and email are presented.
Thanks for help.
You need to encode the name part separately using email.header.Header:
from email.MIMEMultipart import MIMEMultipart
from email.header import Header
from email.utils import formataddr
author = formataddr((str(Header(u'Alał', 'utf-8')), "somemail#somedomain.com"))
msg = MIMEMultipart('alternative')
msg['From'] = author
print msg
I hope this will help.

Get sender email address with Python IMAP

I have this python IMAP script, but my problem is that, every time I want to get the sender's email address, (From), I always get the sender's first name followed by their email address:
Example:
Souleiman Benhida <souleb#gmail.com>
How can i just extract the email address (souleb#gmail.com)
I did this before, in PHP:
$headerinfo = imap_headerinfo($connection, $count)
or die("Couldn't get header for message " . $count . " : " . imap_last_error());
$from = $headerinfo->fromaddress;
But, in python I can only get the full name w/address, how can I get the address alone? I currently use this:
typ, data = M.fetch(num, '(RFC822)')
mail = email.message_from_string(data[0][1])
headers = HeaderParser().parsestr(data[0][1])
message = parse_message(mail) #body
org = headers['From']
Thanks!
Just one more step, using email.utils:
email.utils.parseaddr(address)
Parse address – which should be the value of some address-containing field such as To or Cc – into its constituent realname and email address parts. Returns a tuple of that information, unless the parse fails, in which case a 2-tuple of ('', '') is returned.
Note: originally referenced rfc822, which is now deprecated.
to = email.utils.parseaddr(msg['cc'])
This works for me.
My external lib https://github.com/ikvk/imap_tools
let you work with mail instead read IMAP specifications.
from imap_tools import MailBox, A
# get all emails from INBOX folder
with MailBox('imap.mail.com').login('test#mail.com', 'pwd', 'INBOX') as mailbox:
for msg in mailbox.fetch(A(all=True)):
print(msg.date, msg.from_, msg.to, len(msg.text or msg.html))
msg.from_, msg.to - parsed addresses, like: 'Sender#ya.ru'
I didn't like the existing solutions so I decided to make a sister library for my email sender called Red Box.
Here is how to search and process emails including getting the from address:
from redbox import EmailBox
# Create email box instance
box = EmailBox(
host="imap.example.com",
port=993,
username="me#example.com",
password="<PASSWORD>"
)
# Select an email folder
inbox = box["INBOX"]
# Search and process messages
for msg in inbox.search(unseen=True):
# Process the message
print(msg.from_)
print(msg.to)
print(msg.subject)
print(msg.text_body)
print(msg.html_body)
# Flag the email as read/seen
msg.read()
I also wrote extensive documentation for it. It also has query language that fully supports nested logical operations.

Categories