How do I know the "From" of an email? (Using IMAP) - python

In this case I am downloading plain text from an email with a criteria,
but how do I know the #gmail.com adress who sent it.
I am using Python 3.5.4
import imaplib
import email
mail = imaplib.IMAP4_SSL('imap.gmail.com')
#imaplib module implements connection based on IMAPv4 protocol
mail.login('myemail', 'mypassword')
mail.list() # Lists all labels in GMail
mail.select('inbox') # Connected to inbox.
result, data = mail.uid('search', None, '(HEADER Subject "[News]")')
#search and return uids instead
i = len(data[0].split()) # data[0] is a space separate string
for x in range(i):
latest_email_uid = data[0].split()[x] # unique ids wrt label selected
result, email_data = mail.uid('fetch', latest_email_uid, '(RFC822)')
# fetch the email body (RFC822) for the given ID
raw_email = email_data[0][1]
#From = email.utils.parseaddr(email_data['From'])
#continue inside the same for loop as above
raw_email_string = raw_email.decode('utf-8')
# converts byte literal to string removing b''
email_message = email.message_from_string(raw_email_string)
#this will loop through all the available multiparts in mail
for part in email_message.walk():
if part.get_content_type() == "text/plain": # ignore attachments/html
enter code here`body = part.get_payload(decode=True)
save_string = str("Llave de amigo" + str(x) + str("a"))
# location on disk
myfile = open(save_string, 'a')
myfile.write(body.decode('utf-8'))
# body is again a byte literal
myfile.close()

This is perhaps not obvious from the documentation (assuming Python 2.7), but the email_message object acts like a dict, by implementing the __getitem__ function. Since you fetched and parsed the entirety of the message, you should be able to access it simply as:
email_message['from']
Note, this gives you a raw representation of the header, which is probably okay in a lot of cases.
You may then want to use email.utils.parseaddr to break it into constituent parts:
realname, addr = email.utils.parseaddr(email_message['from')).
email.utils.getaddresses might be useful if you then parse To or Cc headers with more than one recipient.
If you need to deal with internationalized headers in older versions of Python, email.header.decode_header and email.header.make_header can be used.
In Python3.6, this has changed significantly, and should be more straightforward.

Related

Python decoding message object of email but got 3 "From" headers

Problem
I'm trying to print email metadata using python imap.lib. The message object I got from imap4.fectch is supposed to contain proper header information, but one specific email returns 3 "From" header instead of 1.
What I've tried
I printed out results of array = decode_header(msg.get("From")), for normal emails it returns something like this [('"Agnes Lee (ADVS)" <agnes.lee#xxx.com>', None)], but one email would return 3 "From" headers in byte format [(b'"', None), (b'\xe5\x8c\x97\xe9\x83\xa8\xe5\x8c\xba', 'utf-8'), (b'" <datacollection.xxx#xxxx.com>', None)].
Apparently array[2] instead of array[0] in this case contains the email address I want, but it's in byte format and no charset is provided to decode. I wonder why this is the case for some mails that contains 3 From.
My code
M = imaplib.IMAP4_SSL(SERVER, 993)
M.login(EMAIL, PASSWORD)
rsp, data = M.select('INBOX')
data = int(data[0])
res, msg = M.fetch(data, '(RFC822)') #fetch last email recieved
for response in msg:
if isinstance(response, tuple):
msg = email.message_from_bytes(response[1])
# decode the email subject
From, encoding = decode_header(msg.get("From"))[0]
if isinstance(From, bytes):
From = From.decode(encoding)
print(From)

python decode email from base64

hello iam using python script to fetch a message from a specific address mail seems everything work fine but i have a problem with the printable result is a base64 code.
i want to decode the result to get the decode message when do the final result with print, pls help!!
already thanks
the code used.
# Importing libraries
import imaplib, email
user = 'USER_EMAIL_ADDRESS'
password = 'USER_PASSWORD'
imap_url = 'imap.gmail.com'
# Function to get email content part i.e its body part
def get_body(msg):
if msg.is_multipart():
return get_body(msg.get_payload(0))
else:
return msg.get_payload(None, True)
# Function to search for a key value pair
def search(key, value, con):
result, data = con.search(None, key, '"{}"'.format(value))
return data
# Function to get the list of emails under this label
def get_emails(result_bytes):
msgs = [] # all the email data are pushed inside an array
for num in result_bytes[0].split():
typ, data = con.fetch(num, 'BODY.PEEK[1]')
msgs.append(data)
return msgs
# this is done to make SSL connnection with GMAIL
con = imaplib.IMAP4_SSL(imap_url)
# logging the user in
con.login(user, password)
# calling function to check for email under this label
con.select('Inbox')
# fetching emails from this user "tu**h*****1#gmail.com"
msgs = get_emails(search('FROM', 'MY_ANOTHER_GMAIL_ADDRESS', con))
# Uncomment this to see what actually comes as data
# print(msgs)
# Finding the required content from our msgs
# User can make custom changes in this part to
# fetch the required content he / she needs
# printing them by the order they are displayed in your gmail
for msg in msgs[::-1]:
for sent in msg:
if type(sent) is tuple:
# encoding set as utf-8
content = str(sent[1], 'utf-8')
data = str(content)
# Handling errors related to unicodenecode
try:
indexstart = data.find("ltr")
data2 = data[indexstart + 5: len(data)]
indexend = data2.find("</div>")
# printtng the required content which we need
# to extract from our email i.e our body
print(data2[0: indexend])
except UnicodeEncodeError as e:
pass
THE RESULT PRINTED
'''
aGVsbG8gd29yZCBpYW0gdGhlIG1lc3NhZ2UgZnJvbSBnbWFpbA==
'''
You could just use the base64 module to decode base64 encoded strings:
import base64
your_string="aGVsbG8gV29ybGQ==" # the base64 encoded string you need to decode
result = base64.b64decode(your_string.encode("utf8")).decode("utf8")
print(result)
Edit: encoding changed from ASCII to utf-8
If you need to find all encoded places (can be Subject, From, To email addresses with names), the code below might be useful. Given contentData is the entire email,
import re, base64
encodedParts=re.findall('(=\?(.+)\?B\?(.+)\?=)', contentData)
for part in encodedParts:
encodedPart = part[0]
charset = part[1]
encodedContent = part[2]
contentData = contentData.replace(encodedPart, base64.b64decode(encodedContent).decode(charset))

IMAP message gets UnicodeDecodeError 'utf-8' codec can't decode

After 5 hours of trying, time to get some help. Sifted through all the stackoverflow questions related to this but couldn't find the answer.
The code is a gmail parser - works for most emails but some emails cause the UnicodeDecodeError. The problem is "raw_email.decode('utf-8')" but changing it (see comments) causes a different problem down below.
# Source: https://stackoverflow.com/questions/7314942/python-imaplib-to-get-gmail-inbox-subjects-titles-and-sender-name
import datetime
import time
import email
import imaplib
import mailbox
from vars import *
import re # to remove links from str
import string
EMAIL_ACCOUNT = 'gmail_login'
PASSWORD = 'gmail_psswd'
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(EMAIL_ACCOUNT, PASSWORD)
mail.list()
mail.select('inbox')
result, data = mail.uid('search', None, "ALL") # (ALL/UNSEEN)
id_list = data[0].split()
email_rev = reversed(id_list) # Returns a type list.reverseiterator, which is not list
email_list = list(email_rev)
i = len(email_list)
todays_date = time.strftime("%m/%d/%Y")
for x in range(i):
latest_email_uid = email_list[x]
result, email_data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = email_data[0][1] # Returns a byte
raw_email_str = raw_email.decode('utf-8') # Returns a str
#raw_email_str = base64.b64decode(raw_email_str1) # Tried this but didn't work.
#raw_email_str = raw_email.decode('utf-8', errors='ignore') # Tried this but caused a TypeError down where var subject is created because something there is expecting a str or byte-like
email_message = email.message_from_string(raw_email_str)
date_tuple = email.utils.parsedate_tz(email_message['Date'])
date_short = f'{date_tuple[1]}/{date_tuple[2]}/{date_tuple[0]}'
# Header Details
if date_short == '12/23/2019':
#if date_tuple:
# local_date = datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
# local_message_date = "%s" %(str(local_date.strftime("%a, %d %b %Y %H:%M:%S")))
email_from = str(email.header.make_header(email.header.decode_header(email_message['From'])))
subject = str(email.header.make_header(email.header.decode_header(email_message['Subject'])))
#print(subject)
if email_from.find('restaurants#uber.com') != -1:
print('yay')
# Body details
if email_from.find('restaurants#uber.com') != -1 and subject.find('Payment Summary') != -1:
for part in email_message.walk():
if part.get_content_type() == "text/plain":
body = part.get_payload(decode=True)
body = body.decode("utf-8") # Convert byte to str
body = body.replace("\r\n", " ")
text = re.sub(r'\w+:\/{2}[\d\w-]+(\.[\d\w-]+)*(?:(?:\/[^\s/]*))*', '', body) # removes url links
text2 = text.translate(str.maketrans('', '', string.punctuation))
body_list = re.sub("[^\w]", " ", text2).split()
print(body_list)
print(date_short)
else:
continue
Here is an example how to retrieve and read mail parts with imapclient and the email.* modules from the python standard libs:
from imapclient import IMAPClient
import email
from email import policy
def walk_parts(part, level=0):
print(' ' * 4 * level + part.get_content_type())
# do something with part content (applies encoding by default)
# part.get_content()
if part.is_multipart():
for part in part.get_payload():
get_parts(part, level + 1)
# context manager ensures the session is cleaned up
with IMAPClient(host="your_mail_host") as client:
client.login('user', 'password')
# select some folder
client.select_folder('INBOX')
# do something with folder, e.g. search & grab unseen mails
messages = client.search('UNSEEN')
for uid, message_data in client.fetch(messages, 'RFC822').items():
email_message = email.message_from_bytes(
message_data[b'RFC822'], policy=policy.default)
print(uid, email_message.get('From'), email_message.get('Subject'))
# alternatively search for specific mails
msgs = client.search(['SUBJECT', 'some subject'])
#
# do something with a specific mail:
#
# fetch a single mail with UID 12345
raw_mails = client.fetch([12345], 'RFC822')
# parse the mail (very expensive for big mails with attachments!)
mail = email.message_from_bytes(
raw_mails[12345][b'RFC822'], policy=policy.default)
# Now you have a python object representation of the mail and can dig
# into it. Since a mail can be composed of several subparts we have
# to walk the subparts.
# walk all parts at once
for part in mail.walk():
# do something with that part
print(part.get_content_type())
# or recurse yourself into sub parts until you find the interesting part
walk_parts(mail)
See the docs for email.message.EmailMessage. There you find all needed bits to read into a mail message.
use 'ISO 8859-1' instead of 'utf-8'
I had the same issue And after a lot of research I realized that I simply need to use, message_from_bytes function from email rather than using message_from_string
so for your code simply replace:
raw_email_str = raw_email.decode('utf-8')
email_message = email.message_from_string(raw_email_str)
to
email_message = email.message_from_bytes(raw_email)
should work like a charm :)

Python search imap email for a string

New to python, having some trouble getting past this.
Am getting back emails from gmail via imap (with starter code from https://yuji.wordpress.com/2011/06/22/python-imaplib-imap-example-with-gmail/) and want to search a specific email (which I am able to fetch) for a specific string. Something like this
ids = data[0]
id_list = ids.split()
ids = data[0]
id_list = ids.split()
latest_email_id = id_list[-1]
result, data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
def search_raw():
if 'gave' in raw_email:
done = 'yes'
else:
done = 'no'
and it always sets done to no. Here's the output for the email (for the body section of the email)
Content-Type multipart/related;boundary=1_56D8EAE1_29AD7EA0;type="text/html"
--1_56D8EAE1_29AD7EA0
Content-Type text/html;charset="UTF-8"
Content-Transfer-Encoding base64
PEhUTUw+CiAgICAgICAgPEhFQUQ+CiAgICAgICAgICAgICAgICA8VElUTEU+PC9USVRMRT4KICAg
ICAgICA8L0hFQUQ+CiAgICAgICAgPEJPRFk+CiAgICAgICAgICAgICAgICA8UCBhbGlnbj0ibGVm
dCI+PEZPTlQgZmFjZT0iVmVyZGFuYSIgY29sb3I9IiNjYzAwMDAiIHNpemU9IjIiPlNlbnQgZnJv
bSBteSBtb2JpbGUuCiAgICAgICAgICAgICAgICA8QlI+X19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXzwvRk9OVD48L1A+CgogICAgICAg
ICAgICAgICAgPFBSRT4KR2F2ZQoKPC9QUkU+CiAgICAgICAgPC9CT0RZPgo8L0hUTUw+Cg==
--1_56D8EAE1_29AD7EA0--
I know the issue is the html, but can't seem to figure out how to parse the email properly.
Thank you!
The text above is base64 encoding. Python has a module named base64 which gives you the ability to decode it.
import base64
import re
def has_gave(raw_email):
email_body = base64.b64decode(raw_email)
match = re.search(r'.*gave.*', email_body , re.IGNORECASE)
if match:
done = 'yes'
print 'match found for word ', match.group()
else:
done = 'no'
print 'no match found'
return done

Python pull back plain text body from message from IMAP account

I have been working on this and am missing the mark.
I am able to connect and get the mail via imaplib.
msrv = imaplib.IMAP4(server)
msrv.login(username,password)
# Get mail
msrv.select()
#msrv.search(None, 'ALL')
typ, data = msrv.search(None, 'ALL')
# iterate through messages
for num in data[0].split():
typ, msg_itm = msrv.fetch(num, '(RFC822)')
print msg_itm
print num
But what I need to do is get the body of the message as plain text and I think that works with the email parser but I am having problems getting it working.
Does anyone have a complete example I can look at?
Thanks,
To get the plain text version of the body of the email I did something like this....
xxx= data[0][1] #puts message from list into string
xyz=email.message_from_string(xxx)# converts string to instance of message xyz is an email message so multipart and walk work on it.
#Finds the plain text version of the body of the message.
if xyz.get_content_maintype() == 'multipart': #If message is multi part we only want the text version of the body, this walks the message and gets the body.
for part in xyz.walk():
if part.get_content_type() == "text/plain":
body = part.get_payload(decode=True)
else:
continue
Here is a minimal example from the docs:
import getpass, imaplib
M = imaplib.IMAP4()
M.login(getpass.getuser(), getpass.getpass())
M.select()
typ, data = M.search(None, 'ALL')
for num in data[0].split():
typ, data = M.fetch(num, '(RFC822)')
print 'Message %s\n%s\n' % (num, data[0][1])
M.close()
M.logout()
In this case, data[0][1] contains the message body.

Categories