Convert message object to string (Gmail api and python) - python

I am trying to save only a specific part of an email using python. I have the variables service, userId and msg_id but I don't know how to convert the variable plainText to a string in order to take the part that I want in the get_info function
def get_message(service, user_id, msg_id):
try:
#get the message in raw format
message = service.users().messages().get(userId=user_id, id=msg_id, format='raw').execute()
#encode the message in ASCII format
msg_str = base64.urlsafe_b64decode(message['raw'].encode("ASCII")).decode("ASCII")
#put it in an email object
mime_msg = email.message_from_string(msg_str)
#get the plain and html text from the payload
plainText, htmlText = mime_msg.get_payload()
print(get_info(plainText, "start", "finish"))
except Exception as error:
print('An error occurred in the get_message function: %s' % error)
def get_info(plainText, start, end):
usefullText = plainText.split(start)[2]
usefullText = usefullText.split(end)[0]
return usefullText
after running the code I have the following error message:
An error occurred in the get_message function: 'Message' object has no attribute 'split'

Answer:
The method get_payload() doesn't exist for the email.message class. You need to use as_string() instead.
Code Fix:
The code inside your try block needs to be updated, from:
#get the plain and html text from the payload
plainText, htmlText = mime_msg.get_payload()
to:
#get the plain and html text from the payload
plainText, htmlText = mime_msg.as_string()
References:
email.message: Representing an email message — Python 3.8.5 documentation
as_string() method
email.parser: Parsing email messages — Python 3.8.5 documentation
message_from_string() method

Related

Gmail API Pull out plain text email body

I am trying to decode an email sent to me from a specific source. The email is what looks like a CSS box that contains the info i need. When i run this through the function provided by google, I get what appears to be CSS coding and it is not possible for me to extract the information i need, and the content_type() is "text". But if i forward the same email to myself and run the same exact function on it, i get the content_type() as "multipart", and i am able to extract the plain text of the CSS body, and grab the info I need. I think this is because when I forward it to myself it contains plain text at the top (showing the forward info) as well as the CSS body.
So my question is, how can I extract the same plain text I get from the CSS body after I forward the email to myself, without forwarding the email to myself? Below is the function I am using:
def get_message(service, user_id, msg_id):
try:
# Makes the connection and GETS the emails in RAW format.
message = service.users().messages().get(userId=user_id, id=msg_id, format='raw').execute()
# Changes format from RAW to ASCII
msg_raw = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
# Changes format type again
msg_str = email.message_from_bytes(msg_raw)
# This line checks what the content is, if multipart (plaintext and html) or single part
content_types = msg_str.get_content_maintype()
print(content_types)
if content_types == 'multipart':
# Part1 is plaintext
part1, part2 = msg_str.get_payload()
raw_email = part1.get_payload()
remove_char = ["|", "=20", "=C2=A0"]
for i in remove_char:
raw_email = raw_email.replace(i, "")
raw_email = "".join([s for s in raw_email.strip().splitlines(True) if s.strip()])
return str(raw_email)
else:
return msg_str.get_payload()
except:
print('An error has occured during the get_message function.')

How to get message body (or bodies) from Message object returned by email.parser.Parser?

I'm reading the Python 3 docs here and I must be blind or something... Where does it say how to get the body of a message?
What I want to do is to open a message and perform some loop in text-based bodies of the message, skipping binary attachments. Pseudocode:
def read_all_bodies(local_email_file):
email = Parser().parse(open(local_email_file, 'r'))
for pseudo_body in email.pseudo_bodies:
if pseudo_body.pseudo_is_binary():
continue
# Pseudo-parse the body here
How do I do that? Is even Message class correct class for this? Isn't it only for headers?
This is best done using two functions:
One to open the file. If the message is single-part, get_payload returns string in the message. If message is multipart, it returns list of sub-messages
Second to handle the text/payload
This is how it can be done:
def parse_file_bodies(filename):
# Opens file and parses email
email = Parser().parse(open(filename, 'r'))
# For multipart emails, all bodies will be handled in a loop
if email.is_multipart():
for msg in email.get_payload():
parse_single_body(msg)
else:
# Single part message is passed diractly
parse_single_body(email)
def parse_single_body(email):
payload = email.get_payload(decode=True)
# The payload is binary. It must be converted to
# python string depending in input charset
# Input charset may vary, based on message
try:
text = payload.decode("utf-8")
# Now you can work with text as with any other string:
...
except UnicodeDecodeError:
print("Error: cannot parse message as UTF-8")
return

Get Mime Message isn't returning a base 64 decoded version? (Gmail API)

In my script I need to extract a set of emails that match some query. I decided to use GMail's API python client for this. Now, my understanding was that the GetMimeMessage() was supposed to return a set of decoded base 64 messages. Here is my code:
def GmailInput():
credentials = get_credentials()
http = credentials.authorize(httplib2.Http())
service = discovery.build('gmail', 'v1', http=http)
defaultList= ListMessagesMatchingQuery(service, 'me', 'subject:infringement label:unread ')
print(defaultList)
for msg in defaultList:
currentMSG=GetMimeMessage(service, 'me', msg['id'])
....then I parse the text of the emails and extract some things
The problem is, I am unable to actually parse the message body because GetMimeMessage is not returning a base64 decoded message. So what I am actually parsing ends up being completely unreadable by humans.
I find this peculiar because GetMimeMessage (copied below for convenience) literally does a url-safe base 64 decode of the message data. Anyone have any suggestion? Im really stumped on this.
def GetMimeMessage(service, user_id, msg_id):
"""Get a Message and use it to create a MIME Message.
Args:
service: Authorized Gmail API service instance.
user_id: User's email address. The special value "me"
can be used to indicate the authenticated user.
msg_id: The ID of the Message required.
Returns:
A MIME Message, consisting of data from Message.
"""
try:
message = service.users().messages().get(userId=user_id, id=msg_id, format='raw').execute()
print ('Message snippet: %s' % message['snippet'])
msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
mime_msg = email.message_from_string(msg_str)
return mime_msg
except errors.HttpError, error:
print ('An error occurred: %s' % error)
You can use User.messages:get. This request requires authorization with at least one of the following scopes.
HTTP request
GET https://www.googleapis.com/gmail/v1/users/userId/messages/id
import base64
import email
from apiclient import errors
def GetMessage(service, user_id, msg_id):
"""Get a Message with given ID.
Args:
service: Authorized Gmail API service instance.
user_id: User's email address. The special value "me"
can be used to indicate the authenticated user.
msg_id: The ID of the Message required.
Returns:
A Message.
"""
try:
message = service.users().messages().get(userId=user_id, id=msg_id).execute()
print 'Message snippet: %s' % message['snippet']
return message
except errors.HttpError, error:
print 'An error occurred: %s' % error
def GetMimeMessage(service, user_id, msg_id):
"""Get a Message and use it to create a MIME Message.
Args:
service: Authorized Gmail API service instance.
user_id: User's email address. The special value "me"
can be used to indicate the authenticated user.
msg_id: The ID of the Message required.
Returns:
A MIME Message, consisting of data from Message.
"""
try:
message = service.users().messages().get(userId=user_id, id=msg_id,
format='raw').execute()
print 'Message snippet: %s' % message['snippet']
msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
mime_msg = email.message_from_string(msg_str)
return mime_msg
except errors.HttpError, error:
print 'An error occurred: %s' % error

GMail API Python and Encoding/Decoding

I'm trying to read my GMail messages using the API provided by Google using Python 3.4.
I'm using this function that is provided by Google at this link:
def GetMimeMessage(service, user_id, msg_id):
try:
message = service.users().messages().get(userId=user_id, id=msg_id,
format='raw').execute()
print 'Message snippet: %s' % message['snippet']
msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
mime_msg = email.message_from_string(msg_str)
return mime_msg
except errors.HttpError, error:
print 'An error occurred: %s' % error
However if I use this function as it is I get the following error:
TypeError: initial_value must be str or None, not bytes
So I changed the function a bit:
def GetMimeMessage(service, user_id, msg_id):
try:
message = service.users().messages().get(userId=user_id, id=msg_id,
format='raw').execute()
#print ('Message snippet: %s' % message['snippet'])
msg_str = base64.urlsafe_b64decode(message['raw'].encode('utf-8','ignore'))
print(msg_str)
mime_msg = email.message_from_string(msg_str.decode('utf-8','ignore'))
return mime_msg
except errors.HttpError:
print('An error occurred')
If I don't add the 'ignore' argument I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position
2214: invalid continuation byte
If I use the 'ignore' argument then the content of the mail, for example a HTML text, has some weird characters into it, for example:
=09=09body=2C#bodyTable=2C#bodyCell{
=09=09=09height:100% !important;
=09=09=09margin:0;
=09=09=09padding:0;
=09=09=09width:100% !important;
=09=09}
My problem seems very similar to this one but, given that I'm not a Python expert and I need to use the GMail API, I cannot see how to fix it.
Any idea?
It appears that the mail contents are in quote-print codification.
You can use the quopri module to handle it https://docs.python.org/2/library/quopri.html
As Arkanus suggested the problem was related to the quote-printable codification.
Instead of using quopri I used the decode argument implementing a code that is similar to this one.
The first error was caused by the fact that I'm using Python 3.4. I'm not sure about the reason but using Python 2.7 it works fine.

How to receive mail using python

I would like to receive email using python. So far I have been able to get the subject but not the body. Here is the code I have been using:
import poplib
from email import parser
pop_conn = poplib.POP3_SSL('pop.gmail.com')
pop_conn.user('myusername')
pop_conn.pass_('mypassword')
#Get messages from server:
messages = [pop_conn.retr(i) for i in range(1, len(pop_conn.list()[1]) + 1)]
# Concat message pieces:
messages = ["\n".join(mssg[1]) for mssg in messages]
#Parse message intom an email object:
messages = [parser.Parser().parsestr(mssg) for mssg in messages]
for message in messages:
print message['subject']
print message['body']
pop_conn.quit()
My issue is that when I run this code it properly returns the Subject but not the body. So if I send an email with the subject "Tester" and the body "This is a test message" it looks like this in IDLE.
>>>>Tester >>>>None
So it appears to be accurately assessing the subject but not the body, I think it is in the parsing method right? The issue is that I don't know enough about these libraries to figure out how to change it so that it returns both a subject and a body.
The object message does not have a body, you will need to parse the multiple parts, like this:
for part in message.walk():
if part.get_content_type():
body = part.get_payload(decode=True)
The walk() function iterates depth-first through the parts of the email, and you are looking for the parts that have a content-type. The content types can be either text/plain or text/html, and sometimes one e-mail can contain both (if the message content_type is set to multipart/alternative).
The email parser returns an email.message.Message object, which does not contain a body key, as you'll see if you run
print message.keys()
What you want is the get_payload() method:
for message in messages:
print message['subject']
print message.get_payload()
pop_conn.quit()
But this gets complicated when it comes to multi-part messages; get_payload() returns a list of parts, each of which is a Message object. You can get a particular part of the multipart message by using get_payload(i), which returns the ith part, raises an IndexError if i is out of range, or raises a TypeError if the message is not multipart.
As Gustavo Costa De Oliveir points out, you can use the walk() method to get the parts in order -- it does a depth-first traversal of the parts and subparts of the message.
There's more about the email.parser module at http://docs.python.org/library/email.message.html#email.message.Message.
it also good return data in correct encoding in message contains some multilingual content
charset = part.get_content_charset()
content = part.get_payload(decode=True)
content = content.decode(charset).encode('utf-8')
Here is how I solved the problem using python 3 new capabilities:
import imaplib
import email
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(username, password)
mail.select(readonly=True) # refresh inbox
status, message_ids = mail.search(None, 'ALL') # get all emails
for message_id in message_ids[0].split(): # returns all message ids
# for every id get the actual email
status, message_data = mail.fetch(message_id, '(RFC822)')
actual_message = email.message_from_bytes(message_data[0][1])
# extract the needed fields
email_date = actual_message["Date"]
subject = actual_message["Subject"]
message_body = get_message_body(actual_message)
Now get_message_body is actually pretty tricky due to MIME format. I used the function suggested in this answer.
This particular example works with Gmail, but IMAP is a standard protocol, so it should work for other email providers as well, possibly with minor changes.
if u want to use IMAP4. Use outlook python library, download here : https://github.com/awangga/outlook
to retrieve unread email from your inbox :
import outlook
mail = outlook.Outlook()
mail.login('emailaccount#live.com','yourpassword')
mail.inbox()
print mail.unread()
to retrive email element :
print mail.mailbody()
print mail.mailsubject()
print mail.mailfrom()
print mail.mailto()

Categories