Gmail API Pull out plain text email body

Gmail API Pull out plain text email body - python

I am trying to decode an email sent to me from a specific source. The email is what looks like a CSS box that contains the info i need. When i run this through the function provided by google, I get what appears to be CSS coding and it is not possible for me to extract the information i need, and the content_type() is "text". But if i forward the same email to myself and run the same exact function on it, i get the content_type() as "multipart", and i am able to extract the plain text of the CSS body, and grab the info I need. I think this is because when I forward it to myself it contains plain text at the top (showing the forward info) as well as the CSS body.
So my question is, how can I extract the same plain text I get from the CSS body after I forward the email to myself, without forwarding the email to myself? Below is the function I am using:
def get_message(service, user_id, msg_id):
try:
# Makes the connection and GETS the emails in RAW format.
message = service.users().messages().get(userId=user_id, id=msg_id, format='raw').execute()
# Changes format from RAW to ASCII
msg_raw = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
# Changes format type again
msg_str = email.message_from_bytes(msg_raw)
# This line checks what the content is, if multipart (plaintext and html) or single part
content_types = msg_str.get_content_maintype()
print(content_types)
if content_types == 'multipart':
# Part1 is plaintext
part1, part2 = msg_str.get_payload()
raw_email = part1.get_payload()
remove_char = ["|", "=20", "=C2=A0"]
for i in remove_char:
raw_email = raw_email.replace(i, "")
raw_email = "".join([s for s in raw_email.strip().splitlines(True) if s.strip()])
return str(raw_email)
else:
return msg_str.get_payload()
except:
print('An error has occured during the get_message function.')

Related

Trouble decoding email using gmail API

I am having a hard time getting python to read my emails. I am trying to pull information from the body of the email. The problem I am running into is when I run the code on the email directly from the original source, even after running it through the base64 decoder, it still returns the base64 data that is unreadable. BUT if I forward that same email to myself, so the code is then going over the forwarded email, it works perfectly and decodes the entire email appropriately. Here is the function I am using to get the email body. I have noticed that the content_type is "text" when it is directly from the source, but it is reading it as 'multipart' when i forward it to myself. ANY HELP is greatly appreciated. I am at a loss for where to go from here.
Thanks in advance!
def get_message(service, user_id, msg_id):
try:
# Makes the connection and GETS the emails in RAW format.
message = service.users().messages().get(userId=user_id, id=msg_id, format='raw').execute()
# Changes format from RAW to ASCII
msg_raw = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
# Changes format type again
msg_str = email.message_from_bytes(msg_raw)
# This line checks what the content is, if multipart (plaintext and html) or single part
content_types = msg_str.get_content_maintype()
print(content_types)
if content_types == 'multipart':
# Part1 is plaintext and part2 is html text
part1, part2 = msg_str.get_payload()
raw_email = part1.get_payload()
remove_char = ["|", "=20", "=C2=A0"]
for i in remove_char:
raw_email = raw_email.replace(i, "")
raw_email = "".join([s for s in raw_email.strip().splitlines(True) if s.strip()])
print('Inside correct part')
print(raw_email)
return str(raw_email)
else:
print('Inside the Else')
print(msg_str.get_payload())
return msg_str.get_payload()
except:
print('An error has occured during the get_message function.')
Edit: Here is what this function prints out when looking over this from the original source:
text
Inside the Else
PCFET0NUWVBFIGh0bWwgUFVCTElDICItLy93M2MvL2R0ZCB4aHRtbCAxLjAgdHJhbnNpdGlvbmFs
Ly9lbiIgImh0dHA6Ly93d3cudzMub3JnL3RyL3hodG1sMS9kdGQveGh0bWwxLXRyYW5zaXRpb25h
bC5kdGQiPjxodG1sIHN0eWxlPSJtYXJnaW46IDA7cGFkZGluZzogMDtmb250LWZhbWlseTogJ0hl
bHZldGljYSBOZXVlJywgJ0hlbHZldGljYScsIEhlbHZldGljYSwgQXJpYWwsIHNhbnMtc2VyaWY7
Plus about 100 lines of stuff like this.
Here is what it prints out from the same email if I forward it to myself:
multipart
Inside correct part
---------- Forwarded message ---------
From: <originalSource#email.com>
Date: Wed, Jun 10, 2020 at 10:34 AM
Subject: You added cash to your Account
To: <xxxxxxxxxx#gmail.com>
[image: card] Account ending in XXXX
Hi, XXXX XXXX,
Success!
You added cash with

Solution
The base64 will decode the data as a string, not as bytes. Therefore you should change this
msg_str = email.message_from_bytes(msg_raw)
for this
msg_str = email.message_from_string(msg_raw)
Check out this documentation example in Python for more info regarding this.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

python IMAP content of email contains a string

I am able to log in a gmail account with python IMAP
imap = imaplib.IMAP4_SSL('imap.gmail.com')
imap.login(myDict["emailUsername"], myDict["emailPassword"])
imap.select(mailbox='inbox', readonly=False)
resp, items = imap.search(None, 'All')
email_ids = items[0].split()
latest_email_id = email_ids[-1]
resp, data = imap.fetch(latest_email_id, "(UID)")
print ("resp= ", resp, " data=", data)
#msg_uid = parse_uid(data[0])
match = pattern_uid.match(data[0].decode("utf-8"))
#print ("match= ", match)
msg_uid = match.group('uid')
I need to make sure that the UID for the last email I have contains a certain string (XYZ). I am NOT looking for header subject but the content of email. How can I do that ?

There's a couple ways you could go:
Fetch the message and walk through the text body parts looking for your string -- example at Finding links in an emails body with Python
Get the server to do the search by supplying 'latest_email_id' and your search criteria back to the server in a UID SEARCH command. For Gmail, you can even use the X-GM-RAW attribute to use the same syntax support by the GMail web interface. See https://developers.google.com/gmail/imap/imap-extensions for details of that.

Python gmail imap - get text of email body not in a single string

I've been trying to figure this out, and find the solution here on stackoverflow and other place, but i can't get it (not enough experience in Python I guess), so please help:
I'm using the imaplib and email libraries in Python to get emails from my gmail account. I can login and find the mail which I want, and I have implemented the script to capture multipart emails, but the output text of the body of the email (via get_payload method) is a single string, and I would like to get the body of the email as it was sent, so that each new line (as a string) is separated and stored into a list. Please check out the part of my code:
mail = imaplib.IMAP4_SSL('imap.gmail.com', 993)
mail.login('mymail#gmail.com', 'password')
mail.select("inbox")
date = (datetime.datetime.now() - datetime.timedelta(days=1)).strftime("%d-%b-%Y")
result, data = mail.uid('search', 'UNSEEN', '(SENTSINCE {date} FROM "someone#gmail.com")'.format(date=date))
latest_email_uid = data[0].split()[-1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = data[0][1]
email_message = email.message_from_string(raw_email)
text = ''
if email_message.is_multipart():
html = None
for part in email_message.get_payload():
if part.get_content_charset() is None:
text = part.get_payload(decode=True)
continue
charset = part.get_content_charset()
if part.get_content_type() == 'text/plain':
text = unicode(part.get_payload(decode=True), str(charset), "ignore").encode('windows-1250', 'replace')
if part.get_content_type() == 'text/html':
html = unicode(part.get_payload(decode=True), str(charset), "ignore").encode('windows-1250', 'replace')
if text is not None:
text.strip()
else:
html.strip()
else:
text = unicode(email_message.get_payload(decode=True), email_message.get_content_charset(), 'ignore').encode('windows-1250', 'replace')
text.strip()
print text
beforehand I have some more code and at the top are the imported libraries required to run the code, so no need for checking that. I've tried to declare the text = [], i've tried not to strip() text or html,.. but i just can't get it. Is there a simple way to get the text of the body as it was sent, each string in it's own line? I feel that it's so simple but i dont get it..
Thanks in advance!!

IMAP get sender name and body text?

I am using this code:
import imaplib
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(myusername, mypassword)
mail.list()
# Out: list of "folders" aka labels in gmail.
mail.select("inbox") # connect to inbox.
result, data = mail.search(None, "ALL")
ids = data[0] # data is a list.
id_list = ids.split() # ids is a space separated string
latest_email_id = id_list[-1] # get the latest
result, data = mail.fetch(latest_email_id, "(RFC822)") # fetch the email body (RFC822) for the given ID
raw_email = data[0][1] # here's the body, which is raw text of the whole email
# including headers and alternate payloads
print raw_email
and it works, except, when I print raw_email it returns a bunch of extra information, how can I, parse, per say, the extra information and get just the From and body text?

Python's email package is probably a good place to start.
import email
msg = email.message_from_string(raw_email)
print msg['From']
print msg.get_payload(decode=True)
That should do ask you ask, though when an email has multiple parts (attachments, text and HTML versions of the body, etc.) things are a bit more complicated.
In that case, msg.is_multipart() will return True and msg.get_payload() will return a list instead of a string. There's a lot more information in the email.message documentation.
Alternately, rather than parsing the raw RFC822-formatted message - which could be very large, if the email contains attachments - you could just ask the IMAP server for the information you want. Changing your mail.fetch line to:
mail.fetch(latest_email_id, "(BODY[HEADER.FIELDS (FROM)])")
Would just request (and return) the From line of the email from the server. Likewise setting the second parameter to "(UID BODY[TEXT])" would return the body of the email. RFC2060 has a list of parameters that should be valid here.

IMAP high level lib: https://github.com/ikvk/imap_tools (I am author)
from imap_tools import MailBox, A
with MailBox('imap.mail.com').login('test#mail.com', 'password', 'INBOX') as mailbox:
for msg in mailbox.fetch(A(all=True)):
sender = msg.from_
body = msg.text or msg.html

Alternatively, you can use Red Box (I'm the author):
from redbox import EmailBox
# Create email box instance
box = EmailBox(
host="imap.example.com",
port=993,
username="me#example.com",
password="<PASSWORD>"
)
# Select an email folder
inbox = box["INBOX"]
# Search and process messages
for msg in inbox.search(all=True):
# Process the message
print(msg.from_)
print(msg.to)
print(msg.subject)
print(msg.text_body)
print(msg.html_body)
Some relevant links in the documentations:
More about querying
More about manipulating the message
More about configuring the email box
To install:
pip install redbox
Links:
Source code
Documentation

How to receive mail using python

I would like to receive email using python. So far I have been able to get the subject but not the body. Here is the code I have been using:
import poplib
from email import parser
pop_conn = poplib.POP3_SSL('pop.gmail.com')
pop_conn.user('myusername')
pop_conn.pass_('mypassword')
#Get messages from server:
messages = [pop_conn.retr(i) for i in range(1, len(pop_conn.list()[1]) + 1)]
# Concat message pieces:
messages = ["\n".join(mssg[1]) for mssg in messages]
#Parse message intom an email object:
messages = [parser.Parser().parsestr(mssg) for mssg in messages]
for message in messages:
print message['subject']
print message['body']
pop_conn.quit()
My issue is that when I run this code it properly returns the Subject but not the body. So if I send an email with the subject "Tester" and the body "This is a test message" it looks like this in IDLE.
>>>>Tester >>>>None
So it appears to be accurately assessing the subject but not the body, I think it is in the parsing method right? The issue is that I don't know enough about these libraries to figure out how to change it so that it returns both a subject and a body.

The object message does not have a body, you will need to parse the multiple parts, like this:
for part in message.walk():
if part.get_content_type():
body = part.get_payload(decode=True)
The walk() function iterates depth-first through the parts of the email, and you are looking for the parts that have a content-type. The content types can be either text/plain or text/html, and sometimes one e-mail can contain both (if the message content_type is set to multipart/alternative).

The email parser returns an email.message.Message object, which does not contain a body key, as you'll see if you run
print message.keys()
What you want is the get_payload() method:
for message in messages:
print message['subject']
print message.get_payload()
pop_conn.quit()
But this gets complicated when it comes to multi-part messages; get_payload() returns a list of parts, each of which is a Message object. You can get a particular part of the multipart message by using get_payload(i), which returns the ith part, raises an IndexError if i is out of range, or raises a TypeError if the message is not multipart.
As Gustavo Costa De Oliveir points out, you can use the walk() method to get the parts in order -- it does a depth-first traversal of the parts and subparts of the message.
There's more about the email.parser module at http://docs.python.org/library/email.message.html#email.message.Message.

it also good return data in correct encoding in message contains some multilingual content
charset = part.get_content_charset()
content = part.get_payload(decode=True)
content = content.decode(charset).encode('utf-8')

Here is how I solved the problem using python 3 new capabilities:
import imaplib
import email
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(username, password)
mail.select(readonly=True) # refresh inbox
status, message_ids = mail.search(None, 'ALL') # get all emails
for message_id in message_ids[0].split(): # returns all message ids
# for every id get the actual email
status, message_data = mail.fetch(message_id, '(RFC822)')
actual_message = email.message_from_bytes(message_data[0][1])
# extract the needed fields
email_date = actual_message["Date"]
subject = actual_message["Subject"]
message_body = get_message_body(actual_message)
Now get_message_body is actually pretty tricky due to MIME format. I used the function suggested in this answer.
This particular example works with Gmail, but IMAP is a standard protocol, so it should work for other email providers as well, possibly with minor changes.

if u want to use IMAP4. Use outlook python library, download here : https://github.com/awangga/outlook
to retrieve unread email from your inbox :
import outlook
mail = outlook.Outlook()
mail.login('emailaccount#live.com','yourpassword')
mail.inbox()
print mail.unread()
to retrive email element :
print mail.mailbody()
print mail.mailsubject()
print mail.mailfrom()
print mail.mailto()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Gmail API Pull out plain text email body - python

Related

Trouble decoding email using gmail API

python IMAP content of email contains a string

Python gmail imap - get text of email body not in a single string

IMAP get sender name and body text?

How to receive mail using python

Categories

Resources