Trouble decoding email using gmail API - python

I am having a hard time getting python to read my emails. I am trying to pull information from the body of the email. The problem I am running into is when I run the code on the email directly from the original source, even after running it through the base64 decoder, it still returns the base64 data that is unreadable. BUT if I forward that same email to myself, so the code is then going over the forwarded email, it works perfectly and decodes the entire email appropriately. Here is the function I am using to get the email body. I have noticed that the content_type is "text" when it is directly from the source, but it is reading it as 'multipart' when i forward it to myself. ANY HELP is greatly appreciated. I am at a loss for where to go from here.
Thanks in advance!
def get_message(service, user_id, msg_id):
try:
# Makes the connection and GETS the emails in RAW format.
message = service.users().messages().get(userId=user_id, id=msg_id, format='raw').execute()
# Changes format from RAW to ASCII
msg_raw = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
# Changes format type again
msg_str = email.message_from_bytes(msg_raw)
# This line checks what the content is, if multipart (plaintext and html) or single part
content_types = msg_str.get_content_maintype()
print(content_types)
if content_types == 'multipart':
# Part1 is plaintext and part2 is html text
part1, part2 = msg_str.get_payload()
raw_email = part1.get_payload()
remove_char = ["|", "=20", "=C2=A0"]
for i in remove_char:
raw_email = raw_email.replace(i, "")
raw_email = "".join([s for s in raw_email.strip().splitlines(True) if s.strip()])
print('Inside correct part')
print(raw_email)
return str(raw_email)
else:
print('Inside the Else')
print(msg_str.get_payload())
return msg_str.get_payload()
except:
print('An error has occured during the get_message function.')
Edit: Here is what this function prints out when looking over this from the original source:
text
Inside the Else
PCFET0NUWVBFIGh0bWwgUFVCTElDICItLy93M2MvL2R0ZCB4aHRtbCAxLjAgdHJhbnNpdGlvbmFs
Ly9lbiIgImh0dHA6Ly93d3cudzMub3JnL3RyL3hodG1sMS9kdGQveGh0bWwxLXRyYW5zaXRpb25h
bC5kdGQiPjxodG1sIHN0eWxlPSJtYXJnaW46IDA7cGFkZGluZzogMDtmb250LWZhbWlseTogJ0hl
bHZldGljYSBOZXVlJywgJ0hlbHZldGljYScsIEhlbHZldGljYSwgQXJpYWwsIHNhbnMtc2VyaWY7
Plus about 100 lines of stuff like this.
Here is what it prints out from the same email if I forward it to myself:
multipart
Inside correct part
---------- Forwarded message ---------
From: <originalSource#email.com>
Date: Wed, Jun 10, 2020 at 10:34 AM
Subject: You added cash to your Account
To: <xxxxxxxxxx#gmail.com>
[image: card] Account ending in XXXX
Hi, XXXX XXXX,
Success!
You added cash with

Solution
The base64 will decode the data as a string, not as bytes. Therefore you should change this
msg_str = email.message_from_bytes(msg_raw)
for this
msg_str = email.message_from_string(msg_raw)
Check out this documentation example in Python for more info regarding this.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

Related

I am trying to make a email chatbot but it spams how could i fix this?

I am trying to build a email chatbot but it has this bug where after it sends the first message, and then gets a response it keeps spamming the answered to the response it got until it gets another response which then it repeats again I was thinking to solve this I should use a variable which detects emails and later down the code a condition that responds only if a email is received, does anyone have any idea on how I could fix this? Thanks
def receive_email():
try:
mail = imaplib.IMAP4_SSL("smtp.gmail.com")
mail.login(email_address, email_password)
mail.select('inbox')
#searches inbox
status, data = mail.search(None, 'Recent')
mail_ids = data[0].split()
latest_email_id = mail_ids[-1]
status, data = mail.fetch(latest_email_id, '(RFC822)')
#gets message
for response_part in data:
if isinstance(response_part, tuple):
msg = email.message_from_bytes(response_part[1])
sender = msg['from']
subject = msg['subject']
if msg.is_multipart():
for part in msg.get_payload():
if part.get_content_type() == 'text/plain':
return part.get_payload()
message = msg.get_payload()
return message,
except Exception as e:
print("Error: ", e)
print("Could not receive email")
return None, None
This is the usual problem for an email autoresponder, if I understand you correctly, and RFC 3834 offers good advice.
Since answers should be self-contained I offer a summary:
Add the Auto-Submitted: auto-replied header field on your outgoing messages. Any value other than no will prevent well-written autoresponders from replying to your outgoing messages.
Set the \answered flag on the message you reply to, immediately before you send the reply.
Change the search key from recent to unanswered not header "auto-submitted" "". unanswered means that the search won't match the messages on which you set the \answered flag, not header "auto-submitted" "" means that you'll not match messages that contain any auto-submitted header field.
Direct your replies to the address in return-path or sender, not the one in from. This is a matter of convention. Auto-submitted mail will often have a special return-path that points to an address that never sends any autoreply.
You may also extend the search key with more details from RFC 3834. The one I suggest should work, but not header "precedence" "junk" will for example prevent your code from replying to a bit of autogenerated mail. Sendgrid and its friends also add header fields you may want to look for and exclude.
If the incoming message has headers like this (use the "view headers" function of most mail readers to see it):
From: example#example.com
Subject: Weekend
To: srtai22#gmail.com
Message-id: <56451182ae7a62978cd6f6ff06dd21e0#example.com>
Then your reply should have headers like this:
Return-Path: <>
From: srtai22#gmail.com
To: example#example.com
Auto-Submitted: auto-replied
Subject: Auto: Weekend
References: <56451182ae7a62978cd6f6ff06dd21e0#example.com>
There'll be many more fields in both, of course. Your reply's return-path says that nothing should respond automatically, From and To are as expected, auto-submitted specifies what sort of response this is, subject doesn't matter very much but this one's polite and well-behaved, and finally references links to the original message.

Gmail API Pull out plain text email body

I am trying to decode an email sent to me from a specific source. The email is what looks like a CSS box that contains the info i need. When i run this through the function provided by google, I get what appears to be CSS coding and it is not possible for me to extract the information i need, and the content_type() is "text". But if i forward the same email to myself and run the same exact function on it, i get the content_type() as "multipart", and i am able to extract the plain text of the CSS body, and grab the info I need. I think this is because when I forward it to myself it contains plain text at the top (showing the forward info) as well as the CSS body.
So my question is, how can I extract the same plain text I get from the CSS body after I forward the email to myself, without forwarding the email to myself? Below is the function I am using:
def get_message(service, user_id, msg_id):
try:
# Makes the connection and GETS the emails in RAW format.
message = service.users().messages().get(userId=user_id, id=msg_id, format='raw').execute()
# Changes format from RAW to ASCII
msg_raw = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
# Changes format type again
msg_str = email.message_from_bytes(msg_raw)
# This line checks what the content is, if multipart (plaintext and html) or single part
content_types = msg_str.get_content_maintype()
print(content_types)
if content_types == 'multipart':
# Part1 is plaintext
part1, part2 = msg_str.get_payload()
raw_email = part1.get_payload()
remove_char = ["|", "=20", "=C2=A0"]
for i in remove_char:
raw_email = raw_email.replace(i, "")
raw_email = "".join([s for s in raw_email.strip().splitlines(True) if s.strip()])
return str(raw_email)
else:
return msg_str.get_payload()
except:
print('An error has occured during the get_message function.')

How to get message body (or bodies) from Message object returned by email.parser.Parser?

I'm reading the Python 3 docs here and I must be blind or something... Where does it say how to get the body of a message?
What I want to do is to open a message and perform some loop in text-based bodies of the message, skipping binary attachments. Pseudocode:
def read_all_bodies(local_email_file):
email = Parser().parse(open(local_email_file, 'r'))
for pseudo_body in email.pseudo_bodies:
if pseudo_body.pseudo_is_binary():
continue
# Pseudo-parse the body here
How do I do that? Is even Message class correct class for this? Isn't it only for headers?
This is best done using two functions:
One to open the file. If the message is single-part, get_payload returns string in the message. If message is multipart, it returns list of sub-messages
Second to handle the text/payload
This is how it can be done:
def parse_file_bodies(filename):
# Opens file and parses email
email = Parser().parse(open(filename, 'r'))
# For multipart emails, all bodies will be handled in a loop
if email.is_multipart():
for msg in email.get_payload():
parse_single_body(msg)
else:
# Single part message is passed diractly
parse_single_body(email)
def parse_single_body(email):
payload = email.get_payload(decode=True)
# The payload is binary. It must be converted to
# python string depending in input charset
# Input charset may vary, based on message
try:
text = payload.decode("utf-8")
# Now you can work with text as with any other string:
...
except UnicodeDecodeError:
print("Error: cannot parse message as UTF-8")
return

How to receive mail using python

I would like to receive email using python. So far I have been able to get the subject but not the body. Here is the code I have been using:
import poplib
from email import parser
pop_conn = poplib.POP3_SSL('pop.gmail.com')
pop_conn.user('myusername')
pop_conn.pass_('mypassword')
#Get messages from server:
messages = [pop_conn.retr(i) for i in range(1, len(pop_conn.list()[1]) + 1)]
# Concat message pieces:
messages = ["\n".join(mssg[1]) for mssg in messages]
#Parse message intom an email object:
messages = [parser.Parser().parsestr(mssg) for mssg in messages]
for message in messages:
print message['subject']
print message['body']
pop_conn.quit()
My issue is that when I run this code it properly returns the Subject but not the body. So if I send an email with the subject "Tester" and the body "This is a test message" it looks like this in IDLE.
>>>>Tester >>>>None
So it appears to be accurately assessing the subject but not the body, I think it is in the parsing method right? The issue is that I don't know enough about these libraries to figure out how to change it so that it returns both a subject and a body.
The object message does not have a body, you will need to parse the multiple parts, like this:
for part in message.walk():
if part.get_content_type():
body = part.get_payload(decode=True)
The walk() function iterates depth-first through the parts of the email, and you are looking for the parts that have a content-type. The content types can be either text/plain or text/html, and sometimes one e-mail can contain both (if the message content_type is set to multipart/alternative).
The email parser returns an email.message.Message object, which does not contain a body key, as you'll see if you run
print message.keys()
What you want is the get_payload() method:
for message in messages:
print message['subject']
print message.get_payload()
pop_conn.quit()
But this gets complicated when it comes to multi-part messages; get_payload() returns a list of parts, each of which is a Message object. You can get a particular part of the multipart message by using get_payload(i), which returns the ith part, raises an IndexError if i is out of range, or raises a TypeError if the message is not multipart.
As Gustavo Costa De Oliveir points out, you can use the walk() method to get the parts in order -- it does a depth-first traversal of the parts and subparts of the message.
There's more about the email.parser module at http://docs.python.org/library/email.message.html#email.message.Message.
it also good return data in correct encoding in message contains some multilingual content
charset = part.get_content_charset()
content = part.get_payload(decode=True)
content = content.decode(charset).encode('utf-8')
Here is how I solved the problem using python 3 new capabilities:
import imaplib
import email
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(username, password)
mail.select(readonly=True) # refresh inbox
status, message_ids = mail.search(None, 'ALL') # get all emails
for message_id in message_ids[0].split(): # returns all message ids
# for every id get the actual email
status, message_data = mail.fetch(message_id, '(RFC822)')
actual_message = email.message_from_bytes(message_data[0][1])
# extract the needed fields
email_date = actual_message["Date"]
subject = actual_message["Subject"]
message_body = get_message_body(actual_message)
Now get_message_body is actually pretty tricky due to MIME format. I used the function suggested in this answer.
This particular example works with Gmail, but IMAP is a standard protocol, so it should work for other email providers as well, possibly with minor changes.
if u want to use IMAP4. Use outlook python library, download here : https://github.com/awangga/outlook
to retrieve unread email from your inbox :
import outlook
mail = outlook.Outlook()
mail.login('emailaccount#live.com','yourpassword')
mail.inbox()
print mail.unread()
to retrive email element :
print mail.mailbody()
print mail.mailsubject()
print mail.mailfrom()
print mail.mailto()

How can I extract only the email body with Python using IMAP?

I am relatively new to programming and to python, but I think I have done ok so far. This is the code I have, and it works fine, except it gets the entire message in MIME format. I only want the text body of unread emails, but I can't quite figure it out how to strip out all of the formatting and header info. If I send a basic email using a smtp python script that I made it works fine, and only prints the body, but if I send the email using outlook it prints a bunch of extra garbage. Any help is very much appreciated.
client = imaplib.IMAP4_SSL(PopServer)
client.login(USER, PASSWORD)
client.select('INBOX')
status, email_ids = client.search(None, '(UNSEEN SUBJECT "%s")' % PrintSubject)
print email_ids
client.store(email_ids[0].replace(' ',','),'+FLAGS','\Seen')
for email in get_emails(email_ids):
get_emails()
def get_emails(email_ids):
data = []
for e_id in email_ids[0].split():
_, response = client.fetch(e_id, '(UID BODY[TEXT])')
data.append(response[0][1])
return data
Sounds like you're looking for the email package:
The email package provides a standard parser that understands most email document structures, including MIME documents. You can pass the parser a string or a file object, and the parser will return to you the root Message instance of the object structure. For simple, non-MIME messages the payload of this root object will likely be a string containing the text of the message. For MIME messages, the root object will return True from its is_multipart() method, and the subparts can be accessed via the get_payload() and walk() methods.

Categories