Extract body from email message objects in Python

Extract body from email message objects in Python - python

I have an .mbox file that represents many messages at location mbox_fname. In Python 3, I have already loaded each of the messages, which are objects of the class email.message.Message.
I'd like to get access to the body content of the message.
For instance, something like:
import mailbox
the_mailbox = mailbox.mbox(mbox_fname)
for message in the_mailbox:
subject = message["subject"]
content = <???>
How do I access the body of the message?

I made some progress modifying this answer. This is the best I have so far:
import email
def get_body(message: email.message.Message, encoding: str = "utf-8") -> str:
body_in_bytes = ""
if message.is_multipart():
for part in message.walk():
ctype = part.get_content_type()
cdispo = str(part.get("Content-Disposition"))
# skip any text/plain (txt) attachments
if ctype == "text/plain" and "attachment" not in cdispo:
body_in_bytes = part.get_payload(decode=True) # decode
break
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
body_in_bytes = message.get_payload(decode=True)
body = body_in_bytes.decode(encoding)
return body
So modifying the code in the original question, this gets called like the following:
for message in the_mailbox:
content = get_body(message)

Related

python decode email from base64

hello iam using python script to fetch a message from a specific address mail seems everything work fine but i have a problem with the printable result is a base64 code.
i want to decode the result to get the decode message when do the final result with print, pls help!!
already thanks
the code used.
# Importing libraries
import imaplib, email
user = 'USER_EMAIL_ADDRESS'
password = 'USER_PASSWORD'
imap_url = 'imap.gmail.com'
# Function to get email content part i.e its body part
def get_body(msg):
if msg.is_multipart():
return get_body(msg.get_payload(0))
else:
return msg.get_payload(None, True)
# Function to search for a key value pair
def search(key, value, con):
result, data = con.search(None, key, '"{}"'.format(value))
return data
# Function to get the list of emails under this label
def get_emails(result_bytes):
msgs = [] # all the email data are pushed inside an array
for num in result_bytes[0].split():
typ, data = con.fetch(num, 'BODY.PEEK[1]')
msgs.append(data)
return msgs
# this is done to make SSL connnection with GMAIL
con = imaplib.IMAP4_SSL(imap_url)
# logging the user in
con.login(user, password)
# calling function to check for email under this label
con.select('Inbox')
# fetching emails from this user "tu**h*****1#gmail.com"
msgs = get_emails(search('FROM', 'MY_ANOTHER_GMAIL_ADDRESS', con))
# Uncomment this to see what actually comes as data
# print(msgs)
# Finding the required content from our msgs
# User can make custom changes in this part to
# fetch the required content he / she needs
# printing them by the order they are displayed in your gmail
for msg in msgs[::-1]:
for sent in msg:
if type(sent) is tuple:
# encoding set as utf-8
content = str(sent[1], 'utf-8')
data = str(content)
# Handling errors related to unicodenecode
try:
indexstart = data.find("ltr")
data2 = data[indexstart + 5: len(data)]
indexend = data2.find("</div>")
# printtng the required content which we need
# to extract from our email i.e our body
print(data2[0: indexend])
except UnicodeEncodeError as e:
pass
THE RESULT PRINTED
'''
aGVsbG8gd29yZCBpYW0gdGhlIG1lc3NhZ2UgZnJvbSBnbWFpbA==
'''

You could just use the base64 module to decode base64 encoded strings:
import base64
your_string="aGVsbG8gV29ybGQ==" # the base64 encoded string you need to decode
result = base64.b64decode(your_string.encode("utf8")).decode("utf8")
print(result)
Edit: encoding changed from ASCII to utf-8

If you need to find all encoded places (can be Subject, From, To email addresses with names), the code below might be useful. Given contentData is the entire email,
import re, base64
encodedParts=re.findall('(=\?(.+)\?B\?(.+)\?=)', contentData)
for part in encodedParts:
encodedPart = part[0]
charset = part[1]
encodedContent = part[2]
contentData = contentData.replace(encodedPart, base64.b64decode(encodedContent).decode(charset))

Spaces replaced by =20 after extracting text from email

I tried to get the text of a received gmail, using the email and imaplib modules in python. After decoding with utf-8 and after getting the payload of the message, all the spaces are still replaced by =20. Can I use another decoding step in order to fix this?
The code is the following: (I got it from a youtube tutorial - https://youtu.be/Jt8LizzxkPU )
``
import email
import imaplib
username = "abc"
password = "123"
mail = imaplib.IMAP4_SSL("imap.gmail.com")
mail.login(username,password)
mail.select("inbox")
result, data = mail.uid("search", None,"ALL")
inbox_item_list = data[0].split()
for item in inbox_item_list:
#most_recent = inbox_item_list[-1]
#oldest = inbox_item_list[0]
result2, email_data = mail.uid('fetch',item,'(RFC822)')
raw_email = email_data[0][1].decode("utf-8")
email_message = email.message_from_string(raw_email)
to_ = email_message['To']
from_ = email_message['From']
subject_ = email_message['Subject']
counter = 1
for part in email_message.walk():
if part.get_content_maintype() == "multipart":
continue
filename = part.get_filename()
if not filename:
ext = ".html"
filename = "msg-part-%08d%s" %(counter, ext)
counter += 1
#save file
content_type = part.get_content_type()
print(subject_)
print (content_type)
if "plain" in content_type:
print(part.get_payload())
elif "html" in content_type:
print("do some beautiful soup")
else:
print(content_type)
``

Try to import quopri, and then when you get the content of the email body (or whatever text that has the =20s inside), you can use quopri.decodestring()
I do it like this
quopri.decodestring(part.get_payload())
But do keep in mind that this is if you quite specifically want to decode from quoted-printable. Normally I would say the answer of #jfs is neater.

Here's a complete code example of how a simple email (that contains both a literal =20 as well as =20 sequence that should be replaced by a space) could be decoded:
#!/usr/bin/env python3
import email.policy
email_text = """Subject: =?UTF-8?B?dGVzdCDwn5OnID0yMA==?=
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo=
oooooooooooooooooooooooooooooong=20word
=3D20
^ line starts with =3D20
emoji: <=F0=9F=93=A7>"""
msg = email.message_from_string(
email_text, policy=email.policy.default
)
print("Subject: <{subject}>".format_map(msg))
assert not msg.is_multipart()
print(msg.get_content())
Output
Subject: <test 📧 =20>
loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong word
=20
^ line starts with =20
emoji: <📧>
msg.walk(), part.get_payload(decode=True) could be used to traverse more complex EmailMessage objects. See email Examples.

Get message and then send to another email using Gmail API with Python

I'm following the advice on this thread to set up bulk forwarding using Python. I'm trying to search for messages in the inbox with specific keywords, get the message IDs of those messages, and send those messages to a different person.
The search and getting IDs part works fine. Here's the code:
def search_message(service, user_id, search_string):
# initiate the list for returning
list_ids = []
# get the id of all messages that are in the search string
search_ids = service.users().messages().list(userId=user_id, q=search_string).execute()
# if there were no results, print warning and return empty string
try:
ids = search_ids['messages']
except KeyError:
print("WARNING: the search queried returned 0 results")
print("returning an empty string")
return ""
if len(ids)>1:
for msg_id in ids:
list_ids.append(msg_id['id'])
return(list_ids)
else:
list_ids.append(ids['id'])
return list_ids
It's when I try to send a message that things get hairy. I'm testing this on a single message ID at the moment, and here's what I've tried:
message_raw = service.users().messages().get(userId=user_id, id=msg_id,format='raw').execute()
message_full = service.users().messages().get(userId="me", id=msg_id, format="full", metadataHeaders=None).execute()
## get the subject line
msg_header = message_full['payload']['headers']
# this is a little faster then a loop
subj = [i['value'] for i in msg_header if i["name"]=="Subject"]
subject = subj[0]
msg_str = base64.urlsafe_b64decode(message_raw['raw'].encode('UTF-8'))
msg_bytes = email.message_from_bytes(msg_str)
# get content type for msg
content_type = msg_bytes.get_content_maintype()
if content_type == 'multipart':
# there will usually be 2 parts: the first will be the body as a raw string,
# the second will be the body as html
parts = msg_bytes.get_payload()
# return the encoded text
send_string = parts[0].get_payload()
# force utf-8 encoding on the string
send_string = send_string.encode('utf-8').decode('utf-8')
# now that we have the body in raw string, we will build a new mime object and
# send it via gmail
final_message = MIMEText(send_string)
# set send-to and subject line in the msg
final_message['to'] = to
final_message['subject'] = subject
final_message['from'] = 'tradethenewsapi#gmail.com'
# turn back into raw format and return
raw = base64.urlsafe_b64decode(final_message.as_bytes())
body = {'raw': raw}
However, when I go to send the actual email (as this thread suggests),
message_sent = (service.users().messages().send(userId='me', body=body).execute())
I keep getting this error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 1: invalid start byte
No matter what I do to search_string, it won't work and keeps giving me the unicode error.

Replace body message of an email using python

I created a class in python that will send emails via one of my private servers. It works but I'm wondering if there is a method to replace an existing email body message with a new one?
Emailer Class
class Emailer:
def __init__(self, subj=None, message=None, toAddr=None, attachment=None, image=None):
# initialize email inputs
self.msg = email.MIMEMultipart.MIMEMultipart()
self.cidNum = 0
self.message = []
if message is not None:
self.addToMessage(message,image)
# set the subject of the email if there is one specified
self.subj = []
if subj is not None:
self.setSubject(subj)
# set the body of the email and any attachements specified
self.attachment = []
if attachment is not None:
self.addAtachment(attachment)
# set the recipient list
self.toAddr = []
if toAddr is not None:
self.addRecipient(toAddr)
def addAttachment(self,attachment):
logger.debug("Adding attachement to email")
# loop through list of attachments and add them to the email
if attachment is not None:
if type(attachment) is not list:
attachment = [attachment]
for f in attachment:
part = email.MIMEBase.MIMEBase('application',"octet-stream")
part.set_payload( open(f,"rb").read() )
encoders.encode_base64(part)
part.add_header('Content-Disposition', 'attachment; filename="{0}"'.format(os.path.basename(f)))
self.msg.attach(part)
def addToMessage(self,message,image=None):
logger.debug("Adding to email message. Content: [%s]" % message)
# add the plain text message
self.message.append(message)
# add embedded images to message
if image is not None:
if type(image) is not list:
image = [image]
for i in image:
msgText = email.MIMEText.MIMEText('<br><img src="cid:image%s"><br>' % self.cidNum, 'html')
self.msg.attach(msgText)
fp = open(i, 'rb')
img = email.MIMEImage.MIMEImage(fp.read())
fp.close()
img.add_header('Content-ID','<image%s>' % self.cidNum)
self.msg.attach(img)
self.cidNum += 1
# method to set the subject of the email
def setSubject(self,subj):
self.msg['Subject'] = subj
# method to add recipients to the email
def addRecipient(self, toAddr):
# loop through recipient list
for x in toAddr:
self.msg['To'] = x
# method to configure server settings: the server host/port and the senders login info
def configure(self, serverLogin, serverPassword, fromAddr, toAddr, serverHost='myserver', serverPort=465):
self.server=smtplib.SMTP_SSL(serverHost,serverPort)
self.server.set_debuglevel(True)
# self.server.ehlo()
# self.server.ehlo()
self.server.login(serverLogin, serverPassword) #login to senders email
self.fromAddr = fromAddr
self.toAddr = toAddr
# method to send the email
def send(self):
logger.debug("Sending email!")
msgText = email.MIMEText.MIMEText("\n".join(self.message))
self.msg.attach(msgText)
print "Sending email to %s " % self.toAddr
text = self.msg.as_string() #conver the message contents to string format
try:
self.server.sendmail(self.fromAddr, self.toAddr, text) #send the email
except Exception as e:
logger.error(e)
Currently, the addToMessage() method is what adds text to the body of the email. If addToMessage() had already been called but I wanted to replace that body text with new text, is there a way?

If addToMessage() had already been called but I wanted to replace that body text with new text, is there a way?
Yes. If you are always replacing the last entry added to self.message, you can reference this element with self.message[-1] since it is a list. If you want to replace a specific element, you can search for it with the index() method.
Example #1: Replace Last Written Text in Body
def replace_last_written_body_text(new_text):
if len(self.message) > 0:
self.message[-1] = new_text
Example #2: Replace Specified Text in Body
def replace_specified_body_text(text_to_replace, new_text):
index_of_text_to_replace = self.message.index(text_to_replace)
if index_of_text_to_replace is not None:
self.message[index_of_text_to_replace] = new_text
else:
logger.warning("Cannot replace non-existent body text")

If addToMessage has been called just once, then:
message is a list, and its first element is the body text, so you just need to replace that element with the new text:
def replace_body(self, new_text):
if len(self.message) > 0:
self.message[0] = new_text
else:
self.message = [new_text]
I haven't tested that, but it should work. Make sure you write some unit tests for this project!
EDIT:
if addToMessage has been called multiple times, then the new replace function could replace the entire text, or just part of it. If you want to replace all of it, then just replace message, like the part after else above: self.message = [new_text]. Otherwise, you're going to have to find the element you need to replace, like #BobDylan is doing in his answer.

Reading the mail content of an mbox file using python mailbox

I am trying to print the content of the mail ( Mail body) using Python mailbox.
import mailbox
mbox = mailbox.mbox('Inbox')
i=1
for message in mbox:
print i
print "from :",message['from']
print "subject:",message['subject']
print "message:",message['**messages**']
print "**************************************"
i+=1
But I feel message['messages'] is not the right one to print the mail content here. I could not understand it from the documentation

To get the message content, you want to use get_payload(). mailbox.Message is a subclass of email.message.Message. You'll also want to check is_multipart() because that will affect the return value of get_payload(). Example:
if message.is_multipart():
content = ''.join(part.get_payload(decode=True) for part in message.get_payload())
else:
content = message.get_payload(decode=True)

def getbody(message): #getting plain text 'email body'
body = None
if message.is_multipart():
for part in message.walk():
if part.is_multipart():
for subpart in part.walk():
if subpart.get_content_type() == 'text/plain':
body = subpart.get_payload(decode=True)
elif part.get_content_type() == 'text/plain':
body = part.get_payload(decode=True)
elif message.get_content_type() == 'text/plain':
body = message.get_payload(decode=True)
return body
this function can give you message body if the body is plain text.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract body from email message objects in Python - python

Related

python decode email from base64

Spaces replaced by =20 after extracting text from email

Get message and then send to another email using Gmail API with Python

Replace body message of an email using python

Reading the mail content of an mbox file using python mailbox

Categories

Resources