Extract a single line of text from an email

Extract a single line of text from an email - python

I'm trying to get one sentence from a lot of HTML emails. The sentence is located in the exact same place in every email (including the same line if you view the source code).
So far I have used imaplib to set up the connection to the correct mailbox, search and fetch the body of the email.
response_code_fetch, data_fetch = mail.fetch('1', '(BODY.PEEK[TEXT])')
if response_code_fetch == "OK":
print("Body Text: " + str(data_fetch[0]))
else:
print("Unable to find requested messages")
However, I get an incoherent list that has the entire body of the email at index [0] of the returned list. I've tried str(data_fetch[0]) and then using the splitlines method, but it doesn't work.
I've also found the below suggestion online using the email module, but it doesn't seem to work as it prints the else statement.
my_email = email.message_from_string(data_fetch)
body = ""
if my_email.is_multipart():
for part in my_email.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))
print(ctype, cdispo)
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
print("Email is not multipart")
body = my_email.get_payload(decode=True)
print(body)
I won't include the whole result as it's very long but it basically looks like I get the code for the email, HTML formatting, body text and all:
Body Text: [(b'1 (BODY[TEXT] {78687}', b'--_av-
uaAIyctTRCxY0f6Fw54pvw\r\nContent-Type: text/plain; charset=utf-
8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n
Does anyone know how can I get the one sentence out of the body text?

I think the b in front of your string makes it a a byte literal. What if you put a .decode('UTF-8') behind your Body Text string?

Related

If statement is not working in below python code it skip if statement and run else statement only

I have prepared code for mail automation. It will read the contents of of a CSV file and compare it with mail body and fetch the information and share it via mail.
The if-statement will not work code run: it only executes the else statement.
The if-statement works only if we remove the else.
import csv
from ast import Break, If
import win32com.client as client
outlook = client.Dispatch("Outlook.Application").GetNamespace("MAPI")
folder = outlook.Folders.Item("xyx.comt")
Verizon = folder.Folders.Item("Inbox").Folders("Ver")
Inbox =folder.Folders.Item("Inbox")
message = Ver.Items
message = message.GetLast()
csv_file=csv.reader(open("C:\\Users\\Documents\\Ver2022.csv",'r'))
if message.SenderName == 'abc.com' and "INCIDENT" in message.Subject:
#Compare mail body and cvc file
messageBody=str(message.body)
for row in csv_file:
if row[2] in messageBody:
print(row)
#Extrcat data from cvc file and send to mail
NewMsg=message.Forward()
NewMsg.Subject=f"Ver||{row[1:3],row[5:7]}"
NewMsg.Body = f"Hello Team,\n\tPlease find below device deatils:\n\t Site ID :{row[1]}\n\t Device Name: {row[2]}\n\t Circuit ID :{row[3]}\n\t Circuit Type : {row[4]}\n\t Topology: {row[5]}\n\t Site Country: {row[6]}\n\t Site City: {row[7]}\n\t Site Contact details:{row[8]}\n\t Site Address :{row[9]}\n\t\n\nThanks&Regards\nabc"+NewMsg.Body
NewMsg.To=('xyz.com')
NewMsg.Send()
break
else:
NewMsg=message.Forward()
NewMsg.Subject=f"The device is not available in inventory"
NewMsg.Body = f"Hello Team,\n\tThe device is not available in inventory \n\t\nThanks&Regards\nxyz"+NewMsg.Body
NewMsg.To=('abc.com')
NewMsg.Send()
break
message.Move(Inbox)

It is not clear what values are used in the following condition:
if row[2] in messageBody:
Be aware, you may use the HTMLBody property which returns an HTML markup which represents the message body or use the Word object model. The WordEditor of the Inspector class returns a Document instance which represents your item's message body, so you can use Word properties and methods.
The Outlook object model supports three main ways of dealing/customizing the message body:
The Body property returns or sets a string representing the clear-text body of the Outlook item.
The HTMLBody property of the MailItem class returns or sets a string representing the HTML body of the specified item. Setting the HTMLBody property will always update the Body property immediately. For example:
Sub CreateHTMLMail()
'Creates a new e-mail item and modifies its properties.
Dim objMail As Outlook.MailItem
'Create e-mail item
Set objMail = Application.CreateItem(olMailItem)
With objMail
'Set body format to HTML
.BodyFormat = olFormatHTML
.HTMLBody = "<HTML><BODY>Enter the message text here. </BODY></HTML>"
.Display
End With
End Sub
The Word object model can be used for dealing with message bodies. See Chapter 17: Working with Item Bodies for more information.

Importing text and html template for e-mail correctly

I'm working on code to automate survey participation requests. My current code looks as follows:
def survey_mail(name, receiver, sender):
text_content = f"Hello {name},\r\n
Thank you for participating in my survey via your mail {receiver}\r\n.
You can contact me via {sender}."
html_content = """\ Hello """ + str(name) + """,<br>
Thank you for participating in my survey via your mail """ + str(receiver) + """<br>.
You can contact me via """ + str(sender) + """.
"""
content = MIMEMultipart('alternative')
content.attach(MIMEText(text_content, 'plain'))
content.attach(MIMEText(html_content, 'html'))
...
I have two questions here:
First, would it be possible to import the two string above simply as
template files?
Second, is there a better way to handle variables in the
string? The current method comes with two different ways to format variables: {} vs. """ + var + """.
I tried to insert the two templates as *.txt files, and then load the templates:
with open("text.txt") as f:
text_content = f.read()
with open("html.txt") as f:
html_content = f.read()
However, this did not work. The code does just import the template as a full string.

f-strings are evaluated are definition time, so you cannot read them from a file. The second way in your example (for html) is an expression. While an expression can be eval-ed, it is generally seen as a poor security practice, because it allows execution of uncontrolled code.
But you could just use format as a poor man's templating engine: it have far less features that full fledged template engines, but is enough here.
Example file for the text part:
Hello {name},
Thank you for participating in my survey via your mail {receiver}.
You can contact me via {sender}.
You can then use it that way:
with open("text.txt") as f:
text_content = f.read().format(sender=sender, receiver=receiver)

How to add .htm to email body using win32com

I need to use win32com.client to make an email where I add a signature with the .htm extension to the mail.HtmlBody. However, each time I do this, I get UnicodeDecodeError.
In other words, how do I correct the UnicodeDecodeError problem and add my string & htm file to the HtmlBody?
self.mail = win32.Dispatch('outlook.application').CreateItem(0)
self.curText = str(self.email.currentText())
self.projectNameT = ' '.join(self.curText.split(' ')[7:])
self.mail.To = 'ABC#XYZ.com'
self.mail.Subject = "Subject: " + str(self.projectNameT)
self.someStr = 'Hello '
self.html_url = open("SomePath//Signature.htm",encoding = 'utf16')
self.data = self.html_url.read()
self.mail.HtmlBody = self.someStr + ('<p>self.data</p>')

If you want to insert a signature in using python and fully programatically, Redemption exposes the RDOSignature object which implements ApplyTo method (it deals with signature image files and merges HTML styles). Because with the outlook security patch, a lot is unable to be done inherrently, so you must work around this before you can procede as normal

Unable to display Japanese (UTF-8) characters in email body with webbrowser

I am reading text from two different .txt files and concatenating them together. Then add that to a body of the email through by using webbrowser.
One text file is English characters (ascii) and the other Japanese (UTF-8). The text will display fine if I write it to a text file. But if I use webbrowser to insert the text into an email body the Japanese text displays as question marks.
I have tried running the script on multiple machines that have different mail clients as their defaults. Initially I thought maybe that was the issue, but that does not appear to be. Thunderbird and Mail (MacOSX) display question marks.
Hello. Today is 2014-05-09
????????????????2014-05-09????
I have looked at similar issues around on SO but they have not solved the issue.
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in
position 20: ordinal not in
range(128)
Japanese in python function
Printing out Japanese (Chinese) characters
python utf-8 japanese
Is there a way to have the Japanese (UTF-8) display in the body of an email created with webbrowser in python? I could use the email functionality but the requirement is the script needs to open the default mail client and insert all the information.
The code and text files I am using are below. I have simplified it to focus on the issue.
email-template.txt
Hello. Today is {{date}}
email-template-jp.txt
こんにちは。今日は {{date}} です。
Python Script
#
# -*- coding: utf-8 -*-
#
import sys
import re
import os
import glob
import webbrowser
import codecs,sys
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
# vars
date_range = sys.argv[1:][0]
email_template_en = "email-template.txt"
email_template_jp = "email-template-jp.txt"
email_to_send = "email-to-send.txt" # finished email is saved here
# Default values for the composed email that will be opened
mail_list = "test#test.com"
cc_list = "test1#test.com, test2#test.com"
subject = "Email Subject"
# Open email templates and insert the date from the parameters sent in
try:
f_en = open(email_template_en, "r")
f_jp = codecs.open(email_template_jp, "r", "UTF-8")
try:
email_content_en = f_en.read()
email_content_jp = f_jp.read()
email_en = re.sub(r'{{date}}', date_range, email_content_en)
email_jp = re.sub(r'{{date}}', date_range, email_content_jp).encode("UTF-8")
# this throws an error
# UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 26: ordinal not in range(128)
# email_en_jp = (email_en + email_jp).encode("UTF-8")
email_en_jp = (email_en + email_jp)
finally:
f_en.close()
f_jp.close()
pass
except Exception, e:
raise e
# Open the default mail client and fill in all the information
try:
f = open(email_to_send, "w")
try:
f.write(email_en_jp)
# Does not send Japanese text to the mail client. But will write to the .txt file fine. Unsure why.
webbrowser.open("mailto:%s?subject=%s&cc=%s&body=%s" %(mail_list, subject, cc_list, email_en_jp), new=1) # open mail client with prefilled info
finally:
f.close()
pass
except Exception, e:
raise e
edit: Forgot to add I am using Python 2.7.1

EDIT 2: Found a workable solution after all.
Replace your webbrowser call with this.
import subprocess
[... other code ...]
arg = "mailto:%s?subject=%s&cc=%s&body=%s" % (mail_list, subject, cc_list, email_en_jp)
subprocess.call(["open", arg])
This will open your default email client on MacOS. For other OSes please replace "open" in the subprocess line with the proper executable.
EDIT: I looked into it a bit more and Mark's comment above made me read the RFC (2368) for mailto URL scheme.
The special hname "body" indicates that the associated hvalue is the
body of the message. The "body" hname should contain the content for
the first text/plain body part of the message. The mailto URL is
primarily intended for generation of short text messages that are
actually the content of automatic processing (such as "subscribe"
messages for mailing lists), not general MIME bodies.
And a bit further down:
8-bit characters in mailto URLs are forbidden. MIME encoded words (as
defined in [RFC2047]) are permitted in header values, but not for any
part of a "body" hname."
So it looks like this is not possible as per RFC, although that makes me question why the JavaScript solution in the JSFiddle provided by naota works at all.
I leave my previous answer as is below, although it does not work.
I have run into same issues with Python 2.7.x quite a couple of times now and every time a different solution somehow worked.
So here are several suggestions that may or may not work, as I haven't tested them.
a) Force unicode strings:
webbrowser.open(u"mailto:%s?subject=%s&cc=%s&body=%s" % (mail_list, subject, cc_list, email_en_jp), new=1)
Notice the small u right after the opening ( and before the ".
b) Force the regex to use unicode:
email_jp = re.sub(ur'{{date}}', date_range, email_content_jp).encode("UTF-8")
# or maybe
email_jp = re.sub(ur'{{date}}', date_range, email_content_jp)
c) Another idea regarding the regex, try compiling it first with the re.UNICODE flag, before applying it.
pattern = re.compile(ur'{{date}}', re.UNICODE)
d) Not directly related, but I noticed you write the combined text via the normal open method. Try using the codecs.open here as well.
f = codecs.open(email_to_send, "w", "UTF-8")
Hope this helps.

Get the Gmail attachment filename without downloading it

I'm trying to get all the messages from a Gmail account that may contain some large attachments (about 30MB). I just need the names, not the whole files. I found a piece of code to get a message and the attachment's name, but it downloads the file and then read its name:
import imaplib, email
#log in and select the inbox
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('username', 'password')
mail.select('inbox')
#get uids of all messages
result, data = mail.uid('search', None, 'ALL')
uids = data[0].split()
#read the lastest message
result, data = mail.uid('fetch', uids[-1], '(RFC822)')
m = email.message_from_string(data[0][1])
if m.get_content_maintype() == 'multipart': #multipart messages only
for part in m.walk():
#find the attachment part
if part.get_content_maintype() == 'multipart': continue
if part.get('Content-Disposition') is None: continue
#save the attachment in the program directory
filename = part.get_filename()
fp = open(filename, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
print '%s saved!' % filename
I have to do this once a minute, so I can't download hundreds of MB of data. I am a newbie into the web scripting, so could anyone help me? I don't actually need to use imaplib, any python lib will be ok for me.
Best regards

Rather than fetch RFC822, which is the full content, you could specify BODYSTRUCTURE.
The resulting data structure from imaplib is pretty confusing, but you should be able to find the filename, content-type and sizes of each part of the message without downloading the entire thing.

If you know something about the file name, you can use the X-GM-RAW gmail extensions for imap SEARCH command. These extensions let you use any gmail advanced search query to filter the messages. This way you can restrict the downloads to the matching messages, or exclude some messages you don't want.
mail.uid('search', None, 'X-GM-RAW',
'has:attachment filename:pdf in:inbox -label:parsed'))
The above search for messages with PDF attachments in INBOX not labeled "parsed".
Some pro tips:
label the messages you have already parsed, so you don't need to fetch them again (the -label:parsed filter in the above example)
always use the uid version instead of the standard sequential ids (you are already doing this)
unfortunately MIME is messy: there are a lot of clients that do weird (or plain wrong) things. You could try to download and parse only the headers, but is it worth the trouble?
[edit]
If you label a message after parsing it, you can skip the messages you have parsed already. This should be reasonable enough to monitor your class mailbox.
Perhaps you live in a corner of the world where internet bandwidth is more expensive than programmer time; in this case, you can fetch only the headers and look for "Content-disposition" == "attachment; filename=somefilename.ext".

A FETCH of the RFC822 message data item is functionally equivalent to BODY[]. IMAP4 supports other message data items, listed in section 6.4.5 of RFC 3501.
Try requesting a different set of message data items to get just the information that you need. For example, you could try RFC822.HEADER or maybe BODY.PEEK[MIME].

Old question, but just wanted to share the solution to this I came up with today. Searches for all emails with attachments and outputs the uid, sender, subject, and a formatted list of attachments. Edited relevant code to show how to format BODYSTRUCTURE:
data = mailobj.uid('fetch', mail_uid, '(BODYSTRUCTURE)')[1]
struct = data[0].split()
list = [] #holds list of attachment filenames
for j, k in enumerate(struct):
if k == '("FILENAME"':
count = 1
val = struct[j + count]
while val[-3] != '"':
count += 1
val += " " + struct[j + count]
list.append(val[1:-3])
elif k == '"FILENAME"':
count = 1
val = struct[j + count]
while val[-1] != '"':
count += 1
val += " " + struct[j + count]
list.append(val[1:-1])
I've also published it on GitHub.
EDIT
Above solution is good but the logic to extract attachment file name from payload is not robust. It fails when file name contains space with first word having only two characters,
for example: "ad cde gh.png".
Try this:
import re # Somewhere at the top
result, data = mailobj.uid("fetch", mail_uid, "BODYSTRUCTURE")
itr = re.finditer('("FILENAME" "([^\/:*?"<>|]+)")', data[0].decode("ascii"))
for match in itr:
print(f"File name: {match.group(2)}")
Test Regex here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract a single line of text from an email - python

I think the b in front of your string makes it a a byte literal. What if you put a .decode('UTF-8') behind your Body Text string?

Related

If statement is not working in below python code it skip if statement and run else statement only

Importing text and html template for e-mail correctly

How to add .htm to email body using win32com

Unable to display Japanese (UTF-8) characters in email body with webbrowser

Get the Gmail attachment filename without downloading it

Categories

Resources