I am using python Imaplib to scrape zoho inbox for getting bounced emails & failed emails which are being sent from SES.
Now while trying to get the email from abuse report notification, the email body gives no result (NONE)
The Code is:
def ss():
yesterday = (datetime.today() - timedelta(days=30)).strftime('%d-%b-%Y')
M = imaplib.IMAP4_SSL('imap.zoho.com')
M.login('email', password)
M.select()
line = '(FROM "complaints#us-west-2.email-abuse.amazonses.com" SINCE {0})'.format(yesterday)
typ, data = M.uid('search', line)
# print(typ,data)
for i in reversed(data[0].split()):
print(i)
result, data = M.fetch(i, "(RFC822)")
print(data)
Normally M.fetch(i, "(RFC822)") returns Body of the email.
Here the data is None. I want to know how to get the right content so that i could use regex to get relevant mail id
Got the solution, It was a bad mistake.
Instead of using
result, data = M.fetch(i, "(RFC822)")
I had to use :
result, data = M.uid('fetch', i, '(RFC822)')
As previously I had searched through UID instead fo the volatile id. Then later I was trying to get RFC822 or body of mail by volatile id.
It was perhaps giving none because the mail might have been deleted or something.
Related
I am able to log in a gmail account with python IMAP
imap = imaplib.IMAP4_SSL('imap.gmail.com')
imap.login(myDict["emailUsername"], myDict["emailPassword"])
imap.select(mailbox='inbox', readonly=False)
resp, items = imap.search(None, 'All')
email_ids = items[0].split()
latest_email_id = email_ids[-1]
resp, data = imap.fetch(latest_email_id, "(UID)")
print ("resp= ", resp, " data=", data)
#msg_uid = parse_uid(data[0])
match = pattern_uid.match(data[0].decode("utf-8"))
#print ("match= ", match)
msg_uid = match.group('uid')
I need to make sure that the UID for the last email I have contains a certain string (XYZ). I am NOT looking for header subject but the content of email. How can I do that ?
There's a couple ways you could go:
Fetch the message and walk through the text body parts looking for your string -- example at Finding links in an emails body with Python
Get the server to do the search by supplying 'latest_email_id' and your search criteria back to the server in a UID SEARCH command. For Gmail, you can even use the X-GM-RAW attribute to use the same syntax support by the GMail web interface. See https://developers.google.com/gmail/imap/imap-extensions for details of that.
I am using this code:
import imaplib
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(myusername, mypassword)
mail.list()
# Out: list of "folders" aka labels in gmail.
mail.select("inbox") # connect to inbox.
result, data = mail.search(None, "ALL")
ids = data[0] # data is a list.
id_list = ids.split() # ids is a space separated string
latest_email_id = id_list[-1] # get the latest
result, data = mail.fetch(latest_email_id, "(RFC822)") # fetch the email body (RFC822) for the given ID
raw_email = data[0][1] # here's the body, which is raw text of the whole email
# including headers and alternate payloads
print raw_email
and it works, except, when I print raw_email it returns a bunch of extra information, how can I, parse, per say, the extra information and get just the From and body text?
Python's email package is probably a good place to start.
import email
msg = email.message_from_string(raw_email)
print msg['From']
print msg.get_payload(decode=True)
That should do ask you ask, though when an email has multiple parts (attachments, text and HTML versions of the body, etc.) things are a bit more complicated.
In that case, msg.is_multipart() will return True and msg.get_payload() will return a list instead of a string. There's a lot more information in the email.message documentation.
Alternately, rather than parsing the raw RFC822-formatted message - which could be very large, if the email contains attachments - you could just ask the IMAP server for the information you want. Changing your mail.fetch line to:
mail.fetch(latest_email_id, "(BODY[HEADER.FIELDS (FROM)])")
Would just request (and return) the From line of the email from the server. Likewise setting the second parameter to "(UID BODY[TEXT])" would return the body of the email. RFC2060 has a list of parameters that should be valid here.
IMAP high level lib: https://github.com/ikvk/imap_tools (I am author)
from imap_tools import MailBox, A
with MailBox('imap.mail.com').login('test#mail.com', 'password', 'INBOX') as mailbox:
for msg in mailbox.fetch(A(all=True)):
sender = msg.from_
body = msg.text or msg.html
Alternatively, you can use Red Box (I'm the author):
from redbox import EmailBox
# Create email box instance
box = EmailBox(
host="imap.example.com",
port=993,
username="me#example.com",
password="<PASSWORD>"
)
# Select an email folder
inbox = box["INBOX"]
# Search and process messages
for msg in inbox.search(all=True):
# Process the message
print(msg.from_)
print(msg.to)
print(msg.subject)
print(msg.text_body)
print(msg.html_body)
Some relevant links in the documentations:
More about querying
More about manipulating the message
More about configuring the email box
To install:
pip install redbox
Links:
Source code
Documentation
I have a python script that is checking emails on Gmail's IMAP server, it displays the latest email on the server. The email address however is receiving emails from multiple different accounts. What would I do to take this script and have it view ONLY messages that are to a specific sender? For example, any email only sent to "johndoe#gmail.com" will come up.
import imaplib
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('myemail#gmail.com', 'password123')
mail.list()
# Out: list of "folders" aka labels in gmail.
mail.select("inbox") # connect to inbox.
result, data = mail.search(None, "ALL")
ids = data[0] # data is a list.
id_list = ids.split() # ids is a space separated string
latest_email_id = id_list[-1] # get the latest
result, data = mail.fetch(latest_email_id, "(RFC822)") # fetch the email body (RFC822) for the given ID
raw_email = data[0][1] # here's the body, which is raw text of the whole email
# including headers and alternate payloads
Thanks in advance! :)
change:
result, data = mail.search(None, "ALL")
to:
result, data = mail.search(None, '(TO "johndoe#gmail.com")')
I am encountering a really strange bug.
for emailid in item_ids:
resp, data = conn.fetch(emailid, "(RFC822)")
try:
db.emails.insert({'raw': data})
So I am fetching a bunch of data from gmail using oauth2.clients.imap. After fetching the email from gmail, I have decided to store it as "raw" first in my mongodb.
And then in another part of my script, I do something like this:
for i,j in enumerate(db.emails.find()):
raw_s = j['raw'][0][1]
email = email_module.message_from_string(raw_s)
if email.is_multipart():
print get_cleaned_body(email)
note that I did a import email as email_module and shadowed the variable email as I cannot think of a better term for a variable to hold an email instance
Now the strange this is, none of my email instances are multipart!
If i modify my retrieval code to be:
for emailid in item_ids:
resp, data = conn.fetch(emailid, "(RFC822)")
try:
#db.emails.insert({'raw': data})
e = email.message_from_string(data[0][1])
print e.is_multipart()
I am seeing a few Trues.
I guess one possible explanation for this might be that saving the data into mongodb messes up something which doesn't allow the email to be parse correctly?
Turns out you should do this:
oid = db.emails.insert({'raw': bson.binary.Binary(data)})
Saving in binary ensures that the original content of the data is not changed.
I would like to receive email using python. So far I have been able to get the subject but not the body. Here is the code I have been using:
import poplib
from email import parser
pop_conn = poplib.POP3_SSL('pop.gmail.com')
pop_conn.user('myusername')
pop_conn.pass_('mypassword')
#Get messages from server:
messages = [pop_conn.retr(i) for i in range(1, len(pop_conn.list()[1]) + 1)]
# Concat message pieces:
messages = ["\n".join(mssg[1]) for mssg in messages]
#Parse message intom an email object:
messages = [parser.Parser().parsestr(mssg) for mssg in messages]
for message in messages:
print message['subject']
print message['body']
pop_conn.quit()
My issue is that when I run this code it properly returns the Subject but not the body. So if I send an email with the subject "Tester" and the body "This is a test message" it looks like this in IDLE.
>>>>Tester >>>>None
So it appears to be accurately assessing the subject but not the body, I think it is in the parsing method right? The issue is that I don't know enough about these libraries to figure out how to change it so that it returns both a subject and a body.
The object message does not have a body, you will need to parse the multiple parts, like this:
for part in message.walk():
if part.get_content_type():
body = part.get_payload(decode=True)
The walk() function iterates depth-first through the parts of the email, and you are looking for the parts that have a content-type. The content types can be either text/plain or text/html, and sometimes one e-mail can contain both (if the message content_type is set to multipart/alternative).
The email parser returns an email.message.Message object, which does not contain a body key, as you'll see if you run
print message.keys()
What you want is the get_payload() method:
for message in messages:
print message['subject']
print message.get_payload()
pop_conn.quit()
But this gets complicated when it comes to multi-part messages; get_payload() returns a list of parts, each of which is a Message object. You can get a particular part of the multipart message by using get_payload(i), which returns the ith part, raises an IndexError if i is out of range, or raises a TypeError if the message is not multipart.
As Gustavo Costa De Oliveir points out, you can use the walk() method to get the parts in order -- it does a depth-first traversal of the parts and subparts of the message.
There's more about the email.parser module at http://docs.python.org/library/email.message.html#email.message.Message.
it also good return data in correct encoding in message contains some multilingual content
charset = part.get_content_charset()
content = part.get_payload(decode=True)
content = content.decode(charset).encode('utf-8')
Here is how I solved the problem using python 3 new capabilities:
import imaplib
import email
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(username, password)
mail.select(readonly=True) # refresh inbox
status, message_ids = mail.search(None, 'ALL') # get all emails
for message_id in message_ids[0].split(): # returns all message ids
# for every id get the actual email
status, message_data = mail.fetch(message_id, '(RFC822)')
actual_message = email.message_from_bytes(message_data[0][1])
# extract the needed fields
email_date = actual_message["Date"]
subject = actual_message["Subject"]
message_body = get_message_body(actual_message)
Now get_message_body is actually pretty tricky due to MIME format. I used the function suggested in this answer.
This particular example works with Gmail, but IMAP is a standard protocol, so it should work for other email providers as well, possibly with minor changes.
if u want to use IMAP4. Use outlook python library, download here : https://github.com/awangga/outlook
to retrieve unread email from your inbox :
import outlook
mail = outlook.Outlook()
mail.login('emailaccount#live.com','yourpassword')
mail.inbox()
print mail.unread()
to retrive email element :
print mail.mailbody()
print mail.mailsubject()
print mail.mailfrom()
print mail.mailto()