Storing email into mongodb - python

I am encountering a really strange bug.
for emailid in item_ids:
resp, data = conn.fetch(emailid, "(RFC822)")
try:
db.emails.insert({'raw': data})
So I am fetching a bunch of data from gmail using oauth2.clients.imap. After fetching the email from gmail, I have decided to store it as "raw" first in my mongodb.
And then in another part of my script, I do something like this:
for i,j in enumerate(db.emails.find()):
raw_s = j['raw'][0][1]
email = email_module.message_from_string(raw_s)
if email.is_multipart():
print get_cleaned_body(email)
note that I did a import email as email_module and shadowed the variable email as I cannot think of a better term for a variable to hold an email instance
Now the strange this is, none of my email instances are multipart!
If i modify my retrieval code to be:
for emailid in item_ids:
resp, data = conn.fetch(emailid, "(RFC822)")
try:
#db.emails.insert({'raw': data})
e = email.message_from_string(data[0][1])
print e.is_multipart()
I am seeing a few Trues.
I guess one possible explanation for this might be that saving the data into mongodb messes up something which doesn't allow the email to be parse correctly?

Turns out you should do this:
oid = db.emails.insert({'raw': bson.binary.Binary(data)})
Saving in binary ensures that the original content of the data is not changed.

Related

Transfer an email with Python

I've tried with no conclusions to resend emails with Python.
Once I've logged in SMTP and IMAP with TLS, this is what I have written:
status, data = self._imapserver.fetch(id, "(RFC822)")
email_data = data[0][1]
# create a Message instance from the email data
message = email.message_from_string(email_data)
# replace headers (could do other processing here)
message.replace_header("From", 'blablabla#bliblibli.com')
message.replace_header("To", 'blobloblo#blublublu.com')
self._smtpserver.sendmail('blablabla#bliblibli.com', 'blobloblo#blublublu.com', message.as_string())
But the problem is that the variable data doesn't catch the information from the email, even if the ID is the one I need.
It tells me:
b'The specified message set is invalid.'
How can I transfer an email with Python?
Like the error message says, whatever you have in id is invalid. We don't know what you put there, so all we can tell you is what's already in the error message.
(Also, probably don't use id as a variable name, as you will shadow the built-in function with the same name.)
There are additional bugs further on in your code; you need to use message_from_bytes if you want to parse it, though there is really no need to replace the headers just to resend it.
status, data = self._imapserver.fetch(correct_id, "(RFC822)")
self._smtpserver.sendmail('blablabla#bliblibli.com', 'blobloblo#blublublu.com', data[0][1])
If you want to parse the message, you should perhaps add a policy argument; this selects the modern EmailMessage API which was introduced in Python 3.6.
from email.policy import default
...
message = email.message_from_bytes(data[0][1], policy=default)
message["From"] = "blablabla#bliblibli.com"
message["To"] = "blobloblo#blublublu.com"
self._smtpserver.send_message(message)
The send_message method is an addition to the new API. If the message could contain other recipient headers like Cc:, Bcc: etc, perhaps using the good old sendmail method would be better, as it ignores the message's headers entirely.

python IMAP content of email contains a string

I am able to log in a gmail account with python IMAP
imap = imaplib.IMAP4_SSL('imap.gmail.com')
imap.login(myDict["emailUsername"], myDict["emailPassword"])
imap.select(mailbox='inbox', readonly=False)
resp, items = imap.search(None, 'All')
email_ids = items[0].split()
latest_email_id = email_ids[-1]
resp, data = imap.fetch(latest_email_id, "(UID)")
print ("resp= ", resp, " data=", data)
#msg_uid = parse_uid(data[0])
match = pattern_uid.match(data[0].decode("utf-8"))
#print ("match= ", match)
msg_uid = match.group('uid')
I need to make sure that the UID for the last email I have contains a certain string (XYZ). I am NOT looking for header subject but the content of email. How can I do that ?
There's a couple ways you could go:
Fetch the message and walk through the text body parts looking for your string -- example at Finding links in an emails body with Python
Get the server to do the search by supplying 'latest_email_id' and your search criteria back to the server in a UID SEARCH command. For Gmail, you can even use the X-GM-RAW attribute to use the same syntax support by the GMail web interface. See https://developers.google.com/gmail/imap/imap-extensions for details of that.

Getting complaint email from SES abuse report email

I am using python Imaplib to scrape zoho inbox for getting bounced emails & failed emails which are being sent from SES.
Now while trying to get the email from abuse report notification, the email body gives no result (NONE)
The Code is:
def ss():
yesterday = (datetime.today() - timedelta(days=30)).strftime('%d-%b-%Y')
M = imaplib.IMAP4_SSL('imap.zoho.com')
M.login('email', password)
M.select()
line = '(FROM "complaints#us-west-2.email-abuse.amazonses.com" SINCE {0})'.format(yesterday)
typ, data = M.uid('search', line)
# print(typ,data)
for i in reversed(data[0].split()):
print(i)
result, data = M.fetch(i, "(RFC822)")
print(data)
Normally M.fetch(i, "(RFC822)") returns Body of the email.
Here the data is None. I want to know how to get the right content so that i could use regex to get relevant mail id
Got the solution, It was a bad mistake.
Instead of using
result, data = M.fetch(i, "(RFC822)")
I had to use :
result, data = M.uid('fetch', i, '(RFC822)')
As previously I had searched through UID instead fo the volatile id. Then later I was trying to get RFC822 or body of mail by volatile id.
It was perhaps giving none because the mail might have been deleted or something.

imaplib.error: command FETCH illegal in state AUTH

I'm trying to download attachments from Gmail, using a combination of pieces of code I found online, and some editing from myself. However, the following code:
import email, getpass, imaplib, os, random, time
import oauth2 as oauth
import oauth2.clients.imap as imaplib
MY_EMAIL = 'example#gmail.com'
MY_TOKEN = "token"
MY_SECRET = "secret"
consumer = oauth.Consumer('anonymous', 'anonymous')
token = oauth.Token(MY_TOKEN, MY_SECRET)
url = "https://mail.google.com/mail/b/"+MY_EMAIL+"/imap/"
m = imaplib.IMAP4_SSL('imap.gmail.com')
m.authenticate(url, consumer, token)
m.select('INBOX')
items = m.select("UNSEEN");
items = items[0].split()
for emailid in items:
data = m.fetch(emailid, "(RFC822)")
returns this error:
imaplib.error: command FETCH illegal
in state AUTH
Why would Fetch be illegal while I'm authorized?
You're lacking error checking on your calls to select. Typically, this is how I'll structure the first parts of a connection to a mailbox:
# self.conn is an instance of IMAP4 connected to my server.
status, msgs = self.conn.select('INBOX')
if status != 'OK':
return # could be break, or continue, depending on surrounding code.
msgs = int(msgs[0])
Essentially, the trouble you're encountering is that you've selected a mailbox that doesn't exist, your status message is probably not "OK" as it should be, and the value you're iterating over isn't valid. Remember, select expects a mailbox name. It does not search based on a flag (which may be what you're attempting with "UNSEEN"). When you select a non-existent mail box you actually get this as a response:
('NO', ['The requested item could not be found.'])
In which case, for email id in items is not operating properly. Not what you're after in any way, unfortunately. What you'd get on a valid mailbox would be like this:
('OK', ['337'])
Hope that helps.
To address the question in comments, if you want to actually retrieve the unseen messages in the mailbox you'd use this:
status, msgs = self.conn.select('INBOX')
# returns ('OK', ['336'])
status, ids = self.conn.search(None, 'UNSEEN')
# returns ('OK', ['324 325 326 336'])
if status == 'OK':
ids = map(int, ids[0].split())
The response is going to be similar to the response from select, but instead of a single integer for the number of messages you'll get a list of ids.
Why would Fetch be illegal while I'm authorized?
IMAP has a state concept.
You can only fetch messages if you have selected a folder.
Here is a nice picture which shows this:
Image source: http://www.tcpipguide.com/free/t_IMAP4GeneralOperationClientServerCommunicationandS-2.htm

How can I extract only the email body with Python using IMAP?

I am relatively new to programming and to python, but I think I have done ok so far. This is the code I have, and it works fine, except it gets the entire message in MIME format. I only want the text body of unread emails, but I can't quite figure it out how to strip out all of the formatting and header info. If I send a basic email using a smtp python script that I made it works fine, and only prints the body, but if I send the email using outlook it prints a bunch of extra garbage. Any help is very much appreciated.
client = imaplib.IMAP4_SSL(PopServer)
client.login(USER, PASSWORD)
client.select('INBOX')
status, email_ids = client.search(None, '(UNSEEN SUBJECT "%s")' % PrintSubject)
print email_ids
client.store(email_ids[0].replace(' ',','),'+FLAGS','\Seen')
for email in get_emails(email_ids):
get_emails()
def get_emails(email_ids):
data = []
for e_id in email_ids[0].split():
_, response = client.fetch(e_id, '(UID BODY[TEXT])')
data.append(response[0][1])
return data
Sounds like you're looking for the email package:
The email package provides a standard parser that understands most email document structures, including MIME documents. You can pass the parser a string or a file object, and the parser will return to you the root Message instance of the object structure. For simple, non-MIME messages the payload of this root object will likely be a string containing the text of the message. For MIME messages, the root object will return True from its is_multipart() method, and the subparts can be accessed via the get_payload() and walk() methods.

Categories