What I am trying to do:
I want to retrieve emails from gmail, download the email as a text file then delete the original email off the gmail server.
The Issue:
The Issue I run into is when I add the delete portion of the code. So If I don't delete, it will pull down all the emails I want and save them as text files. Once I added the delete portion of my code it downloads 2 emails then deletes them off the gmail server then gives the error below on the third. I can run the script again and it will download another 2 and give the same error on the third. I can't see what i'm doing wrong, any help with this would be greatly appreciated.
The Code
#!/usr/bin/env python
import getpass, imaplib, email, os
from email.parser import HeaderParser
detach_dir = os.path.expanduser('~/Documents/Test/')
M = imaplib.IMAP4_SSL("imap.gmail.com", 993)
address = "EMAIL#GMAIL.com"
password = "PASSWORD"
M.login(address, password)
M.select("EMAIL LABEL")
resp, items = M.search(None, "ALL")
items = items[0].split()
for emailid in items:
resp, data = M.FETCH(emailid, '(RFC822)')
mail = email.message_from_string(data[0][1])
for part in mail.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get_content_subtype() != 'plain':
continue
payload = part.get_payload()
filename = mail["Subject"] + "_gmail.txt"
filename = filename.replace('FW: ', '').replace(' ', '_').replace('\n', '').replace('\r', '').replace('__', '_')
print "FILENAME IS: " + filename
att_path = os.path.join(detach_dir, filename)
if (not os.path.isfile(att_path)):
fp = open(att_path, 'w+')
fp.write(payload)
fp.close()
M.store(emailid, '+FLAGS', '\\Deleted')
M.expunge()
The Error
FILENAME IS: email_1_gmail.txt
FILENAME IS: email_2_gmail.txt
Traceback (most recent call last):
File "/Users/user1/Documents/Personal Work/Project/gmail4.py", line 20, in <module>
mail = email.message_from_string(data[0][1])
TypeError: 'NoneType' object has no attribute '__getitem__'
Related
I am trying to parse some emails using python and I am hitting a wall with html2text. The project is to take a bunch of emails that are responses to a form and get that into a workable csv format. You can invasion the body of the mail being akin to
name: <name>
email: <email address>
grade level: <grade level>
interests: <interests>
(etc)
I need get the body of the email and parse it from raw HTML to text so that I can work through it for questions and their corresponding answers. This is my code so far:
#!/usr/bin/env python3
""" Access email and get contents to csv """
import datetime, os, glob
import email, html2text, imaplib
EMAIL_UN = '<email address>'
EMAIL_PW = '<app token>'
MENTOR = 'Form Submission - Mentor Application'
MENTEE = 'Form Submission - Mentee Application'
cwd = os.getcwd()
sentSince = "SENTSINCE 01-Oct-2021"
def download_emails(SUBJECT):
un = EMAIL_UN
pw = EMAIL_PW
url = 'imap.gmail.com'
folder = "\"" + "Mentoring Program" + "\""
complexSearch = sentSince + " Subject " + "\"" + SUBJECT + "\""
detach_dir = cwd # directory where to save output (default: current)
# connecting to the gmail imap server
m = imaplib.IMAP4_SSL(url,993)
m.login(un,pw)
m.select(folder)
# # This allows us to cycle through the folders and labels in Gmail to find what we need
# for items in m.list('/'):
# for item in items:
# print(item)
resp, items = m.search(None, complexSearch)
# you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp)
items = items[0].split() # getting the mails id
print("Items: \n{}".format(items))
results = [] # This will be a list of dicts, to breakdown each email
for emailid in items:
resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc
if resp != 'OK':
raise Exception("Error reading email: {}".format(data))
message = email.message_from_string(str(data[0][1])) # parsing the mail content to get a mail object
# Next parse the mail object to get a dictionary with important data, text = body of email
res = {
'From' : email.utils.parseaddr(message['From'])[1],
'From name' : email.utils.parseaddr(message['From'])[0],
'Time' : message['Date'],
'To' : message['To'],
'Text' : '',
'File' : None
}
# message.walk is a generator that allows us to walk through the parts of the email
for part in message.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get_content_maintype() == 'text':
# reading as HTML (not plain text)
_html = part.get_payload(decode = True)
print(type(_html)) # checking the type to make sure it's bytes
res['Text'] = html2text.html2text(_html)
results.append(res)
print(results)
if __name__ == '__main__':
download = input("Do you want to search for Mentor or Mentee forms? (enter: Mentor / Mentee) \n")
if download.upper() == 'MENTOR':
download_emails(MENTOR)
elif download.upper() == 'MENTEE':
download_emails(MENTEE)
else:
print('Usage: ./download_emails.py # prompt enter Mentor or Mentee:')
The issue I am running into is that html2text is complaining about needing a bytes like object, but the class of _email is bytes.... what am I messing up here?? You can see the result of print(type(_html)) right before the error.
The error:
<class 'bytes'>
Traceback (most recent call last):
File "Email_ScriptV0.0.1.py", line 97, in <module>
download_emails(MENTOR)
File "Email_ScriptV0.0.1.py", line 61, in download_emails
res['Text'] = html2text.html2text(_html)
File "/opt/anaconda3/lib/python3.8/site-packages/html2text/__init__.py", line 947, in html2text
return h.handle(html)
File "/opt/anaconda3/lib/python3.8/site-packages/html2text/__init__.py", line 142, in handle
self.feed(data)
File "/opt/anaconda3/lib/python3.8/site-packages/html2text/__init__.py", line 138, in feed
data = data.replace("</' + 'script>", "</ignore>")
TypeError: a bytes-like object is required, not 'str'
Thanks for any and all help!
I have a piece of code to get emails from messages from my inbox (gmail). Getting emails work correct when I print email_from but I would like to do some operation on data split name and email etc. but then code broke after print first loop step and I got the error:
Traceback (most recent call last):
File "C:\Users\loc\Desktop\extract_gmail.py", line 24, in <module>
email_message_raw = email.message_from_bytes(data[0][1])
AttributeError: 'str' object has no attribute 'message_from_bytes'
Can you give me some advice how to solve this problem?
Code:
import imaplib
import email
from email.header import Header, decode_header, make_header
# Connection settings
HOST = 'imap.gmail.com'
USERNAME = '***'
PASSWORD = '***'
m = imaplib.IMAP4_SSL(HOST, 993)
m.login(USERNAME, PASSWORD)
m.select('INBOX')
result, data = m.uid('search', None, "ALL")
if result == 'OK':
for num in data[0].split()[:5]:
result, data = m.uid('fetch', num, '(RFC822)')
if result == 'OK':
# Get raw message
email_message_raw = email.message_from_bytes(data[0][1])
# Decode headers
email_from = str(make_header(decode_header(email_message_raw['From'])))
# Print each name and email
name, email = email_from.split('<')
email.replace(">", "")
print(name + "|" + email)
# When i swap to just print email_from then works
# print(email_from)
# Close server connection
m.close()
m.logout()
In your code you replaced the email variable..
Try this...
import imaplib
import email
from email.header import Header, decode_header, make_header
# Connection settings
HOST = 'imap.gmail.com'
USERNAME = '***'
PASSWORD = '***'
m = imaplib.IMAP4_SSL(HOST, 993)
m.login(USERNAME, PASSWORD)
m.select('INBOX')
result, data = m.uid('search', None, "ALL")
if result == 'OK':
for num in data[0].split()[:5]:
result, data = m.uid('fetch', num, '(RFC822)')
if result == 'OK':
# Get raw message
email_message_raw = email.message_from_bytes(data[0][1])
# Decode headers
email_from = str(make_header(decode_header(email_message_raw['From'])))
# Print each name and email
name, email_addr = email_from.split('<')
email_addr.replace(">", "")
print(name + "|" + email_addr)
# When i swap to just print email_from then works
# print(email_from)
# Close server connection
m.close()
m.logout()
I had the same error [happily solved], my mistake was in the shebang. That should point to python3.
#! /usr/bin/env python3
Update: my code works under python 2.6.5 but not python 3 (I'm using 3.4.1).
I'm unable to search for messages in the "All Mail" or "Sent Mail" folders - I get an exception:
imaplib.error: SELECT command error: BAD [b'Could not parse command']
my code:
import imaplib
m = imaplib.IMAP4_SSL("imap.gmail.com", 993)
m.login("myemail#gmail.com","mypassword")
m.select("[Gmail]/All Mail")
using m.select("[Gmail]/Sent Mail") doesn't work either.
But reading from the inbox works:
import imaplib
m = imaplib.IMAP4_SSL("imap.gmail.com", 993)
m.login("myemail#gmail.com","mypassword")
m.select("inbox")
...
I used the mail.list() command to verify the folder names are correct:
b'(\\HasNoChildren) "/" "INBOX"',
b'(\\Noselect \\HasChildren) "/" "[Gmail]"',
b'(\\HasNoChildren \\All) "/" "[Gmail]/All Mail"',
b'(\\HasNoChildren \\Drafts) "/" "[Gmail]/Drafts"',
b'(\\HasNoChildren \\Important) "/" "[Gmail]/Important"',
b'(\\HasNoChildren \\Sent) "/" "[Gmail]/Sent Mail"',
b'(\\HasNoChildren \\Junk) "/" "[Gmail]/Spam"',
b'(\\HasNoChildren \\Flagged) "/" "[Gmail]/Starred"',
b'(\\HasNoChildren \\Trash) "/" "[Gmail]/Trash"'
I'm following the solutions from these questions, but they don't work for me:
imaplib - What is the correct folder name for Archive/All Mail in Gmail?
I cannot search sent emails in Gmail with Python
Here is a complete sample program that doesn't work on Python 3:
import imaplib
import email
m = imaplib.IMAP4_SSL("imap.gmail.com", 993)
m.login("myemail#gmail.com","mypassword")
m.select("[Gmail]/All Mail")
result, data = m.uid('search', None, "ALL") # search all email and return uids
if result == 'OK':
for num in data[0].split():
result, data = m.uid('fetch', num, '(RFC822)')
if result == 'OK':
email_message = email.message_from_bytes(data[0][1]) # raw email text including headers
print('From:' + email_message['From'])
m.close()
m.logout()
The following exception is thrown:
Traceback (most recent call last):
File "./eport3.py", line 9, in <module>
m.select("[Gmail]/All Mail")
File "/RVM/lib/python3/lib/python3.4/imaplib.py", line 682, in select
typ, dat = self._simple_command(name, mailbox)
File "/RVM/lib/python3/lib/python3.4/imaplib.py", line 1134, in _simple_command
return self._command_complete(name, self._command(name, *args))
File "/RVM/lib/python3/lib/python3.4/imaplib.py", line 965, in _command_complete
raise self.error('%s command error: %s %s' % (name, typ, data))
imaplib.error: SELECT command error: BAD [b'Could not parse command']
Here's the corresponding Python 2 version that works:
import imaplib
import email
m = imaplib.IMAP4_SSL("imap.gmail.com", 993)
m.login("myemail#gmail.com","mypassword")
m.select("[Gmail]/All Mail")
result, data = m.uid('search', None, "ALL") # search all email and return uids
if result == 'OK':
for num in data[0].split():
result, data = m.uid('fetch', num, '(RFC822)')
if result == 'OK':
email_message = email.message_from_string(data[0][1]) # raw email text including headers
print 'From:' + email_message['From']
m.close()
m.logout()
As it's mentioned in this answer:
Try using m.select('"[Gmail]/All Mail"'), so that the double quotes get transmitted.
I suspect imaplib is not properly quoting the string, so the server gets what looks like two arguments: [Gmail]/All, and Mail.
And it works in python v3.4.1
import imaplib
import email
m = imaplib.IMAP4_SSL("imap.gmail.com", 993)
m.login("myemail#gmail.com","mypassword")
m.select('"[Gmail]/All Mail"')
result, data = m.uid('search', None, "ALL") # search all email and return uids
if result == 'OK':
for num in data[0].split():
result, data = m.uid('fetch', num, '(RFC822)')
if result == 'OK':
email_message = email.message_from_bytes(data[0][1]) # raw email text including headers
print('From:' + email_message['From'])
m.close()
m.logout()
I want to download the attachments from Unread Messages, but also does not want the messages to be flagged Seen.
The below code works, but currently setting the mail as Seen
Tried '(BODY.PEEK[HEADER])' , but then even mail download stopped.
import upload,checkFileAtServer,sha1sum,email, getpass, imaplib, os
detach_dir = '.'
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login('myaccount#gmail.com','password')
m.select("inbox")
resp, items = m.search(None, "(UNSEEN)")
items = items[0].split()
for emailid in items:
#resp, data = m.fetch(emailid, '(BODY.PEEK[HEADER])')
resp, data = m.fetch(emailid, "(RFC822)")
email_body = data[0][1]
mail = email.message_from_string(email_body)
temp = m.store(emailid,'+FLAGS', '\\Seen')
m.expunge()
if mail.get_content_maintype() != 'multipart':
continue
print "["+mail["From"]+"] :" + mail["Subject"]
for part in mail.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
att_path = os.path.join(detach_dir, filename)
if not os.path.isfile(att_path) :
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
sha1sum = sha1sum.calculateSHA1(att_path)
print type(sha1sum)
responseFromServer = checkFileAtServer.responseFromServer(sha1sum)
if(responseFromServer == "NOT_CHECKED"):
upload.uploadToSecureServer('root','root',att_path,att_path)
Anybody can guide me what am I missing ?
Thanks.
If you do not want to mark a message as \Seen, don't call the STORE IMAP command and don't use FETCHable items which are documented to cause an implicit marking as such (yes, the RFC822 is an alias for BODY[] which causes the message to be marked as read).
This is covered extensively in SO, so I apologize in advance ...however, I've gone through the posts and can't get this to work.
GOALS
Want to get email from gmail that match certain criteria, save the attachments, then delete them.
ISSUE
So, I can get everything to work except deleting the emails. It deletes a few then I get this error:
Traceback (most recent call last): File "get_overdues.py", line 22,
in
email_body = data[0][1] TypeError: 'NoneType' object is unsubscriptable
Every time I run it it deletes more emails then exits with the same error. This has to run on a cronjob and can't be babysat.
What am I doing wrong?
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,word)
m.select("INBOX")
searchString = "(SUBJECT \"Daily Mail Notices\")"
resp, items = m.search(None,searchString)
items = items[0].split()
for emailid in items:
print emailid
resp, data = m.fetch(emailid, "(RFC822)")
email_body = data[0][1]
mail = email.message_from_string(email_body)
if mail.get_content_maintype() != 'multipart':
continue
print "["+mail["From"]+"] :" + mail["Subject"] + mail["Date"]
sub_dir = re.sub('[,:\- ]','', mail["Date"])
for part in mail.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
message_dir = os.path.join(dump_dir, sub_dir)
if not os.path.exists(message_dir):
os.makedirs(message_dir)
filename = part.get_filename()
counter = 1
if not filename:
filename = 'overdues-%s' % counter
counter += 1
att_path = os.path.join(dump_dir, message_dir, filename)
if not os.path.isfile(att_path) :
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
m.store(emailid, '+FLAGS', r'(\Deleted)')
m.expunge()
m.close()
m.logout()
Your problem is clearly with fetch:
resp, data = m.fetch(emailid, "(RFC822)")
email_body = data[0][1]
It's returning a NoneType for either data or, less likely, for data[0], and None obviously isn't subscriptable. You may want to double check the results of m.fetch and see if it's coming the form you expect it to.
This is probably because this email was deleted (and not expunged).