Python IMAP - Read Gmail with '+' in email address - python

I've previously used imaplib in Python 3to extract emails from gmail. However I would want to generate a script to differentiate emails to the same address with different strings after a plus sign. For example, the base email address can be:
example#gmail.com
Then I would want to separately read all emails with the addresses:
example+test1#gmail.com,
example+test2#gmail.com,
example#gmail.com.
Therefore I would wind up with a dictionary of lists containing the specific emails. This only works for example#gmail.com. For example:
{'example':[],
'example_test':[],
'example_test2':[]}
Currently I can retrieve the emails that I need with this function from a class:
def get_emails(self):
"""Retrieve emails"""
self.M = imaplib.IMAP4_SSL(self.server)
self.M.login(self.emailaddress,self.password)
self.M.select(readonly=1)
self.M.select('INBOX', readonly=True)
#Yesterdays date
date = (datetime.date.today() - datetime.timedelta(self.daysback)).strftime("%d-%b-%Y")
print("Selecting email messages since %s" % date)
#Retrieve all emails from yesterday on
result,data = self.M.uid('search', None, '(SENTSINCE {date})'.format(date=date))
return result,data

You should directly use the exact mail address you want in the IMAP search request. For example it could be something like :
result,data = self.M.uid('search', None, '(SENTSINCE {date})'.format(date=date),
('TO example+test1#gmail.com'))

Related

How can I get the date recieved / sent from email in python

I have a program that needs to read in emails and validate if they are from this month, before continuing.
I obtain the email info via the following code
import email
import smtplib
import imaplib
mail = imaplib.IMAP4_SSL('redacted', 993)
mail.login(username, bytes(password).decode('utf-8')) #password is bytes that have been decrypted
msg_data2 = [] #My template allows for multiple email data to be appended
mailbox_data = mail.list()
mail.select('INBOX', readonly=True)
result, msg_ids = mail.search(None, f'(SEARCH CRITERIA REDACTED)')
lister = msg_ids[0].split()
most_recent = lister[-1]
result2, msg_data = mail.fetch(most_recent, '(RFC822)')
msg_data2.append(msg_data)
raw = email.message_from_bytes(msg_data[0][1])
from here im able to get attachments from my emails matching the search criteria, and previously, vendors would name the files properly with the month their jobs ran. Now some are not, so Im attempting to just check the date the email was sent or received.
You can get the sending date from the email's 'date' header.
from email import utils
...
raw = email.message_from_bytes(msg_data[0][1])
datestring = raw['date']
print(datestring)
# Convert to datetime object
datetime_obj = utils.parsedate_to_datetime(datestring)
print(repr(datetime_obj))
The Date: header is inserted by the sender, and may or may not be accurate. For example, when I write an email and place it in the outbox, it gets the date and time of me placing it in the outbox in the Date: header. The header remains the same even if I only send the email hours (or possibly days) later.
This still doesn't say anything on when it was received. It may be stuck in transit for days. For that it depends on your mail client. For example, Claws inserts a X-Received header when it fetches mail, and that will have the timestamp when Claws downloaded the email from the server to your local machine. This may be minutes or even days after it arrived in your inbox.
To check when the email actually was received by your email provider, look at the Received: headers. The top header is from your (provider's) mail server. It should end in a time stamp, with a semicolon separating the time stamp from the rest of the header.
All RFC 5322 time stamps can be parsed with email.utils.parsedate.
So the code would be something along those lines:
from email import utils
mail = "..."
sent = mail['date']
print(f"Date header: {sent}")
received = mail['Received'][0]
received = received.split(";")[-1]
print(f"Received: {received}")
sent_ts = utils.parsedate(sent_date)
received_ts = utils.parsedate(received_ts)
time_in_transit = received_ts = sent_ts
print(f"Sent {sent_ts}, received {received_ts}, took {time_in_transit}")

python IMAP content of email contains a string

I am able to log in a gmail account with python IMAP
imap = imaplib.IMAP4_SSL('imap.gmail.com')
imap.login(myDict["emailUsername"], myDict["emailPassword"])
imap.select(mailbox='inbox', readonly=False)
resp, items = imap.search(None, 'All')
email_ids = items[0].split()
latest_email_id = email_ids[-1]
resp, data = imap.fetch(latest_email_id, "(UID)")
print ("resp= ", resp, " data=", data)
#msg_uid = parse_uid(data[0])
match = pattern_uid.match(data[0].decode("utf-8"))
#print ("match= ", match)
msg_uid = match.group('uid')
I need to make sure that the UID for the last email I have contains a certain string (XYZ). I am NOT looking for header subject but the content of email. How can I do that ?
There's a couple ways you could go:
Fetch the message and walk through the text body parts looking for your string -- example at Finding links in an emails body with Python
Get the server to do the search by supplying 'latest_email_id' and your search criteria back to the server in a UID SEARCH command. For Gmail, you can even use the X-GM-RAW attribute to use the same syntax support by the GMail web interface. See https://developers.google.com/gmail/imap/imap-extensions for details of that.

how filtre huge email list by domain with python

i need help with python
how filtre huge email list by domain with python?
my email list contain different email AOl Gmail Hotmail ....
i want to select one domain ex Gmail and creat a new file contain only gmail adresses
this is the regex function how can i edit it to get only gmail accounts ?
regex = re.compile(("([a-z0-9!#$%&*+\/=?^_{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_" "{|}~-]+)*(#|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|" "\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
Can you provide an example of input data?
Anyway, you don't need regex here, just split each email address on its # and get the domains.
If you have a string with one address per line, you can do the following.
hosts = {}
for address in addresses.splitlines():
_, host = address.split('#')
if host not in hosts:
hosts[host] = [address]
else:
hosts[host].append(address)

Python GMail IMAP doesn't return UIDs

I am using imaplib with Python to fetch the contents of my inbox or GMail labels.
My problem is: imaplib returns the UIDs when I'm querying the inbox, but not my email labels.
If I query the inbox, I get UIDs:
inbox EMAIL_UIDS: 24408 24599 25193 25224 25237 25406 25411 25412 25413 25415
But if I query my label "New" (which contains two of the messages in the inbox, thus should contain two of the above UIDs), I get only the ordered indices:
New EMAIL_UIDS: 1 2
My code is:
#Main file
if folder_name is None:
email_uids = obtain_inbox_email_uids(mail)
else:
email_uids = obtain_folder_email_uids(folder_name,mail)
email_uids = list(email_uids)
and:
#Email utilities file
def obtain_folder_email_uids(folder_name, mail):
"""
Given an IMAP instance,
return the UIDs of the emails in a specific folder.
"""
mail.select(folder_name)
result, data = mail.uid('search', None, "ALL")
print "RESULT, DATA",result,data
email_uids = data[0]
print folder_name,"EMAIL_UIDS:",email_uids
email_uids = email_uids.split(" ")
email_uids = reversed(email_uids)
return email_uids
def obtain_inbox_email_uids(mail):
"""
Given an IMAP instance,
return the UIDs of the inbox emails.
"""
return obtain_folder_email_uids('inbox', mail)
Does anybody know why imaplib returns UIDs for the inbox but ordered indices for the specific labels, and how can I get it to return the UIDs?
Thank you
Found out the problem. The UIDs were being returned correctly.
My problem was: I was using a function to manually fetch an email based on its UID. So in the case of the inbox, I obtained the UIDs, then used this function to obtain each mail in specific. With each label, the same.
However, each specific email was being returned for the inbox, but in the case of a label, I wasn't able to fetch each mail. I therefore assumed I wasn't passing the correct UID to the function.
My mistake was not related. Inside the function to fetch a specific mail, instead of fetching it from the label in question, I was always fetching from the inbox.
So in this case, for the inbox I would obtain a list of UIDs, then use this function to successfully obtain each mail in specific, but in the case of a specific GMail label I would obtain the list of UIDs, then fail to fetch each specific mail because I was performing select on the inbox.
Changed the select from the inbox to the specific label inside the function that fetches each specific mail, and now works perfect.
IMAP UIDs are unique per mailbox. If you just created the mailbox, 1 and 2 are almost certainly the correct UIDs for those two messages. Why do you think they aren't?

Get sender email address with Python IMAP

I have this python IMAP script, but my problem is that, every time I want to get the sender's email address, (From), I always get the sender's first name followed by their email address:
Example:
Souleiman Benhida <souleb#gmail.com>
How can i just extract the email address (souleb#gmail.com)
I did this before, in PHP:
$headerinfo = imap_headerinfo($connection, $count)
or die("Couldn't get header for message " . $count . " : " . imap_last_error());
$from = $headerinfo->fromaddress;
But, in python I can only get the full name w/address, how can I get the address alone? I currently use this:
typ, data = M.fetch(num, '(RFC822)')
mail = email.message_from_string(data[0][1])
headers = HeaderParser().parsestr(data[0][1])
message = parse_message(mail) #body
org = headers['From']
Thanks!
Just one more step, using email.utils:
email.utils.parseaddr(address)
Parse address – which should be the value of some address-containing field such as To or Cc – into its constituent realname and email address parts. Returns a tuple of that information, unless the parse fails, in which case a 2-tuple of ('', '') is returned.
Note: originally referenced rfc822, which is now deprecated.
to = email.utils.parseaddr(msg['cc'])
This works for me.
My external lib https://github.com/ikvk/imap_tools
let you work with mail instead read IMAP specifications.
from imap_tools import MailBox, A
# get all emails from INBOX folder
with MailBox('imap.mail.com').login('test#mail.com', 'pwd', 'INBOX') as mailbox:
for msg in mailbox.fetch(A(all=True)):
print(msg.date, msg.from_, msg.to, len(msg.text or msg.html))
msg.from_, msg.to - parsed addresses, like: 'Sender#ya.ru'
I didn't like the existing solutions so I decided to make a sister library for my email sender called Red Box.
Here is how to search and process emails including getting the from address:
from redbox import EmailBox
# Create email box instance
box = EmailBox(
host="imap.example.com",
port=993,
username="me#example.com",
password="<PASSWORD>"
)
# Select an email folder
inbox = box["INBOX"]
# Search and process messages
for msg in inbox.search(unseen=True):
# Process the message
print(msg.from_)
print(msg.to)
print(msg.subject)
print(msg.text_body)
print(msg.html_body)
# Flag the email as read/seen
msg.read()
I also wrote extensive documentation for it. It also has query language that fully supports nested logical operations.

Categories