Determine unique "from" email addresses in Maildir folder

Determine unique "from" email addresses in Maildir folder - python

I want to find out a list of "From" addresses in a Maildir folder. Using the following script, it illustrates the varying formats that are valid in From:
import mailbox
mbox = mailbox.Maildir("/home/paul/Maildir/.folder")
for message in mbox:
print message["from"]
"John Smith" <jsmith#domain.com>
Tony <tony#domain2.com>
brendang#domain.net
All I need is the email address, for any valid (or common) "From:" field format. This must have been solved a crazillion times before, so I was expecting a library. All I can find is various regexes.
Is there a standard approach?

email.utils.parseaddr is your friend:
>>> emails = """"John Smith" <jsmith#domain.com>
Tony <tony#domain2.com>
brendang#domain.net"""
>>> lines = emails.splitlines()
>>> from email.utils import parseaddr
>>> [parseaddr(email)[1] for email in lines]
['jsmith#domain.com', 'tony#domain2.com', 'brendang#domain.net']
So you should just be able to work with:
for message in mbox:
print parseaddr(message['from'])
Then, I guess if you just want unique email addresses, then you can just use a set directly over mbox, eg:
mbox = mailbox.MailDir('/some/path')
uniq_emails = set(parseaddr(email['from'])[1] for email in mbox)

Related

Moving emails in outlook between folders while inputing the subject list, and restrictring certain conditions

I'm trying to search "All Outlook Items" and then find emails based on the subject list I input into the code. Once the email is found, it is moved to another folder and marked as "Task Complete" (The green check in the emails).
However, I'm having a couple of errors when trying to run the code. If anyone can guide me it'd be amazing.
Here's the code:
import win32com.client
Email = 'johndoe#gmail.com'
subjects = input("Enter a list of subjects separated by commas: ").split(",")
MoveToFolder = "folder1"
Iter_Folder = "folder2"
def find_and_download_case_number_related_emails():
Outlook = win32com.client.Dispatch("Outlook.Application")
Outlook_Location = Outlook.GetNamespace("MAPI")
Lookin_Folder = Outlook_Location.Folders[Email].Folders[Iter_Folder]
Out_MoveToFolder = Outlook_Location.Folders[Email].Folders[MoveToFolder]
for message in Lookin_Folder:
if message.TaskCompleted:
continue
for message in Lookin_Folder:
if message.Subject in subjects:
message.Move(Out_MoveToFolder)
for message in Out_MoveToFolder:
message.MarkAsTaskCompleted()
if __name__ == "__main__":
find_and_download_case_number_related_emails()
and here's the error I'm getting at the moment:
raise AttributeError("%s.%s" % (self._username_, attr))
AttributeError: <unknown>.Items. Did you mean: 'Item'?

The following line of code contains a wrong property call:
outlook.Folders.Items.Restrict
The Folders class doesn't provide the Items property. You need to get a Folder instance and only then use Items property.
I'd suggest using the NameSpace.GetDefaultFolder method which returns a Folder object that represents the default folder of the requested type for the current profile; for example, obtains the default Inbox folder for the user who is currently logged on.
To understand how the Restrict or Find/FindNext methods work in Outlook you may take a look at the following articles that I wrote for the technical blog:
How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
How To: Use Restrict method to retrieve Outlook mail items from a folder

Debunking outlook email features with library win32com

I found ways to check with python using library win32com for outlook the following attributes for any given email.
#imports:
import time
from time import strftime
import pandas as pd, win32com.client as client
from win32.com.client import Dispatch
#importing the excel file that contains email addresses and corresponding flags:
df_excel = pd.read_excel(r'\\user\...\addresses.xlsx')
#adding both columns as lists:
df_excel_mail = df_excel['mail'].tolist();df_excel_flag = df_excel['flag'].tolist()
outlook = client.Dispatch('Outlook.Application').GetNamespace('MAPI')
main_account = outlook.Folders.Item(1)
folder_inbox = main_account.Folders['Inbox'].Folders['Test']
folder_inbox_WIP = main_account.Folders['Inbox'].Folders['Test'].Folders['WIP']
while True:
time.sleep(0)
messages = folder_inbox.Items.Count
if messages > 0:
for i in reversed(range(0,messages)):
message = folder_inbox.Item[i]
for y, z, in zip(df_excel_mail,df_excel_flag)
if message.Categories == '' and y == message.SenderEmailAddress and z != 'nan'
message.Categories = z
message.Save
message.Move(folder_inbox_WIP)
messages_v2 = folder_inbox_WIP.Items.Count
if folder_inbox_WIP .Items.Count > 0:
for ii in reversed (range(0,messages_v2)):
message_v2 = folder_inbox_WIP[ii]
message_v2.Move(folder_inbox)
if strftime('%H, %M, %N') >= strftime('18:00:00')
break
I would like to access for any given email:
receiver list (how would that work if I have more than one)?
cc list (" ")
Is there any other way to update the category on an email other than moving this email from a folder to another? I am working on a batch process and this moving in/out is slowing things.
When the email is sent from an email address "on behalf" of another email address how can I access the email on behalf?

Use MailItem.Recipients collection.
See #1 and check for each recipient's Recipient.Type property equal olCC ( =2)
Of course - set the MailItem.Categpries property. Don't forget to call MailItem.Save
Use the MailItem.SenderEmailAddress. For the sent on behalf of address, read the PR_SENT_REPRESENTING_EMAIL_ADDRESS MAPI property. Access it using MailItem.PropertyAccessor.GetProperty("http://schemas.microsoft.com/mapi/proptag/0x0065001F")
In general, take a look at various Outlook object using OutlookSpy (I am its author) to familiarize yourself with the Outlook Object Model.
Also keep in mind that to access a subfolder of the Inbox folder, it is better to use something like
out_iter_folder = outlook.GetDefaultFolder(6).Folders['TEST']
where 6 is olFolderInbox constant.

Insert a value before comma in a file

I'm new to Python and I'm trying to create a simple program for login which reads/writes the information to/from a text file, but I'm having an issue.
Let's say I have the following content in a text file:
mytest#gmail.com, testPass123
First the email and after the comma the password. How can I read those two separately?
I have used .split(',') but it stores the whole line.
If I run this:
email = []
for line in file:
email.append(line.split(','))
print(email[0])
I get the following output:
['mytest#gmail.com', ' testPass123\n']

I think your variable naming is confusing you here. If you name email accounts, things might become clearer:
accounts = []
for line in file:
accounts.append(line.strip().split(','))
for email, password in accounts:
print("Email:", email, "Password:", password)

You may be looking for multiple assignment
>>> a, b = "em#ail, pass".split(",")
>>> a
'em#ail'
>>> b
' pass'

Using Regex to find and replace email addresses

New to Python and would like to use it with Regex to work with a list of 5k+ email addresses. I need to change the encapsulate each address with either quotes. I am using \b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}\b to identify each email address. How would I replace the current entry of user#email.com to "user#email.com" adding quotes around the each of the 5k email addresses?

You can use re.sub module and using back-reference like this:
>>> a = "this is email: someone#mail.com and this one is another email foo#bar.com"
>>> re.sub('([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', a)
'this is email: "someone#mail.com" and this one is another email "foo#bar.com"'
UPDATE: If you have a file that want to replace emails in each line of it you can use readlines() like this:
import re
with open("email.txt", "r") as file:
lines = file.readlines()
new_lines = []
for line in lines:
new_lines.append(re.sub('([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', line))
with open("email-new.txt", "w") as file:
file.writelines(new_lines)
email.txt:
this is test#something.com and another email here foo#bar.com
another email abc#bcd.com
still remaining someone#something.com
email-new.txt (after running the code):
this is "test#something.com" and another email here "foo#bar.com"
another email "abc#bcd.com"
still remaining "someone#something.com"

Parse email fields

I want to parse email addresses from a To: email field.
Indeed, when looping on the emails in a mbox:
mbox = mailbox.mbox('test.mbox')
for m in mbox:
print m['To']
we can get things like:
info#test.org, Blahblah <blah#test.com>, <another#blah.org>, "Hey" <last#one.com>
That should be parsed into:
[{email: "info#test.org", name: ""},
{email: "blah#test.com", name: "Blahblah"},
{email: "another#blah.org", name: ""},
{email: "last#one.com", name: "Hey"}]
Is there something already built-in (in mailbox or another module) for this or nothing?
I read a few times this doc but I didn't find something relevant.

You can use email.utils.getaddresses() for this:
>>> getaddresses(['info#test.org, Blahblah <blah#test.com>, <another#blah.org>, "Hey" <last#one.com>'])
[('', 'info#test.org'), ('Blahblah', 'blah#test.com'), ('', 'another#blah.org'), ('Hey', 'last#one.com')]
(Note that the function expects a list, so you have to enclose the string in [...].)

email.parser has the modules you're looking for. email.message is still relevant, because the parser will return messages using this structure, so you'll be getting your header data from that. But to actually read the files in, email.parser is the way to go.

As pointed by #TheSpooniest, email has a parser:
import email
s = 'info#test.org, Blahblah <blah#test.com>, <another#blah.org>, "Hey" <last#one.com>'
for em in s.split(','):
print email.utils.parseaddr(em)
gives:
('', 'info#test.org')
('Blahblah', 'blah#test.com')
('', 'another#blah.org')
('Hey', 'last#one.com')

Python provides email.Header.decode_header() for decoding header. The function decode each atom and return a list of tuples ( text, encoding ) that you still have to decode and join to get the full text.
For addresses, Python provides email.utils.getaddresses() that split addresses in a list of tuple ( display-name, address ). display-name need to be decoded too and addresses must match the RFC2822 syntax. The function getmailaddresses() does all the job.
Here's a tutorial that might help http://blog.magiksys.net/parsing-email-using-python-header

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Determine unique "from" email addresses in Maildir folder - python

Related

Moving emails in outlook between folders while inputing the subject list, and restrictring certain conditions

Debunking outlook email features with library win32com

Insert a value before comma in a file

Using Regex to find and replace email addresses

Parse email fields

Categories

Resources