Scraping information from Outlook email folder - python

Python newbie here...
I'm attempting to pull information from multiple emails within a folder in Outlook.
Everyday an email containing a table of information is sent to the mailbox and is autofiled into a folder. My aim is to pull the information from the table in these emails for the last 6 months and present this in a pandas dataframe.
I have no idea how to scrape this information from an email and would appreciate any help.
Thanks!!

It seems you need to automate Outlook to get the required information. To get all items for the last 6 months you need to use the Find/FindNext or Restrict methods of the Items class. Read more about these methods in the following articles:
How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
How To: Use Restrict method to retrieve Outlook mail items from a folder

Related

Scraping Contact Information from Several Websites with Python

I want to collect contact information from all county governments. I do not have a list of their websites. I want to do three things with Python: 1) create a list of county government websites, 2) extract names, email addresses, and phone numbers of government officials, and 3) convert URLs and all the contact information into an excel sheet or csv.
I am a beginner in Python, and any guidance would be greatly appreciated. Thanks!
For creating tables, you would use a package called pandas
for extracting info from websites, a package called beautifulsoup4 is commonly used.
For scraping a website (all data present in the world) you should
define what type of search you want to start, I mean do you want to
Search in google or a specific website for both of them you need a
request library to curl a site or query a google (like search in
the search bar) and got HTML. for parsing data, you have gotten you
can choose BEATIFULSOAP. Both of them have good documents and you
must read them don't disappoint it's easy.
Because the count of countries around the world is more than 170+
you should manage your data; for managing data I recommend using pandas
and finally, after processing data you can convert data to any type of file
pandas.to_excel, pandas.to_csv and more.

Python: how to click on an opened outlook email (win32)

I need to click on an opened outlook email, a specific approve link that says "Approve request".
I opened the wanted email correctly, but I can't click on the specific link.
Here is the code:
import win32com.client
outlook=win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox=outlook.GetDefaultFolder(6)
messages = inbox.Items
messages.Sort("[ReceivedTime]", Descending=True)
for i in range(100):
message = messages.GetNext()
print(""+message.Subject, str(message.ReceivedTime))
if message.Subject == "Mail to approve request":
message.Display(False)
else:
pass
You will need to parse the MailItem.HTMLBody property, extract the relevant link, launch the browser with that link.
Also, never loop through all items in a folder, use Items.Find/FindNext or Items.Restrict with a query like "[Subject]='Mail to approve request'"
There are several ways to open a hyperlink programmatically in Python. The How to open a URL in python page explains possible scenarios. For example:
import os
os.system("start \"\" https://example.com")
Use the HTMLBody property which returns the a string representing the HTML body of the specified item. So, you may find the required URL programmatically by parsing the message body and execute it programmatically.
And, finally, to find items that correspond to your conditions use the Find/FindNext or Restrict methods of the Items class. Read more about them in the following articles:
How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
How To: Use Restrict method to retrieve Outlook mail items from a folder
See Filtering Items Using Query Keywords for building a search criteria string properly.
Also you may find the AdvancedSearch method of the Application class helpful. The key benefits of using the AdvancedSearch method in Outlook are:
The search is performed in another thread. You don’t need to run another thread manually since the AdvancedSearch method runs it automatically in the background.
Possibility to search for any item types: mail, appointment, calendar, notes etc. in any location, i.e. beyond the scope of a certain folder. The Restrict and Find/FindNext methods can be applied to a particular Items collection (see the Items property of the Folder class in Outlook).
Full support for DASL queries (custom properties can be used for searching too). You can read more about this in the Filtering article in MSDN. To improve the search performance, Instant Search keywords can be used if Instant Search is enabled for the store (see the IsInstantSearchEnabled property of the Store class).
You can stop the search process at any moment using the Stop method of the Search class.
Read more about this method in the Advanced search in Outlook programmatically: C#, VB.NET article.

Is there a way to search for user defined strings in different outlook attachments using python?

Currently i am working on a project where i have to extract attachments and e-mails from outlook and check whether a user defined string present in them or not. I've completed the extraction part but still searching for a way to search for text/string within the attached documents. Is there a way to this by using python?
For Microsoft Office files you can:
Automate Office applications.
Use the open xml SDK if you deal with open XML documents only.
Use third-party libraries for dealing with documents.
It is up to you which way is to choose.

Inbox api use them to tag relevant mails just like inboxsdk.com

I tried using newly released Inbox api to sort out and tag my mails, but I am stuck at categorising mails after I had successfully logged in using api.
I am not quite sure where exactly you are struck , but please refer to the following document https://www.inboxsdk.com/docs/#ComposeView though it is documented in javascript

extracting individual email from a single thunderbird email data file

I have a thunderbird email data file from which I need to extract individual email. I tried to use regex and do plain vanilla extract based on from tag but this doesn't give me the required result. An email can have another email attached within the body hence a single email can have more than one "From:" strings. How can I extract individual emails out of this data file?
Try the python mailbox module.
The mailbox module can read mbox/maildir message stores.

Categories