Data extraction from outlook .csv attachments using Python - python

I have a .csv attachment that is emailed to me daily. I'd like to read in this email using python and perform some modifications on it. The emails are sent to my Outlook email account.
This is what I am doing:
import win32com.client
my_outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
folder = outlook.GetDefaultFolder(6) #index
for item in folder.Items
print(item.body)
However, this is for extracting data within the email, how would I read the actual attachment that is being sent? I am looking into extract-msg PyPi as well.
Any insight will be helpful.

To read the attachment, use the following..
import win32com.client
import datetime
import os
import email
outlook = win32com.client.Dispatch("outloook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6) # change depending on folder you wish to see
message = inbox.items
for message in inbox.Items:
if message.Unread == True # finds unread mesages
for attachment in message.Attachments:
This will show you all unread email attachments, simply complete the code with the file address you wish to save the attachments..

Related

python: check if an email attachment is encrypted

I am new to writing Python code and am trying to run a test to determine if an email attachment (.xls file) is encrypted with a password. I am using win32com to retrieve from Outlook and then loop through emails and attachments.
I've reviewed Microsoft documentation, but couldn't find what I need.
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox=outlook.GetDefaultFolder(6)
email_count = inbox.Items.count
messages = inbox.Items
for message in messages:
emailSendDate = message.senton.date()
attachments = message.attachments
emailSubject = message.Subject
for attachment in attachments:
attachment_name = attachment.filename
#a way to test if email attachment is encrypted?
Outlook knows nothing about a particular attachment type - it does not know and does not care. You would need to use the Excel Object Model to try to figure that out after you save the attachment using Attachment.SaveAsFile. Start at https://learn.microsoft.com/en-us/office/vba/api/excel.workbook.passwordencryptionfileproperties
I went the route of saving the attachment, but it would be much better if I could retrieve the temporary file path of the attachment and not need to save it. I tried using the GetTemporaryFilePath() method, but the rules of the method made it not work for me. For now, I used xlrd to test if it can open the workbook.
attachment.SaveAsFile(os.path.join(testfilepath, attachment.filename))
try:
wb = xlrd.open_workbook(os.path.join(testfilepath, attachment.filename))
attachmentEncryption = 'N'
except:
attachmentEncryption = 'Y'

How to read only the 1st email body of the msg file, excluding the mails that are attached to that msg file

I am currently trying to figure out how to parse all the msg files I have stored in a specific folder and then save the body text to a dataframe but when I'm trying to extract the body of the emaill it is also extracting the emails that are attached to it. I want to extract only the body of the first email that is present in the msg file.
#src-code:https://stackoverflow.com/questions/52608069/parsing-multiple-msg-files-and-storing-the-body-text-in-a-csv-file
#reading multiple .msg files using python
from pathlib import Path
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
# Assuming \Documents\Email Reader is the directory containg files
for p in Path(r'C:\Users\XY\Documents\Email Reader').iterdir():
if p.is_file() and p.suffix == '.msg':
msg = outlook.OpenSharedItem(p)
print(msg.Body)
I had a similar requirement. Full code is here:
https://medium.com/#theamazingexposure/accessing-shared-mailbox-using-exchangelib-python-f020e71a96ab
For you purpose I think this snippet is going to work. It reads the first message with a specific subject line:
from exchangelib import Credentials, Account, FileAttachment
credentials = Credentials('First_Name.Last_Name#some_domain.com', 'Your_Password_Here')
account = Account('First_Name.Last_Name#some_domain.com', credentials=credentials, autodiscover=True)
filtered_items = account.inbox.filter(subject__contains='Your Search String Here')
print("Getting latest email from Given Search String...")
for item in account.inbox.filter(subject__contains='Your Search String Here').order_by('-datetime_received')[:1]:
print(item.subject, item.text_body.encode('UTF-8'), item.sender, item.datetime_received) #body of email is extracted using:: item.text_body.encode('UTF-8')
from exchangelib import Credentials, Account, FileAttachment
credentials = Credentials('First_Name.Last_Name#some_domain.com','Your_Password_Here')
account = Account('First_Name.Last_Name#some_domain.com', credentials=credentials, autodiscover=True)
unread_mails = account.inbox.filter(is_read=False)
# ur unread mail list
unread_mail_list = [mail for mail in unread_mails]
# get text body of the latest unread mail
mail_body = unread_mail_list[0].text_body

How to save MS Outlook attachments from specific sender and date using Python

I am a bit new to coding and I am trying to understand how to get Python to save MS Outlook attachments from a specific sender. I currently receive the same email from the same person each day regarding data that I need to save to a specific folder. Below are the requirements I am trying to meet:
I want to open MS Outlook and search for specific sender
I want to make sure that the email that I am opening from the specific sender is the most current date
I want to save all attached files from this sender to a specific folder on my desktop
I have seen some posts on using win32com.client but have not had much luck getting it to work with MS Outlook. I will attach some code I have tried below. I appreciate any feedback!
import win32com.client
outlook=win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox=outlook.GetDefaultFolder(6)
messages=inbox.Items
for message in messages:
attachments = message.attachments
for attachment in attachments:
pass
You almost got it, add filter to the sender email address
import win32com.client
Outlook = win32com.client.Dispatch("Outlook.Application")
olNs = Outlook.GetNamespace("MAPI")
Inbox = olNs.GetDefaultFolder(6)
Filter = "[SenderEmailAddress] = '0m3r#email.com'"
Items = Inbox.Items.Restrict(Filter)
Item = Items.GetFirst()
for attachment in Item.Attachments:
print(attachment.FileName)
attachment.SaveAsFile(r"C:\path\to\my\folder\Attachment.xlsx")
Python 3.8 on windows
def saveAttachments(email:object):
for attachedFile in email.Attachments: #iterate over the attachments
try:
filename = attachedFile.FileName
attachedFile.SaveAsFile("C:\\EmailAttachmentDump\\"+filename) #Filepath must exist already
except Exception as e:
print(e)
for mailItem in inbox.Items:
#Here you just need to bould your own conditions
if mailItem.Sender == "x" or mailItem.SenderName == "y":
saveAttachments(mailItem)
The actual conditions you can change to your liking. I would recommend referring to the Object model for Outlook MailItem objects: https://learn.microsoft.com/en-gb/office/vba/api/outlook.mailitem
Specifically its Properties

Filtering Outlook Mails on a date range and downloading attachments that satisfy a given sub condition?

i was trying to make a Python script to automate certain aspects in my outlook account in a manner that:
whenever i receive emails with attachments containing a certain SUBJECT they are automatically downloaded to a folder on my system. Here is my code:
import win32com.client
import os
current_path = os.getcwd()
outlook = win32com.client.Dispatch("Outlook.Application").GetNameSpace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
for m in messages:
if m.Subject == "TEST MAIL":
print("message body:", m.body)
attachment = messages.Attachments
for x in attachment:
x.SaveASFile(os.path.join(current_path, x.FileName))
The above code doesen't work and throws an exception:pywintypes.com_error: (-2147221005, 'Invalid class string', None, None)
Replace the line
x.SaveASFile(os.path.join(current_path, "Test1.csv"))
with
x.SaveASFile(os.path.join(current_path, x.FileName))

Check whether the email is a reply or response with Python win32com

I am using Python win32com to parse email from outlook . I am able to fetch email from the outlook folder , but I not able to verify whether the email is a reply or response or a forwarded message , I need to check whether the email reiceved is the reply of the previous mail (if yes then find the original mail) or email is the forwarded message. I am using following code to fetch emails from outlook.
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox=outlook.Folders['xyz#xyz.com'].Folders['Inbox'].Folders['abc']
messagesReach = inbox.Items
for message in messagesReach:
if message.Unread==True:
print(message.body)
Hi the header is ConversationID and can be used as message.ConversationID
refer https://msdn.microsoft.com/en-us/library/microsoft.office.interop.outlook.mailitem_properties.aspx
You could try to read the first three characters of the subject, and determine if it has the "Re:"-prefix and therefore is a reply. This should be the case most times.
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.Folders["xyz#xyz.com"].Folders["Inbox"].Folders["abc"]
messagesReach = inbox.Items
for message in messagesReach:
if message.Unread == True:
if message.Subject[:3] == "Re:":
print(message.body)

Categories