I receive a batch email containing a xls file, I have a script that searches my Outlook inbox for that email and extracts the attachment. I would like to save the file as an xlsx instead of it's current format xls.
I have tried to amend the file name in the SaveAsFile attachment method to include x at the end - attachment.SaveAsFile(os.path.join(file_home_path, new_file_name)+"x") - this did save the file as an xlsx but the file got corrupted and I couldn't open it.
Are there any other attachment methods that allow the file extension to be amended at source?
import win32com.client
import os
import datetime
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
now = datetime.datetime.now().strftime("%Y %m %d")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
file_home_path = "C:/Desktop"
for message in messages:
if message.Subject == 'subject_to_search_for':
attachments = message.Attachments
for attachment in attachments:
new_file_name = 'required_file_{}.xls'.format(now)
attachment.SaveAsFile(os.path.join(file_home_path, new_file_name))
break
message.Delete()
My work around is below.
os.chdir(file_home_path)
excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(new_file_name)
wb.SaveAs(new_file_name+"x", FileFormat = 51) #FileFormat = 51 is for .xlsx extension
wb.Close()
excel.Application.Quit()
Related
I'm trying to use python to get some data that is in an attachment on an outlook email and then use that data in python. I've managed to write the code that will get into the outlook inbox and folder I want and then get the attachments of a specific message, however I'm not sure how to view the content of that attachment. A lot of the other questions and tutorials I've found seem to be more related to saving the attachment in a folder location rather than viewing the attachment in python itself.
For context the data I'm trying to get to is an exported report from adobe analytics, this report is a csv file that is attached to an email as a zip file. The CSV file shows some data for a specific time period and I'm planning on scheduling this report to run weekly so what I want to do is get python to look through all the emails with this report on then stack all this data into one dataframe so that I have all the history plus the latest week's data in one place then export this file out.
Please find the code below that I've written so far. If you need more details or I haven't explained anything very well please let me know. I am fairly new to python especially the win32com library so there might be obvious stuff I'm missing.
#STEP 1---------------------------------------------
#import all methods needed
from pathlib import Path
import win32com.client
import requests
import time
import datetime
import os
import zipfile
from zipfile import ZipFile
import pandas as pd
#STEP 2 --------------------------------------------
#connect to outlook
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
#STEP 3 --------------------------------------------
#connect to inbox
inbox = outlook.GetDefaultFolder(6)
#STEP 4 --------------------------------------------
#connect to adobe data reports folder within inbox
adobe_data_reports_folder = inbox.Folders['Cust Insights'].Folders['Adobe data reports']
#STEP 5 --------------------------------------------
#get all messages from adobe reports folder
messages_from_adr_folder = adobe_data_reports_folder.Items
#STEP 6 ---------------------------------------------
#get attachement for a specific message (this is just for testing in real world I'll do this for all messages)
for message in messages_from_adr_folder:
if message.SentOn.strftime("%d-%m-%y") == '07-12-22':
attachment = message.Attachments
else:
pass
#STEP 7 ----------------------------------------------
#get the content of the attachment
##????????????????????????????
With the Outlook Object Model, the best you can do is save the attachment as a file (Attachment.SaveAsFile) - keep in mind that MailItem.Attachments property returns the Attachments collection, not a single Attachment object - loop through all attachments in the collection, figure out which one you want (if there is more than one), and save it as file.
To access file attachment data directly without saving as a file, you will need to use Extended MAPI (C++ or Delphi only) or Redemption (any language, I am its author).
Dmitry mentioned below that there isn't the option to view attachment content with an outlook object model.
So I've come up with a solution for this which basically involves using the save method to save the attachment into a folder location on the current working directory and then once that file is save just load that file back up into python as a dataframe. The only thing to note is that I've added an if statement that only saves files that are csvs, obviously this part can be removed if needed.
If you wanted to do this with multiple files and stack all of these into a single dataframe then I just created a blank dataframe at the start (with the correct column names of the file that will be loaded) and concatenated this blank dataframe with the "importeddata" then added this code into the "attachment" for loop so that each time it's appending the data that is saved and loaded from the attachment
#STEP 1---------------------------------------------
#import all methods needed
from pathlib import Path
import win32com.client
import requests
import time
import datetime
import os
import zipfile
from zipfile import ZipFile
import pandas as pd
#STEP 1b ---------------------------------------------
#create a directory where I can save the files
output_dir = Path.cwd() / "outlook_testing"
#STEP 2 --------------------------------------------
#connect to outlook
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
#STEP 3 --------------------------------------------
#connect to inbox
inbox = outlook.GetDefaultFolder(6)
#STEP 4 --------------------------------------------
#connect to adobe data reports folder within inbox
adobe_data_reports_folder = inbox.Folders['Cust Insights'].Folders['Adobe data
reports']
#STEP 5 --------------------------------------------
#get all messages from adobe reports folder
messages_from_adr_folder = adobe_data_reports_folder.Items
#STEP 6 ---------------------------------------------
#get attachement for a specific message (this is just for testing in real world
#I'll do this for all messages)
for message in messages_from_adr_folder:
body = message.Body
if message.SentOn.strftime("%d-%m-%y") == '07-12-22':
attachments = message.Attachments
for attachment in attachments:
stringofattachment = str(attachment)
#STEP 6b - if the attachment is a csv file then save the attachment to a folder
if stringofattachment.find('.csv') != - 1:
attachment.SaveAsFile(output_dir / str(attachment))
print(output_dir / str(attachment))
#STEP 6C - reload the saved file as a dataframe
importeddata = pd.read_csv(output_dir / str(attachment))
else:
print('NOT CSV')
pass
else:
pass
I have a folder that has nearly 12k (.msg) files each has a csv attachment.
I managed to get a code to extract the attachment from each .msg file. but due to attachment and subjects are similar the attachment keeps getting over written! I tried to rename with msg.subject but the subject of the msg is similar
import win32com.client
import os
inputFolder = r'directory with my msg' ## Change here the input folder
outputFolder = r'directiry for attachments' ## Change here the attachments output folder
for file in os.listdir(inputFolder):
if file.endswith(".msg"):
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
filePath = inputFolder + '\\' + file
msg = outlook.OpenSharedItem(filePath)
att = msg.Attachments
for i in att:
i.SaveAsFile(os.path.join(outputFolder, str(msg.subject + ".csv")))
#Saves the file with the attachment name
You need to find an algorithm which allows identifying attachments uniquely - try to combine the attachment file name with email data such as ReceivedTime and etc.
Don't forget to exclude forbidden symbols from the result file name before trying to save the attachment.
I am able to send email to different recipients on outlook with below script for single attachment, but if I try to send different attachments to each user using for loop, then it fails.
Currently the script is using attachment = r'C:\Users\roy\Royfile.csv'. But I want attachment = file, so that the attachment changes in each for loop for different users. This part is not working.
Different files for different users, example Royfile.csv below. But there are 50 more such files.
Folder FolderOwner EmailAddress AttachmentPath
C:\folder1\ Roy Roy#gmail.com Royfile.csv
D:\folder2\ Roy Roy#gmail.com Royfile.csv
2nd file in same folder Jackfile.csv:
Folder FolderOwner EmailAddress AttachmentPath
C:\folder3\ Jack Jack#gmail.com Jackfile.csv
D:\folder4\ Jack Jack#gmail.com Jackfile.csv
3rd file for example Mandyfile.csv. And same way total 50 files for 50 users in same folder.
Folder FolderOwner EmailAddress AttachmentPath
C:\folder5\ Mandy Mandy#gmail.com Mandyfile.csv
D:\folder6\ Mandy Mandy#gmail.com Mandyfile.csv
Python Script
import glob, as
import win32com.client as win32
import pandas as pd
for file in glob.glob("*file.csv"):
print(file)
email_list = pd.read_csv(file)
names = email_list['FolderOwner']
emails = email_list['EmailAddress']
attachments = email_list['AttachmentPath']
for i in range(len(emails)):
print(file)
name = names[i]
email = emails[i]
attachment = r'{}.csv'.format(attachments)
with open(attachment, 'r') as my_attachment:
myfile = my_attachment.read()
outlook = win32.Dispatch('outlook.application')
mail = outlook.CreateItem(0)
mail.To = email
mail.Subject = 'Message subject'
mail.Body = 'Hello ' + name
mail.Attachments.Add(attachment)
mail.Send()
break
Current output of the script if I remove the attachment part:
Royfile.csv
Royfile.csv
Jackfile.csv
Jackfile.csv
Mandyfile.csv
Mandyfile.csv
...
..
.
Struggling now with what needs to be for attachment = ???. So that each file gets sent to 50 users.
I don't know how your files named how they are distributed in different folders, try to put their all names along with paths in excel sheet in one column and iterate through them the way you are doing for names and mails
attachment = r'{}.csv'.format(filepaths from excel sheet)
with open(attachment, 'r') as my_attachment:
myfile = my_attachment.read()
Found answer for my question finally, below is full code.
The error was coming, as there was PATH missing.
win32com lib need full path even if the script is running in same folder as the attachments.
works perfectly now. :)
import glob, as
import win32com.client as win32
import pandas as pd
for file in glob.glob("*file.csv"):
print(file)
email_list = pd.read_csv(file)
names = email_list['FolderOwner']
emails = email_list['EmailAddress']
attachments = email_list['AttachmentPath']
PATH = "C:\\Users\\roy\\myfolder\\"
for i in range(len(emails)):
print("Sending email with " + file)
name = names[i]
email = emails[i]
attachment = attachments[i]
attachment1 = PATH + attachment
with open(attachment1, 'r') as my_attachment:
myfile = my_attachment.read()
outlook = win32.Dispatch('outlook.application')
mail = outlook.CreateItem(0)
mail.To = email
mail.Subject = 'Message subject'
mail.Body = 'Hello ' + name
mail.Attachments.Add(attachment1)
mail.Send()
break
I am able to send email to different recipients on outlook with below script for single attachment, but if I try to send different attachments to each user using for loop, then it fails.
Currently the script is using attachment = r'C:\Users\roy\Royfile.csv'. But I want attachment = file, so that the attachment changes in each for loop for different users. This part is not working.
Different files for different users, example Royfile.csv below. But there are 50 more such files.
Folder FolderOwner EmailAddress AttachmentPath
C:\folder1\ Roy Roy#gmail.com Royfile.csv
D:\folder2\ Roy Roy#gmail.com Royfile.csv
2nd file in same folder Jackfile.csv:
Folder FolderOwner EmailAddress AttachmentPath
C:\folder3\ Jack Jack#gmail.com Jackfile.csv
D:\folder4\ Jack Jack#gmail.com Jackfile.csv
3rd file for example Mandyfile.csv. And same way total 50 files for 50 users in same folder.
Folder FolderOwner EmailAddress AttachmentPath
C:\folder5\ Mandy Mandy#gmail.com Mandyfile.csv
D:\folder6\ Mandy Mandy#gmail.com Mandyfile.csv
Python Script
import glob, as
import win32com.client as win32
import pandas as pd
for file in glob.glob("*file.csv"):
print(file)
email_list = pd.read_csv(file)
names = email_list['FolderOwner']
emails = email_list['EmailAddress']
attachments = email_list['AttachmentPath']
for i in range(len(emails)):
print(file)
name = names[i]
email = emails[i]
attachment = r'{}.csv'.format(attachments)
with open(attachment, 'r') as my_attachment:
myfile = my_attachment.read()
outlook = win32.Dispatch('outlook.application')
mail = outlook.CreateItem(0)
mail.To = email
mail.Subject = 'Message subject'
mail.Body = 'Hello ' + name
mail.Attachments.Add(attachment)
mail.Send()
break
Current output of the script if I remove the attachment part:
Royfile.csv
Royfile.csv
Jackfile.csv
Jackfile.csv
Mandyfile.csv
Mandyfile.csv
...
..
.
Struggling now with what needs to be for attachment = ???. So that each file gets sent to 50 users.
I don't know how your files named how they are distributed in different folders, try to put their all names along with paths in excel sheet in one column and iterate through them the way you are doing for names and mails
attachment = r'{}.csv'.format(filepaths from excel sheet)
with open(attachment, 'r') as my_attachment:
myfile = my_attachment.read()
Found answer for my question finally, below is full code.
The error was coming, as there was PATH missing.
win32com lib need full path even if the script is running in same folder as the attachments.
works perfectly now. :)
import glob, as
import win32com.client as win32
import pandas as pd
for file in glob.glob("*file.csv"):
print(file)
email_list = pd.read_csv(file)
names = email_list['FolderOwner']
emails = email_list['EmailAddress']
attachments = email_list['AttachmentPath']
PATH = "C:\\Users\\roy\\myfolder\\"
for i in range(len(emails)):
print("Sending email with " + file)
name = names[i]
email = emails[i]
attachment = attachments[i]
attachment1 = PATH + attachment
with open(attachment1, 'r') as my_attachment:
myfile = my_attachment.read()
outlook = win32.Dispatch('outlook.application')
mail = outlook.CreateItem(0)
mail.To = email
mail.Subject = 'Message subject'
mail.Body = 'Hello ' + name
mail.Attachments.Add(attachment1)
mail.Send()
break
I found and modified a code to read CSV attachments file from Outlook application using Python.
What happened in my case is: When I request data for a certain period, they will send me the monthly data of the requested period in separate emails (E.g. request: January 2018 - December 2018; Receive : 12 mails with a single CSV attachment in each one of them.) i save all of the emails coming from the data warehouse in 'DWH Mail'
All of the emails will come from the same subject. So my code will: Save all CSV attachments which are stored in 'DWH Mail' from the Subject specified.
import win32com.client as client
import datetime as date
import os.path
def attach(mail_subject):
outlook = client.Dispatch("Outlook.Application").GetNamespace("MAPI")
folder = outlook.GetDefaultFolder("6").Folders["DWH Mail"]
val_date = date.date.today()
sub_target = mail_subject
for msg in folder.Items:
if msg.ReceivedTime.date() == val_date and msg.Subject == sub_target:
for att in msg.Attachments:
att.SaveASFile(os.getcwd() + "\\" + att.FileName)
print ("Mail Successfully Extracted")
break
print ("Done")
Now I could request for ZIP file, containing the CSV, so that I could receive the file faster. Where and what should I add in my code so that the loop will extract and save the CSV file from the ZIP file? Instead of save the ZIP file and I extract it manually later.
I am relatively new to Python, so any helps would be appreciated. Thank you.
import os
import pandas as pd
import zipfile
curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '/targetfolder/' + yourFileName + '.zip')
text_files = zf.infolist()
# list_ = []
print ("Decompressing and loading data into multiple files... ")
for text_file in text_files:
print(text_file.filename)
df = pd.read_csv(zf.open(text_file.filename)
# do df manipulations if required
df.to_csv(curDir + '/targetfolder/' + text_file.filename + '.csv')
# df = pd.concat(list_)
This will iterate through all the files and load them with the respective names as present in the zip file.