I'm trying to use python to get some data that is in an attachment on an outlook email and then use that data in python. I've managed to write the code that will get into the outlook inbox and folder I want and then get the attachments of a specific message, however I'm not sure how to view the content of that attachment. A lot of the other questions and tutorials I've found seem to be more related to saving the attachment in a folder location rather than viewing the attachment in python itself.
For context the data I'm trying to get to is an exported report from adobe analytics, this report is a csv file that is attached to an email as a zip file. The CSV file shows some data for a specific time period and I'm planning on scheduling this report to run weekly so what I want to do is get python to look through all the emails with this report on then stack all this data into one dataframe so that I have all the history plus the latest week's data in one place then export this file out.
Please find the code below that I've written so far. If you need more details or I haven't explained anything very well please let me know. I am fairly new to python especially the win32com library so there might be obvious stuff I'm missing.
#STEP 1---------------------------------------------
#import all methods needed
from pathlib import Path
import win32com.client
import requests
import time
import datetime
import os
import zipfile
from zipfile import ZipFile
import pandas as pd
#STEP 2 --------------------------------------------
#connect to outlook
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
#STEP 3 --------------------------------------------
#connect to inbox
inbox = outlook.GetDefaultFolder(6)
#STEP 4 --------------------------------------------
#connect to adobe data reports folder within inbox
adobe_data_reports_folder = inbox.Folders['Cust Insights'].Folders['Adobe data reports']
#STEP 5 --------------------------------------------
#get all messages from adobe reports folder
messages_from_adr_folder = adobe_data_reports_folder.Items
#STEP 6 ---------------------------------------------
#get attachement for a specific message (this is just for testing in real world I'll do this for all messages)
for message in messages_from_adr_folder:
if message.SentOn.strftime("%d-%m-%y") == '07-12-22':
attachment = message.Attachments
else:
pass
#STEP 7 ----------------------------------------------
#get the content of the attachment
##????????????????????????????
With the Outlook Object Model, the best you can do is save the attachment as a file (Attachment.SaveAsFile) - keep in mind that MailItem.Attachments property returns the Attachments collection, not a single Attachment object - loop through all attachments in the collection, figure out which one you want (if there is more than one), and save it as file.
To access file attachment data directly without saving as a file, you will need to use Extended MAPI (C++ or Delphi only) or Redemption (any language, I am its author).
Dmitry mentioned below that there isn't the option to view attachment content with an outlook object model.
So I've come up with a solution for this which basically involves using the save method to save the attachment into a folder location on the current working directory and then once that file is save just load that file back up into python as a dataframe. The only thing to note is that I've added an if statement that only saves files that are csvs, obviously this part can be removed if needed.
If you wanted to do this with multiple files and stack all of these into a single dataframe then I just created a blank dataframe at the start (with the correct column names of the file that will be loaded) and concatenated this blank dataframe with the "importeddata" then added this code into the "attachment" for loop so that each time it's appending the data that is saved and loaded from the attachment
#STEP 1---------------------------------------------
#import all methods needed
from pathlib import Path
import win32com.client
import requests
import time
import datetime
import os
import zipfile
from zipfile import ZipFile
import pandas as pd
#STEP 1b ---------------------------------------------
#create a directory where I can save the files
output_dir = Path.cwd() / "outlook_testing"
#STEP 2 --------------------------------------------
#connect to outlook
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
#STEP 3 --------------------------------------------
#connect to inbox
inbox = outlook.GetDefaultFolder(6)
#STEP 4 --------------------------------------------
#connect to adobe data reports folder within inbox
adobe_data_reports_folder = inbox.Folders['Cust Insights'].Folders['Adobe data
reports']
#STEP 5 --------------------------------------------
#get all messages from adobe reports folder
messages_from_adr_folder = adobe_data_reports_folder.Items
#STEP 6 ---------------------------------------------
#get attachement for a specific message (this is just for testing in real world
#I'll do this for all messages)
for message in messages_from_adr_folder:
body = message.Body
if message.SentOn.strftime("%d-%m-%y") == '07-12-22':
attachments = message.Attachments
for attachment in attachments:
stringofattachment = str(attachment)
#STEP 6b - if the attachment is a csv file then save the attachment to a folder
if stringofattachment.find('.csv') != - 1:
attachment.SaveAsFile(output_dir / str(attachment))
print(output_dir / str(attachment))
#STEP 6C - reload the saved file as a dataframe
importeddata = pd.read_csv(output_dir / str(attachment))
else:
print('NOT CSV')
pass
else:
pass
Related
I am trying to download an email from outlook sent-items. Currently I am able to save is '.msg' format. Is there anyway through which I can save the mail as '.html' or '.pdf' using python
from pathlib import Path
import win32com.client as win32
from datetime import date, timedelta
import os
import glob
# Create output folder
output_dir = Path.cwd()
output_dir.mkdir(parents=True, exist_ok=True)
# Connect to folder
outlook = win32.Dispatch('outlook.application').GetNamespace("MAPI")
# Connect to folder
sent_items = outlook.GetDefaultFolder(5)
# Get the required mail and store it locally
messages = sent_items.items
message = messages.GetLast()
name = str(message.subject)
message.saveas(os.getcwd()+'//'+name+".msg")
When I tried to replace .msg with .html or .pdf in the last line, then it is not working. The resultant file generated through html or pdf is displayed as special characters and not the actual .msg format
The Outlook object model doesn't provide any property or method for saving messages using the PDF file format. But you can use the OlSaveAsType enumeration for all available file formats. The HTML format (.html) is available. So, you just need to pass the olHTML value for the second parameter in addition to the file path:
message.saveas(os.getcwd()+'//'+name+".html", Outlook.OlSaveAsType.olHTML)
If you really need to save the message using the PDF file format you may consider using the Word object model for that. The Document.ExportAsFixedFormat2 method saves a document in PDF or XPS format. Use the GetInspector method to get the inspector where you may retrieve an instance of the Word Document object which represents the message body. The Inspector.WordEditor property returns the Microsoft Word Document Object Model of the message being displayed. The WordEditor property is only valid if the IsWordMail method returns true and the EditorType property is olEditorWord. The returned Word Document object provides access to most of the Word object model
This code takes email pdf attachments, download it, merge to one pdf file and send further.
Now it takes all emails which are marked with specific category in that inbox, so it merge all pdf's from all emails to one file.
But I want that it take emails one by one, that after download pdf's from one email it will merge and send them, delete them from folder and just after that it will take second email.
How to make such loop for this code?
import datetime
import os
import win32com.client as win32
from PyPDF2 import PdfFileMerger
from pathlib import Path
path = ('C:\\Users\\Desktop\\Work')
today = datetime.date.today()
outlook = win32.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
subFolder = inbox.Folders("Test")
messages = subFolder.Items
def save_attachments(subject):
for message in messages:
if message.Categories == "Red Category":
for attachment in message.Attachments:
print(attachment.FileName)
attachment.SaveAsFile(os.path.join(path, str(attachment)))
if __name__ == "__main__":
save_attachments('PB report - next steps')
#Merge PDF's
merger = PdfFileMerger()
path_to_files = r'C:\Users\Desktop\Work/'
for root, dirs, file_names in os.walk(path_to_files):
for file_name in file_names:
merger.append(path_to_files + file_name)
merger.write(r"C:\Users\Desktop\Work\merged.pdf")
merger.close()
#Send PDF with outlook
# construct Outlook application instance
olApp = win32.Dispatch('Outlook.Application')
olNS = olApp.GetNameSpace('MAPI')
# construct the email item object
mailItem = olApp.CreateItem(0)
mailItem.Subject = 'Test'
mailItem.BodyFormat = 1
mailItem.Body = "Pdf merged"
mailItem.To = 'email'
path = (os.path.join('C:\\Users\\Desktop\\Work\\merged.pdf'))
mailItem.Attachments.Add(path)
mailItem.Display()
mailItem.Save()
mailItem.Send()
#Delete PDF's from folder
[f.unlink() for f in Path("C:\\Users\\Desktop\\Work").glob("*") if f.is_file()]
Iterating over all items in the folder is not really a good idea:
for message in messages:
if message.Categories == "Red Category":
Instead, you need to use the Find/FindNext or Restrict methods of the Items class from the Outlook object model. So, in that case you will get all items that correspond to your search criteria and iterate over them only. Read more about these methods in the following articles:
How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
How To: Use Restrict method to retrieve Outlook mail items from a folder
Second, there is no need to create a new Outlook Application instance:
# construct Outlook application instance
olApp = win32.Dispatch('Outlook.Application')
olNS = olApp.GetNameSpace('MAPI')
Re-use the existing application instance instead. Moreover, Outlook is a singleton, you can't have two instances running at the same time.
Third, there is no need to display and save the item created before sending:
mailItem.Attachments.Add(path)
mailItem.Send()
I am an absolute beginner when it comes to working with REST APIs with python. We have received a share-point URL which has multiple folders and multiples files inside those folders in the 'document' section. I have been provided an 'app_id' and a 'secret_token'.
I am trying to access the .csv file and read them as a dataframe and perform operations.
The code for operation is ready after I downloaded the .csv and did it locally but I need help in terms of how to connect share-point using python so that I don't have to download such heavy files ever again.
I know there had been multiple queries already on this over stack-overflow but none helped to get to where I want.
I did the following and I am unsure of what to do next:
import json
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from office365.runtime.http.request_options import RequestOptions
site_url = "https://<company-name>.sharepoint.com"
ctx = ClientContext(site_url).with_credentials(UserCredential("{app_id}", "{secret_token}"))
Above for site_url, should I use the whole URL or is it fine till ####.com?
This is what I have so far, next I want to read files from respective folders and convert them into a dataframe? The files will always be in .csv format
The example hierarchy of the folders are as follows:
Documents --> Folder A, Folder B
Folder A --> a1.csv, a2.csv
Folder B --> b1.csv, b2.csv
I should be able to move to whichever folder I want and read the files based on my requirement.
Thanks for the help.
This works for me, using a Sharepoint App Identity with an associated client Id and client Secret.
First, I demonstrate authenticating and reading a specific file, then getting a list of files from a folder and reading the first one.
import pandas as pd
import json
import io
from office365.sharepoint.client_context import ClientCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
#Authentication (shown for a 'modern teams site', but I think should work for a company.sharepoint.com site:
site="https://<myteams.companyname.com>/sites/<site name>/<sub-site name>"
#Read credentials from a json configuration file:
spo_conf = json.load(open(r"conf\spo.conf", "r"))
client_credentials = ClientCredential(spo_conf["RMAppID"]["clientId"],spo_conf["RMAppID"]["clientSecret"])
ctx = ClientContext(site).with_credentials(client_credentials)
#Read a specific CSV file into a dataframe:
folder_relative_url = "/sites/<site name>/<sub site>/<Library Name>/<Folder Name>"
filename = "MyFileName.csv"
response = File.open_binary(ctx, "/".join([folder_relative_url, filename]))
df = pd.read_csv(io.BytesIO(response.content))
#Get a list of file objects from a folder and read one into a DataFrame:
def getFolderContents(relativeUrl):
contents = []
library = ctx.web.get_list(relativeUrl)
all_items = library.items.filter("FSObjType eq 0").expand(["File"]).get().execute_query()
for item in all_items: # type: ListItem
cur_file = item.file
contents.append(cur_file)
return contents
fldrContents = getFolderContents('/sites/<site name>/<sub site>/<Library Name>')
response2 = File.open_binary(ctx, fldrContents[0].serverRelativeUrl)
df2 = pd.read_csv(io.BytesIO(response2.content))
Some References:
Related SO thread.
Office365 library github site.
Getting a list of contents in a doc library folder.
Additional notes following up on comments:
The site path doesn't not include the full url for the site home page (ending in .aspx) - it just ends with the name for the site (or sub-site, if relevant to your case).
You don't need to use a configuration file to store your authentication credentials for the Sharepoint application identity - you could just replace spo_conf["RMAppID"]["clientId"] with the value for the Sharepoint-generated client Id and do similarly for the client Secret. But this is a simple example of what the text of a JSON file could look like:
{
"MyAppName":{
"clientId": "my-client-id",
"clientSecret": "my-client-secret",
"title":"name_for_application"
}
}
I'm using the simple-smartsheet library for read data from a sheet in Smartsheet and download existing attachments on each row of the sheet.
I can already read the data for each row, however I cannot download existing attachments.
import config
from simple_smartsheet import Smartsheet
sheet = smartsheet.sheets.get(id=config.SHEET_ID)
for row in sheet.rows:
attachments = row.attachments
print(attachments)
when executing the above command I get as a result:
[]
simple-smartsheet
I use the simple-smartsheet library as it is the only one that supports python versions 3.6+
my python version 3.7.5
You can use list_row_attachments to find information of the attachments that belongs to a row.
The code might look like this:
import config
from simple_smartsheet import Smartsheet
sheet = smartsheet.sheets.get(id=config.SHEET_ID)
for row in sheet.rows:
response = smartsheet_client.Attachments.list_row_attachments(
config.SHEET_ID,
row.id,
include_all=True
)
attachments = response.data
print(attachments)
my solution is not very pythonic, but works, it consist of 2 steps
Get the attachment links
Save the file to a local HDD (I'm doing backups too) as a pivot place
1. to get the list of attachments:
import smartsheet
import urllib.request
smart = smartsheet.Smartsheet()
att_list = smart.Attachments.list_all_attachments(<sheet_id>, include_all=True)
2. Downloading the attachments to local disk, you need to create a loop to go through the list of attachments, you can also add your own conditions to discriminate which ones to download:
for attach in att_list:
att_id = attach.id #get the id of the attachment
att_name = attach.name # get the name of the attachment
retrieve_att = smart.Attachments.get_attachment(<sheet id>, att_id) #downloads the atachment
dest_dir = "C:\\path\\to\\folder\\"
dest_file = destd+str(att_name) # parsing the destination path
dwnld_url = retrieve_att.url # this link gives you access to download the file for about 5 to 10 min. before expire
urllib.request.urlretrieve(dwnld_url, dest_file) ## retrieving attachement and saving locally
Now you have the file and you can do whatever you need with it
It looks like that library has not implemented logic for dealing with the attachments yet.
as an alternative to solving this problem I implemented a solution with the code below:
import requests
#token = 'Your smartsheet Token'
#sheetId = 'Your sheet id'
r = requests.get('https://api.smartsheet.com/2.0/sheets/{sheetId}/rows/{rowId}/attachments', headers={'Authorization': f'Bearer {token}'})
response_json = r.json()
print(response_json)
see Get Attachments for more details on handling attachments Smartsheets
I found and modified a code to read CSV attachments file from Outlook application using Python.
What happened in my case is: When I request data for a certain period, they will send me the monthly data of the requested period in separate emails (E.g. request: January 2018 - December 2018; Receive : 12 mails with a single CSV attachment in each one of them.) i save all of the emails coming from the data warehouse in 'DWH Mail'
All of the emails will come from the same subject. So my code will: Save all CSV attachments which are stored in 'DWH Mail' from the Subject specified.
import win32com.client as client
import datetime as date
import os.path
def attach(mail_subject):
outlook = client.Dispatch("Outlook.Application").GetNamespace("MAPI")
folder = outlook.GetDefaultFolder("6").Folders["DWH Mail"]
val_date = date.date.today()
sub_target = mail_subject
for msg in folder.Items:
if msg.ReceivedTime.date() == val_date and msg.Subject == sub_target:
for att in msg.Attachments:
att.SaveASFile(os.getcwd() + "\\" + att.FileName)
print ("Mail Successfully Extracted")
break
print ("Done")
Now I could request for ZIP file, containing the CSV, so that I could receive the file faster. Where and what should I add in my code so that the loop will extract and save the CSV file from the ZIP file? Instead of save the ZIP file and I extract it manually later.
I am relatively new to Python, so any helps would be appreciated. Thank you.
import os
import pandas as pd
import zipfile
curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '/targetfolder/' + yourFileName + '.zip')
text_files = zf.infolist()
# list_ = []
print ("Decompressing and loading data into multiple files... ")
for text_file in text_files:
print(text_file.filename)
df = pd.read_csv(zf.open(text_file.filename)
# do df manipulations if required
df.to_csv(curDir + '/targetfolder/' + text_file.filename + '.csv')
# df = pd.concat(list_)
This will iterate through all the files and load them with the respective names as present in the zip file.