Splitting Up Excel Worksheet By Unique Column Values Using Openpyxl - python

I am attempting to separate an Excel spreadsheet with multiple email addresses in the same column to a separate workbook based on each unique email address value.
My intent is to copy and paste each group of rows with the same email address value into a separate workbook I can then attach to an email message and send as an Excel attachment to each corresponding recipient within the new workbook.
So far I have only been able to copy and paste rows for a specific value into a new workbook and then send that file to the email address in a specific cell location, in this case "O2".
import openpyxl
wb1 = openpyxl.load_workbook(r'file.xlsx', data_only=True)
wb2 = openpyxl.Workbook()
sheet1 = wb1['Sheet1']
sheet2 = wb2['Sheet2']
header = sheet[1:1]
listH =[]
for h in header:
listH.append(h.value)
sheet2.append(listH)
colOfInterest= 15 # This is the column which contains the email addresses
for rowNum in range(2, sheet1.max_row +1):
if sheet1.cell(row=rowNum, column=colOfInterest).value is not None:
listA = []
row = sheet1[rowNum:rowNum]
for cell in row:
listA.append(cell.value)
if listA[15] == "email#domainname.com":
sheet2.append(listA)
wb1.save('TestList1.xlsx')
import win32com.client as win32
outlook = win32.Dispatch("outlook.application")
mail = outlook.CreateItem(0)
mail.To = (str(sheet1['O2'].value))
mail.Cc = (str(sheet1['P2'].value))
mail.Subject = "Subject Line Goes Here"
mail.Body = ""
mail.HTMLBody = """ Email content goes here."""
attachment1 = "List1.xlsx"
mail.Attachments.Add(attachment1)
mail.Send()
The code listed above works only for specific email address values. I have yet to figure out how to have python loop through the entire original spreadsheet to create the separate Excel attachment for each unique email address value and email it to the appropriate recipient.
Thanks in advance for any help you can provide!

Related

Multiple excel tables in Outlook with Python

I need to send Outlook email with 3 excel tables.
I have one excel file - master_file.csv (this file is filled with automated data from pandas data frame)
In this file I have one sheet (Sheet1) with 3 tables
These tables have always the same number of columns:
table_1 from A to R
table_2 from S to AJ
table_3 from AK to BD
Number of rows is changing every time so range for rows should be determined depending on filled cells (probably XlDirectionDown)
These tables have their own formatting in Excel file - this formatting needs to be copied into the email
Email should look somethig like that:
'Text'
"Table 1'
'Text"
"Table 2"
"Text"
"Table 3"
"Text"
I already tried code below but can't figure it out all this together and I bumped into 100 of options which none of them works.
Can you help me out with problem of adding excel tables to outlook email when table needs to be determined based on filled cells in rows?
import sys
from pathlib import Path
import win32com.client as win32
from PIL import ImageGrab
excel_path = str(Path.cwd() / 'master_file.xlsm')
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = False
excel.DisplayAlerts = False
wb = excel.Workbooks.Open(excel_path)
ws = wb.Worksheets(1)
win32c = win32.constants
ws.Range("A1:R11").CopyPicture(Appearance=1, Format=win32c.xlBitmap)
img = ImageGrab.grabclipboard()
image_path = str(Path.cwd() / 'test.png')
img.save(image_path)
outlook = win32.gencache.EnsureDispatch('Outlook.Application')
new_mail = outlook.CreateItem(0)
new_mail.To = 'person#email.com'
new_mail.Attachments.Add(Source=image_path)
body = "<h1>Email text...</h1><br><br> <img src=test.png>"
new_mail.HTMLBody = (body)
new_mail.Display()
wb.Close()```
I'm answering my own question as I found a solution and maybe it will be helpful for somebody.
There are actually 2 solutions but the second one is more suitable for nice email in my opinion.
First solution: We have one csv/excel file.
import sys
from pathlib import Path
import win32com.client as win32
from PIL import ImageGrab
import xlwings as xw
# open raw data file
filename_read = 'master_file.csv'
wb = xw.Book(filename_read)
sht = wb.sheets[0]
# find the numbers of columns and rows in the sheet
num_col = sht.range('A1').end('right').column
num_row = sht.range('A4').end('down').row
# collect data
content_list = sht.range((1,1),(num_row,num_col-1))
excel_path = str(Path.cwd() / 'master_file.xlsm')
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = False
excel.DisplayAlerts = False
wb = excel.Workbooks.Open(excel_path)
ws = wb.Worksheets(1)
win32c = win32.constants
ws.Range(f'A1:R{num_row}').CopyPicture(Appearance=1, Format=win32c.xlBitmap)
img = ImageGrab.grabclipboard()
image_path1 = str(Path.cwd() / 'test.png')
img.save(image_path1)
win32c = win32.constants
ws.Range(f'S1:AJ{num_row}').CopyPicture(Appearance=1, Format=win32c.xlBitmap)
img = ImageGrab.grabclipboard()
image_path2 = str(Path.cwd() / 'test2.png')
img.save(image_path2)
outlook = win32.gencache.EnsureDispatch('Outlook.Application')
new_mail = outlook.CreateItem(0)
new_mail.To = 'person#email.com'
new_mail.Attachments.Add(Source=image_path1)
new_mail.Attachments.Add(Source=image_path2)
body = "<h1>Hello team,</h1> <br><br> <h2> Here are data for yesterday.</h2> <br><br> <h2>Call Metrics:</h2> <br><br> <img src=test.png width=1700 height=600> <br><br> <h2>Back office metrics:</h2> <img src=test2.png width=1700 height=600> "
new_mail.HTMLBody = (body)
new_mail.Display()
wb.Close()
This code is searching for a table end and take a picture of it and put it to email. It is good solution when you have always sort of the same number of rows. But if that changes a lot (like one time 30 rows, and second time 100 rows) images will be too small or too big if you set up one fixed width and height like I did.
Second solution: Create one excel for one table (i.e. "table_1.xlsm", "table_2.xlsm"). Put there a simple VBA code
Sub auto_open()
Application.ScreenUpdating = False
Application.AlertBeforeOverwriting = False
Application.DisplayAlerts = False
Range("A4:R200").Clear
Workbooks.Open "C:\Users\xxxxxxx\master_source.csv"
Windows("master_source.csv").Activate
'This range below select data till rows are filled'
Range("A2:R2", Range("B2:R2").End(xlDown).End(xlToRight)).Select
Range("A2").Activate
Selection.Copy
Windows("table_1.xlsm").Activate
Sheet1.Select
Range("A4").Select
Sheet1.Paste
Windows("master_source.csv").Application.CutCopyMode = False
Windows("master_source.csv").Close
Range("A2:R2", Range("B2:R2").End(xlDown).End(xlToRight)).Borders.LineStyle = XlLineStyle.xlContinuous
Range("A2:R2", Range("B2:R2").End(xlDown).End(xlToRight)).HorizontalAlignment = xlCenter
Range("A2:R2", Range("B2:R2").End(xlDown).End(xlToRight)).VerticalAlignment = xlCenter
ActiveWorkbook.Save
Application.Quit
End Sub
Save this SHEET as Web Page - (!!) This is important - save JUST a sheet. If you save whole Workbook it will save with FRAMES, which Outlook doesn't support.
Save it and tick a square with AutoRepublish (after every save it will update our HTML file).
Then this Python code
import os, time, sys
from datetime import datetime, timedelta, date
from pathlib import Path
import win32com.client as win32
#seting up yesterday date and date format
d = date.today() - timedelta(days=1)
dt = d.strftime("%d/%m/%y")
#saving copy of the file for future usage
filepath = Path(f'C:/Users/xxxxxxxx/Agent report {d}.csv')
filepath.parent.mkdir(parents=True, exist_ok=True)
master_df.to_csv(filepath)
#updating 3 tables - it opens every table file and then VBA doing it's job automatically as it is "auto_open"
for x in range(1, 4):
os.system(f'start "excel" "C:\\xxxxxxxxxxx\\table_{x}.xlsm"')
time.sleep(10)
outlook = win32.gencache.EnsureDispatch('Outlook.Application')
mail = outlook.CreateItem(0)
mail.To = 'person#gmail.com'
mail.Subject = f'Agent report for {dt}'
table1 = open(r'C:\xxxxxxxxxxxxxxxxxxxxxx\table_1.htm').read()
table2 = open(r'C:\xxxxxxxxxxxxxxxxx\table_2.htm').read()
table3 = open(r'C:\xxxxxxxxxxxxxxxxxxxxx\table_3.htm').read()
mail.HTMLBody = f"""\
<html>
<head></head>
<body>
Hello team,<br><br>
Below are metrics for your agents for previous day<br><br>
<b>First Metrics:</b><br><br>
{table1}<br><br>
<b>Second metrics:</b><br><br>
{table2}<br><br>
<b>Third statistics:</b><br><br>
{table3}<br>
Reference:<br>
Kind regards,<br>
</body>
</html>
"""
mail.Send()

Problem when trying to run python script from VBA

I'm new on coding.. I'm trying to write my first script called from VBA and I always. The script works well itself, but now I want to call it from VBA passing two variables (the URL and file name).
I alway get error in the shell line.
I hope you can help me..
thanks
This is the URL containing the table to download: https://jde.erpref.com/?schema=920&table=F4311
#this is my VBA code:
Sub Dw_table()
Dim url As String
Dim file_name As String
Dim PythonExe, PythonScript As String
Dim objShell As Object
' Prompt the user to enter the URL and file name
url = InputBox("Enter the URL of the table:")
file_name = InputBox("Enter the file name:")
'paths for exe and script
PythonExe = """C:\Users\Mario Rdz\AppData\Local\Programs\Python\Python311\python.exe"""
PythonScript = "C:\Users\Mario Rdz\PycharmProjects\HelloWorld\JDEREF.py"
objShell.Run PythonExe & PythonScript & url & file_name
End Sub
this is my python code
from bs4 import BeautifulSoup
import requests
import openpyxl
import sys
# Passed arguments from excel VBA
url = sys.argv[1]
file_name = sys.argv[2]`
page = requests.get(url)
#Parse the HTML content of the webpage
soup = BeautifulSoup(page.content, "html.parser")
#Find the table with the ID "columnselectcollection"
table = soup.find(id="columnselectcollection")
#Create a new workbook and add a worksheet
your text`workbook = openpyxl.Workbook()
worksheet = workbook.active
#Create a new workbook and add a worksheet
workbook = openpyxl.Workbook()
worksheet = workbook.active
#Set the values for column headers
worksheet.cell(row=1, column=2).value = "Seq"
worksheet.cell(row=1, column=3).value = "Field"
worksheet.cell(row=1, column=4).value = "Description"
worksheet.cell(row=1, column=5).value = "Data type"
worksheet.cell(row=1, column=6).value = "Edit Type type"
worksheet.cell(row=1, column=7).value = "Lenght"
worksheet.cell(row=1, column=8).value = "Decimals"
#Iterate over the rows of the table and write the data to the worksheet
for row in table.find_all('tr'):
row_data = []
for cell in row.find_all('td'):
row_data.append(cell.text)
worksheet.append(row_data)
#delete row 2 which is blanks and delete unuseful column 9
worksheet.delete_cols(9)
worksheet.delete_rows(2)
#Save the workbook to an Excel file
workbook.save(f"{file_name}.xlsx")
I think the issue ius the way how I'm calling the python script or how I'm passing the URL and file name to the python script

Grab only latest received outlook email with python

I am trying to save attachment from outlook email if it contains a specific string in the subject and also during the day there might be 2 or 3 emails with the same subject but the content in attachment changes. I wrote the code to save the attachment, however it saves attachments from all emails with this subject received today. Is it possible to save the attachment only of the latest receive email with this subject?
Here is my code:
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
folder = outlook.GetDefaultFolder(6).folders("target folder")
items= folder.items
received_dt = date.strftime('%Y%m%d')
time_writing = datetime.date.today().strftime('%Y%m%d')
items.Sort("[ReceivedTime]", Descending=True)
for i in items:
RT = i.ReceivedTime
Msgdate = datetime.datetime(RT.year ,RT.month, RT.day, RT.hour, RT.minute, RT.second)
msgdate = Msgdate.strftime('%Y%m%d')
if "apples" in i.Subject and received_dt == msgdate:
for att in i.Attachments:
att.SaveAsFile(os.path.join(input_folder + 'Apples_folder\\', 'apples_'+time_writing+'csv'))
print('Apples attachment is saved')
if "oranges" in i.Subject and received_dt == msgdate:
for att in i.Attachments:
att.SaveAsFile(os.path.join(input_folder + 'Oranges_folder\\', 'oranges_'+time_writing+'csv'))
print('Oranges attachment is saved')
If I use break after apples block of code to save attachment, then it doesn't go to oranges, also when I am trying to enumerate and grab only the first email python shows:
This object does not support enumeration
Use Items.GetLast
items = folder.Items # needs to be uppercase IIUC
items.Sort("[ReceivedTime]", Descending=True)
most_recent = items.GetLast()
# you can call strftime directly on the ReceivedTime object
msgdate = most_recent.ReceivedTime.strftime("%Y%m%d")

Password protected xls data transfer to master sheet

I am quite new to python and currently writing a code to speed up a VBA process which takes 5 to 6 hours to complete and want to speed it up. The code needs to open a password protected excel, extract certain sheet and cell data to a master sheet and if column A is that same number then override so no duplicates:
Process:
Step 1: Open password protected xls
step 2: check for the duplicated number in column A and if the same value exists then override, copy required cells from each sheet to master wb and data sheet as shown below
step 3: go back to step one until all xls are done.
This is part of the VBA to show the process to a degree:
wbThis.Worksheets("Data").Range("A" & Store_Row_no) = NewNumber
wbThis.Worksheets("Data").Range("B" & Store_Row_no) = DateNew
wbThis.Worksheets("Data").Range("C" & Store_Row_no) = wbNew.Worksheets("Sheet1").Range("F2").Value
wbThis.Worksheets("Data").Range("D" & Store_Row_no) = wbNew.Worksheets("Sheet2").Range("H152").Value
wbThis.Worksheets("Data").Range("E" & Store_Row_no) = wbNew.Worksheets("Sheet3").Range("D3").Value
and this is my current code but cant work out how I open a password protected excel and copy to master sheet and then overide for data column A if it is a duplicate.
Python code so far:
import win32com.client
import sys
import os
foldername = ('C:\\Users\\')
password = 'ORANGE
pmaster = (r'C:\Users')
xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.Visible = False
master = xlApp.Workbooks.Open(Filename=pmaster)
wb = xlApp.Workbooks.Open(foldername, False, True, None, password)
sh1 = wb.Sheets('sheet1') #sheet name1
sh2 = wb.Sheets('sheet2') #sheet name2
sh3 = wb.Sheets('sheet3') #sheet name2
out1 = sh1.Range("B2").value
out2 = sh1.Range("D2").value
out3 = sh1.Range("F2").value
out4 = sh2.Range("H152").value
out5 = sh3.Range("D3").value
print(out1,out2,out3,out4,out5)
Just need to loop through help and copy to new master wb
Thank you so much in advance

Copy excel sheet from one worksheet to another in Python

All I want to do is copy a worksheet from an excel workbook to another excel workbook in Python.
I want to maintain all formatting (coloured cells, tables etc.)
I have a number of excel files and I want to copy the first sheet from all of them into one workbook. I also want to be able to update the main workbook if changes are made to any of the individual workbooks.
It's a code block that will run every few hours and update the master spreadsheet.
I've tried pandas, but it doesn't maintain formatting and tables.
I've tried openpyxl to no avail
I thought xlwings code below would work:
import xlwings as xw
wb = xw.Book('individual_files\\file1.xlsx')
sht = wb.sheets[0]
new_wb = xw.Book('Master Spreadsheet.xlsx')
new_wb.sheets["Sheet1"] = sht
But I just get the error:
----> 4 new_wb.sheets["Sheet1"] = sht
AttributeError: __setitem__
"file1.xlsx" above is an example first excel file.
"Master Spreadsheet.xlsx" is my master spreadsheet with all individual files.
In the end I did this:
def copyExcelSheet(sheetName):
read_from = load_workbook(item)
#open(destination, 'wb').write(open(source, 'rb').read())
read_sheet = read_from.active
write_to = load_workbook("Master file.xlsx")
write_sheet = write_to[sheetName]
for row in read_sheet.rows:
for cell in row:
new_cell = write_sheet.cell(row=cell.row, column=cell.column,
value= cell.value)
write_sheet.column_dimensions[get_column_letter(cell.column)].width = read_sheet.column_dimensions[get_column_letter(cell.column)].width
if cell.has_style:
new_cell.font = copy(cell.font)
new_cell.border = copy(cell.border)
new_cell.fill = copy(cell.fill)
new_cell.number_format = copy(cell.number_format)
new_cell.protection = copy(cell.protection)
new_cell.alignment = copy(cell.alignment)
write_sheet.merge_cells('C8:G8')
write_sheet.merge_cells('K8:P8')
write_sheet.merge_cells('R8:S8')
write_sheet.add_table(newTable("table1","C10:G76","TableStyleLight8"))
write_sheet.add_table(newTable("table2","K10:P59","TableStyleLight9"))
write_to.save('Master file.xlsx')
read_from.close
With this to check if the sheet already exists:
#checks if sheet already exists and updates sheet if it does.
def checkExists(sheetName):
book = load_workbook("Master file.xlsx") # open an Excel file and return a workbook
if sheetName in book.sheetnames:
print ("Removing sheet",sheetName)
del book[sheetName]
else:
print ("No sheet ",sheetName," found, will create sheet")
book.create_sheet(sheetName)
book.save('Master file.xlsx')
with this to create new tables:
def newTable(tableName,ref,styleName):
tableName = tableName + ''.join(random.choices(string.ascii_uppercase + string.digits + string.ascii_lowercase, k=15))
tab = Table(displayName=tableName, ref=ref)
# Add a default style with striped rows and banded columns
tab.tableStyleInfo = TableStyleInfo(name=styleName, showFirstColumn=False,showLastColumn=False, showRowStripes=True, showColumnStripes=True)
return tab
Adapted from this solution, but note that in my (limited) testing (and as observed in the other Q&A), this does not support the After parameter of the Copy method, only Before. If you try to use After, it creates a new workbook instead.
import xlwings as xw
wb = xw.Book('individual_files\\file1.xlsx')
sht = wb.sheets[0]
new_wb = xw.Book('Master Spreadsheet.xlsx')
# copy this sheet into the new_wb *before* Sheet1:
sht.api.Copy(Before=new_wb.sheets['Sheet1'].api)
# now, remove Sheet1 from new_wb
new_wb.sheets['Sheet1'].delete()
This can be done using pywin32 directly. The Before or After parameter needs to be provided (see the api docs), and the parameter needs to be a worksheet <object>, not simply a worksheet Name or index value. So, for example, to add it to the end of an existing workbook:
def copy_sheet_within_excel_file(excel_filename, sheet_name_or_number_to_copy):
excel_app = win32com_client.gencache.EnsureDispatch('Excel.Application')
wb = excel_app.Workbooks.Open(excel_filename)
wb.Worksheets[sheet_name_or_number_to_copy].Copy(After=wb.Worksheets[wb.Worksheets.Count])
new_ws = wb.ActiveSheet
return new_ws
As most of my code runs on end-user machines, I don't like to make assumptions whether Excel is open or not so my code determines if Excel is already open (see GetActiveObject), as in:
try:
excel_app = win32com_client.GetActiveObject('Excel.Application')
except com_error:
excel_app = win32com_client.gencache.EnsureDispatch('Excel.Application')
And then I also check to see if the workbook is already loaded (see Workbook.FullName). Iterate through the Application.Workbooks testing the FullName to see if the file is already open. If so, grab that wb as your wb handle.
You might find this helpful for digging around the available Excel APIs directly from pywin32:
def show_python_interface_modules():
os.startfile(os.path.dirname(win32com_client.gencache.GetModuleForProgID('Excel.Application').__file__))

Categories