I am quite new to python and currently writing a code to speed up a VBA process which takes 5 to 6 hours to complete and want to speed it up. The code needs to open a password protected excel, extract certain sheet and cell data to a master sheet and if column A is that same number then override so no duplicates:
Process:
Step 1: Open password protected xls
step 2: check for the duplicated number in column A and if the same value exists then override, copy required cells from each sheet to master wb and data sheet as shown below
step 3: go back to step one until all xls are done.
This is part of the VBA to show the process to a degree:
wbThis.Worksheets("Data").Range("A" & Store_Row_no) = NewNumber
wbThis.Worksheets("Data").Range("B" & Store_Row_no) = DateNew
wbThis.Worksheets("Data").Range("C" & Store_Row_no) = wbNew.Worksheets("Sheet1").Range("F2").Value
wbThis.Worksheets("Data").Range("D" & Store_Row_no) = wbNew.Worksheets("Sheet2").Range("H152").Value
wbThis.Worksheets("Data").Range("E" & Store_Row_no) = wbNew.Worksheets("Sheet3").Range("D3").Value
and this is my current code but cant work out how I open a password protected excel and copy to master sheet and then overide for data column A if it is a duplicate.
Python code so far:
import win32com.client
import sys
import os
foldername = ('C:\\Users\\')
password = 'ORANGE
pmaster = (r'C:\Users')
xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.Visible = False
master = xlApp.Workbooks.Open(Filename=pmaster)
wb = xlApp.Workbooks.Open(foldername, False, True, None, password)
sh1 = wb.Sheets('sheet1') #sheet name1
sh2 = wb.Sheets('sheet2') #sheet name2
sh3 = wb.Sheets('sheet3') #sheet name2
out1 = sh1.Range("B2").value
out2 = sh1.Range("D2").value
out3 = sh1.Range("F2").value
out4 = sh2.Range("H152").value
out5 = sh3.Range("D3").value
print(out1,out2,out3,out4,out5)
Just need to loop through help and copy to new master wb
Thank you so much in advance
Related
I am trying to update an excel sheet using openpyxl. When reading a updated formula based cell I am getting None output. The updates are not getting saved even though I have used openpyxl save command.
import openpyxl
# data_only=False to upadate excel file
def write_cell(data_only):
wb_obj = openpyxl.load_workbook("mydata.xlsx", data_only=data_only)
sheet_obj = wb_obj["Sheet1"]
sheet_obj = wb_obj.active
sheet_obj.cell(row = 1, column = 1).value = 8
wb_obj.save(filename="mydata.xlsx")
# data_only=True to read excel file"
def read_cell(data_only):
wb_obj = openpyxl.load_workbook("mydata.xlsx", data_only=data_only)
sheet = wb_obj["Sheet1"]
# Formula at column 2 : =A1*5
val = sheet.cell(row = 1, column = 2).value
return val
write_cell(False)
print(read_cell(True))
Actual Output -> None
Expected output -> 40
There are two solutions to this:
If you refer the documentation, it is mentioned that you can either have the formula or the value from formula. If you modify a file with formulae then you must pass it through some kind of application such as Excel and save it again which will now update the value of the formula. You won't get the none as the output now if you try to read the value of the cell containing formula.
Another solution is to open the excel file and save it from the script itself after saving it using openpyxl:
from win32com.client import Dispatch
import openpyxl
def write_cell(data_only):
wb_obj = openpyxl.load_workbook("mydata.xlsx", data_only=data_only)
sheet_obj = wb_obj["Sheet1"]
sheet_obj = wb_obj.active
sheet_obj.cell(row = 1, column = 1).value = 8
wb_obj.save(filename="mydata.xlsx")
open_save("mydata.xlsx")
def open_save(filename):
"""Function to open and save the excel file"""
xlApp = Dispatch("Excel.Application")
xlApp.Visible = False
xlBook = xlApp.Workbooks.Open(filename)
xlBook.Save()
xlBook.Close()
I do have sticky situation with my project. I am trying to update Excel Sheet and export it to PDF in one loop.
At moment I bevies’ best for this is openpyxl library.
Issue is that both are functions writing and printing are opening Excel different way.. using:
book = openpyxl.load_workbook(excel_file) and
wb = excel.Workbooks.Open(excel_file).
Both functions are crossing each other and creating permission issues (at least it is looking like it) plus crashing Jupyter :).
PLEASE is there any elegant way how to do this or I really need 2 loops?
Error call example:
PermissionError: [Errno 13] Permission denied: 'C:/Users/admin/test_files/dir#$$.xlsx'
Code is looking like this:
def update_directory():
excel_file = r'C:/Users/admin/test_files/doo.xlsx'
excel = client.DispatchEx("Excel.Application")
excel.Visible = 0
folder_selected = filedialog.askdirectory()
os.chdir(folder_selected)
for root, dirs, files in os.walk(".", topdown=False):
for name in dirs:
a_pth = os.getcwd()
pdf_file = os.path.join(a_pth,name," ")+"Dic_"+"%s.pdf" % name
book = openpyxl.load_workbook(excel_file)
sheet= book['Sheet1']
sheet.cell(row=4, column=6).value = name
book.save(excel_file)
wb = excel.Workbooks.Open(excel_file)
ws = wb.Worksheets[1]
ws.SaveAs(pdf_file, FileFormat=57)
wb.Close() # <- need to be part of loop (comment from Amiga500). File save
# prompt from Excell present.
excel.Exit()
Having an entry
wb.application.displayalerts = False
Inserted just before the
wb.Close()
line seems to have worked for me, so the code snippet would resemble
book = openpyxl.load_workbook(excel_file)
sheet= book['Sheet1']
sheet.cell(row=4, column=6).value = name
book.save(excel_file)
wb = excel.Workbooks.Open(excel_file)
ws = wb.Worksheets[1]
ws.SaveAs(pdf_file, FileFormat=57)
wb.application.displayalerts = False #This stops the popup asking for a save
wb.Close() # <- need to be part of loop (comment from Amiga500). File save
# prompt from Excell present.
Note wb.Close() is at same indentation as the rest of inner for loop.
I have this simple code and it creates a file "example.xlsx"
I only need the A1 Cell to have an output for the first run.
This is my initial code
from openpyxl import Workbook
import requests
workbook = Workbook()
sheet = workbook.active
success= "DONE"
sheet["A1"] = requests.get('http://ip.42.pl/raw').text
workbook.save(filename="example.xlsx")
print(success)
The first output is an excel file example.xlsx. I am required to update the same excel file every time we run the program. Example.
The 1st run has only A1 with the output from the website http://ip.42.pl/raw and the following will be input to A2, A3 and so on every run.
THANK YOU. I AM BEGINNER. PLEASE BEAR WITH ME
I modified the code, and now I think it does what you ask for:
from openpyxl import Workbook, load_workbook
import os
import requests
workbook = Workbook()
filename = "example.xlsx"
success = "DONE"
# First verifies if the file exists
if os.path.exists(filename):
workbook = load_workbook(filename, read_only=False)
sheet = workbook.active
counter = 1
keep_going = True
while keep_going:
cell_id = 'A' + str(counter)
if sheet[cell_id].value is None:
sheet[cell_id] = requests.get('http://ip.42.pl/raw').text
keep_going = False
else:
counter += 1
workbook.save(filename)
print(success)
else:
# If file does not exist, you have to create an empty file from excel first
print('Please create an empty file ' + filename + ' from excel, as it throws error when created from openpyxl')
Check the question xlsx and xlsm files return badzipfile: file is not a zip file for clarification about why you have to create an empty file from excel so openpyxl can work with it (line in the else: statement).
You could use sheet.max_row in openpyxl to get the length. Like so:
from openpyxl import Workbook
import requests
workbook = Workbook()
sheet = workbook.active
max_row = sheet.max_row
success= "DONE"
sheet.cell(row=max_row+1, column=1).value = requests.get('http://ip.42.pl/raw').text
# sheet["A1"] = requests.get('http://ip.42.pl/raw').text
workbook.save(filename="example.xlsx")
print(success)
All I want to do is copy a worksheet from an excel workbook to another excel workbook in Python.
I want to maintain all formatting (coloured cells, tables etc.)
I have a number of excel files and I want to copy the first sheet from all of them into one workbook. I also want to be able to update the main workbook if changes are made to any of the individual workbooks.
It's a code block that will run every few hours and update the master spreadsheet.
I've tried pandas, but it doesn't maintain formatting and tables.
I've tried openpyxl to no avail
I thought xlwings code below would work:
import xlwings as xw
wb = xw.Book('individual_files\\file1.xlsx')
sht = wb.sheets[0]
new_wb = xw.Book('Master Spreadsheet.xlsx')
new_wb.sheets["Sheet1"] = sht
But I just get the error:
----> 4 new_wb.sheets["Sheet1"] = sht
AttributeError: __setitem__
"file1.xlsx" above is an example first excel file.
"Master Spreadsheet.xlsx" is my master spreadsheet with all individual files.
In the end I did this:
def copyExcelSheet(sheetName):
read_from = load_workbook(item)
#open(destination, 'wb').write(open(source, 'rb').read())
read_sheet = read_from.active
write_to = load_workbook("Master file.xlsx")
write_sheet = write_to[sheetName]
for row in read_sheet.rows:
for cell in row:
new_cell = write_sheet.cell(row=cell.row, column=cell.column,
value= cell.value)
write_sheet.column_dimensions[get_column_letter(cell.column)].width = read_sheet.column_dimensions[get_column_letter(cell.column)].width
if cell.has_style:
new_cell.font = copy(cell.font)
new_cell.border = copy(cell.border)
new_cell.fill = copy(cell.fill)
new_cell.number_format = copy(cell.number_format)
new_cell.protection = copy(cell.protection)
new_cell.alignment = copy(cell.alignment)
write_sheet.merge_cells('C8:G8')
write_sheet.merge_cells('K8:P8')
write_sheet.merge_cells('R8:S8')
write_sheet.add_table(newTable("table1","C10:G76","TableStyleLight8"))
write_sheet.add_table(newTable("table2","K10:P59","TableStyleLight9"))
write_to.save('Master file.xlsx')
read_from.close
With this to check if the sheet already exists:
#checks if sheet already exists and updates sheet if it does.
def checkExists(sheetName):
book = load_workbook("Master file.xlsx") # open an Excel file and return a workbook
if sheetName in book.sheetnames:
print ("Removing sheet",sheetName)
del book[sheetName]
else:
print ("No sheet ",sheetName," found, will create sheet")
book.create_sheet(sheetName)
book.save('Master file.xlsx')
with this to create new tables:
def newTable(tableName,ref,styleName):
tableName = tableName + ''.join(random.choices(string.ascii_uppercase + string.digits + string.ascii_lowercase, k=15))
tab = Table(displayName=tableName, ref=ref)
# Add a default style with striped rows and banded columns
tab.tableStyleInfo = TableStyleInfo(name=styleName, showFirstColumn=False,showLastColumn=False, showRowStripes=True, showColumnStripes=True)
return tab
Adapted from this solution, but note that in my (limited) testing (and as observed in the other Q&A), this does not support the After parameter of the Copy method, only Before. If you try to use After, it creates a new workbook instead.
import xlwings as xw
wb = xw.Book('individual_files\\file1.xlsx')
sht = wb.sheets[0]
new_wb = xw.Book('Master Spreadsheet.xlsx')
# copy this sheet into the new_wb *before* Sheet1:
sht.api.Copy(Before=new_wb.sheets['Sheet1'].api)
# now, remove Sheet1 from new_wb
new_wb.sheets['Sheet1'].delete()
This can be done using pywin32 directly. The Before or After parameter needs to be provided (see the api docs), and the parameter needs to be a worksheet <object>, not simply a worksheet Name or index value. So, for example, to add it to the end of an existing workbook:
def copy_sheet_within_excel_file(excel_filename, sheet_name_or_number_to_copy):
excel_app = win32com_client.gencache.EnsureDispatch('Excel.Application')
wb = excel_app.Workbooks.Open(excel_filename)
wb.Worksheets[sheet_name_or_number_to_copy].Copy(After=wb.Worksheets[wb.Worksheets.Count])
new_ws = wb.ActiveSheet
return new_ws
As most of my code runs on end-user machines, I don't like to make assumptions whether Excel is open or not so my code determines if Excel is already open (see GetActiveObject), as in:
try:
excel_app = win32com_client.GetActiveObject('Excel.Application')
except com_error:
excel_app = win32com_client.gencache.EnsureDispatch('Excel.Application')
And then I also check to see if the workbook is already loaded (see Workbook.FullName). Iterate through the Application.Workbooks testing the FullName to see if the file is already open. If so, grab that wb as your wb handle.
You might find this helpful for digging around the available Excel APIs directly from pywin32:
def show_python_interface_modules():
os.startfile(os.path.dirname(win32com_client.gencache.GetModuleForProgID('Excel.Application').__file__))
I can open a password-protected Excel file with this:
import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = sys.argv[1:3]
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets(1) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1
I'm not sure though how to transfer the information to a pandas dataframe. Do I need to read cells one by one and all, or is there a convenient method for this to happen?
Simple solution
import io
import pandas as pd
import msoffcrypto
passwd = 'xyz'
decrypted_workbook = io.BytesIO()
with open(i, 'rb') as file:
office_file = msoffcrypto.OfficeFile(file)
office_file.load_key(password=passwd)
office_file.decrypt(decrypted_workbook)
df = pd.read_excel(decrypted_workbook, sheet_name='abc')
pip install --user msoffcrypto-tool
Exporting all sheets of each excel from directories and sub-directories to seperate csv files
from glob import glob
PATH = "Active Cons data"
# Scaning all the excel files from directories and sub-directories
excel_files = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.xlsx'))]
for i in excel_files:
print(str(i))
decrypted_workbook = io.BytesIO()
with open(i, 'rb') as file:
office_file = msoffcrypto.OfficeFile(file)
office_file.load_key(password=passwd)
office_file.decrypt(decrypted_workbook)
df = pd.read_excel(decrypted_workbook, sheet_name=None)
sheets_count = len(df.keys())
sheet_l = list(df.keys()) # list of sheet names
print(sheet_l)
for i in range(sheets_count):
sheet = sheet_l[i]
df = pd.read_excel(decrypted_workbook, sheet_name=sheet)
new_file = f"D:\\all_csv\\{sheet}.csv"
df.to_csv(new_file, index=False)
Assuming the starting cell is given as (StartRow, StartCol) and the ending cell is given as (EndRow, EndCol), I found the following worked for me:
# Get the content in the rectangular selection region
# content is a tuple of tuples
content = xlws.Range(xlws.Cells(StartRow, StartCol), xlws.Cells(EndRow, EndCol)).Value
# Transfer content to pandas dataframe
dataframe = pandas.DataFrame(list(content))
Note: Excel Cell B5 is given as row 5, col 2 in win32com. Also, we need list(...) to convert from tuple of tuples to list of tuples, since there is no pandas.DataFrame constructor for a tuple of tuples.
from David Hamann's site (all credits go to him)
https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/
Use xlwings, opening the file will first launch the Excel application so you can enter the password.
import pandas as pd
import xlwings as xw
PATH = '/Users/me/Desktop/xlwings_sample.xlsx'
wb = xw.Book(PATH)
sheet = wb.sheets['sample']
df = sheet['A1:C4'].options(pd.DataFrame, index=False, header=True).value
df
Assuming that you can save the encrypted file back to disk using the win32com API (which I realize might defeat the purpose) you could then immediately call the top-level pandas function read_excel. You'll need to install some combination of xlrd (for Excel 2003), xlwt (also for 2003), and openpyxl (for Excel 2007) first though. Here is the documentation for reading in Excel files. Currently pandas does not provide support for using the win32com API to read Excel files. You're welcome to open up a GitHub issue if you'd like.
Based on the suggestion provided by #ikeoddy, this should put the pieces together:
How to open a password protected excel file using python?
# Import modules
import pandas as pd
import win32com.client
import os
import getpass
# Name file variables
file_path = r'your_file_path'
file_name = r'your_file_name.extension'
full_name = os.path.join(file_path, file_name)
# print(full_name)
Getting command-line password input in Python
# You are prompted to provide the password to open the file
xl_app = win32com.client.Dispatch('Excel.Application')
pwd = getpass.getpass('Enter file password: ')
Workbooks.Open Method (Excel)
xl_wb = xl_app.Workbooks.Open(full_name, False, True, None, pwd)
xl_app.Visible = False
xl_sh = xl_wb.Worksheets('your_sheet_name')
# Get last_row
row_num = 0
cell_val = ''
while cell_val != None:
row_num += 1
cell_val = xl_sh.Cells(row_num, 1).Value
# print(row_num, '|', cell_val, type(cell_val))
last_row = row_num - 1
# print(last_row)
# Get last_column
col_num = 0
cell_val = ''
while cell_val != None:
col_num += 1
cell_val = xl_sh.Cells(1, col_num).Value
# print(col_num, '|', cell_val, type(cell_val))
last_col = col_num - 1
# print(last_col)
ikeoddy's answer:
content = xl_sh.Range(xl_sh.Cells(1, 1), xl_sh.Cells(last_row, last_col)).Value
# list(content)
df = pd.DataFrame(list(content[1:]), columns=content[0])
df.head()
python win32 COM closing excel workbook
xl_wb.Close(False)
Adding to #Maurice answer to get all the cells in the sheet without having to specify the range
wb = xw.Book(PATH, password='somestring')
sheet = wb.sheets[0] #get first sheet
#sheet.used_range.address returns string of used range
df = sheet[sheet.used_range.address].options(pd.DataFrame, index=False, header=True).value