Set cell format and style using Optimized Writer in openpyxl - python

I've to write a huge Excel file and the optimized writer in openpyxl is what I need.
The question is:
is it possibile to set style and format of cells when using optimized writer? Style is not so important (I would only like to highlight column headers), but I need the correct number format for some columns containing currency values.
I saw that ws.cell() method is not available when using optimized writer, so how to do it?
Thank you in advance for your help!

As I can't comment, I'll post an update to Dean's answer here:
openpyxl's api (version 2.4.7) has changed slightly so that it should now read:
from openpyxl import Workbook
wb = Workbook( write_only = True )
ws = wb.create_sheet()
from openpyxl.writer.dump_worksheet import WriteOnlyCell
from openpyxl.styles import Font
cell = WriteOnlyCell(ws, value="highlight")
cell.font = Font(name='Courier', size=36)
cols=[]
cols.append(cell)
cols.append("some other value")
ws.append(cols)
wb.save("test.xlsx")
Hope it helps

You could also look at the XlsxWriter module which allows writing huge files in optimised mode with formatting.
from xlsxwriter.workbook import Workbook
workbook = Workbook('file.xlsx', {'constant_memory': True})
worksheet = workbook.add_worksheet()
...

Quote from docs:
Those worksheet only have an append() method, it’s not possible to
access independent cells directly (through cell() or range()). They
are write-only.
When you pass optimized_write=True to the Workbook constructor, openpyxl will use DumpWorksheet class instead of Worksheet. DumpWorksheet class is very limited in terms of styling and formatting.
But, look at append method - it matches the python type of data you pass to excel types. So, see correct cell formats in the result file after running this:
import datetime
from openpyxl import Workbook
wb = Workbook(optimized_write=True)
ws = wb.create_sheet()
for irow in xrange(5):
ws.append([True, datetime.datetime.now(), 'test', 1, 1.25, '=D1+E1'])
wb.save('output.xlsx')
Speaking about changing the column headers style - just no way to do it using optimized writer.
Hope that helps.

You can use the WriteOnlyCell to do this.
from openpyxl import Workbook
wb = Workbook(optimized_write = True)
ws = wb.create_sheet()
from openpyxl.writer.dump_worksheet import WriteOnlyCell
from openpyxl.styles import Style, Font, PatternFill
cell = WriteOnlyCell(ws, value="highlight")
cell.style = Style(font=Font(name='Courier', size=36), fill=PatternFill(fill_type='solid',start_color='8557e5'))
cols=[]
cols.append(cell)
cols.append("some other value")
ws.append(cols)
wb.save("test.xlsx")
I hope this helps. You can use anything that the style will allow before appending it to the row for the worksheet.

Related

Pandas dataframe to specific sheet in a excel file without losing formatting

I have a dataframe like as shown below
Date,cust,region,Abr,Number,
12/01/2010,Company_Name,Somecity,Chi,36,
12/02/2010,Company_Name,Someothercity,Nyc,156,
df = pd.read_clipboard(sep=',')
I would like to write this dataframe to a specific sheet (called temp_data) in the file output.xlsx
Therfore I tried the below
import pandas
from openpyxl import load_workbook
book = load_workbook('output.xlsx')
writer = pandas.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
I also tried the below
path = 'output.xlsx'
with pd.ExcelWriter(path) as writer:
writer.book = openpyxl.load_workbook(path)
final_df.to_excel(writer, sheet_name='temp_data',startrow=10)
writer.save()
But am not sure whether I am overcomplicating it. I get an error like as shown below. But I verifiedd in task manager, no excel file/task is running
BadZipFile: File is not a zip file
Moreover, I also lose my formatting of the output.xlsx file when I manage to write the file based on below suggestions. I already have a neatly formatted font,color file etc and just need to put the data inside.
Is there anyway to write the pandas dataframe to a specific sheet in an existing excel file? WITHOUT LOSING FORMATTING OF THE DESTIATION FILE
You need to just use to_excel from pandas dataframe.
Try below snippet:
df1.to_excel("output.xlsx",sheet_name='Sheet_name')
If there is existing data please try below snippet:
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('output.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.save()
writer.close()
Are you restricted to using pandas or openpyxl?
Because if you're comfortable using other libraries, the easiest way is probably using win32com to puppet excel as if you were a user manually copying and pasting the information over.
import pandas as pd
import io
import win32com.client as win32
import os
csv_text = """Date,cust,region,Abr,Number
12/01/2010,Company_Name,Somecity,Chi,36
12/02/2010,Company_Name,Someothercity,Nyc,156"""
df = pd.read_csv(io.StringIO(csv_text),sep = ',')
temp_path = r"C:\Users\[User]\Desktop\temp.xlsx" #temporary location where to write this dataframe
df.to_excel(temp_path,index = False) #temporarily write this file to excel, change the output path as needed
excel = win32.Dispatch("Excel.Application")
excel.Visible = True #Switch these attributes to False if you'd prefer Excel to be invisible while excecuting this script
excel.ScreenUpdating = True
temp_wb = excel.Workbooks.Open(temp_path)
temp_ws = temp_wb.Sheets("Sheet1")
output_path = r"C:\Users\[User]\Desktop\output.xlsx" #Path to your output excel file
output_wb = excel.Workbooks.Open(output_path)
output_ws = output_wb.Sheets("Output_sheet")
temp_ws.Range('A1').CurrentRegion.Copy(Destination = output_ws.Range('A1')) # Feel free to modify the Cell where you'd like the data to be copied to
input('Check that output looks like you expected\n') # Added pause here to make sure script doesn't overwrite your file before you've looked at the output
temp_wb.Close()
output_wb.Close(True) #Close output workbook and save changes
excel.Quit() #Close excel
os.remove(temp_path) #Delete temporary excel file
Let me know if this achieves what you were after.
I spent all day on this (and a co-worker of mine spent even longer). Thankfully, it seems to work for my purposes - pasting a dataframe into an Excel sheet without changing any of the Excel source formatting. It requires the pywin32 package, which "drives" Excel as if it a user, using VBA.
import pandas as pd
from win32com import client
# Grab your source data any way you please - I'm defining it manually here:
df = pd.DataFrame([
['LOOK','','','','','','','',''],
['','MA!','','','','','','',''],
['','','I pasted','','','','','',''],
['','','','into','','','','',''],
['','','','','Excel','','','',''],
['','','','','','without','','',''],
['','','','','','','breaking','',''],
['','','','','','','','all the',''],
['','','','','','','','','FORMATTING!']
])
# Copy the df to clipboard, so we can later paste it as text.
df.to_clipboard(index=False, header=False)
excel_app = client.gencache.EnsureDispatch("Excel.Application") # Initialize instance
wb = excel_app.Workbooks.Open("Template.xlsx") # Load your (formatted) template workbook
ws = wb.Worksheets(1) # First worksheet becomes active - you could also refer to a sheet by name
ws.Range("A3").Select() # Only select a single cell using Excel nomenclature, otherwise this breaks
ws.PasteSpecial(Format='Unicode Text') # Paste as text
wb.SaveAs("Updated Template.xlsx") # Save our work
excel_app.Quit() # End the Excel instance
In general, when using the win32com approach, it's helpful to record yourself (with a macro) doing what you want to accomplish in Excel, then reading the generated macro code. Often this will give you excellent clues as to what commands you could invoke.
The solution to your problem exists here: How to save a new sheet in an existing excel file, using Pandas?
To add a new sheet from a df:
import pandas as pd
from openpyxl import load_workbook
import os
import numpy as np
os.chdir(r'C:\workdir')
path = 'output.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
### replace with your df ###
x = np.random.randn(100, 2)
df = pd.DataFrame(x)
df.to_excel(writer, sheet_name = 'x')
writer.save()
writer.close()
You can try xltpl.
Create a template file based on your output.xlsx file.
Render a file with your data.
from xltpl.writerx import BookWriterx
writer = BookWriterx('template.xlsx')
d = {'rows': df.values}
d['tpl_name'] = 'tpl_sheet'
d['sheet_name'] = 'temp_data'
writer.render_sheet(d)
d['tpl_name'] = 'other_sheet'
d['sheet_name'] = 'other'
writer.render_sheet(d)
writer.save('out.xls')
See examples.

python - adding rows to an existing worksheet table

I'm working with the .xlsx file and it has a tab with a workhsheet table where lots of conditional formatting are used. From time to time I need to append this table with new rows.
My plan is to use python openpyxl (or other package) to append this table.
so far I could identify this table as
from openpyxl import load_workbook
wb=load_workbood(myfile)
ws=wb['mytab']
tab = wb.ws._tables[0]
Can I use something like .append() method or change data of this table to add more rows to it?
My goal is to keep the formatting.
I've already tried this approach -
Manipulate existing excel table using openpyxl and it doesn't' work for me
I'm using openpyxl 2.6.1
Regards,
Pavel
from openpyxl import load_workbook
filename= r'C:\Users\PC/test.xlsx'
wb = load_workbook(filename)
ws = wb['Hoja1']
ws["A1"] = "AAA"
ws["A2"] = "BBB"
wb.save(filename)
from openpyxl import load_workbook
wb=load_workbood(myfile)
ws=wb['mytab']
tab = ws.tables["Table1"]
tab.ref = f"A1:{ws.max_column}{ws.max_row}"

Reading and updating sheets in an XLSM file using pandas while preserving the VBA code

I have a requirement to read an xlsm file and update some of the sheets in the file. I want to use pandas for this purpose.
I tried answers presented in the following post. I couldn't see the VBA macros when I add the VBA project back.
https://stackoverflow.com/posts/28170939/revisions
Here are the steps I tried,
Extracted the VBA_project.bin out of the original.xlsm file and then
writer = pd.ExcelWriter('original.xlsx', engine='xlsxwriter')
workbook = writer.book
workbook.filename = 'test.xlsm'
workbook.add_vba_project('vbaProject.bin')
writer.save()
With this I don't see the VBA macros attached to "test.xlsm". The result is the same even if I write it to the "original.xlsm" file.
How do I preserve the VBA macros or add them back to the original xlsm file?
Also, is there a way I can open the "xlsm" file itself rather than the "xlsx" counterpart using pd.ExcelWriter?
You can do this easily with pandas
import pandas as pd
import xlrd
# YOU MUST PUT sheet_name=None TO READ ALL CSV FILES IN YOUR XLSM FILE
df = pd.read_excel('YourFile.xlsm', sheet_name=None)
# prints all sheets
print(df)
Ah, I see. I still can't tell what you are doing, but here are a few general samples of code to get Python to communicate with Excel.
Read contents of a worksheet in Excel:
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('C:\\your_path\\test.xls', sheetname='Sheet1')
************************************************************************************
Use Python to run Macros in Excel:
import os
import win32com.client
#Launch Excel and Open Wrkbook
xl=win32com.client.Dispatch("Excel.Application")
xl.Workbooks.Open(Filename="C:\your_path\excelsheet.xlsm") #opens workbook in readonly mode.
#Run Macro
xl.Application.Run("excelsheet.xlsm!modulename.macroname")
#Save Document and Quit.
xl.Application.Save()
xl.Application.Quit()
#Cleanup the com reference.
del xl
Write, from Python, to Excel:
import xlsxwriter
# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('C:/your_path/ranges_and_offsets.xlsx')
worksheet = workbook.add_worksheet()
# Widen the first column to make the text clearer.
worksheet.set_column('A:A', 20)
# Add a bold format to use to highlight cells.
bold = workbook.add_format({'bold': True})
# Write some simple text.
worksheet.write('A1', 'Hello')
# Text with formatting.
worksheet.write('A2', 'World', bold)
# Write some numbers, with row/column notation.
worksheet.write(2, 0, 123)
worksheet.write(3, 0, 123.456)
workbook.close()
from openpyxl import Workbook
wb = Workbook()
# grab the active worksheet
ws = wb.active
# Data can be assigned directly to cells
ws['A1'] = 42
# Rows can also be appended
ws.append([1, 2, 3])
# Python types will automatically be converted
import datetime
ws['A2'] = datetime.datetime.now()
# Save the file
wb.save("C:\\your_path\\sample.xlsx")

How get a excel sheet with its code name property with "python"

I want to get a Excel's sheet with Python. I can do this with the sheet's name but I want get it with its Code Name property. The following is a code using the sheet's name:
from openpyxl import load_workbook
wb_donnees = load_workbook("Données.xlsm", read_only = True)
name_ws_1 = wb_donnees.get_sheet_name()[0]
ws_1 = wb_donnees[name_ws_1]
But I want get the sheet with its Code Name property. Is it possible ?
Charlie Clark's answer works for me in read mode.
I'm not sure whether OP needed this, but when writing a new workbook, you cannot get the codename this way. Instead, you will need to specify it yourself, otherwise the function returns None, and sheets will only be codenamed 'Sheet1' etc at workbook creation.
wb = load_workbook('input.xlsm')
wsx = wb.create_sheet('New Worksheet')
wsx.sheet_properties.codeName = 'wsx'
wb.save('output.xlsm')
The following should will only work if the file is not opened in read-only mode:
from openpyxl import load_workbook
wb = load_workbook("Données.xlsm")
for n in wb.sheetnames:
ws = wb[n]
print(n, ws.sheet_properties.codeName)

Read Excel cell value and not the formula computing it -openpyxl

I am using openpyxl to read cell value (excel addin-webservice update this column.
)
I have used data_only = True but it is not showing the current cell value instead it is the value stored the last time Excel read the sheet.
wbFile = openpyxl.load_workbook(filename = xxxx,data_only=True)
wsFile = wbFile[c_sSheet]
How can i read the cell actual value ?
wb = openpyxl.load_workbook(filename, data_only=True)
The data_only flag helps.
As #alex-martelli says, openpyxl does not evaluate formulae. When you open an Excel file with openpyxl you have the choice either to read the formulae or the last calculated value. If, as you indicate, the formula is dependent upon add-ins then the cached value can never be accurate. As add-ins outside the file specification they will never be supported. Instead you might want to look at something like xlwings which can interact with the Excel runtime.
data_only : read values for even for the formula cells.
keep_vba: it's used only if you are using macroenabled excel
file_location = 'C:\Arpan Saini\Monsters\Project_Testing\SecCardGrad\SecCardGrad_Latest_docs\Derived_Test_Cases_Secure_Card_Graduate.xlsm'
wb = load_workbook(file_location, keep_vba=True, data_only=True)
As #Charlie Clark mentioned you could use xlwings (if you have MS Excel). Here an example
say you have an excel sheet with formulas, for the example I define one with openpyxl
from openpyxl import Workbook, load_workbook
wb=Workbook()
ws1=wb['Sheet']
ws1['A1']='a'
ws1['A2']='b'
ws1['A3']='c'
ws1['B1']=1
ws1['B2']=2
ws1['B3']='=B1+B2'
wb.save('to_erase.xlsx')
As mentioned, if we load the excel again with openpyxl, we will not get the evaluated formula
wb2 = load_workbook(filename='to_erase.xlsx',data_only=True)
wb2['Sheet']['B3'].value
you can use xlwings to get the formula evaluated by excel:
import xlwings as xw
wbxl=xw.Book('to_erase.xlsx')
wbxl.sheets['Sheet'].range('B3').value
which returns 3, the expected value.
I found it quite useful when working with spreadsheets with very complicated formulas and references between sheets.
Faced the same problem. Needed to read cell values whatever those cells are: scalars, formulae with precomputed values or formulae without them, with fail-tolerance preferred over correctness.
The strategy is pretty straightforward:
if a cell doesn't contain formula, return cell's value;
if it's a formula, try to get its precomputed value;
if couldn't, try to evaluate it using pycel;
if failed (due to pycel's limited support of formulae or with some error), warn and return None.
I made a class which hides all this machinery and provides simple interface for reading cell values.
It's easy to modify the class so that it will raise an exception on step 4, if correctness is preferred over fail-tolerance.
Hope it will help someone.
from traceback import format_exc
from pathlib import Path
from openpyxl import load_workbook
from pycel.excelcompiler import ExcelCompiler
import logging
class MESSAGES:
CANT_EVALUATE_CELL = ("Couldn't evaluate cell {address}."
" Try to load and save xlsx file.")
class XLSXReader:
"""
Provides (almost) universal interface to read xlsx file cell values.
For formulae, tries to get their precomputed values or, if none,
to evaluate them.
"""
# Interface.
def __init__(self, path: Path):
self.__path = path
self.__book = load_workbook(self.__path, data_only=False)
def get_cell_value(self, address: str, sheet: str = None):
# If no sheet given, work with active one.
if sheet is None:
sheet = self.__book.active.title
# If cell doesn't contain a formula, return cell value.
if not self.__cell_contains_formula(address, sheet):
return self.__get_as_is(address, sheet)
# If cell contains formula:
# If there's precomputed value of the cell, return it.
precomputed_value = self.__get_precomputed(address, sheet)
if precomputed_value is not None:
return precomputed_value
# If not, try to compute its value from the formula and return it.
# If failed, report an error and return empty value.
try:
computed_value = self.__compute(address, sheet)
except:
logging.warning(MESSAGES.CANT_EVALUATE_CELL
.format(address=address))
logging.debug(format_exc())
return None
return computed_value
# Private part.
def __cell_contains_formula(self, address, sheet):
cell = self.__book[sheet][address]
return cell.data_type is cell.TYPE_FORMULA
def __get_as_is(self, address, sheet):
# Return cell value.
return self.__book[sheet][address].value
def __get_precomputed(self, address, sheet):
# If the sheet is not loaded yet, load it.
if not hasattr(self, '__book_with_precomputed_values'):
self.__book_with_precomputed_values = load_workbook(
self.__path, data_only=True)
# Return precomputed value.
return self.__book_with_precomputed_values[sheet][address].value
def __compute(self, address, sheet):
# If the computation engine is not created yet, create it.
if not hasattr(self, '__formulae_calculator'):
self.__formulae_calculator = ExcelCompiler(self.__path)
# Compute cell value.
computation_graph = self.__formulae_calculator.gen_graph(
address, sheet=sheet)
return computation_graph.evaluate(f"{sheet}!{address}")
I solved this problem by the following way:
import xlwings
from openpyxl import load_workbook
data = load_workbook('PATH_TO_YOUR_XLSX_FILE')
data['sheet_name']['A1'].value = 1
data.save('PATH_TO_YOUR_XLSX_FILE')
excel_app = xlwings.App(visible=False)
excel_book = excel_app.books.open('PATH_TO_YOUR_XLSX_FILE')
excel_book.save()
excel_book.close()
excel_app.quit()
data = load_workbook('PATH_TO_YOUR_XLSX_FILE', data_only=True)
I hope, this can help You...
Instead on openpyxl, use xlwings.
I found data_only option is not working properly if there is an "REF!" error cell in a worksheet.
Openpyxl returns None for each cell value in my tiny test xlsx file.
For me, after opening Excel and fixing the cell, data_only works perfectly.
I use openpyxl 3.0.3
Rather than use a Python library to do the Excel calculations, I have Excel do them.
Why? It's not pure Python, but it minimizes the amount of Python involved. Instead of using Python to evaluate the Excel formulas, I let Excel handle its own functionality. This avoids any possible bugs in the Python that evaluates the Excel formulas.
Here's an outline of how this approach works:
Call openpyxl with data_only=False to edit and then save the spreadsheet.
Use subprocess.Popen to open the new spreadsheet in Excel, and let Excel evaluate the spreadsheet formulas.
Use pynput.keyboard to save the updated spreadsheet and exit Excel.
Use openpyxl with data_only=True to open the updated spreadsheet and get the values of the formulas.
Here is a test program for Windows that creates a new workbook, puts the formula "=SUM(Al:C3)" in cell E2, puts data into cells A1-C3, and evaluates the formula.
from openpyxl import load_workbook, Workbook
from pynput.keyboard import Key, Controller
import subprocess
import time
import os
excel_prog = r'C:\Program Files\Microsoft Office\root\Office16\EXCEL.EXE'
# Create test Excel workbook, get default worksheet.
wb = Workbook()
ws = wb.active
# Put data and a formula into worksheet.
for row_index in range(1,4):
for column_index in range(1,4):
ws.cell(row = row_index, column = column_index).value = row_index + column_index
ws['E1'].value = 'Sum of cells in range A1:C3:'
ws['E2'].value = '=SUM(A1:C3)'
# Try to get value of formula. We'll see the formula instead.
print('E2:', ws['E2'].value)
# Save and close workbook.
wb.save(filename = 'test.xlsx')
wb.close()
# Pause to give workbook time to close.
time.sleep(5)
# Open the workbook in Excel. I specify folder, otherwise Excel will
# open in "Protected View", interfering with using pynput.
subprocess.Popen([excel_prog, os.path.join(os.getcwd(), 'test.xlsx')])
# Pause to give workbook time to open and for formulas to update.
time.sleep(5)
# Save workbook using pynput.
keyboard = Controller()
with keyboard.pressed(Key.ctrl):
keyboard.press('s')
keyboard.release('s')
# Pause to give workbook time to save.
time.sleep(5)
# Close workbook.
with keyboard.pressed(Key.alt):
keyboard.press(Key.f4)
keyboard.release(Key.f4)
# Pause to give workbook time to fully close.
time.sleep(5)
# Open Excel workbook and worksheet in openpyxl, data-only.
wb = load_workbook(filename = 'test.xlsx', data_only = True)
ws = wb.active
# Get value of the cell containing the formula.
print('E2:', ws['E2'].value)
# Close workbook.
wb.close()
Xlcalculator has the ability to evaluate a cell.
from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator
filename = r'xxxx.xlsm'
compiler = ModelCompiler()
new_model = compiler.read_and_parse_archive(filename)
evaluator = Evaluator(new_model)
val1 = evaluator.evaluate('First!A2')
print("value 'evaluated' for First!A2:", val1)
The output is:
value 'evaluated' for First!A2: 0.1

Categories