Excel formatting in python without loading workbook - python

I am trying to format an excel document within python that I am creating in the same script. All of the answers I have found have involved loading an existing workbook into python and formatting from there. In my script, I am currently writing the entire unformatted excel sheet, saving the file, then immediately reloading the document in to python to format. This is the only workaround I can find so that I can have an active sheet.
writer=pd.ExcelWriter(file_name, engine='openpyxl')
writer.save()#saving my file
wb=load_workbook(file_name) #reloading file to format
ws=wb.active
ws.column_dimensions['A'].width=33
ws.column_dimensions['B'].width=16
wb.save(file_name)
This works to change aspects such as column width, but I would like a way to format the page without saving and reloading. Is there a way to get around the need for an active sheet when there is no file_name written yet? I want a way to remove line 2 and 3, however that may be.

The object that Pandas is creating in ExcelWriter depends on the "engine" you give it. In this case, you're passing along "openpyxl", so ExcelWriter is making an openpyxl.Workbook() object. You can create a new Workbook in openpyxl using "Workbook()" Like so:
https://openpyxl.readthedocs.io/en/default/tutorial.html#create-a-workbook
It is created with 1 active sheet. Basically:
import openpyxl
wb = openpyxl.Workbook()
ws=wb.active
ws.column_dimensions['A'].width=33
ws.column_dimensions['B'].width=16
wb.save(file_name)
...would do the job

Your title is misleading: you're working in Pandas and dumping to Excel. Pandas does allow some formatting for this but, because it tries to support different Python libraries (openpyxl, xlsxwriter and xlwt) there are restrictions on this.
For full control openpyxl provides support for Pandas' DataFrame objects: http://openpyxl.readthedocs.io/en/latest/pandas.html

Related

How to split an Excel workbook by worksheet while preserving grouping

I am doing some excel reports for work and am given a book exported from SSRS daily. The book is nicely set up, with groupings applied to every sheet for an effect similar to pivot tables.
However the book comes with 32 sheets, and I eventually need to send out each sheet individually as a distinct report. Right now I am splitting them up manually, but I am wondering if there is a way to automate this while preserving the grouping.
I previously tried something like:
import xlrd
import pandas as pd
targetWorkbook = xlrd.open_workbook(r'report.xlsx', on_demand=True)
xlsxDoc = pd.ExcelFile('report.xlsx')
for sheet in targetWorkbook.sheet_names():
reportDF = pd.read_excel(xlsxDoc, sheet)
reportDF.to_excel("report - {}.xlsx".format(sheet))
However since I'm converting each sheet to a pandas datagrams, the grouping is lost.
There are multiple ways to read/interact with excel docs in python, but I can't find a clear way to pick out a sheet and save it as its own document without losing the grouping.
This is my full answer. I have used the Worksheets().Move() method. The main idea is to use win32com.client library.
This was tested and works on my Windows 10 system with Excel 2013 installed, and Python 3.7. The grouping format was moved intact with the worksheets. I am still working on getting the looping to work. I will revise my answer again when I get the looping to work.
My example has 3 worksheets, each with different grouping (subtotal) formats.
#
# Refined .Move() method, save new file using Active Worksheet property.
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb0 = excel.Workbooks.Open(r'C:\python\so\original.xlsx')
excel.Visible = True
# Move sheet1.
wb0.Worksheets(1).Move()
excel.Application.ActiveWorkbook.SaveAs(r'C:\python\so\sheet1.xlsx')
# Move sheet2, which is now the front sheet.
wb0.Worksheets(1).Move()
excel.Application.ActiveWorkbook.SaveAs(r'C:\python\so\sheet2.xlsx')
# Save single remaining sheet as sheet3.
wb0.SaveAs(r'C:\python\so\sheet3.xlsx')
wb0.Close()
excel.Application.Quit()
You would also need to install pywin32, which is not a standard library item.
https://github.com/mhammond/pywin32
pip install pywin32

Reading cell value without redefining it with Openpyxl

I need to read this .xlsm database and some of the cells values I need are derived from Excel functions. To accomplish this I used:
from openpyxl import load_workbook
wb = load_workbook('file.xlsm', data_only=True, keep_vba=True)
ws = wb['Plan1']
And then, for every cell I wanted to read:
ws.cell(row=row, column=column).value
This works fine for getting the data out. But the problem comes with saving. When I do:
wb.save('file.xlsm')
It saves the file, but all the formulas inside the sheets are lost
My dilemma is reading the cell's displayed values on one of the database's sheet without modifying them, writing the code's output in a new sheet and saving it.
Read the file once in read-only and data-only mode to look at the values and another time keeping the VBA around. And save under a different name.

Python append xls file using only xlwt/xlrd

I am having problems appending issues appending data to an xls file.
Long story short, I am using a program to get some data from something and writing it in an xls file.
If I run the script 10 times, I would like the results to be appended to the same xls file.
My problem is that I am forced to use Python 3.4 and xlutils is not supported, so I cannot use the copy function.
I just have to use xlwt / xlrd. Note, the file cannot be a xlsx.
Is there any way i can do this?
I would look into using openpyxl, which is supported by Python 3.4. An example of appending to a file can be found https://openpyxl.readthedocs.org/en/default/. Please also see: How to append to an existing excel sheet with XLWT in Python. Here is an example that will do it. Assuming you have an Excel sheet called sample.xlsx:
from openpyxl import Workbook, load_workbook
# grab the active worksheet
wb = load_workbook("sample.xlsx")
ws = wb.active
ws.append([3])
# Save the file
wb.save("sample.xlsx")

Delete excel row with Python

I'm doing some testing using python-excel modules. I can't seem to find a way to delete a row in an excel sheet using these modules and the internet hasn't offered up a solution. Is there a way to delete a row using one of the python-excel modules?
In my case, I want to open an excel sheet, read the first row, determine if it contains some valid data, if not, then delete it.
Any suggestions are welcome.
xlwt provides as the module name suggests Excel writer (creation rather than modification) funcionality.
xlrd on the other hand provides Excel reader funcionality.
If your source excel file is rather simple (no fancy graphs, pivot tables, etc.), you should proceed this way:
with xlrd module read the contents of the targeted excel file, and then with xlwt module create new excel file which contains the necessary rows.
If you, however are running this on windows platform , you might be able to manipulate Excel directly through Microsoft COM objects, see old book reference.
I was having the same issue but found a walk around:
Use a custom filter process (Reader>Filter1>Filter2>...>Writer) to generate a copy of the source excel file but with a blank column inserted at the front. Let's call this file augmented.xls.
Then, read augmented.xls into a xlrd.Workbook object, rb, using xlrd.open_workbook().
Use xlutils.copy.copy() to convert rb into a xlwt.Workbook object, wb.
Set the value of the first column of each of the to-be-deleted rows as "x" (or other values as a marker) in wb.
Save wb back to augmented.xls.
Use another custom filter process to generate a resulting excel file from augmented.xls by omitting those rows with "x" in the first column and shifting all columns one column left (equivalent to deleting the first column of markers).
Information and examples of defining a filter process can be found in http://www.simplistix.co.uk/presentations/python-excel.pdf
Hope this help in some way.
You can use the library openpyxl. When opening a file it is both for reading and for writing. Then, with a simple function you can achieve that:
from openpyxl import load_workbook
wb = load_workbook(filename)
ws = wb.active()
first_row = ws[1]
# Your code here using first_row
if first_row not valid:
ws.delete_rows(1, amount=1)

Reading .xlsx format in python

I've got to read .xlsx file every 10min in python.
What is the most efficient way to do this?
I've tried using xlrd, but it doesn't read .xlsx - according to documentation he does, but I can't do this - getting Unsupported format, or corrupt file exceptions.
What is the best way to read xlsx?
I need to read comments in cells too.
xlrd hasn't released the version yet to read xlsx. Until then, Eric Gazoni built a package called openpyxl - reads xlsx files, and does limited writing of them.
Use Openpyxl some basic examples:
import openpyxl
# Open Workbook
wb = openpyxl.load_workbook(filename='example.xlsx', data_only=True)
# Get All Sheets
a_sheet_names = wb.get_sheet_names()
print(a_sheet_names)
# Get Sheet Object by names
o_sheet = wb.get_sheet_by_name("Sheet1")
print(o_sheet)
# Get Cell Values
o_cell = o_sheet['A1']
print(o_cell.value)
o_cell = o_sheet.cell(row=2, column=1)
print(o_cell.value)
o_cell = o_sheet['H1']
print(o_cell.value)
# Sheet Maximum filled Rows and columns
print(o_sheet.max_row)
print(o_sheet.max_column)
There are multiple ways to read XLSX formatted files using python. Two are illustrated below and require that you install openpyxl at least and if you want to parse into pandas directly you want to install pandas, eg. pip install pandas openpyxl
Option 1: pandas direct
Primary use case: load just the data for further processing.
Using read_excel() function in pandas would be your best choice. Note that pandas should fall back to openpyxl automatically but in the event of format issues its best to specify the engine directly.
df_pd = pd.read_excel("path/file_name.xlsx", engine="openpyxl")
Option 2 - openpyxl direct
Primary use case: getting or editing specific Excel document elements such as comments (requested by OP), formatting properties or formulas.
Using load_workbook() followed by comment extraction using the comment attribute for each cell would be achieved by the following.
from openpyxl import load_workbook
wb = load_workbook(filename = "path/file_name.xlsx")
ws = wb.active
ws["A1"].comment # <- loop through row & columns to extract all comments

Categories