XlsxWriter has a method of adding frozen panes to an excel file:
import xlsxwriter
workbook = xlsxwriter.Workbook('frozen_panes.xlsx')
worksheet1 = workbook.add_worksheet('Panes 1')
worksheet1.freeze_panes(1, 0)
However, I have to use Pyexcelerate, and I can't find anything in their docs related to froze panes. Does Pyexcelerate have a similar method which would allow me to add frozen panes?
To whom it may concern:
The solution was to get a worksheet and add a Pane with the option freeze = true.
The class Pane can be seen here:
https://github.com/kz26/PyExcelerate/blob/dev/pyexcelerate/Panes.py
import pyexcelerate
wb = pyexcelerate.Workbook()
ws = wb.new_sheet("sheet name")
# suppose you want to freeze rows 1-2 and columns A-D
rows = 2
columns = 4
ws.panes = pyexcelerate.Panes(columns, rows) # note order versus set_cell_value
wb.save("example_freeze_panes.xlsx")
Related
I am struggling to set the alignment for data in excel using python
My python function loads data from excel into a pandas dataframe, calculates some new columns, then adds these columns to the original sheet. This all works well, but I now want to tidy up the result.
I can set italics / bold etc using
sheet['E1:J24'].font.bold = True
sheet['E1:J24'].font.italic = True
But I cannot set the alignment properly. I have tried the following, and several other suggestions I found online, but none of them seems to work.
sheet['E1:J24'].alignment = Alignment(horizontal="center")
Any help would be appreciated.
Update to question,
With further on-line searching I came upon this line of code which successfully adjusts the alignment.
sheet.range(f'$E1:J24').api.HorizontalAlignment = -4152
I think the problem is that I connected to the worksheet using xlwings and then tried to use openpyxl to format it. Jupyter didn't give an error because I had imported 'Alignment' from openpyxl
Note, for alignments use setting as follows
center = -4108
right = -4152
Left = -4131
Not sure where the numbers come from
Use 'VerticalAlignment' and/or 'HorizontalAlignment'.
Import VAlign, HAlign from the Xlwings constants to use the name or just use the Excel code. I have copied these into the comments for your information.
import xlwings as xw
from xlwings.constants import VAlign, HAlign
### Xlwings constants
"""
VAlign Class
xlVAlignBottom = -4107
xlVAlignCenter = -4108
xlVAlignDistributed = -4117
xlVAlignJustify = -4130
xlVAlignTop = -4160
HAlign Class
xlHAlignCenter = -4108
xlHAlignCenterAcrossSelection = 7
xlHAlignDistributed = -4117
xlHAlignFill = 5
xlHAlignGeneral = 1
xlHAlignJustify = -4130
xlHAlignLeft = -4131
xlHAlignRight = -4152
"""
path = "foo.xlsx"
with xw.App() as app:
wb = xw.Book(path)
ws = wb.sheets[0]
# Align text vertically
ws.range(1, 1).api.VerticalAlignment = -4160
ws.range(1, 2).api.VerticalAlignment = VAlign.xlVAlignCenter
ws.range(1, 3).api.VerticalAlignment = VAlign.xlVAlignBottom
# Align text horizontally
ws.range(2, 1).api.HorizontalAlignment = HAlign.xlHAlignLeft
ws.range(2, 2).api.HorizontalAlignment = HAlign.xlHAlignCenter
ws.range(2, 3).api.HorizontalAlignment = -4152
wb.save(path)
wb.close()
I'm trying to copy/append a dataframe with multiple column headers(similar to the one below) to an existing excel sheet starting from a particular cell AA2
df1 = pd.DataFrame({'sub1': [np.nan,'E',np.nan,'S'],
'sub2': [np.nan,'D',np.nan,'A']})
df2 = pd.DataFrame({'sub1': [np.nan,'D',np.nan,'S'],
'sub2': [np.nan,'C',np.nan,'S']})
df = pd.concat({'Af':df1, 'Dp':df2}, axis=1)
df
I'm thinking of a solution to export this dataframe to an excel starting in that particular cell and use openpyxl to copy the data from one to another - column by column... but not sure if that is the correct approach. any ideas?!
(the excel sheet that I'm working with has formatting and can't make it into a dataframe and use merge)
I've had success manipulating Excel files in the past with xlsxwriter (you will need to pip install this as a dependency first - although it does not need to be explicitly imported).
import io
import pandas as pd
# Load your file here instead
file_bytes = io.BytesIO()
with pd.ExcelWriter(file_bytes, engine = 'xlsxwriter') as writer:
# Write a DataFrame to Excel into specific cells
pd.DataFrame().to_excel(
writer,
sheet_name = 'test_sheet',
startrow = 10, startcol = 5,
index = False
)
# Note: You can repeat any of these operations within the context manager
# and keep adding stuff...
# Add some text to cells as well:
writer.sheets['test_sheet'].write('A1', 'Your text goes here')
file_bytes.seek(0)
# Then write your bytes to a file...
# Overwriting it in your case?
Bonus:
You can add plots too - just write them to a BytesIO object and then call <your_image_bytes>.seek(0) and then use in insert_image() function.
... # still inside ExcelWriter context manager
plot_bytes = io.BytesIO()
# Create plot in matplotlib here
plt.savefig(plot_bytes, format='png') # Instead of plt.show()
plot_bytes.seek(0)
writer.sheets['test_sheet'].insert_image(
5, # Row start
5, # Col start
'some_image_name.png',
options = {'image_data': plot_bytes}
)
The full documentation is really helpful too:
https://xlsxwriter.readthedocs.io/working_with_pandas.html
I've already read Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file? but here my question is specific to the layout mentioned hereafter.
How to open an Excel file with Pandas, do some modifications, and save it back:
(1) without removing that there is a Filter on the first row
(2) without modifying the "displayed column width" of the columns as displayed in Excel
(3) without removing the formulas which might be present on some cells
?
Here is what I tried, it's a short example (in reality I do more processing with Pandas):
import pandas as pd
df = pd.read_excel('in.xlsx')
df['AB'] = df['A'].astype(str) + ' ' + df['B'].astype(str) # create a new column from 2 others
del df['Date'] # delete columns
del df['Time']
df.to_excel('out.xlsx', index=False)
With this code, the Filter of the first row is removed and the displayed column width are set to a default, which is not very handy (because we would have to manually set the correct width for all columns).
If you are using a machine that has Excel installed on it, then I highly recommend using the flexible xlwings API. This answers all your questions.
Let's assume I have an Excel file called demo.xlxs in the same directory as my program.
app.py
import xlwings as xw # pip install xlwings
import pandas as pd
wb = xw.Book('demo.xlsx')
This will create a initiate an xl workbook instance and open your Excel editor to allow you to invoke Python commands.
Let's assume we have the following dataframe that we want to use to replace the ID and Name column:
new_name
A John_new
B Adams_new
C Mo_new
D Safia_new
wb.sheets['Sheet1']['A1:B1'].value = df
Finally, you can save and close.
wb.save()
wb.close()
I would recommend xlwings, as it interfaces with excel's COM interfaces (like built-in vba), so it is more powerful. I never tested the "preservation of filtering or formula", official doc may provide ways.
For my own use, I just build everything into python, filtering, formulas, so I don't even touch the excel sheet.
Demo:
# [step 0] boiler plate stuff
df = pd.DataFrame(
index=pd.date_range("2020-01-01 11:11:11", periods=100, freq="min"),
columns=list('abc'))
df['a'] = np.random.randn(100, 1)
df['b'] = df['a'] * 2 + 10
# [step 1] google xlwings, and pip/conda install xlwings
# [step 2] open a new excel sheet, no need to save
# (basically this code will indiscriminally wipe whatever sheet that is active on your desktop)
# [step 3] magic, ...and things you can do
import xlwings as xw
wb = xw.books.active
ws = wb.sheets.active
ws.range('A1').current_region.options(index=1).value = df
# I believe this preserves existing formatting, HOWEVER, it will destory filtering
if 1:
# show casing some formatting you can do
active_window = wb.app.api.ActiveWindow
active_window.FreezePanes = False
active_window.SplitColumn = 2 # const_splitcolumn
active_window.SplitRow = 1
active_window.FreezePanes = True
ws.cells.api.Font.Name = 'consolas'
ws.api.Rows(1).Orientation = 60
ws.api.Columns(1).Font.Bold = True
ws.api.Columns(1).Font.ColorIndex = 26
ws.api.Rows(1).Font.Bold = True
ws.api.Rows(1).Borders.Weight = 4
ws.autofit('c') # 'c' means columns, autofitting columns
ws.range(1,1).api.AutoFilter(1)
This is a solution for (1), (2), but not (3) from my original question. (If you have an idea for (3), a comment and/or another answer is welcome).
In this solution, we open the input Excel file two times:
once with openpyxl: this is useful to keep the original layout (which seems totally discarded when reading as a pandas dataframe!)
once as a pandas dataframe df to benefit from pandas' great API to manipulate/modify the data itself. Note: data modification is handier with pandas than with openpyxl because we have vectorization, filtering df[df['foo'] == 'bar'], direct access to the columns by name df['foo'], etc.
The following code modifies the input file and keeps the layout: the first row "Filter" is not removed and the column width of each colum is not modified.
import pandas as pd
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import load_workbook
wb = load_workbook('test.xlsx') # load as openpyxl workbook; useful to keep the original layout
# which is discarded in the following dataframe
df = pd.read_excel('test.xlsx') # load as dataframe (modifications will be easier with pandas API!)
ws = wb.active
df.iloc[1, 1] = 'hello world' # modify a few things
rows = dataframe_to_rows(df, index=False)
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx, column=c_idx, value=value)
wb.save('test2.xlsx')
I think this is not field of pandas, you must use openpyxl in order to take care of all formatting, blocked_rows, name ranges and so on. Main difference is that you cannot use vectorial computation as in pandas so you need to introduce some loop.
I am copying an existing file into a new workbook, and then hidding some unnecessary tabs. One of the tabs that needs to be visible contains a pivot table that after the script finishes appears as values (instead of the actual pivot table). I need to "preserve" the pivot table
Edit: Excel 2013 version
This is my code:
import xlsxwriter
import openpyxl as xl
import shutil
shutil.copy('C:/Prueba/GOOG.xlsm', 'C:/Prueba/GOOG-copia.xlsm')
workbook = xl.load_workbook('C:/Prueba/GOOG-copia.xlsm', keep_vba = 'True')
keep = ['Cacaca','Sheet1'] # Cacaca contains a pivot table that needs to be preserved
for i in workbook.sheetnames:
if i in keep:
pivot = workbook[i]._pivots[0]
pivot.cache.refreshOnLoad = True
workbook[i].sheet_state = 'visible'
else:
workbook[i].sheet_state = 'hidden'
workbook.save('C:/Prueba/GOOG-copia.xlsm')
workbook.close
Error:
AttributeError: 'Worksheet' object has no attribute '_pivots'
According to the documentation to preserve a pivot-tables, you have to set at least one their booleans pivot.cache.refreshOnLoad to True
Due to _pivots not existing on a sheet unless it contains an actual pivot-table, we can check for a pivot-table and set the cache if one is found:
for i in workbook.sheetnames:
if i in keep:
ws = workbook[i]
if hasattr(ws, "_pivots"):
pivot = ws._pivots[0]
pivot.cache.refreshOnLoad = True
workbook[i].sheet_state = 'visible'
else:
workbook[i].sheet_state = 'hidden'
I'm using openpyxl 2.4.8 to generate some excel files. I fill in some data into some columns and insert a chart that plots that data. If I do so manually in excel the charts dynamically update if I use the data-sort methods to remove datapoints. However, the openpyxl generated chart is static and ignore any such sorting.
When looking at the xml of a chart generate by excel and of the one generated by openpyxl I see a lot of differences (fx. all tags are prefaced by 'c:' in excel), but nothing that looks like a setting which would automatically update the content. I cannot find a setting in excel that would turn this on or off either.
The code I use to generate an excel file with a chart is here:
import numpy as np
from openpyxl import *
from random import random
from openpyxl.utils.cell import get_column_letter
from openpyxl.chart import (
LineChart,
BarChart,
ScatterChart,
Reference,
Series,
)
from openpyxl.drawing.text import CharacterProperties
wb = Workbook()
ws = wb.create_sheet()
ws.title = 'interactiveChart'
num = 9
ws.cell(column=1, row=2, value='X')
ws.cell(column=2, row=2, value='Y')
for i in range(num+1):
ws.cell(column=1, row=3+i, value=random()*100)
ws.cell(column=2, row=3+i, value='=A{0}*3+4+ABS(5/(11-A{0}))+ABS(10/(35- A{0}))+ABS(30/(67-A{0}))'.format(3+i))
textSize = 10
modeChart = ScatterChart()
modeChart.title = 'Resonance'
modeChart.title.tx.rich.p[0].r.rPr = CharacterProperties(sz=textSize*100, b=True)
modeChart.style = 48
modeChart.x_axis.title = "X"
modeChart.x_axis.title.tx.rich.p[0].r.rPr = CharacterProperties(sz=textSize*100, b=True)
modeChart.y_axis.title = "Y"
modeChart.y_axis.title.tx.rich.p[0].r.rPr = CharacterProperties(sz=textSize*100, b=True)
modeChart.legend = None
xvalues = Reference(ws, min_col=1, min_row=2, max_row=num+3)
yvalues = Reference(ws, min_col=2, min_row=2, max_row=num+3)
series = Series(yvalues, xvalues, title_from_data=False, title='Resonace')
modeChart.series.append(series)
s1 = modeChart.series[0]
s1.marker.symbol = "diamond"
s1.marker.graphicalProperties.solidFill = "6495ED"
s1.marker.graphicalProperties.line.solidFill = "6495ED"
s1.graphicalProperties.line.noFill = True
modeChart.x_axis.tickLblPos = "low"
modeChart.y_axis.tickLblPos = "low"
modeChart.width = 12
modeChart.height = 7
ws.add_chart(modeChart, "F6")
ws.auto_filter.ref = 'A2:B{}'.format(num+3)
ws = wb.get_sheet_by_name("Sheet")
wb.remove_sheet(ws)
wb.save('aTest.xlsx')
I can't find a reference to this behavior so I'm not certain what I should be looking for either.
This cannot be done directly in openpyxl.
openpyxl is a file format library and not a replacement for Excel. Filtering and sorting needs to be done by an application such as Excel, openpyxl just maintains the relevant metadata.
I found the solution. There is a tag in the xml of Excel charts called: <plotVisOnly/>. This tag was added in openpyxl 2.5 (see the bitbucket issue: Setting property plotVisOnly for charts).
Thanks to a friendly openpyxl contributor for helping me solve this.
In order to make this work with my anaconda installation I first removed openpyxl:
pip uninstall openpyxl
Then i downloaded the latest openpyxl version (>2.5) from python.org package index, unzipped it and ran:
python setup.py install
This installed the >2.5 version of openpyxl. I double checked this in python by running:
import openpyxl
openpyxl.__version__
Hope this helps someone!