I'm using openpyxl 2.4.8 to generate some excel files. I fill in some data into some columns and insert a chart that plots that data. If I do so manually in excel the charts dynamically update if I use the data-sort methods to remove datapoints. However, the openpyxl generated chart is static and ignore any such sorting.
When looking at the xml of a chart generate by excel and of the one generated by openpyxl I see a lot of differences (fx. all tags are prefaced by 'c:' in excel), but nothing that looks like a setting which would automatically update the content. I cannot find a setting in excel that would turn this on or off either.
The code I use to generate an excel file with a chart is here:
import numpy as np
from openpyxl import *
from random import random
from openpyxl.utils.cell import get_column_letter
from openpyxl.chart import (
LineChart,
BarChart,
ScatterChart,
Reference,
Series,
)
from openpyxl.drawing.text import CharacterProperties
wb = Workbook()
ws = wb.create_sheet()
ws.title = 'interactiveChart'
num = 9
ws.cell(column=1, row=2, value='X')
ws.cell(column=2, row=2, value='Y')
for i in range(num+1):
ws.cell(column=1, row=3+i, value=random()*100)
ws.cell(column=2, row=3+i, value='=A{0}*3+4+ABS(5/(11-A{0}))+ABS(10/(35- A{0}))+ABS(30/(67-A{0}))'.format(3+i))
textSize = 10
modeChart = ScatterChart()
modeChart.title = 'Resonance'
modeChart.title.tx.rich.p[0].r.rPr = CharacterProperties(sz=textSize*100, b=True)
modeChart.style = 48
modeChart.x_axis.title = "X"
modeChart.x_axis.title.tx.rich.p[0].r.rPr = CharacterProperties(sz=textSize*100, b=True)
modeChart.y_axis.title = "Y"
modeChart.y_axis.title.tx.rich.p[0].r.rPr = CharacterProperties(sz=textSize*100, b=True)
modeChart.legend = None
xvalues = Reference(ws, min_col=1, min_row=2, max_row=num+3)
yvalues = Reference(ws, min_col=2, min_row=2, max_row=num+3)
series = Series(yvalues, xvalues, title_from_data=False, title='Resonace')
modeChart.series.append(series)
s1 = modeChart.series[0]
s1.marker.symbol = "diamond"
s1.marker.graphicalProperties.solidFill = "6495ED"
s1.marker.graphicalProperties.line.solidFill = "6495ED"
s1.graphicalProperties.line.noFill = True
modeChart.x_axis.tickLblPos = "low"
modeChart.y_axis.tickLblPos = "low"
modeChart.width = 12
modeChart.height = 7
ws.add_chart(modeChart, "F6")
ws.auto_filter.ref = 'A2:B{}'.format(num+3)
ws = wb.get_sheet_by_name("Sheet")
wb.remove_sheet(ws)
wb.save('aTest.xlsx')
I can't find a reference to this behavior so I'm not certain what I should be looking for either.
This cannot be done directly in openpyxl.
openpyxl is a file format library and not a replacement for Excel. Filtering and sorting needs to be done by an application such as Excel, openpyxl just maintains the relevant metadata.
I found the solution. There is a tag in the xml of Excel charts called: <plotVisOnly/>. This tag was added in openpyxl 2.5 (see the bitbucket issue: Setting property plotVisOnly for charts).
Thanks to a friendly openpyxl contributor for helping me solve this.
In order to make this work with my anaconda installation I first removed openpyxl:
pip uninstall openpyxl
Then i downloaded the latest openpyxl version (>2.5) from python.org package index, unzipped it and ran:
python setup.py install
This installed the >2.5 version of openpyxl. I double checked this in python by running:
import openpyxl
openpyxl.__version__
Hope this helps someone!
Related
I am struggling to set the alignment for data in excel using python
My python function loads data from excel into a pandas dataframe, calculates some new columns, then adds these columns to the original sheet. This all works well, but I now want to tidy up the result.
I can set italics / bold etc using
sheet['E1:J24'].font.bold = True
sheet['E1:J24'].font.italic = True
But I cannot set the alignment properly. I have tried the following, and several other suggestions I found online, but none of them seems to work.
sheet['E1:J24'].alignment = Alignment(horizontal="center")
Any help would be appreciated.
Update to question,
With further on-line searching I came upon this line of code which successfully adjusts the alignment.
sheet.range(f'$E1:J24').api.HorizontalAlignment = -4152
I think the problem is that I connected to the worksheet using xlwings and then tried to use openpyxl to format it. Jupyter didn't give an error because I had imported 'Alignment' from openpyxl
Note, for alignments use setting as follows
center = -4108
right = -4152
Left = -4131
Not sure where the numbers come from
Use 'VerticalAlignment' and/or 'HorizontalAlignment'.
Import VAlign, HAlign from the Xlwings constants to use the name or just use the Excel code. I have copied these into the comments for your information.
import xlwings as xw
from xlwings.constants import VAlign, HAlign
### Xlwings constants
"""
VAlign Class
xlVAlignBottom = -4107
xlVAlignCenter = -4108
xlVAlignDistributed = -4117
xlVAlignJustify = -4130
xlVAlignTop = -4160
HAlign Class
xlHAlignCenter = -4108
xlHAlignCenterAcrossSelection = 7
xlHAlignDistributed = -4117
xlHAlignFill = 5
xlHAlignGeneral = 1
xlHAlignJustify = -4130
xlHAlignLeft = -4131
xlHAlignRight = -4152
"""
path = "foo.xlsx"
with xw.App() as app:
wb = xw.Book(path)
ws = wb.sheets[0]
# Align text vertically
ws.range(1, 1).api.VerticalAlignment = -4160
ws.range(1, 2).api.VerticalAlignment = VAlign.xlVAlignCenter
ws.range(1, 3).api.VerticalAlignment = VAlign.xlVAlignBottom
# Align text horizontally
ws.range(2, 1).api.HorizontalAlignment = HAlign.xlHAlignLeft
ws.range(2, 2).api.HorizontalAlignment = HAlign.xlHAlignCenter
ws.range(2, 3).api.HorizontalAlignment = -4152
wb.save(path)
wb.close()
My goal is to add conditional formatting to a dataset, more specifically I want to make every row blue where the total profit is less than 70 thousand. When applying the below code on the dataset the Excel gives an alert on opening and the formatting is not applied to the file:
The dataset is as follows. 'A1 : M101'
I am using the Pycharm IDE, the latest version of openpyxl (3.0.10), other formatting rules work like number_format and I have reduced the code to the below where the issue shows up. Does anyone know why this issue shows up and how I can fix it or work around it using Python?
import openpyxl
from openpyxl.styles import PatternFill, colors
from openpyxl.styles.differential import DifferentialStyle
from openpyxl.formatting.rule import Rule
work_book = openpyxl.load_workbook('datasets/sales_record.xlsx')
sheet = work_book.active
blue_background = PatternFill(bgColor=colors.BLUE)
diff_style = DifferentialStyle(fill=blue_background)
rule = Rule(type='expression', dxf=diff_style)
rule.formula = ["$1M<70000"]
sheet.conditional_formatting.add(sheet.calculate_dimension(), rule)
work_book.save('workbooks/filename.xlsx')
I have also tried the below but I am unsure if the issue is with the color or some of the other formatting:
a_background = PatternFill(bgColor="FFC7CE", fill_type = "solid")
Your corrupted workbook is due to the formula
rule.formula = ["$1M<70000"]
It may just be a typo transposing the 1 and M however you don't want the Header row included anyway so this should be
rule.formula = ["$M2<70000"]
The cell range set would also offset the formatting. It should exclude the Headers as well, so the start of the range should be 'A2'. Therefore the conditional formatting range should cover 'A2:MXX' where XX is the last row.
The changes to your code shown below will then highlight the whole row from Col A to M where the Total_Profit is less than 70,000.
I would also suggest using a different highlight colour than BLUE since it's too dark to see the text.
import openpyxl
from openpyxl.styles import PatternFill, colors
from openpyxl.styles.differential import DifferentialStyle
from openpyxl.formatting.rule import Rule
work_book = openpyxl.load_workbook('sales_record.xlsx')
sheet = work_book.active
blue_background = PatternFill(bgColor=colors.BLUE)
diff_style = DifferentialStyle(fill=blue_background)
rule = Rule(type='expression', dxf=diff_style)
rule.formula = ["$M2<70000"]
# sheet.conditional_formatting.add(sheet.calculate_dimension(), rule)
cf_range = '$A2:$' + sheet.calculate_dimension().split(':')[1]
sheet.conditional_formatting.add(cf_range, rule)
work_book.save('filename.xlsx')
I've already read Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file? but here my question is specific to the layout mentioned hereafter.
How to open an Excel file with Pandas, do some modifications, and save it back:
(1) without removing that there is a Filter on the first row
(2) without modifying the "displayed column width" of the columns as displayed in Excel
(3) without removing the formulas which might be present on some cells
?
Here is what I tried, it's a short example (in reality I do more processing with Pandas):
import pandas as pd
df = pd.read_excel('in.xlsx')
df['AB'] = df['A'].astype(str) + ' ' + df['B'].astype(str) # create a new column from 2 others
del df['Date'] # delete columns
del df['Time']
df.to_excel('out.xlsx', index=False)
With this code, the Filter of the first row is removed and the displayed column width are set to a default, which is not very handy (because we would have to manually set the correct width for all columns).
If you are using a machine that has Excel installed on it, then I highly recommend using the flexible xlwings API. This answers all your questions.
Let's assume I have an Excel file called demo.xlxs in the same directory as my program.
app.py
import xlwings as xw # pip install xlwings
import pandas as pd
wb = xw.Book('demo.xlsx')
This will create a initiate an xl workbook instance and open your Excel editor to allow you to invoke Python commands.
Let's assume we have the following dataframe that we want to use to replace the ID and Name column:
new_name
A John_new
B Adams_new
C Mo_new
D Safia_new
wb.sheets['Sheet1']['A1:B1'].value = df
Finally, you can save and close.
wb.save()
wb.close()
I would recommend xlwings, as it interfaces with excel's COM interfaces (like built-in vba), so it is more powerful. I never tested the "preservation of filtering or formula", official doc may provide ways.
For my own use, I just build everything into python, filtering, formulas, so I don't even touch the excel sheet.
Demo:
# [step 0] boiler plate stuff
df = pd.DataFrame(
index=pd.date_range("2020-01-01 11:11:11", periods=100, freq="min"),
columns=list('abc'))
df['a'] = np.random.randn(100, 1)
df['b'] = df['a'] * 2 + 10
# [step 1] google xlwings, and pip/conda install xlwings
# [step 2] open a new excel sheet, no need to save
# (basically this code will indiscriminally wipe whatever sheet that is active on your desktop)
# [step 3] magic, ...and things you can do
import xlwings as xw
wb = xw.books.active
ws = wb.sheets.active
ws.range('A1').current_region.options(index=1).value = df
# I believe this preserves existing formatting, HOWEVER, it will destory filtering
if 1:
# show casing some formatting you can do
active_window = wb.app.api.ActiveWindow
active_window.FreezePanes = False
active_window.SplitColumn = 2 # const_splitcolumn
active_window.SplitRow = 1
active_window.FreezePanes = True
ws.cells.api.Font.Name = 'consolas'
ws.api.Rows(1).Orientation = 60
ws.api.Columns(1).Font.Bold = True
ws.api.Columns(1).Font.ColorIndex = 26
ws.api.Rows(1).Font.Bold = True
ws.api.Rows(1).Borders.Weight = 4
ws.autofit('c') # 'c' means columns, autofitting columns
ws.range(1,1).api.AutoFilter(1)
This is a solution for (1), (2), but not (3) from my original question. (If you have an idea for (3), a comment and/or another answer is welcome).
In this solution, we open the input Excel file two times:
once with openpyxl: this is useful to keep the original layout (which seems totally discarded when reading as a pandas dataframe!)
once as a pandas dataframe df to benefit from pandas' great API to manipulate/modify the data itself. Note: data modification is handier with pandas than with openpyxl because we have vectorization, filtering df[df['foo'] == 'bar'], direct access to the columns by name df['foo'], etc.
The following code modifies the input file and keeps the layout: the first row "Filter" is not removed and the column width of each colum is not modified.
import pandas as pd
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import load_workbook
wb = load_workbook('test.xlsx') # load as openpyxl workbook; useful to keep the original layout
# which is discarded in the following dataframe
df = pd.read_excel('test.xlsx') # load as dataframe (modifications will be easier with pandas API!)
ws = wb.active
df.iloc[1, 1] = 'hello world' # modify a few things
rows = dataframe_to_rows(df, index=False)
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx, column=c_idx, value=value)
wb.save('test2.xlsx')
I think this is not field of pandas, you must use openpyxl in order to take care of all formatting, blocked_rows, name ranges and so on. Main difference is that you cannot use vectorial computation as in pandas so you need to introduce some loop.
I have an assignment to do for my boring online class and I couldn't come out with an idea to do this thing. I'm told to calculate the ratio of four columns with this formula ratio = weight/heightlengthwidth. Bu i'm bad at using microsoft excel and ironically we haven't learnt anything related to that. So I remembered that there is a python library which works with excel sheets. So how could I calculate this ratio = Weight/HeightWidthLength by using openpyxl for every single row in this excel sheet easily ?
Though I've never used openpyxl library I tried to find a solution to your problem. If the spreadsheet you're working on looks like the one below then you should be able to work with this script.
Sample spreadsheet image
from openpyxl import load_workbook
# Modify filename and sheet name where the data is
workbook_filename = 'workbook.xlsx'
sheet_name = 'Sheet1'
wb = load_workbook(workbook_filename)
ws = wb[sheet_name]
# If the data is stored differently in your file, you have to modify
# this loop to suit your needs
for row in ws.iter_rows(min_row = 2, max_row = 3, max_col = 5):
row[4].value = row[0].value / (row[1].value * row[2].value * row[3].value)
wb.save('result.xlsx')
XlsxWriter has a method of adding frozen panes to an excel file:
import xlsxwriter
workbook = xlsxwriter.Workbook('frozen_panes.xlsx')
worksheet1 = workbook.add_worksheet('Panes 1')
worksheet1.freeze_panes(1, 0)
However, I have to use Pyexcelerate, and I can't find anything in their docs related to froze panes. Does Pyexcelerate have a similar method which would allow me to add frozen panes?
To whom it may concern:
The solution was to get a worksheet and add a Pane with the option freeze = true.
The class Pane can be seen here:
https://github.com/kz26/PyExcelerate/blob/dev/pyexcelerate/Panes.py
import pyexcelerate
wb = pyexcelerate.Workbook()
ws = wb.new_sheet("sheet name")
# suppose you want to freeze rows 1-2 and columns A-D
rows = 2
columns = 4
ws.panes = pyexcelerate.Panes(columns, rows) # note order versus set_cell_value
wb.save("example_freeze_panes.xlsx")