I am using openpyxl to manipulate a Microsoft Excel Worksheet.
What I want to do is to add a Conditional Formatting Rule that fills the rows with a given colour if the row number is even, leaves the row blank if not.
In Excel this can be done by selecting all the worksheet, creating a new formatting rule with the text =MOD(ROW();2)=0 or =EVEN(ROW()) = ROW().
I tried to implement this behaviour with the following lines of code (considering for example the first 10 rows):
redFill = PatternFill(start_color='EE1111', end_color='EE1111', fill_type='solid')
ws2.conditional_formatting.add('A1:A10', FormulaRule(formula=['MOD(ROW();2) = 0'], stopIfTrue=False, fill=redFill))
My program runs correctly but when I try to open the output Excel file, it tells me that the file contains unreadable content and it asks me if I want to recover the worksheet content. By clicking yes, the worksheet is what I expect but there is no formatting.
What is the correct way to apply such a formatting in openpyxl (possibly to the entire worksheet)?
Unfortunately, the way formulae are handled in conditional formatting is particularly opaque. The best thing to do is to create a file with the relevant conditional format and inspect the relevant file by unzipping it. The rules are stored in the relevant worksheet files and the formats in the styles file.
However, I suspect that the problem may simply because you are using ";" to separate parameters in the function: you must always use commas for this.
A sample formula from one of my projects:
green_text = Font(color="006100")
green_fill = PatternFill(bgColor="C6EFCE")
dxf2 = DifferentialStyle(font=green_text, fill=green_fill)
r3 = Rule(type="expression", dxf=dxf2)
r3.formula = ["AND(ISNUMBER(C2), C2>=400)"]
Related
I have an excel workbook that uses functions like OFFSET, UNIQUE, and FILTER which spill into other cells. I'm using python to analyze and write some data to the workbook, but after doing so these formulas revert into normal arrays. This means they now take up a fixed number of cells (however many they took up before opening the file in python) instead of adjusting to fit all of the data. I can revert the change by selecting the formula and hitting enter, but there are many of these formulas it's more work to fix them than to just print the data to a text file and paste it into excel manually. Is there any way to prevent this behavior?
I've been using openpyxl to open and save the workbook, but after encountering this issue also tried xlsxwriter and the dataframe to excel function from pandas. Both of them had the same issue as openpyxl. For context I am on python 3.11 and using the most recent version of these modules. I believe this issue is on the Python side and not the Excel side, so I don't think changing Excel settings will help, but maybe there is something there I missed.
Example:
I've created an empty workbook with two sheets, one called 'main' and one called 'input'. The 'main' sheet will analyze data from the 'input' sheet which will be entered with openpyxl. The data will just be values in the first column.
In cell A1 of the 'main' sheet, enter =OFFSET(input!A1,0,0,COUNTA(input!A:A),1).
This formula will just show a copy of the data. Since there currently isn't any data it gives a #REF! error, so it only takes up one cell.
Now I'll run the following python code to add the numbers 0-9 into the first column of the input sheet:
from openpyxl import load_workbook
wb = load_workbook('workbook.xlsx')
ws = wb['input']
for i in range(10):
ws.append([i])
wb.save('workbook_2.xlsx')
When opening the new file, cell A1 on the 'main' sheet only has the first value, 0, instead of the range 0--9. When selecting the cell, you can see the formula is now {=OFFSET(input!A1,0,0,COUNTA(input!A:A),1)}. The curly brackets make it an array, so it wont spill. By hitting enter in the formula the array is removed and the sheet properly becomes the full range.
If I can get this simple example to work, then expanding it to the data I'm using shouldn't be a problem.
I am reading an excel sheet and plucking data from rows containing the given PO.
import pandas as pd
xlsx = pd.ExcelFile('Book2.xlsx')
df = pd.read_excel(xlsx)
PO_arr = ['121121','212121']
for i in PO_arr:
PO = i
PO_DATA = df.loc[df['PONUM'] == PO]
for i in range(1, max(PO_DATA['POLINENUM'].values) +1):
When I take this Excel sheet straight from its source, my code works fine. But when I cut out only the rows I want and paste them to a new spreadsheet with the exact same formatting and read this new spreadsheet, I have to change PO_DATA to look for an integer instead of a string as such:
PO_DATA = df.loc[df['PONUM'] == int(PO)]
If not, I get an error, and calling PO_DATA returns an empty dataframe.
C:\...\pandas\core\ops\array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
res_values = method(rvalues)
I checked the cell formatting in Excel and in both cases, they are formatted as 'General' cells.
What is going on that makes it so when I chop up my spreadsheet, I have to look for an integer and not a string? What do I have to do to make it work for sheets I've created and pasted relevant data into instead of only sheets from the source?
Excel can do some funky formatting when copy and paste is used: ctl-c : ctl-v.
I am sure you tried these but...
A) Try copy ctl-c then ctl-alt-v:"v":enter ... on new sheet/file
B) Try using the format painter in Excel : Looks like a paintbrush on the home tab - select the properly formatted cells first - double click format painter - move to your new file/sheet - select cells you want the format to conform to.
C) Select your new file/table you pasted into - select purple eraser icon from the top options in excel - clear all formats
Update: I found an old related thread that didn't necessarily answer the question but solved the problem.
you can force pandas to import values as a certain datatype when reading from excel using the converters argument for read_excel.
df = pd.read_excel(xlsx, converters={'POLINENUM':int,'PONUM':int})
Hi I am vey new to python but have been tasked with creating a tool that does the following:
1) opens a csv file
2) splits the data frame up by the values of a single column
3) It will then save those groupings to individual excel workbooks and manipulate the formatting (may add a chart to one of the worksheets based on the newly added Data)
I have found this code, which groups and saves to csv. I can change to the excel format, but I’m really struggling I do the formatting and chart bit. Any help would be very appreciated.
gp = df.groupby('CloneID')
for g in gp.groups:
path = 'CloneID' + str(g) + '.txt'
gp.get_group(g).to_csv(path)
One easy way to create nicely formatted excel sheets is to pre-format a template, and use openpyxl to fill in the rows as you need.
At a high level, your project should include a template, which will be an xlsx file (excel). If you named your project my_project, for example, the structure of your project should look like this:
my_project
--__init__.py
--templates
----formated_excel.xlsx
--main.py
where templates is a directory, formatted_excel is an xlsx file, and main.py is your code.
In main.py, the basic logic of your code would work like this:
import os
import openpyxl
TEMPLATE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
'templates', 'formated_excel.xlsx')
wb = openpyxl.load_workbook(TEMPLATE)
# to use wb[VALUE], your template must have a sheet called VALUE
data_sheet = wb['Data']
# have enumerate start at 2, as in most cases row 1 of a sheet
# is the header
for row, value in enumerate(data, start=2):
data_sheet[f'A{row}'] = value
wb.save('my_output.xlsx')
This example is a very, very basic explanation of how to use openpyxl.
Note that I've assumed you are using python3, if not, you'll have to use the appropriate string formatting when setting the data_sheet row that you are writing to. Openpyxl also has Chart Support, which you can read up on to hep you in formatting your chart.
You haven't provided much detail in exactly what you want to do or the data you are using, so you will have to extend this example to fit your dataset.
I found this on a related subject - Excel function to make SQL-like queries on worksheet data?
But I was wondering if there is any way to use an excel workbook/file like a function in a separate workbook/file? So I have an excel workbook that has a control page - where I can input parameters 1,2,3. And based on those parameter, the outpage page will display the correlating data. I know I can duplictae the output page to show the outputs for all three parameters and link these to my other workbook.
However, is there any way to (in the other excel file) do something like FILENAME_otherfile.function(1).range(A1) or something? to extract the cell A1 with parameter 1 as an input? And in the same file also call FILENAME_otherfile.function(2).range(A1)?
There is a very inelegant way of doing it, but we can keep the interface as pretty as possible.
Open the workbook with a Application.Workbooks.Open()
Set its attribute to Hidden for aesthetic reasons.
Say you want to manipulate cells on "Sheet1" then Set InputRange and OutputRange (cells where you'll see the outputs) to the appropriate ranges in the opened workbook.
Change InputRange.Cells(m,n).Value
Get into some variable what happens, for e.g.
Dim MyAnswer as Double
MyAnswer = OutputRange.Cells(x,y)
Close the workbook, preferably not saving it (as your programming logic requires) and use the MyAnswer value as your 'function' output.
I am relatively new to coding in Python, and here is my problem:
1- When I update data of an existing .xlsx file (using openpyxl), the outcome is a .xlsx that looses all the previous formatting. I've also tried with .xls (using xlwt and xlrd), but nothing changed.
2- So, I decided to keep this unformatted outcome file and apply all the formatting of a template .xls(x) file.
Is there an straight forward way to preserve the formatting at the step 1? if not, how can I implement step 2?
P.S: I've tried to handle styles with xlutils.styles... but I didn't manage to...
Thanks for your help!
You can take the format of a cell with xlrd and set it as the style of a cell you're writing in xlwt. The cell's xf_index attribute is an instance of the XFStyle object and can be dumped directly into the style argument of a write method, like so:
import xlwt,xlrd
readbook = xlrd.open_workbook('book.xls', formatting_info=True)
readsheet = readbook.sheet_by_index(0)
cell = sheet.cell(0,0)
writebook = xlwt.Workbook()
writesheet = writebook.add_sheet('sheet1')
writesheet.write(0,0,'First Cell', cell.xf_index)
From there you can either iterate to match each cell's format or you can write each format to rows or columns, as shown here: https://github.com/python-excel/xlwt/blob/master/xlwt/examples/row_styles.py