I'm trying to set conditional formatting in openpyxl to emulate highlighting duplicate values. With this simple code, I should be able to highlight consecutive duplicates (but not the first value in a duplicate sequence).
from pandas import *
data = DataFrame({'a':'a a a b b b c b c a f'.split()})
wb = ExcelWriter('test.xlsx')
data.to_excel(wb)
ws = wb.sheets['Sheet1']
from openpyxl.style import Color, Fill
# Create fill
redFill = Fill()
redFill.start_color.index = 'FFEE1111'
redFill.end_color.index = 'FFEE1111'
redFill.fill_type = Fill.FILL_SOLID
ws.conditional_formatting.addCellIs("B1:B1048576", 'equal', "=R[1]C", True, wb.book, None, None, redFill)
wb.save()
However, when I open it in Excel I get an error related to conditional formatting, and the data is not highlighted as expected. Is openpyxl able to handle R1C1 style referencing?
In regards to highlighting to find duplicates of sequential values, the formula you want is
=AND(B1<>"",B2=B1)
With a range starting from B2 (aka, B2:B1048576)
Note - this appears to be broken in the current 1.8.3 branch of openpyxl, but will be fixed shortly in the 1.9 branch.
from openpyxl import Workbook
from openpyxl.style import Color, Fill
wb = Workbook()
ws = wb.active
ws['B1'] = 1
ws['B2'] = 2
ws['B3'] = 3
ws['B4'] = 3
ws['B5'] = 7
ws['B6'] = 4
ws['B7'] = 7
# Create fill
redFill = Fill()
redFill.start_color.index = 'FFEE1111'
redFill.end_color.index = 'FFEE1111'
redFill.fill_type = Fill.FILL_SOLID
dxfId = ws.conditional_formatting.addDxfStyle(wb, None, None, redFill)
ws.conditional_formatting.addCustomRule('B2:B1048576',
{'type': 'expression', 'dxfId': dxfId, 'formula': ['AND(B1<>"",B2=B1)']})
wb.save('test.xlsx')
As a further reference:
If you want to highlight all duplicates:
COUNTIF(B:B,B1)>1
If you want to highlight all duplicates except for the first occurence:
COUNTIF($B$2:$B2,B2)>1
If you to highlight sequential duplicates, except for the last one:
COUNTIF(B1:B2,B2)>1
Regarding RC notation - while openpyxl doesn't support excel RC notation, conditional formatting will write the formula as provided. Unfortunately, excel enables R1C1 notation only superficially as a flag, and converts all the formulas back to their A1 equivalent when saving, meaning you'd need a function to convert all R1C1 functions to their A1 equivalents for this to work.
Openpyxl doesn't support Excel RC notation.
You could use A1 notation instead which would mean that the equivalent formula is =B2 (I think).
However, you should verify that it actually works in Excel first.
My feeling is that it won't. In general conditional formatting uses absolute cell references $B$2 instead of relative cell references B1.
If it does work then convert your formula to A1 notation and that should work in Openpyxl.
You can't use R1C1 notation directly, and this answer would be a terrible way to format a range of cells, but OpenPyXL does allow you to use row and column numbers.
cell = ws.cell(r, c)
returns the worksheet cell at row r and column c, creating one if needed. Unlike the old xlrd/xlwt modules, row and column indices begin at 1, so you can read r and c directly off of a spreadsheet using the R1C1 reference style. For most purposes, you want to access .value, for example:
ws.cell(2, 3).value = 3
...
v = ws.cell(4, 5).value
It's not nearly as pretty as ws['R2C3'] = 3 or v = ws['R4C5'], but it helps with simple tasks.
Related
let me describe my issue below:
I've got got two excel worksheets, one containing past, the other - current data. They both have the following structure:
Col_1
Col_2
KEY
Col_3
Etc.
abc
xyz
key_1
foo
---
def
zyx
key_2
bar
---
Now, the goal is to check if a value for given key changed between the past and current iteration and if yes, color the given cell's background (in current data worksheet). This check has to be done for all the columns.
As the KEY column is not the very first one, I've decided to use XLOOKUP function and apply the formatting within the for loop. The full loop looks like this (in this example the KEY column is column C):
dark_blue = writer.book.add_format({'bg_color': '#3A67B8'})
old_sheet = "\'" + "old_" + "sheet_name" + "\'"
for col in range(last_col):
col_name = xl_col_to_name(col)
if col_name in unformatted_cols: # Not apply the formatting to certain columns
continue
else:
apply_range = '{0}1:{0}1048576'.format(col_name)
formula = "XLOOKUP(C1, {1}!C1:C1048576, {1}!{0}1:{0}1048576) <> XLOOKUP(C1, C1:C1048576, {0}1:{0}1048576)".format(col_name, old_sheet)
active_sheet.conditional_format(apply_range, {'type': 'formula',
'criteria': formula,
'format': dark_blue})
Now, my problem is that when I open the output the this conditional formatting doesn't work. If however I'll go to Conditional Formatting -> Manage Rules -> Edit Rule and without any editing I'll press OK and later apply it starts working correctly.
Does anyone know how to make this rule work properly without this manual intervention?
My all other conditional formatting rules, though simpler, work exactly as intended.
# This is the formula that I see in Python for the first loop iteration
=XLOOKUP(C1, 'old_sheet_name'!C1:C1048576, 'old_sheet_name'!A1:A1048576) <> XLOOKUP(C1, C1:C1048576, A1:A1048576)
# This formula I see in Excel for the same first column
=XLOOKUP(C1, 'old_sheet_name'!C:C, 'old_sheet_name'!A:A) <> XLOOKUP(C1, C:C, A:A)
The reason that XLOOKUPdoesn't work in your formula is that it is classified by Excel as a "Future Function", i.e, a function added after the original file format. In order to use it you need to prefix it with _xlfn.
This is explained in the XlsxWriter docs on Formulas added in Excel 2010 and later.
Here is a working example:
import xlsxwriter
workbook = xlsxwriter.Workbook('conditional_format.xlsx')
worksheet1 = workbook.add_worksheet('old_sheet_name')
worksheet2 = workbook.add_worksheet('new_sheet_name')
worksheet1.write(0, 0, 'Foo')
format1 = workbook.add_format({'bg_color': '#C6EFCE',
'font_color': '#006100'})
xlookup_formula = '=_xlfn.XLOOKUP(C1, old_sheet_name!C:C, old_sheet_name!A:A) <> _xlfn.XLOOKUP(C1, C:C, A:A)'
worksheet2.conditional_format('D1:D10',
{'type': 'formula',
'criteria': xlookup_formula,
'format': format1})
workbook.close()
Output:
I have solved this problem in Python, but I would like it in VBA so that anybody can just cut/paste it into their workbooks and use it, since most of the people I work alongside are not Python-literate, or even novices in the most liberal sense of the word when it comes to programming.
The columns of interest (see below) are B, C, D. Column B represents levels of separation away from the top order (Column A). If any value in Col B is 1, Col D of that row == A (John). When BX > 1 (X is any row number), DX takes the value of the first row in B that is exactly 1 less than BX. For instance, D4 == B3, but D8 == B7, B9 == B6 and so on.
In python, I solved it like this (I know it's not very elegant):
import pandas as pd
df = pd.read_csv('workbook.csv', header=0)
levels = df['col2'].to_list()
child = df['col3'].to_list()
parent = []
length = len(levels)
indexing_container = {}
for idx in range(length):
if levels[idx]==1:
parents.append('John')
elif levels[idx]>1:
index_container.update({str(levels[idx]):child[idx]})
parents.append(index_container[str(levels[idx]-1)])
df['Parents'] = parents
This works great, the problem is I don't know VBA. Reading the docs in the meantime, but not sure how it will go. How do I write this in a VBA script?
We use Office 2019, if that makes a difference.
The following code is rather simple: It created an array of names. The column B gives the index into the array, column C the name.
The code loops over all rows, fills the name into the array and then looks for the name of the previous index.
Sub xy()
Const MaxNames = 10
Dim wb As Workbook, lastRow As Long
Set wb = Workbooks.Open("workbook.csv")
Dim names(1 To MaxNames) As String
With wb.Sheets(1)
names(1) = .Cells(2, 1)
lastRow = .Cells(.Rows.Count, 1).End(xlUp).row
Dim row As Long
For row = 2 To lastRow
Dim index As Long
index = .Cells(row, 2)
names(index) = .Cells(row, 3)
If index > 1 Then index = index - 1
.Cells(row, 4) = names(index)
Next
End With
End Sub
Note that I simply open the CSV file as workbook. Depending on the content of your CSV file and the settings in your Excel, this might or might not work. If you have problems (eg a whole line of data is written into one cell), you need to figure out which method of reading the data fits your need - lots of information on SO and elsewhere about that.
If I have the following saved Excel document with 26007.930562 in every cell, where the column names represent the Excel formatting I am using for the given cell:
and I run the following Python code:
from openpyxl.utils import get_column_letter, column_index_from_string
import win32com.client # https://www.youtube.com/watch?v=rh039flfMto
ExcelApp = win32com.client.GetActiveObject('Excel.Application')
wb = ExcelApp.Workbooks('test.xlsx')
ws = wb.Worksheets(1)
excelRange = ws.Range(ws.Cells(1, 1), ws.Cells(2, 4))
listVals = [[*row] for row in excelRange.Value]
print(listVals)
I get the following output:
[['general', 'currency', 'accounting', 'number'], [26007.930562, Decimal('26007.9306'), Decimal('26007.9306'), 26007.930562]]
Notice how there is a loss of precision for the "currency" and "accounting" formats. They get turned into some decimal that rounds off several of the later decimal places.
Is it possible to read in currency and accounting formatted cells while still keeping full precision? If so, how?
This is what I mean when I say "currency formatting":
EDIT:
BigBen's solution works in this example. But if you have dates, Value2 doesn't treat them like dates which causes errors in Python where you intend to treat them like dates. I ended up having to write this instead:
listVals = [] # https://stackoverflow.com/a/71375004
for rowvalue, rowvalue2 in zip([[*row] for row in excelRange.Value], [[*row] for row in excelRange.Value2]):
rowlist = []
for value, value2 in zip(rowvalue, rowvalue2):
if type(value) == pywintypes.TimeType:
rowlist.append(value)
else:
rowlist.append(value2)
listVals.append(rowlist)
I'm sure there's a faster / more efficient way to do it than that but I don't know what it is.
Use .Value2 instead of .Value:
listVals = [[*row] for row in excelRange.Value2]
Result:
[['general', 'currency', 'accounting', 'number'], [26007.93056, 26007.93056, 26007.93056, 26007.93056]]
sencap.csv is a file that has a lot of columns that I don't need and I want to keep just some columns in order to start filtering it to analyze the information and do some graphs, which in this case it'll be a pie chart that aggregate energy quantities depending on its energy source. Everything works fine except the condition that asks to sum() only those rows which are less than 9.0 MW.
import pandas as pd
import matplotlib.pyplot as plt
aux = pd.read_csv('sencap.csv')
keep_col = ['subsistema','propietario','razon_social', 'estado',
'fecha_servicio_central', 'region_nombre', 'comuna_nombre',
'tipo_final', 'clasificacion', 'tipo_energia', 'potencia_neta_mw',
'ley_ernc', 'medio_generacion', 'distribuidora', 'punto_conexion',
]
c1 = aux['medio_generacion'] == 'PMGD'
c2 = aux['medio_generacion'] == 'PMG'
aux2 = aux[keep_col]
aux3 = aux2[c1 | c2]
for col in ['potencia_neta_mw']:
aux3[col] = pd.to_numeric(aux3[col].str.replace(',','.'))
c3 = aux3['potencia_neta_mw'] <= 9.0
aux4 = aux3[c3]
df = aux4.groupby(['tipo_final']).sum()
Warning:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
aux3[col] = pd.to_numeric(aux3[col].str.replace(',','.'))
aux3[col] = pd.to_numeric(aux3[col].str.replace(',','.'))
this line is the reason you are getting a warning.
Accessing the "col" using indexing may result in unpredictable behavior since it may return view or copy of original data.
it depends on the memory layout of the array, about which pandas makes no guarantees
pandas documentation advises users to use .loc instead.
Example:
In: df
Out:
one two
first second first second
0 a b c d
1 e f g h
2 i j k l
3 m n o p
dfmi.loc[:,('one','second')] = value
# becomes
dfmi.loc.__setitem__((slice(None), ('one', 'second')), value)
dfmi['one']['second'] = value
# becomes
dfmi.__getitem__('one').__setitem__('second', value)
In the second case __getitem__ is unpredictable. It may return view or copy of the data. Modifying a view and copy works differently.
Making change on copy is not reflected on the original data where as a change on a view does.
Note: So the warning is present to warn users, even if you get the expected output there is a chance it might causes some unpredictable behavior.
I would like to use the value calculated in the second, "for i in range" statement to calculate a new value using the fourth, "for i in range" statement; however, I receive the error: "could not convert string to float: 'E2*37.5'"
How do I call upon the numerical value calculated in, sheet['F{}',format(i)] ='E{}*37.5'.format(i) instead of the formula/string?
import openpyxl
wb = openpyxl.load_workbook('camdatatest.xlsx', read_only= False, data_only = True)
# Assuming you are working with Sheet1
sheet = wb['Sheet1']
for i in range(2,80):
sheet['D{}'.format(i)] = '=C{}/3'.format(i)
for i in range(2,80):
sheet['F{}'.format(i)] = '=E{}*37.5'.format(i)
for i in range(2,80):
sheet['H{}'.format(i)] = '=D{}*G2*50'.format(i)
for i in range(2,80):
sheet['I{}'.format(i)].value = float(sheet['F{}'.format(i)].value)/float(sheet['H{}'.format(i)].value)
wb.save('camdatatestoutput.xlsx' , data_only= True)
Unfortunately that's not quite possible because
openpyxl never evaluates formula
There are other libraries that may do so. However your problem can be overcome by recognizing that you can use yet another cell reference instead of the calculated value here.
for i in range(2,80):
sheet['I{}'.format(i)] = '=E{}*37.5/H{}'.format(i,i)
Note that you can't set the value for the Ix cell because you don't actually have a value for the Ex or Hx cells. (well you might have but it's not clear from your question if you do)