xlsxwriter - grouping via set_row() - python

I tried to set some groups in xlsxwriter but it seems I can't get the + Symbol at the top of my group. As it is setting the grouping row-wise I tried to write a function that takes the workbook, the worksheet index and a start/end as well as the level. But whatever I do the grouping symbol in excel never appears in the first row. Funny thing is, if I use something like start_row + 5 for the collapsing row I will get a second + at the right row, but still another one at the end. Does anyone know if its possible?
The examples show grouping just for the last row of a group.
def set_group(out_wb, ws_index, start_row, end_row, level):
#added as i used an offset
start_row = start_row
end_row = end_row
out_wb.worksheets()[ws_index].set_row(start_row, None, None, {'level': level, 'collapsed': True})
for i in range(start_row + 1, end_row):
out_wb.worksheets()[ws_index].set_row(i, None, None, {'level': level, 'hidden': True})
return out_wb

You can use the outline_settings() worksheet method and set symbols_below to False:
worksheet.outline_settings(symbols_below=False)

I'm with xlsxwriter in version 0.5.5 and the command line is
worksheet.outline_settings(outline_below=False)

Related

xlsxwriter conditional format works only after manually applying it

let me describe my issue below:
I've got got two excel worksheets, one containing past, the other - current data. They both have the following structure:
Col_1
Col_2
KEY
Col_3
Etc.
abc
xyz
key_1
foo
---
def
zyx
key_2
bar
---
Now, the goal is to check if a value for given key changed between the past and current iteration and if yes, color the given cell's background (in current data worksheet). This check has to be done for all the columns.
As the KEY column is not the very first one, I've decided to use XLOOKUP function and apply the formatting within the for loop. The full loop looks like this (in this example the KEY column is column C):
dark_blue = writer.book.add_format({'bg_color': '#3A67B8'})
old_sheet = "\'" + "old_" + "sheet_name" + "\'"
for col in range(last_col):
col_name = xl_col_to_name(col)
if col_name in unformatted_cols: # Not apply the formatting to certain columns
continue
else:
apply_range = '{0}1:{0}1048576'.format(col_name)
formula = "XLOOKUP(C1, {1}!C1:C1048576, {1}!{0}1:{0}1048576) <> XLOOKUP(C1, C1:C1048576, {0}1:{0}1048576)".format(col_name, old_sheet)
active_sheet.conditional_format(apply_range, {'type': 'formula',
'criteria': formula,
'format': dark_blue})
Now, my problem is that when I open the output the this conditional formatting doesn't work. If however I'll go to Conditional Formatting -> Manage Rules -> Edit Rule and without any editing I'll press OK and later apply it starts working correctly.
Does anyone know how to make this rule work properly without this manual intervention?
My all other conditional formatting rules, though simpler, work exactly as intended.
# This is the formula that I see in Python for the first loop iteration
=XLOOKUP(C1, 'old_sheet_name'!C1:C1048576, 'old_sheet_name'!A1:A1048576) <> XLOOKUP(C1, C1:C1048576, A1:A1048576)
# This formula I see in Excel for the same first column
=XLOOKUP(C1, 'old_sheet_name'!C:C, 'old_sheet_name'!A:A) <> XLOOKUP(C1, C:C, A:A)
The reason that XLOOKUPdoesn't work in your formula is that it is classified by Excel as a "Future Function", i.e, a function added after the original file format. In order to use it you need to prefix it with _xlfn.
This is explained in the XlsxWriter docs on Formulas added in Excel 2010 and later.
Here is a working example:
import xlsxwriter
workbook = xlsxwriter.Workbook('conditional_format.xlsx')
worksheet1 = workbook.add_worksheet('old_sheet_name')
worksheet2 = workbook.add_worksheet('new_sheet_name')
worksheet1.write(0, 0, 'Foo')
format1 = workbook.add_format({'bg_color': '#C6EFCE',
'font_color': '#006100'})
xlookup_formula = '=_xlfn.XLOOKUP(C1, old_sheet_name!C:C, old_sheet_name!A:A) <> _xlfn.XLOOKUP(C1, C:C, A:A)'
worksheet2.conditional_format('D1:D10',
{'type': 'formula',
'criteria': xlookup_formula,
'format': format1})
workbook.close()
Output:

Using gspread, trying to add a column at the end of Google Sheet that already exists

Here is the code I am working with.
dfs=dfs[['Reserved']] #the column that I need to insert
dfs=dfs.applymap(str) #json did not accept the nan so needed to convert
sh=gc.open_by_key('KEY') #would open the google sheet
sh_dfs=sh.get_worksheet(0) #getting the worksheet
sh_dfs.insert_rows(dfs.values.tolist()) #inserts the dfs into the new worksheet
Running this code would insert the rows at the first column of the worksheet but what I am trying to accomplish is adding/inserting the column at the very last, column p.
In your situation, how about the following modification? In this modification, at first, the maximum column is retrieved. And, the column number is converted to the column letter, and the values are put to the next column of the last column.
From:
sh_dfs.insert_rows(dfs.values.tolist())
To:
# Ref: https://stackoverflow.com/a/23862195
def colnum_string(n):
string = ""
while n > 0:
n, remainder = divmod(n - 1, 26)
string = chr(65 + remainder) + string
return string
values = sh_dfs.get_all_values()
col = colnum_string(max([len(r) for r in values]) + 1)
sh_dfs.update(col + '1', dfs.values.tolist(), value_input_option='USER_ENTERED')
Note:
If an error like exceeds grid limits occurs, please insert the blank column.
Reference:
update

How to change a certain font color in a string using Xlsxwriter?

I want to change a certain text color in the string using xlsxwriter.
My thought was to replace non-colored text with colored text.
But it failed...
The result shows "TypeError: 'Format' object cannot be interpreted as an integer"
It seems like f"{wrong}",cell_format) is a integer.
It's odd, because what else can we change single font color among string if we cannot use replace() to do so?
My output is :
It should be :
My code:
import xlsxwriter
from functools import partial
def x_in_y(word, inner):
return inner in word
workbook = xlsxwriter.Workbook('C:\\Users\\Claude\\Desktop\\hello.xlsx')
worksheet = workbook.add_worksheet()
cell_format = workbook.add_format()
cell_format.set_font_color('red')
words = [
('pasport','passport'),
('limmit','limit'),
('putt','put')
]
sentence =['putt dweqrerwr','dfsdf putt','limmit','pasport']
row = 0
for wrong,correct in words:
filtered_names = filter(partial(x_in_y, inner=wrong), sentence)
next_elem = next(filtered_names, None)
if next_elem:
worksheet.write(row,0, f"Typo: {wrong} 'should be {correct}'")
worksheet.write(row+1,0,next_elem.replace(wrong, f"{wrong}",cell_format))
for name in filtered_names:
worksheet.write(row+2,0,name)
row += 2
workbook.close()
So i had a similar case in my work and i thought that it was not possible to partially format a string, let alone according to some specific conditions like in your case. I saw your post and the reply by the amazing John Mcnamara and i decided to give it a try using the rich string method (i really doubt if there is another way).
Firstly let me mention that i was able to achieve it using pandas and xlsxwriter. Secondly, for loops should be avoided with pandas and xlsxwriter (because the more rows a file has the longer it takes for the program to finish) but i was not able to achieve it differently. You need to apply some error handling there because if the index value does not exist it will raise a value error. Finally i did not include the case where a cell contains more than one wrong word and we need to format all of them.
This is how i would do it:
import pandas as pd
# Create your dataframe
df = pd.DataFrame(data={'A': ["Typo: pasport 'should be passport'", 'pasport',
"Typo: limmit 'should be limit'", 'limmit',
"Typo: putt 'should be put'", 'putt dweqrerwr',
'dfsdf putt']})
# Create a list with the words that are wrong
wrong_words = ['pasport', 'limmit', 'putt']
# Kickstart the xlsxwriter
writer = pd.ExcelWriter('Testing rich strings.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Define the red format and a default format
cell_format_red = workbook.add_format({'font_color': 'red'})
cell_format_default = workbook.add_format({'bold': False})
# Start iterating through the rows and through all of the words in the list
for row in range(0,df.shape[0]):
for word in wrong_words:
try:
# 1st case, wrong word is at the start and there is additional text
if (df.iloc[row,0].index(word) == 0) \
and (len(df.iloc[row,0]) != len(word)):
worksheet.write_rich_string(row, 0, cell_format_red, word,
cell_format_default,
df.iloc[row,0][len(word):])
# 2nd case, wrong word is at the middle of the string
elif (df.iloc[row,0].index(word) > 0) \
and (df.iloc[row,0].index(word) != len(df.iloc[row,0])-len(word)) \
and ('Typo:' not in df.iloc[row,0]):
starting_point = df.iloc[row,0].index(word)
worksheet.write_rich_string(row, 0, cell_format_default,
df.iloc[row,0][0:starting_point],
cell_format_red, word, cell_format_default,
df.iloc[row,0][starting_point+len(word):])
# 3rd case, wrong word is at the end of the string
elif (df.iloc[row,0].index(word) > 0) \
and (df.iloc[row,0].index(word) == len(df.iloc[row,0])-len(word)):
starting_point = df.iloc[row,0].index(word)
worksheet.write_rich_string(row, 0, cell_format_default,
df.iloc[row,0][0:starting_point],
cell_format_red, word)
# 4th case, wrong word is the only one in the string
elif (df.iloc[row,0].index(word) == 0) \
and (len(df.iloc[row,0]) == len(word)):
worksheet.write(row, 0, word, cell_format_red)
except ValueError:
continue
writer.save()
Final output identical to your desired output:
I hope that this helps.

How to format a range of cells in python using xlsxwriter

How do I write a format to a range of cells.
What I am doing is looping over the column names in a list from oracle, and formatting the columns as dates, where the column name starts with "DT". But I also want to make the entire data range have borders.
I would like to really apply the date format to the columns, and then separately apply the borders...but the last format applies wins, and the application of the borders overwrites the date formatting on the columns.
Ideally I want to blast the data range with borders, and then apply date formats to the date columns, while retaining the borders.
Can you select a range and then apply formatting or do range intersections as you can in VBA?
# Generate EXCEL File
xl_filename = "DQ_Valid_Status_Check.xlsx"
workbook = xlsxwriter.Workbook(xl_filename)
# Add a bold format to use to highlight cells.
bold = workbook.add_format({'bold': True})
date_format = workbook.add_format(
{'num_format': 'dd-mmm-yyyy hh:mm:ss'})
border = workbook.add_format()
border.set_bottom()
border.set_top()
border.set_left()
border.set_right()
worksheet_info = workbook.add_worksheet()
worksheet_info.name = "Information"
worksheet_info.write('A1', 'Report Description:', bold)
worksheet_info.write('B1', 'ARIEL Data Quality Report for Checking Authorisation Status of Marketing Applications')
worksheet_info.write('A2', 'Report Date:', bold)
worksheet_info.write('B2', datetime.datetime.now(), date_format)
worksheet_data = workbook.add_worksheet()
worksheet_data.name = "DQ Report"
worksheet_data.write_row('A1', col_names)
for i in range(len(results)):
print("result " + str(i) + ' of' + str(len(results)))
print(results[i])
worksheet_data.write_row('A' + str(i + 2), results[i])
#worksheet_data.set_row(i + 2, None, border)
# add borders
for i in range(len(results)):
worksheet_data.set_row(i + 2, None, border)
# format date columns
for i in range(len(col_names)):
col_name = col_names[i]
if col_name.startswith("DT"):
print(col_name)
worksheet_data.set_column(i, i, None, date_format)
workbook.close()
According to the FAQ, it is not currently possible to format a range of cells at once, but a future feature might allow this.
You could create Format objects containing multiple format properties and apply your custom format to each cell as you write to it. See "Creating and using a Format Object".
To apply borders to all columns at once you can do something like:
border = workbook.add_format({'border':2})
worksheet_info.set_column(first_col=0, last_col=10, cell_format=border)
And to retain the border format you can modify your date_format to:
date_format = workbook.add_format(
{'num_format': 'dd-mmm-yyyy hh:mm:ss',
'border': 2})

R1C1 in openpyxl

I'm trying to set conditional formatting in openpyxl to emulate highlighting duplicate values. With this simple code, I should be able to highlight consecutive duplicates (but not the first value in a duplicate sequence).
from pandas import *
data = DataFrame({'a':'a a a b b b c b c a f'.split()})
wb = ExcelWriter('test.xlsx')
data.to_excel(wb)
ws = wb.sheets['Sheet1']
from openpyxl.style import Color, Fill
# Create fill
redFill = Fill()
redFill.start_color.index = 'FFEE1111'
redFill.end_color.index = 'FFEE1111'
redFill.fill_type = Fill.FILL_SOLID
ws.conditional_formatting.addCellIs("B1:B1048576", 'equal', "=R[1]C", True, wb.book, None, None, redFill)
wb.save()
However, when I open it in Excel I get an error related to conditional formatting, and the data is not highlighted as expected. Is openpyxl able to handle R1C1 style referencing?
In regards to highlighting to find duplicates of sequential values, the formula you want is
=AND(B1<>"",B2=B1)
With a range starting from B2 (aka, B2:B1048576)
Note - this appears to be broken in the current 1.8.3 branch of openpyxl, but will be fixed shortly in the 1.9 branch.
from openpyxl import Workbook
from openpyxl.style import Color, Fill
wb = Workbook()
ws = wb.active
ws['B1'] = 1
ws['B2'] = 2
ws['B3'] = 3
ws['B4'] = 3
ws['B5'] = 7
ws['B6'] = 4
ws['B7'] = 7
# Create fill
redFill = Fill()
redFill.start_color.index = 'FFEE1111'
redFill.end_color.index = 'FFEE1111'
redFill.fill_type = Fill.FILL_SOLID
dxfId = ws.conditional_formatting.addDxfStyle(wb, None, None, redFill)
ws.conditional_formatting.addCustomRule('B2:B1048576',
{'type': 'expression', 'dxfId': dxfId, 'formula': ['AND(B1<>"",B2=B1)']})
wb.save('test.xlsx')
As a further reference:
If you want to highlight all duplicates:
COUNTIF(B:B,B1)>1
If you want to highlight all duplicates except for the first occurence:
COUNTIF($B$2:$B2,B2)>1
If you to highlight sequential duplicates, except for the last one:
COUNTIF(B1:B2,B2)>1
Regarding RC notation - while openpyxl doesn't support excel RC notation, conditional formatting will write the formula as provided. Unfortunately, excel enables R1C1 notation only superficially as a flag, and converts all the formulas back to their A1 equivalent when saving, meaning you'd need a function to convert all R1C1 functions to their A1 equivalents for this to work.
Openpyxl doesn't support Excel RC notation.
You could use A1 notation instead which would mean that the equivalent formula is =B2 (I think).
However, you should verify that it actually works in Excel first.
My feeling is that it won't. In general conditional formatting uses absolute cell references $B$2 instead of relative cell references B1.
If it does work then convert your formula to A1 notation and that should work in Openpyxl.
You can't use R1C1 notation directly, and this answer would be a terrible way to format a range of cells, but OpenPyXL does allow you to use row and column numbers.
cell = ws.cell(r, c)
returns the worksheet cell at row r and column c, creating one if needed. Unlike the old xlrd/xlwt modules, row and column indices begin at 1, so you can read r and c directly off of a spreadsheet using the R1C1 reference style. For most purposes, you want to access .value, for example:
ws.cell(2, 3).value = 3
...
v = ws.cell(4, 5).value
It's not nearly as pretty as ws['R2C3'] = 3 or v = ws['R4C5'], but it helps with simple tasks.

Categories