I am working on a script to help my boss modify .xlsx files that he gets. I'm trying to insert a VLOOKUP into every cell in a column, but I am running into an issue where some of the letters inside the parenthesis are being changed to lowercase.
This is the code I am using:
import openpyxl
wb = openpyxl.load_workbook('wb.xlsx')
ws = wb['Sheet1']
for row in ws['J1:J847']:
for cell in row:
cell.value = '=VLOOKUP(A{0}, Collection.A:G,7,0)'.format(cell.row)
wb.save('test.xlsx')
The output in the spreadsheet is:
=VLOOKUP(A1, collection.a:g,6,0)
I need it to look like:
=VLOOKUP(A1, Collection.a:g,6,0)
or even better:
=VLOOKUP(A1, Collection.A:G,6,0)
I have checked to ensure that the string is being formatted correctly. What I find most confusing is that not all of the uppercase characters are being switched. What am I doing wrong and what is happening under the covers to cause something like this?
I had a similar problem and it was fixed by replacing the period by an exclamation point.
=VLOOKUP(A{0}, Collection!A:G,7,0)
Related
Using openpyxl we can properly check if a cell is fully bold/not bold, but we cannot work with richtext so having two words, one bolded and one not, will make the check fail.
This can be done correctly with xlrd, but it doesn't support xlsx files. Converting from xlsx to xls is risky, especially in my use case, since I have a big file with many languages and I think i could lose information.
How can I check if cells substrings are bold inside a xlsx file?
TL;DR: not yet, but upcoming openpyxl v3.1 will be able to satisfy your request.
I took a quick tour through the late 2022 state of python-excel affair with respect to this very feature, which relies on being able to manage Rich Text objects as cell contents:
pylightxl was new to me, so I quickly browsed the project for a bit. But as the name indicates, it is geared towards a small, maintainable feature set. Only cell values are in scope, formatting is intentionally skipped.
xlrd supports rich text, though only legacy .xls, as you pointed out in your question already.
xlsxwriter luckily is able to construct cells containing rich text with mixed formatting (example), but unfortunately only deals with writing files.
openpyxl finally currently does not support rich text... but: with the upcoming release 3.1, it will, thanks to merge request !409 that was completed earlier this year.
example, using openpyxl 3.0.10:
have this sample:
then use:
import openpyxl
test_file = "test_spreadsheet.xlsx"
obj = openpyxl.load_workbook(test_file)
sheet_ref = obj.active
cell1 = sheet_ref.cell(row = 3, column = 2)
cell_font1 = cell1.font
print(cell_font1.b)
cell = sheet_ref.cell(row = 4, column = 2)
cell_font = cell.font
print(cell_font.b)
result:
$ python test.py
False
True
so you could build it yourself:
def is_text_in_cell_bold(cell_obj):
if (cell_obj.font.b):
return True
return False
I am trying to find a solution how to substitute the following:
worksheet = writer.sheets['Overview']
worksheet.write_formula('C4', '=MIN('Sheet_147_mB'!C2:C325)')
with something like:
for s in sheet_names:
worksheet.write_formula(row, col, '=MIN(s +'!C2:C325')')
row+=1
to iterate through all the existing sheets in the current xlsx book and write the function to the current sheet having an overview.
After spending some hours I was not able to find any solution therefore it would be hihgly appriciated if someone could point me to any direction. Thank you!
You don't give the error message, but it looks like the problem is with your quoting - you can't nest single quotes like this: '=MIN(s +'!C2:C325')'), and your quotes aren't in the right places. After fixing those problems, your code looks like this:
for s in sheet_names:
worksheet.write_formula(row, col, "=MIN('" + s +"'!C2:C325)")
row+=1
The single quotes are now nested in double quotes (they could also have been escaped, but that's ugly), and the sheet name is enclosed in single quotes, which protects special characters (e.g. spaces).
I have looked into many stackoverflow questions but none of them seemed to solve my problem. I am using Python and Openpyxl to fill a whole row with red given a certain condition. I did all the importations necessary :
from openpyxl.styles import PatternFill, NamedStyle, Color
from openpyxl.styles.colors import RED
And my code is the following :
for cell in sheet[i]:
cell.style = NamedStyle(fill=PatternFill(patternType='solid',
fill_type='solid',
fgColor=Color(RED)))
When I ask to print the first occurence of cell it gives me
<Cell 'Divers'.A4>
which is what I am looking for.
However, the following error comes every time : "Style Normal exists already". There is absolutely no cell formatting or style whatsoever in the rest of the code but the Excel file cells are indeed filled with yellow already.
Any idea on how to solve this ? Thanks in advance for any help.
If using a NamedStyle, you're required to pass a name.
red_foreground = NamedStyle(
name="RedForeground",
fill=PatternFill(
patternType='solid',
fill_type='solid',
fgColor=Color(RED)
)
)
Since you're assigning this NamedStyle to more than one cell, it makes sense to register it to your workbook.
wb.add_named_style(red_foreground)
Then you can update it's application to cells, like so:
for cell in sheet[i]:
cell.style = "RedForeground"
Reference:
Creating NamedStyle
NamedStyle Constructor
I also have this problem, and finally found that it was because there were 2 styles, of which had the same name. This is usually caused when you use copy.copy(style). Then after change one of the style.name = 'newname', it will work.
This code would solve already existing named styles.
for index,cur_style in enumerate(excel_workbook._named_styles):
if cur_style.name == 'my_new_style':
excel_workbook._named_styles[index] = my_new_style
my_new_style.bind(excel_workbook)
break
else:
excel_workbook.add_named_style(my_new_style)
However, in your case, you should use some other name than "Normal", because "Normal" is the default named style, just find another name and you can use the code I pasted
There is another way to solve traceback by adding existing styles:
if not 'Style_A' in wb.named_styles:
wb.add_named_style(Style_A)
What I am trying to do: There is a large excel sheet with a lot haphazard customer information. I want to sort the email address and other data in a set format in a new excel file.
I can't quite figure out how to match the cell text(which will have some format like Address Email squished togethe and similar) with the regex and to keep only the regex data in a list.
Would really appreciate some help. Thanks
import sys, os, openpyxl
def sort_email_from_xl():
sheet = sheet_select() #Opens the worksheet
emailRegex = re.compile(r'''([a-zA-Z0-9._%+-]+#+[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,4}))''',re.VERBOSE)
customeremails = []
for row in range(0, max_row):
if cell.text == emailRegex:
mail = cell.text
customeremails.append(mail)
return customeremails
print(customeremails)
This code should work (I could only test the regex part though):
import sys, os, openpyxl
def sort_email_from_xl():
sheet = sheet_select() #Opens the worksheet
emailRegex = re.compile(".*?([a-zA-Z0-9\._%+\-]+#[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,4}).*?")
customeremails = []
for row in range(0, max_row):
if emailRegex.match(cell.text):
mail = emailRegex.match(cell.text).groups()[0]
cell.text = mail
customeremails.append(mail)
print(customeremails)
There were many problems with your code. First about the regex:
the regex was not allowing text around your email address, added that with .*? at start and end
you don't need the re.VERBOSE part as you'd only need it if you want to add inline comments to your regex, see doc
you allowed email addresses with many # in between
you matched the TLD separately, that's unneeded
Now, the email regex works for basic usage, but I'd definitively recommend to take a proven email regex from other answers on Stackoverflow.
Then: with emailRegex.match(cell.text) you can check if the cell.text matches your regex and with emailRegex.match(cell.text).groups()[0] you extract only the matching part. You had one return statement too much as well.
For some reason the above code is giving me a NameError: name 'max_row' is not defined
You need to correct the looping through the rows e.g. like documented here
After you do:
sheet.write(0, 1, 'whatevs')
Is it still possible to adjust the style of cell 0,1. The reason I am asking is that I have a list of errors I loop and I would like to color all the cells red that have an error.
I could do it when I write the cell but it will make my code a little more complex.
There is no API exposed to do it, you could look at the source code and come up with a way:
rows = ws.get_rows()
rows[0]._Row_cells[0].xf_idx = styleindex
You can get the styleindex by adding a style you created.
style0 = xlwt.easyxf('font: name Times New Roman, color-index red, bold on',
num_format_str='#,##0.00')
styleindex = wb.add_style(style0)
wb is a Workbook object and ws your Worksheet.
Beware: this isn't meant to be done this way, but I couldn't find another.