openpyxl error: 'str' object has no attribute 'BLACK' - python

I am trying to set styles on an excel spreadsheet using pythons OPENPYXL module. I keep coming up with this error:
'str' object has no attribute 'BLACK'
Basically, my code reads a list of known values from a .xlsx file and places them into a python list. I use that list to compare the values in a column from an access table to make sure the values in each cell is correct as compared to the know values.
Where my script blows out is when I try to set styles using openpyxl. For some reason, the above error comes up. The wird thing is, I'm not even using BLACK in the styles anywhere and it seems to error out when I try to set the fill. In the SearchCursor portion of the script, it iterates through each row. It's on the second pass, that the script blows out. I have a feeling it wants to overwrite something, but I can't figure out what.
import openpyxl, arcpy
from arcpy import env
from openpyxl import Workbook
env.workspace = r"Z:\Access_Tables\Access_Table.gdb"
TableList = []
for row in arcpy.SearchCursor(r"Z:\Domains\Domains.xlsx\DOMAINS$"):
TableList.append(row.Code)
# Create workbook for report. Openpyxl
workbook = openpyxl.Workbook()
ws = workbook.get_active_sheet()
ws.title = "Test"
workbook.remove_sheet(ws)
# List out the access tables in the workspace
for fc in arcpy.ListFeatureClasses():
# Processing SOIL Point access table
if fc == "SOIL":
# List the column headings from the access table to be applied to the .xlsx table
fieldnames = [f.name for f in arcpy.ListFields(fc)]
# Create Sheet. Openpyxl
new_sheet = workbook.create_sheet(None,fc)
dictFieldnames = {}
for num,fname in enumerate(fieldnames):
dictFieldnames[num] = fname
# Write to cell, openpyxl
new_sheet.cell(None,0,num).value = fname
col_let = openpyxl.cell.get_column_letter(num + 1)
new_sheet.column_dimensions[col_let].width = len(fname) + 3
# Process SOIL Field
if "SOIL" in fieldnames:
# Set a counter and Loop through each row of the access table
x = 1
for row in arcpy.SearchCursor(fc):
for key, value in dictFieldnames.iteritems():
if value == "SOIL":
fieldKey = key
if not row.SOIL or len(row.SOIL.strip()) == 0:
# Openpyxl write. Set fill and color for cell. Write the unique id to the cell.
new_sheet.cell(None,x,fieldKey).style.fill.fill_type = openpyxl.style.Fill.FILL_SOLID
new_sheet.cell(None,x,fieldKey).style.fill.start_color.index = openpyxl.style.Color = 'FF808000'
new_sheet.cell(None,x,fieldKey).value = row.OBJECTID
x += 1
print 'first'
elif len(row.INCLUSION_TYPE) not in range(2,5):
# Openpyxl write. Set fill and color for cell. Write the unique id to the cell.
new_sheet.cell(None,x,fieldKey).style.fill.fill_type = openpyxl.style.Fill.FILL_SOLID
new_sheet.cell(None,x,fieldKey).style.fill.start_color.index = openpyxl.style.Color = 'FF2F4F4F'
new_sheet.cell(None,x,fieldKey).value = row.OBJECTID
x += 1
print 'second'
elif row.SOIL.upper() not in [y.upper() for y in TableList]:
# Openpyxl write. Set fill and color for cell. Write the unique id to the cell.
new_sheet.cell(None,x,fieldKey).style.fill.fill_type = openpyxl.style.Fill.FILL_SOLID
new_sheet.cell(None,x,fieldKey).style.fill.start_color.index = openpyxl.style.Color = 'FF00FFFF'
new_sheet.cell(None,x,fieldKey).value = row.OBJECTID
x += 1
print 'third'
print x

The problem is in lines there you are defining colors. Just assign the color to style.fill.start_color.index there. For example:
new_sheet.cell(None,x,fieldKey).style.fill.start_color.index = 'FF808000'
instead of:
new_sheet.cell(None,x,fieldKey).style.fill.start_color.index = openpyxl.style.Color = 'FF808000'

Related

Is there a way to trigger formatting in excel documents via Python

I am working on an automatic excel sheet comparison python script, but I am having trouble making my changes show up when I open the excel document. Basically, it does a cell level comparison and adds a red fill if the cell changed and a green fill if the cell did not change. I am using openpyxl to edit the value, and change the fill color. After the code runs, I open the excel file to see no changes. However, when I click on the cells themselves, I can then see the color fill change. I need help finding a solution that will automatically trigger that formatting to appear before I open the excel file. Does anyone have experience with this?
To run, create an excel file named 'test', then add values to rows in first column of Sheet1. Create a second Sheet2 and add the same values for half of first column, then change values for second half of first column. Save, close file, run. Then look for changes.
import pandas as pd
import openpyxl as op
from openpyxl.utils.cell import get_column_letter
from openpyxl.styles import PatternFill
def main():
old_tab = "Sheet1"
new_tab = "Sheet2"
# set up A1:A10 on Sheet1 all to be = 10
# ... then on Sheet2, have A1:A5 be = 10, and A6:A10 be = 20
path = './test.xlsx'
# set up list to track indices that should be highlighted red
cells_to_highlight_red = []
red_fill = PatternFill(fill_type=None, start_color='FFFFFFFF', end_color='FF0000')
# set up list to track indices that should be highlighted green
cells_to_highlight_green = []
green_fill = PatternFill(fill_type=None, start_color='FFFFFFFF', end_color='008000')
old_sheet = pd.read_excel(path, sheet_name=old_tab, data_only=True, header=None).fillna('-')
new_sheet = pd.read_excel(path, sheet_name=new_tab, data_only=True, header=None).fillna('-')
# do cell by cell comparison to see if cells have changed
bool_df = old_sheet.eq(new_sheet)
# go through each column
for col_index in range(bool_df.shape[1]):
# then through each row of the bool_df.
# ... if the cell is False, that means a change has occurred
# ... if the cell is not False, so True, that means no
for row_index, row in enumerate(bool_df.iloc[:,col_index].values.flatten()):
if row == False:
col_letter = get_column_letter(col_index+1)
trg_cell = col_letter + str(row_index+1)
trg_cell.replace(" ", "")
# if this is true, then there was no value to begin or end, so do not add to list to track
if old_sheet.iloc[row_index, col_index] == "-" and new_sheet.iloc[row_index, col_index] == "-":
continue
cells_to_highlight_red.append(trg_cell)
else:
col_letter = get_column_letter(col_index+1)
trg_cell = col_letter + str(row_index+1)
trg_cell.replace(" ", "")
# if this is true, then there was no value to begin or end, so do not add to list to track
if old_sheet.iloc[row_index, col_index] == "-" and new_sheet.iloc[row_index, col_index] == "-":
continue
cells_to_highlight_green.append(trg_cell)
target_workbook = op.load_workbook( filename=path )
target_sheet = target_workbook["Sheet2"]
for trg_col_row in cells_to_highlight_red:
cell_to_edit = target_sheet[trg_col_row]
cell_to_edit.fill = red_fill
for trg_col_row in cells_to_highlight_green:
cell_to_edit = target_sheet[trg_col_row]
cell_to_edit.fill = green_fill
target_workbook.save( path )
main()

How can I concatenate multiple rows of excel data into one?

I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.
So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.
I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.
I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.
import openpyxl
from openpyxl import Workbook
path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active
listab = []
rows = ws.max_row
columns = ws.max_column
for i in range (1, rows+1):
listab.append([])
cellValue = " "
prevCell = " "
for c in range (1, rows+1):
for r in range(1, columns+1):
cellValue = ws.cell(row=r, column=c).value
if cellValue == prevCell:
listab[r-1].append(prevCell)
elif cellValue == "NULL":
listab[r-1].append(prevCell)
elif cellValue != prevCell:
listab[r-1].append(cellValue)
prevCell = cellValue
for r in range(1, rows+1):
for c in range (1, columns+1):
j = ws2.cell(row = r, column=c)
j.value = listab[r-1][c-1]
print(listab)
wb2.save("output.xlsx")
There should be one line with the below information:
ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |
Personally I would go with pandas.
import pandas as pd
#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()
#Storing pd into a list
records_list = df_data.to_dict('records')
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = {}
#Go through unique ids
for key in unique_ids:
processed[key] = {}
#Get related records
matching_records = [y for y in records_list if y['ods_service_ids'] == key]
#Loop through records
for record in matching_records:
#For each key to check, save in dict if non null
processed[key]['ods_service_ids'] = key
for detail_key in keys_to_check:
if record[detail_key] != "NO DATA" :
processed[key][detail_key] = record[detail_key]
##Note : doesn't handle duplicate values for different keys so far
#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]
#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)
The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?
EDIT:
df could be exported in a different way. Replaced below #export to Excel with the following :
df.to_excel("output.xlsx",sheet_name='Sheet_name_1')
EDIT 2:
with no input data it was hard to see any flows. Corrected the code above with fake data
To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.
One approach that would suit would be to use Python dictionaries for each service, updating them row by row.
wb = load_workbook("sample.xlsx")
ws = wb.active
objs = {}
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
if row[0] not in objs:
obj = {key:value for key, value in zip(headers, row)}
objs[obj['ods_service_id']] = obj
else:# update dict with non-None values
extra = {key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"}
obj.update(extra)
# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
ws2.append([obj[key] for key in headers])
Note how you can do everything without using counters.

I want to update the last row in an excel spreadsheet, daily. OpenPyXl

The following is an excerpt from a function (whose remaining body has been excluded; nothing to do with this issue and has already been tested to make sure has no faults).
Objective: Get "val1a" (a dollar value acquired from another part of the function) and "t" to update daily to an excel spreadsheet.
Right now, I have them mapped to the A2 and B2 cells, respectively. I can't figure out how to make them populate the latest row, whenever the function is run. (A2:B2, A3:B3, and so on...)
t = date.today()
ts = datetime.time(datetime.now())
wb = load_workbook('val1a.xlsx')
sheet = wb.worksheets[0]
# grab the active worksheet
ws = wb.active
ws['A1'] = 'PRICE'
ws['B1'] = 'DATE'
ws['C1'] = 'FED'
ws['D1'] = 'CTD'
ws['A2'] = val1a
ws['B2'] = t
# Save the file
wb.save('a1 ' + str(t) + ".xlsx")
# how to read values in excel
read1 = ws['A2'].value
ws.append() always puts values in the next row of a spreadsheet.

is there a better way to use OpenPyXL's defined_names feature to return values from an Excel Named Range?

I have an Excel workbook that has a worksheet called 'Professional Staff'. On that sheet, there is a range of cells named 'ProStaff'. I retrieve a list of the values in those cells with this code:
import openpyxl
wb = openpyxl.load_workbook(filename='SOexample.xlsx', read_only=True)
#Get the ProStaff range values
ProStaffRange = wb.defined_names['ProStaff']
#returns a generator of (worksheet title, cell range) tuples
dests = ProStaffRange.destinations
#use generator to create a list of (sheet, cell) tuples
cells = []
for title, coord in dests:
ws = wb[title]
cells.append(ws[coord])
#Above was from the OpenPyXL website
#Below is my best attempt to retrieve the values from those cells
cellsStr = []
startChar = '.'
stopChar = '>'
for item in cells[0]:
itemStr = str(item)
cellsStr.append( (itemStr.split("'")[1].strip(), itemStr[itemStr.find(startChar)+1:itemStr.find(stopChar)]) )
for item in cellsStr:
print(wb[item[0]][item[1]].value)
The string manipulation I do takes something like:
(<ReadOnlyCell 'Professional Staff'.A1>,)
and turns it into:
('Professional Staff', 'A1')
It seems to me that there should be a way to work with the ReadOnlyCell items directly in order to retrieve their values, but I haven't been able to figure out how.
Try this, modified from something I saw elsewhere, it works for single-cell named ranges:
wb = load_workbook('filename.xlsx', data_only=True)
ws = wb['sheet_name']
val=ws[list(wb.defined_names['single_cell_named_range'].destinations)[0][1]].value
print(val)
I'm using Openpyxl 2.5.12.

Python Openpyxl Writing in a cell

I cannot seem to write any value in an excel sheet. I open two files at the same time. I want to copy a value from file 1 to file 2. it gives the error
File
"C:\Python34\lib\site-packages\openpyxl\writer\dump_worksheet.py", line 214, in removed_method
raise NotImplementedError
Only the line with the writing part gives an error. The function code is as follows
def data_input(size):
from openpyxl import load_workbook
wb1 = load_workbook('150318 load matching_Storage_v4.xlsm',data_only=True)
wb1s1 = wb1.get_sheet_by_name('Home load options')
from openpyxl import Workbook
wb2 = Workbook('Data',write_only=True)
wb2s1 = wb2.create_sheet(0)
wb2s1.title = "Consumption"
wb2s1.cell(row = 1, column = 1).value = 4 - this line gives the error
#what i have to write but block yet to test if i can write at all
'''i = 0
r = 0
while i < 8760:
d = wb2s1.cell(row = r, column = 1)
d.value = i
i = i + 0.25
r += 1'''
for i in range(4,35040):
cell_value1 = wb1s1.cell(row = i, column = (12+size)).value
print(cell_value1)
# cell_value1 = wb2s1.cell(row = i-3, column = 1)
wb2.save('Data.xlsx')
I tried all the different ways in the documentation but nothing works so far.
please help.
Thank you
You are creating a write-only workbook. At the name suggests, this is designed for streaming data to a workbook so some operations, such as looking up cells do not work. To add data you should use the append() method. If you do need to add formatting or comments to individual cells you can include a WriteOnlyCell in the iterable that you pass into append().

Categories