Outputted data into a xlxs sheet's row in sequence with openpyxl - python

I'm trying to write the data outputted h.mediacount to a column (C) in Sheet1.
I can't figure out how to iterate through to the next cell for the next output i.e writing h.mediacount to cell C2, looping and writing the next output to cell C3 etc.
Here is my code as it stands.
book = load_workbook(path)
sheet = book['Sheet1']
column_name = 'username'
for column_cell in sheet.iter_cols(1, sheet.max_column):
if column_cell[0].value == column_name:
B = 0
for data in column_cell[1:]:
htag = data.value
print(htag)
h = Hashtag.from_name(l.context, htag)
print(h.mediacount)
Please note the print(htag) and print(h.mediacount) are only there to demonstrate that the code works up to that point.
Update:
I've written this code out, however, it runs indefinitely without any errors, but also without any changes to the sheet. I am unable to see where it's going wrong as there are no errors.
column_name = 'username'
column_name2 = 'hashtags'
for column_cell in sheet.iter_cols(1, sheet.max_column):
if column_cell[0].value == column_name:
B = 0
for data in column_cell[1:]:
htag = data.value
h = Hashtag.from_name(l.context, htag)
if column_cell[0].value == column_name2:
C = 0
for cell in column_cell[1:]:
cell.value = h.mediacount
book.save('alpha list test.xlsx')
Update 2:
Tried adding print(h.mediacount) before
python if column_cell[0].value == column_name2:
and it loops through that flawlessly, must be an issue with the code underneath and writing to the workbook.

Related

Openpyxl: We found a problem with some content

I am getting the error message 'We found a problem with some content' opening a file I generated with openpyxl. The file is being generated by concatenating different xlsx files and adding additional formulas in further cells.
The problem is caused by a Formula with an if-condition I am writing into a cell (the second for loop is causing the excel error message).
That's the code:
import openpyxl as op
import glob
# Search for all xlsx files in directory and assign them to variable allfiles
allfiles = glob.glob('*.xlsx')
print('Following files are going to be included into the inventory: ' + str(allfiles))
# Create a workbook with a sheet called 'Input'
risk_inventory = op.load_workbook('./Report/Risikoinventar.xlsx', data_only = False)
input_sheet = risk_inventory['Input']
risk_inventory.remove(input_sheet)
input_sheet = risk_inventory.create_sheet()
input_sheet.title = 'Input'
r_maxrow = input_sheet.max_row + 1
# There is more code here which is not related to the problem
for i in range (2,r_maxrow):
if input_sheet.cell(row = i, column = 2).value == 'Top-Down':
input_sheet.cell(row = i, column = 20).value = '=IF(ISTEXT(H{}),0,IF(H{}<=1000000,1,IF(H{}<=2000000,2,IF(H{}<=4000000,3,IF(H{}<=8000000,4,IF(H{}>8000000,5,0))))))'.format(i,i,i,i,i,i)
elif input_sheet.cell(row = i, column = 2).value == 'Bottom-Up':
input_sheet.cell(row = i, column = 20).value = '=IF(ISTEXT(H{}),0,IF(H{}<=1000000,1,IF(H{}<=2000000,2,IF(H{}<=4000000,3,IF(H{}<=8000000,4,IF(H{}>8000000,5,0))))))'.format(i,i,i,i,i,i)
for i in range (2,r_maxrow):
if input_sheet.cell(row = i, column = 2).value == 'Top-Down':
input_sheet.cell(row = i, column = 21).value = '=IF(K{}="Sehr gering",1,IF(K{}="Gering",2,IF(K{}="Mittel",3,IF(K{}="Hoc",3,IF(K{}="Sehr hoch",3,0))))))'.format(i,i,i,i,i,i)
elif input_sheet.cell(row = i, column = 2).value == 'Bottom-Up':
input_sheet.cell(row = i, column = 21).value = '=IF(K{}="Sehr gering",1,IF(K{}="Gering",2,IF(K{}="Mittel",3,IF(K{}="Hoc",3,IF(K{}="Sehr hoch",3,0))))))'.format(i,i,i,i,i,i)
So depending on what information is in cell(row = i, column = 2) I want a specific formula in cell(row = i, column = 21). The first for loop works perfectly, second for loop causes the error message in excel and the formulas are not being pasted in)
As you probably already see I am trying to code with Python for a week an have never ever tried coding beforeā€¦
Many thanks in advance!
I've been having the same issue, and it was due to an incorrectly written formula. I found what was wrong by clicking "View" instead of "Delete" when opening the file.

How can I concatenate multiple rows of excel data into one?

I'm currently facing an issue where I need to bring all of the data shown in the images below into one line only.
So using Python and Openpyxl, I tried to write a parsing script that reads the line and only copies when values are non-null or non-identical, into a new workbook.
I get out of range errors, and the code does not keep just the data I want. I've spent multiple hours on it, so I thought I would ask here to see if I can get unstuck.
I've read some documentation on Openpyxl and about making lists in python, tried a couple of videos on youtube, but none of them did exactly what I was trying to achieve.
import openpyxl
from openpyxl import Workbook
path = "sample.xlsx"
wb = openpyxl.load_workbook(path)
ws = wb.active
path2 = "output.xlsx"
wb2 = Workbook()
ws2 = wb2.active
listab = []
rows = ws.max_row
columns = ws.max_column
for i in range (1, rows+1):
listab.append([])
cellValue = " "
prevCell = " "
for c in range (1, rows+1):
for r in range(1, columns+1):
cellValue = ws.cell(row=r, column=c).value
if cellValue == prevCell:
listab[r-1].append(prevCell)
elif cellValue == "NULL":
listab[r-1].append(prevCell)
elif cellValue != prevCell:
listab[r-1].append(cellValue)
prevCell = cellValue
for r in range(1, rows+1):
for c in range (1, columns+1):
j = ws2.cell(row = r, column=c)
j.value = listab[r-1][c-1]
print(listab)
wb2.save("output.xlsx")
There should be one line with the below information:
ods_service_id | service_name| service_plan_name| CPU | RAM | NIC | DRIVE |
Personally I would go with pandas.
import pandas as pd
#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True) ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()
#Storing pd into a list
records_list = df_data.to_dict('records')
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = {}
#Go through unique ids
for key in unique_ids:
processed[key] = {}
#Get related records
matching_records = [y for y in records_list if y['ods_service_ids'] == key]
#Loop through records
for record in matching_records:
#For each key to check, save in dict if non null
processed[key]['ods_service_ids'] = key
for detail_key in keys_to_check:
if record[detail_key] != "NO DATA" :
processed[key][detail_key] = record[detail_key]
##Note : doesn't handle duplicate values for different keys so far
#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]
#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)
The above should work but I wasn't really sure on how you wanted to save duplicated records for the same id. Do you look to store them as DRIVE_0, DRIVE_1, DRIVE_2 ?
EDIT:
df could be exported in a different way. Replaced below #export to Excel with the following :
df.to_excel("output.xlsx",sheet_name='Sheet_name_1')
EDIT 2:
with no input data it was hard to see any flows. Corrected the code above with fake data
To be honest, I think you've managed to get confused by data structures and come up with something far more complicated than you need.
One approach that would suit would be to use Python dictionaries for each service, updating them row by row.
wb = load_workbook("sample.xlsx")
ws = wb.active
objs = {}
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
if row[0] not in objs:
obj = {key:value for key, value in zip(headers, row)}
objs[obj['ods_service_id']] = obj
else:# update dict with non-None values
extra = {key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"}
obj.update(extra)
# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
ws2.append([obj[key] for key in headers])
Note how you can do everything without using counters.

Faster search method on the first empty cell in a column using openpyxl PYTHON 3.5

I am having a problem in searching a the first empty cell in a certain column
on a 40k lines .xlsx file. As the search goes farther, it becoming slower and slower. Is there a faster/instant search method in searching the first empty cell on a column?
wb = load_workbook(filename = dest_filename,read_only=True)
sheet_ranges1 = wb[name]
i = 1
x = 0
sam = 0
cc = 0
brgyst =Street+Brgy
entrylist = [TotalNoConfig,TotalNoChannel,Rsl,Mode,RslNo,Year,IssuedDate,Carrier,CaseNo,Site,brgyst,Municipality,Province,Region,Longitude1,Longitude2,Longitude3,Latitude1,Latitude2,Latitude3,ConvertedLong,ConvertedLat,License,Cos,NoS,CallSign,PTSVC,PTSVCCS,Tx,Rx] #The values to be inputted in the entire row after searching the last empty cell in column J
listX1 = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N', 'O','P','Q','T','U','V','R','X','Y','Z','AA','AB','AM','AN','AP','FL'] #The columns in the file
eter = 0
while(x != 1):
cellS = 'J'+str(i) #until there is no empty cell
if(sheet_ranges1[cellS].value is None): #if found empty cell, insert the values
x=1
book = load_workbook(filename = dest_filename)
sheet = book[name]
rangeof = int(len(entrylist))
while(cc<rangeof):
cells = listX1[cc]+str(i)
sheet[cells]= entrylist[cc]
cc=cc+1
else:
x=0
sam = sam+1
i=i+1
wb.save(dest_filename)
wb.close()
In read-only mode every cell lookup causes the worksheet to parsed again so you should always use ws.iter_rows() for your work.

Pyexcel doesn't manipulate the cells I tell it to

I'm working with pyexcel to automatically open a excelsheet, manipulate some data in it and save it again.
However it only manipulates the first command and seems to ignore the others.
I access my file, with
book = pyexcel.get_book(file_name=file_to_be_manipulated)
whereas file_to_be_manipulated holds the link to the file
then I have my sheets in a tuple like
sheets = ('first_sheet', 'second_sheet', etc.)
and access them via
sheet_name = book[sheets[sheet_index]]
to iterate over the cells I want to manipulate I access the cells like
HERE everything works, I iterate over the second column aslong as there is something in it and 'delete' everything that is in the first two columns.
This works perfectly fine.
row = 5
column = 2
column_to_be_deleted = 0
second_column_to_be_deleted = 1
sheet_name = book[sheets[sheet_index]]
while sheet_name[row,column] != None:
row_to_be_deleted = row
second_row_to_be_deleted = row
sheet_name[row_to_be_deleted, column_to_be_deleted] = ""
sheet_name[second_row_to_be_deleted, second_column_to_be_deleted] = ""
row += 1
HOWEVER here strangely I just want to manipulate columns 2 and 3 from 'empty' to 'Default' and 'x'
but this doesn't work. The 'delete' in the first column works fine but the other two manipulations won't work and I can't figure out why.
row = 5
column = 1
column_to_be_deleted = 0
column_to_set_to_default = 2
column_to_set_to_something = 3
sheet_name = book[sheets[sheet_index]]
while sheet_name[row,column] != None:
row_to_be_deleted = row
row_to_set_to_default = row
row_to_set_to_something = row
sheet_name[row_to_be_deleted, column_to_be_deleted] = ""
sheet_name[row_to_set_to_default, column_to_set_to_default] = "Default"
sheet_name[row_to_set_to_something, column_to_set_to_something] = "x"
row += 1
It just will work if some string already is inside the columns 2 and 3, then it works fine.
HOWEVER here I want to change the value of column 11 row 5 to '1' and just delete the first column like in the other examples. Here the deletion works fine as well but the '0' in column 11 row 5 won't change to '1'
if sheet_index == 13: #ORGANISATIONS SHEET, L6 MUST BE SET TO 1
row = 5
column = 1
column_to_be_deleted = 0
column_to_set_to_one = 11
row_to_set_to_one = 5
sheet_name = book[sheets[sheet_index]]
sheet_name[row_to_set_to_one, column_to_set_to_one] = "1"
while sheet_name[row,column] != None:
row_to_be_deleted = row
sheet_name[row_to_be_deleted, column_to_be_deleted] = ""
row += 1
How comes this, it seems so random to me which command is executed and which not.
The problem seems to be with how pyexcel looks at Excel-Sheets.
Pyexcel first looks how big the sheet is with where the last data entry is.
Then it creates an array this big, and if you want to manipulate data outside this array it doesn't throw an error, but simply doesn't do what it was asked for.
So if you want to manipulate data in a column where no data is filled in yet you either have to create this column (how to do that see the readthedocs from pyexcel) or manually input some data in it first.

openpyxl error: 'str' object has no attribute 'BLACK'

I am trying to set styles on an excel spreadsheet using pythons OPENPYXL module. I keep coming up with this error:
'str' object has no attribute 'BLACK'
Basically, my code reads a list of known values from a .xlsx file and places them into a python list. I use that list to compare the values in a column from an access table to make sure the values in each cell is correct as compared to the know values.
Where my script blows out is when I try to set styles using openpyxl. For some reason, the above error comes up. The wird thing is, I'm not even using BLACK in the styles anywhere and it seems to error out when I try to set the fill. In the SearchCursor portion of the script, it iterates through each row. It's on the second pass, that the script blows out. I have a feeling it wants to overwrite something, but I can't figure out what.
import openpyxl, arcpy
from arcpy import env
from openpyxl import Workbook
env.workspace = r"Z:\Access_Tables\Access_Table.gdb"
TableList = []
for row in arcpy.SearchCursor(r"Z:\Domains\Domains.xlsx\DOMAINS$"):
TableList.append(row.Code)
# Create workbook for report. Openpyxl
workbook = openpyxl.Workbook()
ws = workbook.get_active_sheet()
ws.title = "Test"
workbook.remove_sheet(ws)
# List out the access tables in the workspace
for fc in arcpy.ListFeatureClasses():
# Processing SOIL Point access table
if fc == "SOIL":
# List the column headings from the access table to be applied to the .xlsx table
fieldnames = [f.name for f in arcpy.ListFields(fc)]
# Create Sheet. Openpyxl
new_sheet = workbook.create_sheet(None,fc)
dictFieldnames = {}
for num,fname in enumerate(fieldnames):
dictFieldnames[num] = fname
# Write to cell, openpyxl
new_sheet.cell(None,0,num).value = fname
col_let = openpyxl.cell.get_column_letter(num + 1)
new_sheet.column_dimensions[col_let].width = len(fname) + 3
# Process SOIL Field
if "SOIL" in fieldnames:
# Set a counter and Loop through each row of the access table
x = 1
for row in arcpy.SearchCursor(fc):
for key, value in dictFieldnames.iteritems():
if value == "SOIL":
fieldKey = key
if not row.SOIL or len(row.SOIL.strip()) == 0:
# Openpyxl write. Set fill and color for cell. Write the unique id to the cell.
new_sheet.cell(None,x,fieldKey).style.fill.fill_type = openpyxl.style.Fill.FILL_SOLID
new_sheet.cell(None,x,fieldKey).style.fill.start_color.index = openpyxl.style.Color = 'FF808000'
new_sheet.cell(None,x,fieldKey).value = row.OBJECTID
x += 1
print 'first'
elif len(row.INCLUSION_TYPE) not in range(2,5):
# Openpyxl write. Set fill and color for cell. Write the unique id to the cell.
new_sheet.cell(None,x,fieldKey).style.fill.fill_type = openpyxl.style.Fill.FILL_SOLID
new_sheet.cell(None,x,fieldKey).style.fill.start_color.index = openpyxl.style.Color = 'FF2F4F4F'
new_sheet.cell(None,x,fieldKey).value = row.OBJECTID
x += 1
print 'second'
elif row.SOIL.upper() not in [y.upper() for y in TableList]:
# Openpyxl write. Set fill and color for cell. Write the unique id to the cell.
new_sheet.cell(None,x,fieldKey).style.fill.fill_type = openpyxl.style.Fill.FILL_SOLID
new_sheet.cell(None,x,fieldKey).style.fill.start_color.index = openpyxl.style.Color = 'FF00FFFF'
new_sheet.cell(None,x,fieldKey).value = row.OBJECTID
x += 1
print 'third'
print x
The problem is in lines there you are defining colors. Just assign the color to style.fill.start_color.index there. For example:
new_sheet.cell(None,x,fieldKey).style.fill.start_color.index = 'FF808000'
instead of:
new_sheet.cell(None,x,fieldKey).style.fill.start_color.index = openpyxl.style.Color = 'FF808000'

Categories