Pyexcel doesn't manipulate the cells I tell it to - python

I'm working with pyexcel to automatically open a excelsheet, manipulate some data in it and save it again.
However it only manipulates the first command and seems to ignore the others.
I access my file, with
book = pyexcel.get_book(file_name=file_to_be_manipulated)
whereas file_to_be_manipulated holds the link to the file
then I have my sheets in a tuple like
sheets = ('first_sheet', 'second_sheet', etc.)
and access them via
sheet_name = book[sheets[sheet_index]]
to iterate over the cells I want to manipulate I access the cells like
HERE everything works, I iterate over the second column aslong as there is something in it and 'delete' everything that is in the first two columns.
This works perfectly fine.
row = 5
column = 2
column_to_be_deleted = 0
second_column_to_be_deleted = 1
sheet_name = book[sheets[sheet_index]]
while sheet_name[row,column] != None:
row_to_be_deleted = row
second_row_to_be_deleted = row
sheet_name[row_to_be_deleted, column_to_be_deleted] = ""
sheet_name[second_row_to_be_deleted, second_column_to_be_deleted] = ""
row += 1
HOWEVER here strangely I just want to manipulate columns 2 and 3 from 'empty' to 'Default' and 'x'
but this doesn't work. The 'delete' in the first column works fine but the other two manipulations won't work and I can't figure out why.
row = 5
column = 1
column_to_be_deleted = 0
column_to_set_to_default = 2
column_to_set_to_something = 3
sheet_name = book[sheets[sheet_index]]
while sheet_name[row,column] != None:
row_to_be_deleted = row
row_to_set_to_default = row
row_to_set_to_something = row
sheet_name[row_to_be_deleted, column_to_be_deleted] = ""
sheet_name[row_to_set_to_default, column_to_set_to_default] = "Default"
sheet_name[row_to_set_to_something, column_to_set_to_something] = "x"
row += 1
It just will work if some string already is inside the columns 2 and 3, then it works fine.
HOWEVER here I want to change the value of column 11 row 5 to '1' and just delete the first column like in the other examples. Here the deletion works fine as well but the '0' in column 11 row 5 won't change to '1'
if sheet_index == 13: #ORGANISATIONS SHEET, L6 MUST BE SET TO 1
row = 5
column = 1
column_to_be_deleted = 0
column_to_set_to_one = 11
row_to_set_to_one = 5
sheet_name = book[sheets[sheet_index]]
sheet_name[row_to_set_to_one, column_to_set_to_one] = "1"
while sheet_name[row,column] != None:
row_to_be_deleted = row
sheet_name[row_to_be_deleted, column_to_be_deleted] = ""
row += 1
How comes this, it seems so random to me which command is executed and which not.

The problem seems to be with how pyexcel looks at Excel-Sheets.
Pyexcel first looks how big the sheet is with where the last data entry is.
Then it creates an array this big, and if you want to manipulate data outside this array it doesn't throw an error, but simply doesn't do what it was asked for.
So if you want to manipulate data in a column where no data is filled in yet you either have to create this column (how to do that see the readthedocs from pyexcel) or manually input some data in it first.

Related

Inserting a new column with openpyxl removes previous column width adjustment?

Below is a snippet out of my code thats supposed to create a new E column and insert the current time in E1.
os.chdir(r'C:\Users\daani\Documents\Programmering\automate_online-materials')
if 'ownUpdatedProduce.xlsx' in os.listdir(os.getcwd()):
print('Updated excel file found. Updating this one')
mainFileName = 'ownUpdatedProduce.xlsx'
else:
print('Updated excel file not found. Creating one')
mainFileName = 'produceSales.xlsx'
wb = openpyxl.load_workbook(mainFileName)
sheet = wb[(wb.sheetnames[0])]
sheet.insert_cols(idx=5, amount=1)
sheet.column_dimensions['E'].width = 19
sheet['E1'] = currentTime
sheet['E1'].font = capitalFont
After that it does some stuff to the file and saves it. The only thing that does not work as intended is what happens to the E column. It was intended to create a new column on the fifth position (E) and adjust its size to 19 then insert current date as bold text whenever the script is run. It does that perfectly the first time its run (when the updated file does not exist). But on consecutive runs, as it inserts a new column the previously created ones get their width readjusted back to excels default. So whatever was on E column on first run is now on F column with a much narrower width. I've added a picture of the excel file after 3 runs to try and clarify what i mean.
As Charlie Clark commented, one needs to back fill the widths. Apparently, insert_cols in openpyxl does not change the map between the column letter and the width. Respectively, the widths are shifted by one, if one column is inserted.
import openpyxl
mainFileName = 'produceSales.xlsx'
wb = openpyxl.load_workbook(mainFileName)
ws = wb.active
col = 5
ws.insert_cols(col,1)
for i in range(ws.max_column-1,col,-1):
l = openpyxl.utils.cell.get_column_letter(i) # Get the next column letter
m = openpyxl.utils.cell.get_column_letter(i-1) # Get the previous column letter
ws.column_dimensions[l].width = ws.column_dimensions[m].width
col_letter = openpyxl.utils.cell.get_column_letter(col)
sheet.column_dimensions[col_letter].width = 19
wb.save(mainFileName)

Is there a way to find the current row of the iteration using 'openpyxl' on Python?

I'm working with a xlsx file where it is divided by sections with empty rows and each section has an information displayed in a different manner i.e. different columns.
So i'm basically trying to find the section that i'm looking for ('Ação') and create a range from its next line, where are the headers, until the next empty row so I can create a DataFrame of this range.
when I try to print the index, it returns a tuple containing the values of the row, but I couldn't find a way to return its index (integer)
from openpyxl import load_workbook
data = '2019/02/07'
symbol = 'EQTL3'
ano = data[0:4]
mes = data[5:7]
dia = data[8:10]
file = "Fundo_{}{}{}.xlsx".format(ano, mes, dia)
wb = load_workbook(filename=file, read_only=False)
ws = wb["Fundo_{}{}{}".format(ano, mes, dia)]
for cell in ws['A']:
if (cell.value == 'Ação'):
x = int(cell.coordinate[1:]) + 1
for index in ws.iter_rows(min_row=x, max_col=ws.max_column, max_row=ws.max_row, values_only=True):
if (index[0] == None):
y = ws._current_row
break
I expect to receive an integer value with the index of the last row different than empty.
you can use enumerate for that....
something like this:
for row_idx, row_of_cells in enumerate(ws.iter_rows(min_row=x, values_only=True), start=1):

Quickly count non empty cells in large excel sheet

I'm trying to determine how much data is missing from a large excel sheet. The following code takes a prohibitive amount of time to complete. I've seen similar questions, but I'm not sure how to translate the answer to this case. Any help would be appreciated!
import openpyxl
wb = openpyxl.load_workbook('C://Users/Alec/Documents/Vertnet master list.xlsx', read_only = True)
sheet = wb.active
lat = 0
loc = 0
ele = 0
a = openpyxl.utils.cell.column_index_from_string('CF')
b = openpyxl.utils.cell.column_index_from_string('BU')
c = openpyxl.utils.cell.column_index_from_string('BX')
print('Workbook loaded')
for x in range(2, sheet.max_row):
if sheet.cell(row = x, column = a).value:
lat += 1
if sheet.cell(row = x, column = b).value:
loc += 1
if sheet.cell(row = x, column = c).value:
ele += 1
print((x/sheet.max_row) * 100, '%')
print('Latitude: ', lat/sheet.max_row)
print('Location', loc/sheet.max_row)
print('Elevation', ele/sheet.max_row)
If you are simply trying to do the calc on a table on the sheet and not the entire sheet, you could make one adjustment to make it faster.
row = 1
Do Until IsEmpty(range("A1").offset(row,1).value)
if range("B"&row).value: lat += 1
if range("C"&row).value: loc += 1
if range("D"&row).value: ele += 1
row = row + 1
Loop
This would take you to the end of your defined table rather than the end of the whole sheet which is 90% of the reason it's taking you so long.
Hope this helps
Your problem is that, despite advice in the documentation to the contrary, you're using your own counters to access cells. In read-only mode each use of ws.cell() will force the worksheet to reparse the XML source for the worksheet. Simply use ws.iter_rows(min_col=a, max_col=c) to get the cells in the columns you're interested in.

Faster search method on the first empty cell in a column using openpyxl PYTHON 3.5

I am having a problem in searching a the first empty cell in a certain column
on a 40k lines .xlsx file. As the search goes farther, it becoming slower and slower. Is there a faster/instant search method in searching the first empty cell on a column?
wb = load_workbook(filename = dest_filename,read_only=True)
sheet_ranges1 = wb[name]
i = 1
x = 0
sam = 0
cc = 0
brgyst =Street+Brgy
entrylist = [TotalNoConfig,TotalNoChannel,Rsl,Mode,RslNo,Year,IssuedDate,Carrier,CaseNo,Site,brgyst,Municipality,Province,Region,Longitude1,Longitude2,Longitude3,Latitude1,Latitude2,Latitude3,ConvertedLong,ConvertedLat,License,Cos,NoS,CallSign,PTSVC,PTSVCCS,Tx,Rx] #The values to be inputted in the entire row after searching the last empty cell in column J
listX1 = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N', 'O','P','Q','T','U','V','R','X','Y','Z','AA','AB','AM','AN','AP','FL'] #The columns in the file
eter = 0
while(x != 1):
cellS = 'J'+str(i) #until there is no empty cell
if(sheet_ranges1[cellS].value is None): #if found empty cell, insert the values
x=1
book = load_workbook(filename = dest_filename)
sheet = book[name]
rangeof = int(len(entrylist))
while(cc<rangeof):
cells = listX1[cc]+str(i)
sheet[cells]= entrylist[cc]
cc=cc+1
else:
x=0
sam = sam+1
i=i+1
wb.save(dest_filename)
wb.close()
In read-only mode every cell lookup causes the worksheet to parsed again so you should always use ws.iter_rows() for your work.

data validation range Django and xlsxwriter

I have been using Django and xlsxwriter on a project that I am working on. I want to use data_validation in Sheet1 to pull in the lists that I have printed in Sheet2. I get the lists to print, but am not seeing the data_validation in Sheet1 when I open the file. Any insight on what I am doing incorrectly is much appreciated!
wb = xlsxwriter.Workbook(TestCass)
sh_1 = wb.add_worksheet()
sh_2 = wb.add_worksheet()
col = 15
head_col = 0
for header in headers:
sh_1.write(0,head_col,header)
sh_2.write(0,head_col,header)
list_row = 1
list = listFunction(headerToModelDic[header])
for entry in list:
sh_2.write(list_row,col,entry)
list_row += 1
sh_1.data_validation(1,col,50,col,{'validate':'list','source':'=Sheet2!$A2:$A9'})
col += 1
wb.close()
Note: The reason I am not pulling the list directly from the site is because it is too long (longer than 256 characters). Secondly, I ultimately would like the source range in the data validation to take in variables from sheet2, however I cannot get sheet 1 to have any sort of data validation as is so I figured I would start with the absolute values.
It looks like the data ranges are wrong in the example. It appears that you are writing out the list data in a column but the data validation refers to a row of data.
Maybe in your full example there is data in that row but in the example above there isn't.
I've modified your example slightly to a non-Django example with some sample data. I've also changed the data validation range to match the written data range:
import xlsxwriter
wb = xlsxwriter.Workbook('test.xlsx')
sh_1 = wb.add_worksheet()
sh_2 = wb.add_worksheet()
col = 15
head_col = 0
headers = ['Header 1']
for header in headers:
sh_1.write(0,head_col,header)
sh_2.write(0,head_col,header)
list_row = 1
list = [1, 2, 3, 4, 5]
for entry in list:
sh_2.write(list_row,col,entry)
list_row += 1
sh_1.data_validation(1,col,50,col,
{'validate':'list','source':'=Sheet2!$P2:$P6'})
col += 1
wb.close()
And here is the output:

Categories