python xlrd index array out of range in matrix format

python xlrd index array out of range in matrix format - python

I would like to build a dict using a concatenation of column and row headers as the keys. In this format cell (0,0) is blank
I want the script to start at the second column heading, concatenate its string value to the second row value in the first column, then the second row, and so forth until there are no more rows. If the corresponding cell value in column1 row[i] is blank, it should skip making a key. If it is not blank, it should make the key with the corresponding cell value as it's value.
So, 0.5_20.00 would not be created; and .05_32.00: 9.00 would be created
Once it reaches the last row, it should move to the third column (column[2]) and do the same thing until there are no more columns.
import xlrd
#make dict from excel workbook, size and frequency
serParams = {}
wb = xlrd.open_workbook(r"S:\Shared\Service_Levels.xlsx")
sh = wb.sheet_by_index(0)
# Get row range value
row_range_value = 0
for i in sh.col(0):
row_range_value += 1
row_range_value = row_range_value -1
print row_range_value
# Get column range value
column_range_value = 0
for i in sh.row(0):
column_range_value += 1
column_range_value = column_range_value - 1
print column_range_value
# build the dict by using concatenated column and row headers as keys and
# corresponding values as key value
for i in range(0,column_range_value,1):
for j in range(0,row_range_value,1):
if sh.cell(i+1,j+1).value != '':
key = str(sh.cell(i+1,j).value) +"_"+str(sh.cell(i,j+1).value)
serParams[key] = sh.cell(i+1, j+1).value
Maybe the format of my data is causing the index error because the ranges exceed the table data with i+1 and j+1 once the loop reaches the end of the table? I tried to address this by subtracting 1 from the range values but I continue to get the index error.

Related

Is there a way to find the current row of the iteration using 'openpyxl' on Python?

I'm working with a xlsx file where it is divided by sections with empty rows and each section has an information displayed in a different manner i.e. different columns.
So i'm basically trying to find the section that i'm looking for ('Ação') and create a range from its next line, where are the headers, until the next empty row so I can create a DataFrame of this range.
when I try to print the index, it returns a tuple containing the values of the row, but I couldn't find a way to return its index (integer)
from openpyxl import load_workbook
data = '2019/02/07'
symbol = 'EQTL3'
ano = data[0:4]
mes = data[5:7]
dia = data[8:10]
file = "Fundo_{}{}{}.xlsx".format(ano, mes, dia)
wb = load_workbook(filename=file, read_only=False)
ws = wb["Fundo_{}{}{}".format(ano, mes, dia)]
for cell in ws['A']:
if (cell.value == 'Ação'):
x = int(cell.coordinate[1:]) + 1
for index in ws.iter_rows(min_row=x, max_col=ws.max_column, max_row=ws.max_row, values_only=True):
if (index[0] == None):
y = ws._current_row
break
I expect to receive an integer value with the index of the last row different than empty.

you can use enumerate for that....
something like this:
for row_idx, row_of_cells in enumerate(ws.iter_rows(min_row=x, values_only=True), start=1):

openpyxl: Iterate through all the rows and get the row data in a tuple

How do I iterate through all the rows in an xls sheet, and get each row data in a tuple. So at the end of the iteration, I should have a list of tuples with each element in the list, being a tuple of row data.
For instance: This is the content of my spreadsheet:
testcase_ID input_request request_change
test_1A test/request_1 YES
test_2A test/request_2 NO
test_3A test/request_3 YES
test_4A test/request_4 YES
my final list should be:
[(test_1A, test/request_1, YES),
(test_2A, test/request_2, NO),
(test_3A, test/request_3, YES),
(test_4A, test/request_4, YES)]
How can I do this in openpyxl?

I think this task would be easier with xlrd. However, if you want to use openpyxl, then assuming that testcase_ID is in column A, input_request in column B, and request_change in column C somehting like this might be what you are looking for:
import openpyxl as xl
#Opening xl file
wb = xl.load_workbook('PATH/TO/FILE.xlsx')
#Select your sheet (for this example I chose active sheet)
ws = wb.active
#Start row, where data begins
row = 2
testcase = '' #this is just so that you can enter while - loop
#Initialiazing list
final_list = []
#With each iteration we get the value of testcase, if the cell is empty
#tescase will be None, when that happens the while loop will stop
while testcase is not None:
#Getting cell value, from columns A, B and C
#Iterating through rows 2, 3, 4 ...
testcase = ws['A' + str(row)].value
in_request = ws['B' + str(row)].value
req_change = ws['C' + str(row)].value
#Making tuple
row_tuple = (testcase, in_request, req_change)
#Adding tuple to list
final_list.append(row_tuple)
#Going to next row
row += 1
#This is what you return, you don't want the last element
#because it is tuple of None's
print(final_list[:-1])
If you want to do it with xlrd this is how I would do it:
import xlrd
#Opening xl file
wb = xlrd.open_workbook('PATH/TO/FILE.xlsx')
#Select your sheet (for this example I chose first sheet)
#you can also choose by name or something else
ws = wb.sheet_by_index(0)
#Getting number of rows and columns
num_row = ws.nrows
num_col = ws.ncols
#Initializing list
final_list = []
#Iterating over number of rows
for i in range(1,num_row):
#list of row values
row_values = []
#Iterating over number of cols
for j in range(num_col):
row_values.append(ws.cell_value(i,j))
#Making tuple with row values
row_tuple = tuple(row_values)
#Adding tuple to list
final_list.append(row_tuple)
print(final_list)
Adding xlrd index specifications comments at the end for easy reading:
Deleted if statement, when num_row is 1 then for-loop never happens
xlrd indexes rows beginning at 0
for row 2 we want index 1
Columns are also zero-indexed (A=0, B=1, C=2...)

Python: count number of empty cells in excel sheet of data

I'm relatively new to Python, and I'm attempting to count the number of empty cells in an excel sheet filled with data. To test the program, I've been deleting some values so that the cells are empty: my code is below
import xlrd
import pandas as pd
import openpyxl
df = pd.read_excel('5train.xls')
workbook = xlrd.open_workbook('5train.xls')
worksheet = workbook.sheet_by_name('5train')
#Task starts here
empty = 0
row_data = worksheet.nrows - 1
row = 0
cell = 0
while row < row_data:
if worksheet.cell(0, 0).value == xlrd.empty_cell.value:
empty += 1
cell += 1
else:
pass
row += 1
print("Number of empty cells in data sheet:", empty)
However, the code will consistently print "Number of empty cells in data sheet: 0" no matter how many cells I empty. Any pointers? Thank you!

You always check the same cell in your loop:
if worksheet.cell(0, 0).value == xlrd.empty_cell.value:
Only the cell in row 0 and columns 0 is checked if it is empty.

You can iterate over each row through the last row that contains data using .get_rows(), then count the empty cells by checking the value of each cell in each row.
workbook = xlrd.open_workbook('5train.xls')
worksheet = workbook.sheet_by_name('5train')
empty_cells = 0
for row in worksheet.get_rows():
empty_cells += sum(0 if c.value else 1 for c in row)
If you want to make it a one-liner, you can use:
empty_cells = sum(0 if c.value else 1 for row in worksheet.get_rows() for c in row)

Openpyxl: How to copy a row after checking if a cell contains specific value

I have a worksheet that is updated every week with thousands of rows and would need to transfer rows from this worksheet after filtering. I am using the current code to find the cells which has the value I need and then transfer the entire row to another sheet but after saving the file, I get the "IndexError: list index out of range" exception.
The code I use is as follows:
import openpyxl
wb1 = openpyxl.load_workbook('file1.xlsx')
wb2 = openpyxl.load_workbook('file2.xlsx')
ws1 = wb1.active
ws2 = wb2.active
for row in ws1.iter_rows():
for cell in row:
if cell.value == 'TrueValue':
n = 'A' + str(cell.row) + ':' + ('GH' + str(cell.row))
for row2 in ws1.iter_rows(n):
ws2.append(row2)
wb2.save("file2.xlsx")
The original code I used that used to work is below and has to be modified because of the large files which causes MS Excel not to open them (over 40mb).
n = 'A3' + ':' + ('GH'+ str(ws1.max_row))
for row in ws1.iter_rows(n):
ws2.append(row)
Thanks.

I'm not entirely sure what you're trying to do but I suspect the problem is that you have nested your copy loop.
Try the following:
row_nr = 1
for row in ws1:
for cell in row:
if cell.value == "TrueValue":
row_nr = cell.row
break
if row_nr > 1:
break
for row in ws1.iter_rows(min_row=row_nr, max_col=190):
ws2.append((cell.value for cell in row))

Question: I get the "IndexError: list index out of range" exception.
I get, from ws1.iter_rows(n)
UserWarning: Using a range string is deprecated. Use ws[range_string]
and from ws2.append(row2).
ValueError: Cells cannot be copied from other worksheets
The Reason are row2 does hold a list of Cell objects instead of a list of Values
Question: ... need to transfer rows from this worksheet after filtering
The following do what you want, for instance:
# If you want to Start at Row 2 to append Row Data
# Set Private self._current_row to 1
ws2.cell(row=1, column=1).value = ws2.cell(row=1, column=1).value
# Define min/max Column Range to copy
from openpyxl.utils import range_boundaries
min_col, min_row, max_col, max_row = range_boundaries('A:GH')
# Define Cell Index (0 Based) used to Check Value
check = 0 # == A
for row in ws1.iter_rows():
if row[check].value == 'TrueValue':
# Copy Row Values
# We deal with Tuple Index 0 Based, so min_col must have to be -1
ws2.append((cell.value for cell in row[min_col-1:max_col]))
Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2

Use a list to hold the items in each column for the particular row.
Then append the list to your ws2.
...
def iter_rows(ws,n): #produce the list of items in the particular row
for row in ws.iter_rows(n):
yield [cell.value for cell in row]
for row in ws1.iter_rows():
for cell in row:
if cell.value == 'TrueValue':
n = 'A' + str(cell.row) + ':' + ('GH' + str(cell.row))
list_to_append = list(iter_rows(ws1,n))
for items in list_to_append:
ws2.append(items)

I was able to solve this with lists for my project.
import openpyxl
#load data file
wb1 = openpyxl.load_workbook('original.xlsx')
sheet1 = wb1.active
print("loaded 1st file")
#new template file
wb2 = openpyxl.load_workbook('blank.xlsx')
sheet2 = wb2.active
print("loaded 2nd file")
header = sheet1[1:1] #grab header row
listH =[]
for h in header:
listH.append(h.value)
sheet2.append(listH)
colOfInterest= 11 # this is my col that contains the value I'm checking against
for rowNum in range(2, sheet1.max_row +1): #iterate over each row, starting with 2 to skipping header from original file
if sheet1.cell(row=rowNum, column=colOfInterest).value is not None: #interested in non blank values in column 11
listA = [] # list which will hold my data
row = sheet1[rowNum:rowNum] #creates a tuple of row's data
#print (str(rowNum)) # for debugging to show what rows are copied
for cell in row: # for each cell in the row
listA.append(cell.value) # add each cell's data as an element in the list
if listA[10] == 1: # condition1 I'm checking for by looking up the index in the list
sheet2.append(listA) # appending the sheet2's next available row
elif listA[10] > 1: # condition2 I'm checking for by looking up the index in the list
# do something else and store it in bar
sheet2.append(bar) # appending the sheet2's next available row
print("saving file...")
wb2.save('result.xlsx') # save file
print("Done!")
Tested with: Python 3.7 openpyxl 2.5.4

XLRD Out of Range Error

I have a excel spreadsheet containing data as follows
Serial Number SAMPLE ID SAMPLE NAME
value value value
value value value
value value value......
Basically a table of entries. I do not know how many entries the table will have in it. Now I write Python code with xlrd to extract the values from Excel. The first thing that I want to do is determine the amount of entries present, so I use the following piece of code:
kicker = 0
counter = 0
rownum = 5
colnum = 1
while (kicker == 0):
if sh.cell_value(rowx=rownum, colx=colnum) is None:
kicker = 1
else:
counter = counter + 1
rownum = rownum + 1
print("done")
The code scans through the values and successfully reads the entries that have a value in the first field. The problem is, when I get to the first row without a value in the first field, xlrd gives me a "list index out of range" error. Thus, I read the last valid value, but as soon as I read the first empty block, it gives the error. How can I determine the amount of entries in my "table" without having xlrd throw an out of range error?

You should query for nrows and not use an potentional endless loop.
kicker = 0
counter = 0
colnum = 1
for rownum in range(5, sh.nrows):
if sh.cell_type(rowx=rownum, colx=colnum) in (xlrd.XL_CELL_EMPTY, xlrd.XL_CELL_BLANK):
kicker = 1
else:
counter = counter + 1
print("done")
Testing an empty cell I looked up here How to detect if a cell is empty when reading Excel files using the xlrd library?.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python xlrd index array out of range in matrix format - python

Related

Is there a way to find the current row of the iteration using 'openpyxl' on Python?

openpyxl: Iterate through all the rows and get the row data in a tuple

Python: count number of empty cells in excel sheet of data

Openpyxl: How to copy a row after checking if a cell contains specific value

XLRD Out of Range Error

Categories

Resources