xlrd & openpyxl fetch wrong cell values (Excel) - python

Need help, please! Seems like a simple task – I need to fetch values from certain spreadsheet cells and sum them up. But I failed even at the first step - fetching them. At first, I thought smth wrong was with the module (openpyxl is being regularly upgraded and I missed something), but the xlrd module produced the same wrong results! Here's the code:
import xlrd, xlwt
wb = xlrd.open_workbook(r"E:\Projects_working (11).xlsx")
sheet = wb.sheet_by_name('Language Process')
for i in range(1, 100):
cellVal = sheet.cell(i, 14).value #need to find "5" in column 14
if type(cellVal) == float and cellVal == 5.0: #need to read corresp.
print(sheet.cell(i, 11).value) #values в column 11
As a result, instead of an integer (say, 22), the code ends up with a float 42782.61458. (The other values are similar and wrong: 42782.66146, 42781.38542, 42781.42708, etc.)
Orignially I used the openpyxl module and added the flag data_only=True to the loadede workbook: wb = load_workbook("file.xlsx", data_only=True). That code produces the same results. Without this flag, all get is strange formulas: =B32+((M32-B32)/2), =B41+((M41-B41)/2) etc. Here's the code for these formulas (with no flag):
import openpyxl
wb = openpyxl.load_workbook(r"E:\Projects_working (11).xlsx")
sheet = wb.get_sheet_by_name('Language Process')
for i in range(1, 100):
cellVal = sheet.cell(row=i, column=14).value
if type(cellVal) == float and cellVal == 5.0:
print(sheet.cell(row=i, column=11).value)
And here's a link to the file, just in case: https://docs.google.com/spreadsheets/d/1bFhkEs8JTVWCgZoW5_9lQ1q_T0gtijBhuywr6OVpfGc/edit?usp=sharing

The data you are reading look like Excel's version of raw datetime values. You probably have miscounted the columns (that is, given the wrong column index).

Related

How to iterate in excel with python

This is probably super simple, but i am new to python.
I wrote some code to insert a number into a certain row and column in excel. That gives me a value in another cell. I would like to iterate, by inserting -1000, then -950, then -900 up to +1000. And for every increment i would like to print the value.
How is this possible?
THis is my code so far
import xlwings as xw
import pandas as pd
import matplotlib.pyplot as plt
#load the excel file
wb = xw.Book("Datasets/Sektion_20111.xlsm")
#Sheet
sht = wb.sheets["Beregning"]
#dataframe
#Cell with normal force
sht.range("N25").value = (500)
#Print cell with nedre grænse, brudmoment
print(sht["AV24"].value)
This way it works by creating a new spreadsheet, where cell N25 has the value 1000, and i can read the result from that manually. i would like python to print all values and all results for me.
How can i do this?
As far as I understood you're trying to run an Excel macro several times, inserting values from -1000 to +1000 with step of 50 using a Python script, then get the result for each iteration, taken from a different cell of the sheet.
If this is your case openpyxl is not able to do that, as stated in this post:
openpyxl how to read formula result after editing input data on the sheet? data_only=True gives me a "None" result
To anyone interested, i solved it with this code
import xlwings as xw
#load the excel file
wb = xw.Book("Datasets/Sektion_20111.xlsm")
#Sheet
sht = wb.sheets["Beregning"]
#for loop
x = range(-100, 100, 50)
for i in x:
#Cell with normal force
sht.range("N25").value = i
print(sht["AV24"].value)
N25 is the cell to enter info.
AV24 is the cell to print.

Data append to list using XLRD

I am able to import data of rows in a particular column of certain sheet name in to a python list. But, the list is looking like Key:Value formatted list (not the one I need).
Here is my code:
import xlrd
excelList = []
def xcel(path):
book = xlrd.open_workbook(path)
impacted_files = book.sheet_by_index(2)
for row_index in range(2, impacted_files.nrows):
#if impacted_files.row_values(row_index) == 'LCR':
excelList.append(impacted_files.cell(row_index, 1))
print(excelList)
if __name__ == "__main__":
xcel(path)
The output is like below:
[text:'LCR_ContractualOutflowsMaster.aspx', text:'LCR_CountryMaster.aspx', text:'LCR_CountryMasterChecker.aspx', text:'LCR_EntityMaster.aspx', text:'LCR_EntityMasterChecker.aspx', text:'LCR_EscalationMatrixMaster.aspx',....]
I want the list to have just values. Like this...
['LCR_ContractualOutflowsMaster.aspx', 'LCR_CountryMaster.aspx', 'LCR_CountryMasterChecker.aspx', 'LCR_EntityMaster.aspx', 'LCR_EntityMasterChecker.aspx', 'LCR_EscalationMatrixMaster.aspx',...]
I've tried pandas too (df.value.tolist() method). Yet the output is not what I visualize.
Please suggest a way.
Regards
You are accumulating a list of cells, and what you are seeing is the repr of each cell in your list. Cell objects have three attributes: ctype is an int that identifies the type of the cell's value, value (which which is a Python rtype holding the cell's value) and xf_index. If you want only the values then try
excelList.append(impacted_files.cell(row_index, 1).value)
You can read more about cells in the documentation.
If you are willing to try one more library, openpyxl this is how it can be done.
from openpyxl import load_workbook
book = load_workbook(path)
sh = book.worksheets[0]
print([cell.value for cell in row for row in sheet.iter_rows()] )

How could calculate the excel data by using openpyxl

I have an assignment to do for my boring online class and I couldn't come out with an idea to do this thing. I'm told to calculate the ratio of four columns with this formula ratio = weight/heightlengthwidth. Bu i'm bad at using microsoft excel and ironically we haven't learnt anything related to that. So I remembered that there is a python library which works with excel sheets. So how could I calculate this ratio = Weight/HeightWidthLength by using openpyxl for every single row in this excel sheet easily ?
Though I've never used openpyxl library I tried to find a solution to your problem. If the spreadsheet you're working on looks like the one below then you should be able to work with this script.
Sample spreadsheet image
from openpyxl import load_workbook
# Modify filename and sheet name where the data is
workbook_filename = 'workbook.xlsx'
sheet_name = 'Sheet1'
wb = load_workbook(workbook_filename)
ws = wb[sheet_name]
# If the data is stored differently in your file, you have to modify
# this loop to suit your needs
for row in ws.iter_rows(min_row = 2, max_row = 3, max_col = 5):
row[4].value = row[0].value / (row[1].value * row[2].value * row[3].value)
wb.save('result.xlsx')

xlwings function to find the last row with data

I am trying to find the last row in a column with data. to replace the vba function: LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
I am trying this, but this pulls in all rows in Excel. How can I just get the last row.
from xlwings import Workbook, Range
wb = Workbook()
print len(Range('A:A'))
Consolidating the answers above, you can do it in one line:
wb.sheet.range(column + last cell value).Get End of section going up[non blank assuming the last cell is blank].row
Example code:
import xlwings as xw
from xlwings import Range, constants
wb = xw.Book(r'path.xlsx')
wb.sheets[0].range('A' + str(wb.sheets[0].cells.last_cell.row)).end('up').row
We can use Range object to find the last row and/or the last column:
import xlwings as xw
# open raw data file
filename_read = 'data_raw.csv'
wb = xw.Book(filename_read)
sht = wb.sheets[0]
# find the numbers of columns and rows in the sheet
num_col = sht.range('A1').end('right').column
num_row = sht.range('A1').end('down').row
# collect data
content_list = sht.range((1,1),(num_row,num_col)).value
print(content_list)
This is very much the same as crazymachu's answer, just wrapped up in a function. Since version 0.9.0 of xlwings you can do this:
import xlwings as xw
def lastRow(idx, workbook, col=1):
""" Find the last row in the worksheet that contains data.
idx: Specifies the worksheet to select. Starts counting from zero.
workbook: Specifies the workbook
col: The column in which to look for the last cell containing data.
"""
ws = workbook.sheets[idx]
lwr_r_cell = ws.cells.last_cell # lower right cell
lwr_row = lwr_r_cell.row # row of the lower right cell
lwr_cell = ws.range((lwr_row, col)) # change to your specified column
if lwr_cell.value is None:
lwr_cell = lwr_cell.end('up') # go up untill you hit a non-empty cell
return lwr_cell.row
Intuitively, the function starts off by finding the most extreme lower-right cell in the workbook. It then moves across to your selected column and then up until it hits the first non-empty cell.
You could try using Direction by starting at the very bottom and then moving up:
import xlwings
from xlwings.constants import Direction
wb = xlwings.Workbook(r'data.xlsx')
print(wb.active_sheet.xl_sheet.Cells(65536, 1).End(Direction.xlUp).Row)
Try this:
import xlwings as xw
cellsDown = xw.Range('A1').vertical.value
cellsRight = xw.Range('A1').horizontal.value
print len(cellsDown)
print len(cellsRight)
One could use the VBA Find function that is exposed through api property (use it to find anything with a star, and begin your search from the first cell).
Example:
row_cell = s.api.Cells.Find(What="*",
After=s.api.Cells(1, 1),
LookAt=xlwings.constants.LookAt.xlPart,
LookIn=xlwings.constants.FindLookIn.xlFormulas,
SearchOrder=xlwings.constants.SearchOrder.xlByRows,
SearchDirection=xlwings.constants.SearchDirection.xlPrevious,
MatchCase=False)
column_cell = s.api.Cells.Find(What="*",
After=s.api.Cells(1, 1),
LookAt=xlwings.constants.LookAt.xlPart,
LookIn=xlwings.constants.FindLookIn.xlFormulas,
SearchOrder=xlwings.constants.SearchOrder.xlByColumns,
SearchDirection=xlwings.constants.SearchDirection.xlPrevious,
MatchCase=False)
print((row_cell.Row, column_cell.Column))
Other methods outlined here seems to require no empty rows/columns between data.
source: https://gist.github.com/Elijas/2430813d3ad71aebcc0c83dd1f130e33
python 3.6, xlwings 0.11
Solutoin 1
To find last row with data, you should do some work both horizontally and vertically. You have to go through every column to determine which row is the last row.
import xlwings
workbook_all = xlwings.Book(r'path.xlsx')
objectiveSheet = workbook_all .sheets['some_sheet']
# lastCellContainData(), inspired of Stefan's answer.
def lastCellContainData(objectiveSheet,lastRow=None,lastColumn=None):
lastRow = objectiveSheet.cells.last_cell.row if lastRow==None else lastRow
lastColumn = objectiveSheet.cells.last_cell.column if lastColumn==None else lastColumn
lastRows,lastColumns = [],[]
for col in range(1,lastColumn):
lastRows.append(objectiveSheet.range((lastRow, col)).end('up').row)
# extract last row of every column, then max(). Or you can compare the next
# column's last row number to the last column's last row number. Here you get
# the last row with data, you can also go further get the last column with data:
for row in range(1,lastRow):
lastColumns.append(objectiveSheet.range((row, lastColumn)).end('left').column)
return max(lastRows),max(lastColumns)
lastCellContainData(objectiveSheet,lastRow=5000,lastColumn=300)
I added lastRow and lastColumn. To make the program more effective, you can set these parameters according to the approximate shape of the data you're dealing with.
Solution 2
xlwings is honored for being wrapper of pywin32. I don't know if your situation allows for keyboard or mouse. If so, first you ctrl+tab switch to the workbook, then ctrl+a to select the region containing data, then you call workbook_all.selection.rows.count.
another way:
When you know where right bottom cell of your data locates faintly, say AAA10000, just call objectiveSheet.range('A1:'+'AAA10000').current_region.rows.count
Update:
After a while none of the solutions were really intuitive to me, so I decided to compile the following:
Code:
import xlwings as Objxlwings
import xlwings.constants
def Return_RangeLastCell(ObjWS):
return ObjWS.api.Cells.SpecialCells(xlwings.constants.CellType.xlCellTypeLastCell)
I tried to keep consistency with the way to call it from Excel to keep it simple
Then on my main code, I just call it like so:
ObjWS=Objxlwings.Book('Book1.xlsx').sheets["Sheet1"]
print(Return_RangeLastCell(ObjWS).Column)
Interesting solutions. But maybe like this:
print(sheet.used_range.last_cell.row)
#Cody's answer will help under normal circumstances, but if your sheet have hidden rows at bottom like links: example, it will give the wrong row number.
Lets say, if your row counts of data is 10, and row[5:11] are hidden, i.e. actually last_row will be 10.
[code a] below will give you answer 5, [code b] below will give you answer 10.
code a:
ws = wb.sheets[your_sheet_name]
last_row = ws.range('A' + str(ws.cells.last_cell.row)).end('up').row # return 5
code b:
ws = wb.sheets[your_sheet_name]
last_row_1 = ws.used_range.last_cell.row # return 10

How to read a particular cell by using "wb = load_workbook('path', True)" in openpyxl

there
I have written code for reading the large excel files
but my requirement is to read a particular cell like for e.g(cell(row,column) in a excel file when i kept True
in wb = load_workbook('Path', True)
any body please help me...
CODE:
from openpyxl import load_workbook
wb = load_workbook('Path', True)
sheet_ranges = wb.get_sheet_by_name(name = 'Global')
for row in sheet_ranges.iter_rows():
for cell in row:
print cell.internal_value
Since you are using an Optimized Reader, you cannot just access an arbitrary cell using ws.cell(row, column).value:
cell, range, rows, columns methods and properties are disabled
Optimized reader was designed and created specially for reading an umlimited amount of data from an excel file by using iterators.
Basically you should iterate over rows and cells until you get the necessary cell. Here's a simple example:
for r, row in enumerate(sheet_ranges.iter_rows()):
if r == 10:
for c, cell in enumerate(row):
if c == 5:
print cell.internal_value
You can find the answer here.
I recommend you consult the documentation first before asking a question on SO.
In particular, this is pretty much exactly what you want:
d = ws.cell(row = 4, column = 2)
where ws is a worksheet.

Categories