I am trying to read only the cell value in an Excel spread sheet using Python's openpyxl, but I am only able to read the forumulas.
I have already come across countless questions on Stack Overflow that ask this question and they all says to set the flag data_only=True like this:
wb = openpyxl.load_workbook(reference_filename, data_only=True)
ws = wb.worksheets[0]
cell_value = ws.cell(7, 1).value
print(cell_value)
However, this is still only printing the formula.. Why??
I just need the value that is in the cell.
The openpyxl documentation (https://openpyxl.readthedocs.io/en/latest/usage.html?#read-an-existing-workbook)
notes that...
data_only controls whether cells with formulae have either the formula
(default) or the value stored the last time Excel read the sheet.
If the worksheet hasn't been opened by Excel previously, it may not have the last-calculated values stored and therefore openpyxl may not be able to extract it.
Do you have to use openpyxl?
Check this out using pandas:
ws = pd.read_excel('try.xlsx', sheet_name=0, header=None)
cell_value = ws[7][1]
print(cell_value)
For me this gives the result not the formula.
The problem could be on the excel file. If the cell you are trying to read is set as 'Show formula', then the openpyxl will read the formula instead of the value. Go to your excel and Formulas -> Formula Auditing -> Uncheck Show Formulas save the file and run the python program again
Related
The issue I am having is I have a .xlsm workbook with two worksheets.
One of the worksheets using a VLOOKUP macro function in a cell, that looks up a value in the second sheet in the workbook. I just need the date value that it defines from the VLOOKUP.
What I have tried:
-I used Openpyxl to open the existing workbook, using data_only=True,
vba_values=True and the value keeps giving me '#N/A'
-I have tried using win32com to open the workbook, refresh the workbook and grab
the cell value, but I get this giant negative int -217589383
I am not sure if this is possible in openpyxl or if I am not using the library correctly.
The macro in the cell looks like this '=VLOOKUP(A34,SSU!$1:$65536,2,FALSE)', the second sheet is called SSU.
I don't care which Python library I use in order to get the value that this macro calls, so long as I can get it. When I have a file that is .xls I am able to get that value easily using xlrd, but unfortunately, xlrd doesn't work with files that aren't .xls.
Below is my code sample.
elif check_file_type(site_list_name, ['.xlsx', '.xlsm', '.xltx', '.xltm']) and get_file_name(
site_list_name) != prefix_suffix_file_name:
workbook = load_workbook(site_list_name_path, keep_vba=True, data_only=True)
worksheet = workbook.active
print(worksheet['B4'].value)
the print value is #N/A
Any suggestions would be greatly appreciated!!
I have an excel workbook that has quite a few formulas, and when I try to upload the workbook into a database, the cells with iferror formulas come in as blanks even though it should be a string or number. I am new to python but I want to create a python file that will read in the sheet, and paste only the values into a new workbook.
I tried:
import openpyxl as xl
wb1 = xl.load_workbook('file1.xlsx')
ws1 = wb1["Sheet 1"]
wb2 = xl.load_workbook('file2.xlsx')
ws2 = wb2.create_sheet(ws1.title)
for row in ws1:
for cell in row:
ws2[cell.coordinate].value = cell.value
wb2.save('path')
The code works to copy the data into a new workbook, but it is pasting the formulas. I just want the values.
As per my earlier comment:
This comes from the OpenPyxl docs:
Where it's stated on the openpyxl.reader.excel.load_workbook submodule, looking at the data_only parameter:
data_only (bool) – controls whether cells with formulae have either the formula (default) or the value stored the last time Excel read the sheet
Default is Formulas whereas you want the values. So setting it to true:
wb = xl.load_workbook('file1.xlsx', data_only=True)
Should help :)
I am trying to read an excel with pandas but because it has formulae it will return nan values when reading it instead of the cell values.
df=pd.read_excel('Test.xlsx',sheet_name='Sheet1')
#Naga kiran if you want to see the value instead of the formula you can add:
wb = load_workbook('empty_book.xlsx', data_only=True)
But openpyxl never evaluates formula (https://openpyxl.readthedocs.io/en/latest/usage.html#using-formulae)
You need to open the empty_book.xlsx with Excel and save it if you want to see the formula result
This question is a quite stale but I didn't find another one that explicitly answer's the OP's question. Assuming you're opening a .xlsx file and not a .xls, you can do:
workbook = openpyxl.load_workbook(filename=excel_path, data_only=True)
data = pd.read_excel(workbook, sheet_name='Sheet1', engine='openpyxl')
Note that pandas reads .xls files with xlrd, and I'm not sure what the best way to achieve the same thing with that library would be.
I need to read this .xlsm database and some of the cells values I need are derived from Excel functions. To accomplish this I used:
from openpyxl import load_workbook
wb = load_workbook('file.xlsm', data_only=True, keep_vba=True)
ws = wb['Plan1']
And then, for every cell I wanted to read:
ws.cell(row=row, column=column).value
This works fine for getting the data out. But the problem comes with saving. When I do:
wb.save('file.xlsm')
It saves the file, but all the formulas inside the sheets are lost
My dilemma is reading the cell's displayed values on one of the database's sheet without modifying them, writing the code's output in a new sheet and saving it.
Read the file once in read-only and data-only mode to look at the values and another time keeping the VBA around. And save under a different name.
I am having real trouble with this, since the cell.value function returns the formula used for the cell, and I need to extract the result Excel provides after operating.
Thank you.
Ok, I think I ahve found a way around it; apparently to access cell.internal value you have to use the iter_rows() in your worksheet previously, which is a list of "RawCell".
for row in ws.iter_rows():
for cell in row:
print cell.internal_value
Like Charlie Clark already suggest you can set data_only on True when you load your workbook:
from openpyxl import load_workbook
wb = load_workbook("file.xlsx", data_only=True)
sh = wb["Sheet_name"]
print(sh["x10"].value)
From the code it looks like you're using the optimised reader: read_only=True. You can switch between extracting the formula and its result by using the data_only=True flag when opening the workbook.
internal_value was a private attribute that used to refer only to the (untyped) value that Excel uses, ie. numbers with an epoch in 1900 for dates as opposed to the Python form. It has been removed from the library since this question was first asked.
You can try following code.Just provide the excel file path and the location of the cell which value you need in terms of row number and column number below in below code.
from openpyxl import Workbook
wb = Workbook()
Dest_filename = 'excel_file_path'
ws=wb.active
print(ws.cell(row=row_number, column=column_number).value)
Try to use cell.internal_value instead.
Please use this below in Python, and you can get the real values with openpyxl module:
for row in ws.iter_rows(values_only=True):
for cell in row:
print(cell)