I need to read this .xlsm database and some of the cells values I need are derived from Excel functions. To accomplish this I used:
from openpyxl import load_workbook
wb = load_workbook('file.xlsm', data_only=True, keep_vba=True)
ws = wb['Plan1']
And then, for every cell I wanted to read:
ws.cell(row=row, column=column).value
This works fine for getting the data out. But the problem comes with saving. When I do:
wb.save('file.xlsm')
It saves the file, but all the formulas inside the sheets are lost
My dilemma is reading the cell's displayed values on one of the database's sheet without modifying them, writing the code's output in a new sheet and saving it.
Read the file once in read-only and data-only mode to look at the values and another time keeping the VBA around. And save under a different name.
Related
I have an excel workbook that uses functions like OFFSET, UNIQUE, and FILTER which spill into other cells. I'm using python to analyze and write some data to the workbook, but after doing so these formulas revert into normal arrays. This means they now take up a fixed number of cells (however many they took up before opening the file in python) instead of adjusting to fit all of the data. I can revert the change by selecting the formula and hitting enter, but there are many of these formulas it's more work to fix them than to just print the data to a text file and paste it into excel manually. Is there any way to prevent this behavior?
I've been using openpyxl to open and save the workbook, but after encountering this issue also tried xlsxwriter and the dataframe to excel function from pandas. Both of them had the same issue as openpyxl. For context I am on python 3.11 and using the most recent version of these modules. I believe this issue is on the Python side and not the Excel side, so I don't think changing Excel settings will help, but maybe there is something there I missed.
Example:
I've created an empty workbook with two sheets, one called 'main' and one called 'input'. The 'main' sheet will analyze data from the 'input' sheet which will be entered with openpyxl. The data will just be values in the first column.
In cell A1 of the 'main' sheet, enter =OFFSET(input!A1,0,0,COUNTA(input!A:A),1).
This formula will just show a copy of the data. Since there currently isn't any data it gives a #REF! error, so it only takes up one cell.
Now I'll run the following python code to add the numbers 0-9 into the first column of the input sheet:
from openpyxl import load_workbook
wb = load_workbook('workbook.xlsx')
ws = wb['input']
for i in range(10):
ws.append([i])
wb.save('workbook_2.xlsx')
When opening the new file, cell A1 on the 'main' sheet only has the first value, 0, instead of the range 0--9. When selecting the cell, you can see the formula is now {=OFFSET(input!A1,0,0,COUNTA(input!A:A),1)}. The curly brackets make it an array, so it wont spill. By hitting enter in the formula the array is removed and the sheet properly becomes the full range.
If I can get this simple example to work, then expanding it to the data I'm using shouldn't be a problem.
I am trying to read only the cell value in an Excel spread sheet using Python's openpyxl, but I am only able to read the forumulas.
I have already come across countless questions on Stack Overflow that ask this question and they all says to set the flag data_only=True like this:
wb = openpyxl.load_workbook(reference_filename, data_only=True)
ws = wb.worksheets[0]
cell_value = ws.cell(7, 1).value
print(cell_value)
However, this is still only printing the formula.. Why??
I just need the value that is in the cell.
The openpyxl documentation (https://openpyxl.readthedocs.io/en/latest/usage.html?#read-an-existing-workbook)
notes that...
data_only controls whether cells with formulae have either the formula
(default) or the value stored the last time Excel read the sheet.
If the worksheet hasn't been opened by Excel previously, it may not have the last-calculated values stored and therefore openpyxl may not be able to extract it.
Do you have to use openpyxl?
Check this out using pandas:
ws = pd.read_excel('try.xlsx', sheet_name=0, header=None)
cell_value = ws[7][1]
print(cell_value)
For me this gives the result not the formula.
The problem could be on the excel file. If the cell you are trying to read is set as 'Show formula', then the openpyxl will read the formula instead of the value. Go to your excel and Formulas -> Formula Auditing -> Uncheck Show Formulas save the file and run the python program again
The issue I am having is I have a .xlsm workbook with two worksheets.
One of the worksheets using a VLOOKUP macro function in a cell, that looks up a value in the second sheet in the workbook. I just need the date value that it defines from the VLOOKUP.
What I have tried:
-I used Openpyxl to open the existing workbook, using data_only=True,
vba_values=True and the value keeps giving me '#N/A'
-I have tried using win32com to open the workbook, refresh the workbook and grab
the cell value, but I get this giant negative int -217589383
I am not sure if this is possible in openpyxl or if I am not using the library correctly.
The macro in the cell looks like this '=VLOOKUP(A34,SSU!$1:$65536,2,FALSE)', the second sheet is called SSU.
I don't care which Python library I use in order to get the value that this macro calls, so long as I can get it. When I have a file that is .xls I am able to get that value easily using xlrd, but unfortunately, xlrd doesn't work with files that aren't .xls.
Below is my code sample.
elif check_file_type(site_list_name, ['.xlsx', '.xlsm', '.xltx', '.xltm']) and get_file_name(
site_list_name) != prefix_suffix_file_name:
workbook = load_workbook(site_list_name_path, keep_vba=True, data_only=True)
worksheet = workbook.active
print(worksheet['B4'].value)
the print value is #N/A
Any suggestions would be greatly appreciated!!
I'm having troubles writing something that I believe should be relatively easy.
I have a template excel file, that has some visualizations on it with a few spreadsheets. I want to write a scripts that loads the template, inserts an existing dataframe rows to specific cells on each sheet, and saves the new excel file as a new file.
The template already have all the cells designed and the visualization, so i will want to insert this data only without changing the design.
I tried several packages and none of them seemed to work for me.
Thanks for your help! :-)
I have written a package for inserting Pandas DataFrames to Excel sheets (specific rows/cells/columns), it's called pyxcelframe:
https://pypi.org/project/pyxcelframe/
It has very simple and short documentation, and the method you need is insert_frame
So, let's say we have a Pandas DataFrame called df which we have to insert in the Excel file ("MyWorkbook") sheet named "MySheet" from the cell B5, we can just use insert_frame function as follows:
from pyxcelframe import insert_frame
from openpyxl import load_workbook
workbook = load_workbook("MyWorkbook.xlsx")
worksheet = workbook["MySheet"]
insert_frame(worksheet=worksheet,
dataframe=df,
row_range=(5, 0),
col_range=(2, 0))
0 as the value of the second element of row_range or col_range means that there is no ending row or column specified, if you need specific ending row/column you can replace 0 with it.
Sounds like a job for xlwings. You didn't post any test data, but modyfing below to suit your needs should be quite straight-forward.
import xlwings as xw
wb = xw.Book('your_excel_template.xlsx')
wb.sheets['Sheet1'].range('A1').value = df[your_selected_rows]
wb.save('new_file.xlsx')
wb.close()
I'm doing some testing using python-excel modules. I can't seem to find a way to delete a row in an excel sheet using these modules and the internet hasn't offered up a solution. Is there a way to delete a row using one of the python-excel modules?
In my case, I want to open an excel sheet, read the first row, determine if it contains some valid data, if not, then delete it.
Any suggestions are welcome.
xlwt provides as the module name suggests Excel writer (creation rather than modification) funcionality.
xlrd on the other hand provides Excel reader funcionality.
If your source excel file is rather simple (no fancy graphs, pivot tables, etc.), you should proceed this way:
with xlrd module read the contents of the targeted excel file, and then with xlwt module create new excel file which contains the necessary rows.
If you, however are running this on windows platform , you might be able to manipulate Excel directly through Microsoft COM objects, see old book reference.
I was having the same issue but found a walk around:
Use a custom filter process (Reader>Filter1>Filter2>...>Writer) to generate a copy of the source excel file but with a blank column inserted at the front. Let's call this file augmented.xls.
Then, read augmented.xls into a xlrd.Workbook object, rb, using xlrd.open_workbook().
Use xlutils.copy.copy() to convert rb into a xlwt.Workbook object, wb.
Set the value of the first column of each of the to-be-deleted rows as "x" (or other values as a marker) in wb.
Save wb back to augmented.xls.
Use another custom filter process to generate a resulting excel file from augmented.xls by omitting those rows with "x" in the first column and shifting all columns one column left (equivalent to deleting the first column of markers).
Information and examples of defining a filter process can be found in http://www.simplistix.co.uk/presentations/python-excel.pdf
Hope this help in some way.
You can use the library openpyxl. When opening a file it is both for reading and for writing. Then, with a simple function you can achieve that:
from openpyxl import load_workbook
wb = load_workbook(filename)
ws = wb.active()
first_row = ws[1]
# Your code here using first_row
if first_row not valid:
ws.delete_rows(1, amount=1)