Can a pycel object be saved as an Excel Workbook - python

After parsing Excel file to Python and evaluating the workbook using pycel, can the pycel object be saved as an Excel file maintaining all original formatting, etc? I.e. only values need to be updated.

TL;DR
No, you cannot save a pycel object back into Excel.
Why not?
The basic problem is that pycel is based on openpyxl. Openpyxl is used to read (and write if needed) Excel spreadsheets. However, while openpyxl has the computed values available for formula cells for a workbook it read in, it does not really allow those computed values to be saved back into a workbook it writes. It doesn't really make sense to save a different computed value for a formula cell, since the cell's value will be recomputed once it is opened back up in Excel.
While it is true that pycel has the information available to properly populate a new value when the workbook is written, it evidently is not a use case that was important to the openpyxl authors or contributors.
Please note that the openpyxl maintainers gladly took pull requests to make it run better with pycel. It seems likely they would be open to discussing a PR for writing values into workbooks.

Related

How to programmatically copy excel worksheet sheet to a new one

I have created a python script that parses some data and uses it to generate a few worksheet excel files (xlsx) (I am not much knowledgeable in excel by the way).
The original worksheet i am reading the data from has a main sheet used to fill in lots of info which is then distributed across many other sheets, which perform some formulas to calculate some results.
My python script does the processing i want by automatically filling in this main sheet, which then produces the results of the other sheets. I then save the worksheet and eveything looks fine.
Now, I want to split those worksheets into individual ones without including the main sheet in any of them. This appears to be pretty challenging.
I first tried using the data_only argument of load_workbook, but I quickly discovered that in order to preserve the values and not the formulas in the spreadsheet(or just get None back), I'd have to manually open and save each one of the files so that a temporary cache is created. That won't really do it in my case since i want the whole thing to be automated.
Among other things, the one that came closer is this piece of code:
workbook = xw.Book('generatedFiles/generated.xlsx')
sheet1 = workbook.sheets['PRIM'].used_range.value
df = pd.DataFrame(sheet1)
df.to_excel('generatedFiles/fixed-generated.xlsx', index = False, header = False)
This does indeed generate a spreadsheet with the values, using its pandas dataframe, but the problem is that it doesn't preserve any information about the types of the values. So for example, an integer is being treated as a string when saved.
This messes the processing being done by an external parser that I feed these files with.
Any ideas on how to fix that ? What would be the best way to go about doing something like that ?
Thanks !

Saving XlsxWriter workbook more than once

I am writing software that manipulates Excel sheets. So far, I've been using xlrd and xlwt to do so, and everything works pretty well.
It opens a sheet (xlrd) and copies select columns to a new workbook (xlwt)
It then opens the newly created workbook to read data (xlrd) and does some math and formatting with the data (which couldn't be done if the file isn't saved once) - (xlwt saves once again)
However, I am now willing to add charts in my documents, and this function is not supported by xlwt. I have found that xlsxwriter does, but this adds other complications to my code: xlsxwriter only has xlsxwriter.close(), which saves AND closes the document.
Does anyone know if there's any workaround for this? Whenever I use xlsxwriter.close(), my workbook object containing the document I'm writing isn't usable anymore.
Fundamentally, there is no reason you need to read twice and save twice. For your current (no charts) process, you can just read the data you need using xlrd; then do all your processing; and write once with xlwt.
Following this workflow, it is a relatively simple matter to replace xlwt with XlsxWriter.

How to parse cell format data from a .xlsx file using xlrd

I am relatively new to python and I am trying to read information from an excel sheet to generate a graph. So far I am using the most current version of the xlrd library (0.9.4) in a nested for loop to grab the value from each cell. However, I am unsure how to access the formatting information for each cell
For example, if a cell were formatted to display as currency in the excel file, using the standard sheet.cell(row, column).value from xlrd would only return 5.0 instead of $5.00
I found here that you can set the formatting_info parameter to true when opening the workbook in order to see some of the format information, however I am primarily using excel 2013 and my excel sheets are being saved by default as .xlsx files. According to this issue on GitHub, support for formatting_info has not yet been implemented for .xlsx files.
Is there any way around using the formatting_info flag, or any other way that I can detect when a format, currency specifically, has been used in order to reflect that in my graphs? I am aware that it is possible to convert .xlsx files to .xls files such as shown here, but I am concerned about information/formatting loss.

How to save in openpyxl without losing formulae?

Because I need to parse and then use the actual data in cells, I open an xlsm in openpyxl with data_only = True.
This has proved very useful. Now though, having the same need for an xlsm that contains formuale in cells, when I then save my changes, the formulae are missing from the saved version.
Are data_only = True and formulae mutually exclusive? If not, how can I access the actual value in cells without losing the formulae when I save?
When I say I lose the formulae, it seems that the results of the formulae (sums, concatenattions etc.) get preserved. But the actual formulaes themselves are no longer displayed when a cell is clicked.
UPDATE:
To confirm whether or not the formulaes were being preserved or not, I've re-opened the saved xlsm, this time with data_only left as False. I've checked the value of a cell that had been constructed using a formula. Had formulae been preserved, opening the xlsm with data_only set to False should have return the formula. But it returns the actual text value (which is not what I want).
If you want to preserve the integrity of the workbook, ie. retain the formulae, the you cannot use data_only=True. The documentation makes this very clear.
Part of your question was: Are data_only = True and formulae mutually exclusive?
The answer to that, in openpyxl, is yes.
But this is not intrinsic to Excel. You could have a library like openpyxl which gives you access to both the formulas and their results. This is unlikely to happen, since the maintainer(s) of openpyxl are philosophically opposed to this idea.
So, how you're expected to handle your kind of situation in openpyxl is to load the workbook twice: once with data_only=True just to read the data (which you keep in memory), then load it again as a "different" workbook with data_only=False to get a writable version.
The "canonical" way of modifying an existing workbook with Python while preserving everything else (including formatting, formulas, charts, macros, etc.) is to use a COM interface (such as PyWin32, or higher-level wrappers like pywinauto or xlwings) to control a running instance of Excel. Of course, this is only possible if you are running on a machine with Excel installed.

How do I write to one sheet in an already existing excel sheet in Python?

I got an excel file that has four sheets. One sheet, sheet 4. contains data in simple CSV and the others read the data of this sheet and make different calculations and graphs. In my python application I would like to open the excel file, open sheet 4, and replace the data. I know you technically can't open and edit excel however you like with Python, due to the complex file structure of XLS (previous relevant answer), but is there a work around for this specific case? Remember the only thing I want to do is to open the data sheet, write to it, and ignore the others...
Note: Previous answers to relevant questions have suggested using the copy function in xlutils. But that doesn't work in this case, as the rest of the sheets are rather complex. The graphs, for example, can't be preserved with the copy function.
I used to use pyExcelerator. It did certainly a good job, but I'm not sure if it is maintained.
https://pypi.python.org/pypi/pyExcelerator/
hth.

Categories