I have a script to format a bunch of data and then push it into excel, where I can easily scrub the broken data, and do a bit more analysis.
As part of this I'm pushing quite a lot of data to excel, and want excel to do some of the legwork, so I'm putting a certain number of formulae into the sheet.
Most of these ("=AVERAGE(...)" "=A1+3" etc) work absolutely fine, but when I add the standard deviation ("=STDEV.P(...)" I get a name error when I open in excel 2013.
If I click in the cell within excel and hit (i.e. don't change anything within the cell), the cell re-calculates without the name error, so I'm a bit confused.
Is there anything extra that needs to be done to get this to work?
Has anyone else had any experience of this?
Thanks,
Will
--
I've investigated further and this is the issue:
When saving the formula "STDEV.P" openpyxl saves it as:
"=_xludf.STDEV.P(...)"
which is correct for many formula, but not this one.
The result should be:
"=_xlfn.STDEV.P(...)"
When I explicitly change the function to the latter, it works as expected.
I'll file a bug report, so hopefully this is done automatically in the future.
I suspect that there might be a subtle difference in what you think you need to write as the formula and what is actually required. openpyxl itself does nothing with the formula, not even check it. You can investigate this by comparing two files (one from openpyxl, one from Excel) with ostensibly the same formula. The difference might be simple – using "." for decimals and "," as a separator between values even if English isn't the language – or it could be that an additional feature is required: Microsoft has continued to extend the specification over the years.
Once you have some pointers please submit a bug report on the openpyxl issue tracker.
Related
I have a ROM estimator tool I've made that takes data which is then output to an excel file with openpyxl. The problem I keep running into is when I output and XLOOKUP function to my excel file and go to open it, excel throws an error on the sheet where the formula is placed. When allowing excel to fix it, excel deletes the formula and I havent been able to find a work around.
I at first thought that the function was failing because of the order in which it was placed into the excel file. The XLOOKUP function populates a column of cells on the first page of the workbook. I attempted to use different functions that more of less did the same thing, but no matter the function I got the same "excel found a problem with one or more formula references in this worksheet". I also attempted to add the formula after the rest of the workbook was populated with information. If it was a data error maybe the data it was trying to access didn't exist yet. So I put the formula at the very end of my code, once everything else had been populated and I still got the same error message.
What doesn't make sense is that if I decide not populate the cell with the formula that is causing the issue using openpyxl and instead manually input the formula once the file is open and the rest of the worksheet is populated, it works totally fine. It just seems that when I input the formula i am unable to open the document without excel removing the formula to fix it.
Let me know if anyone has anything I could try to fix this issue. Right now i input the formula using openpyxl and simple remove the "=", adding it once I open it
I've been looking for ages to find a suitable module to interact with excel, which needs to do the following:
Check a column of cells for an "incorrect" value and change it
Check for empty cells, and if so, replace it
Check a cell value is consistent with the contents of another cell(for example, if called Datasheet, the code in another cell = DS)and if not, change it.
I've looked at openpxyl but I am running Python 3 and I can only seem to find it working for 2.
I've seen a few others but they seem to be mainly focusing creating a new spreadsheet and simple writing/reading.
The Pandas library is amazing to work with excel files. It can read excel files easily and you then have access to a lot of tools. You can do all the operations you mentionned above. You can also save your result in the excel format
I was wondering if openpyxl can read and/or write rich text into excel. I am aware that this question was asked once before in 2012 linked below, but I am not sure if this has changed.
As it stands load_workbook() seems to throw away rich text formatting.
As for a specific problem, I need to open, edit, and save a workbook where some cells have both superscripted and normal text in one cell. When I save the workbook, the format of the first character of the cell is applied to the rest of the cell.
Here is the to 2012 question:
How do I find the formatting for a subset of text in an Excel document cell
After looking around, it seems like rich text was implemented in openpyxl (based on the issues list on openpyxl's bitbucket):
https://bitbucket.org/openpyxl/openpyxl/issues?q=rich+text
But I am still unclear on how to use it (if I interpreted the issues list correctly at all). If it helps at all, I am actually not editing the contents of these cells simply that they don't lose formatting on save.
Any thoughts would be greatly appreciated.
Thanks!
Best
Formatting below the level of the cell is not supported by openpyxl. To use it you'd have to implement your own code when writing as openpyxl just stores whatever strings it receives. Full read/write support would add a great deal of complexity.
I'm going through the book ThinkStats. http://greenteapress.com/thinkstats/nsfg_data.html
I'd prefer to work with pandas because I'd like to strengthen my skill in that, but I'm having a hard time making out how to open this file.
http://greenteapress.com/thinkstats/nsfg_data.html
The usual pd.read_csv(filename) does not seem to work.
I'm also reading the code provided with the book, but it's a bit difficult to make out for me.
The pandas read_csv function will not work on this data set without some thinking about the data set itself. Indeed, it is neither a comma separated value nor a space separated format.
Instead, it is a kind of home-made format where the number of fields per line is not contant, which is another issue. Besides, number of spaces between values is not constant, which is another issue.
In order to better understand the format of the data files, I would recommend you get the code from the author. The link is provided in the book but it is here http://greenteapress.com/thinkstats/ and to play with the code provided to figure out the format being used
Provided you have the data file, you can use the survey module
import survey
preg = survey.Pregancies()
pre.ReadRecors(".")
Using the python library openpyxl I am reading an XLSX file created in excel 2007. it is empty apart from cell A1 which is coloured yellow and has the value "test" written in it. I can easily retrieve the value from that cell, however when I attempt to determine the fill colour I get the following results:
this_sheet.cell("A1").style.fill.start_color
returns "FFFFFF"
this_sheet.cell("A1").style.fill.end_color
returns "FF0000"
Testing this on other blank cells I get exactly the same results, and trying to retrieve the font style information keeps returning calibri size 11 (our system default).
Am I going about this all wrong? Is there an alternative method i should be using?
Any help would be greatly appreciated.
Thanks!
Openpyxl is still in development, and styles are not yet completely implemented, thus you can encounter some issues here and there. Don't hesitate to open an issue on the project bug tracker if you want.