I was wondering if openpyxl can read and/or write rich text into excel. I am aware that this question was asked once before in 2012 linked below, but I am not sure if this has changed.
As it stands load_workbook() seems to throw away rich text formatting.
As for a specific problem, I need to open, edit, and save a workbook where some cells have both superscripted and normal text in one cell. When I save the workbook, the format of the first character of the cell is applied to the rest of the cell.
Here is the to 2012 question:
How do I find the formatting for a subset of text in an Excel document cell
After looking around, it seems like rich text was implemented in openpyxl (based on the issues list on openpyxl's bitbucket):
https://bitbucket.org/openpyxl/openpyxl/issues?q=rich+text
But I am still unclear on how to use it (if I interpreted the issues list correctly at all). If it helps at all, I am actually not editing the contents of these cells simply that they don't lose formatting on save.
Any thoughts would be greatly appreciated.
Thanks!
Best
Formatting below the level of the cell is not supported by openpyxl. To use it you'd have to implement your own code when writing as openpyxl just stores whatever strings it receives. Full read/write support would add a great deal of complexity.
Related
I have an interest in utilizing the stock quote feature of excel while also creating the file using the xlsxwriter libray in Python. I am familiar with how to write and format text using xlsxwriter but I do not see any option to create a file with certain cells already set to have a stock data type. To be clear, the link from microsoft below basically summarizes the manual process I'm looking to have taken care of in the excel sheet before an actual user ever opens up the file.
https://support.microsoft.com/en-us/office/get-a-stock-quote-e5af3212-e024-4d4c-bea0-623cf07fbc54
I am open to other python based solutions to this issue if the general consensus is that xlsxwriter doesn't support this feature. I really appreciate any advice here.
I am the author of XlsxWriter. I just looked into this and these aren't regular Excel formulas. They have a lot of of additional metadata and richdata helper files associated with them and even the company names aren't standard string types. So unfortunately these aren't, and probably won't be, supported.
I've been looking for ages to find a suitable module to interact with excel, which needs to do the following:
Check a column of cells for an "incorrect" value and change it
Check for empty cells, and if so, replace it
Check a cell value is consistent with the contents of another cell(for example, if called Datasheet, the code in another cell = DS)and if not, change it.
I've looked at openpxyl but I am running Python 3 and I can only seem to find it working for 2.
I've seen a few others but they seem to be mainly focusing creating a new spreadsheet and simple writing/reading.
The Pandas library is amazing to work with excel files. It can read excel files easily and you then have access to a lot of tools. You can do all the operations you mentionned above. You can also save your result in the excel format
I have a script to format a bunch of data and then push it into excel, where I can easily scrub the broken data, and do a bit more analysis.
As part of this I'm pushing quite a lot of data to excel, and want excel to do some of the legwork, so I'm putting a certain number of formulae into the sheet.
Most of these ("=AVERAGE(...)" "=A1+3" etc) work absolutely fine, but when I add the standard deviation ("=STDEV.P(...)" I get a name error when I open in excel 2013.
If I click in the cell within excel and hit (i.e. don't change anything within the cell), the cell re-calculates without the name error, so I'm a bit confused.
Is there anything extra that needs to be done to get this to work?
Has anyone else had any experience of this?
Thanks,
Will
--
I've investigated further and this is the issue:
When saving the formula "STDEV.P" openpyxl saves it as:
"=_xludf.STDEV.P(...)"
which is correct for many formula, but not this one.
The result should be:
"=_xlfn.STDEV.P(...)"
When I explicitly change the function to the latter, it works as expected.
I'll file a bug report, so hopefully this is done automatically in the future.
I suspect that there might be a subtle difference in what you think you need to write as the formula and what is actually required. openpyxl itself does nothing with the formula, not even check it. You can investigate this by comparing two files (one from openpyxl, one from Excel) with ostensibly the same formula. The difference might be simple – using "." for decimals and "," as a separator between values even if English isn't the language – or it could be that an additional feature is required: Microsoft has continued to extend the specification over the years.
Once you have some pointers please submit a bug report on the openpyxl issue tracker.
I got an excel file that has four sheets. One sheet, sheet 4. contains data in simple CSV and the others read the data of this sheet and make different calculations and graphs. In my python application I would like to open the excel file, open sheet 4, and replace the data. I know you technically can't open and edit excel however you like with Python, due to the complex file structure of XLS (previous relevant answer), but is there a work around for this specific case? Remember the only thing I want to do is to open the data sheet, write to it, and ignore the others...
Note: Previous answers to relevant questions have suggested using the copy function in xlutils. But that doesn't work in this case, as the rest of the sheets are rather complex. The graphs, for example, can't be preserved with the copy function.
I used to use pyExcelerator. It did certainly a good job, but I'm not sure if it is maintained.
https://pypi.python.org/pypi/pyExcelerator/
hth.
Two-headed question here guys,
First, I've been trying to do some searching for a way to read .xlsx files in python. Does xlrd read .xlsx files now? If not, what's the recommended way to read/write to such a file?
Second, I have two files with similar information. One primary field with scoping subfields (like coordinates(the primary field) -> city -> state -> country). In the older file, the information is given an ID number while the newer file (with records deleted/added) does not have these ID's. In python, I'd 1) open the two files 2) check the primary field of the older file against the primary field of the newer file and merge their information to a new file if they match. Given that its not too big of a file, I don't mind the O(n^2) complexity. My question is this: is there a well-defined way to do this in VBA or excel? Everything I think of using excel's library seems too slow and I'm not excellent with VBA.
I frequently access excel files through python and xlrd, python and the Excel COM object. For this job, xlrd won't work because it does not support the xlsx format. But no matter, both approaches are overkill for what you are looking for. Simple Excel formulas will deliver what you want, specifically VLOOKUP.
VLOOKUP "looks for a value in the lefmost column of a table, and then returns a value in the same row from the column you specify".
Some advice on VLOOKUP, First, if you want to match on multiple cells, create a "key" cell which concatenates the cells you are interested in (in both workbooks). Second, make sure to set the last argument to VLOOKUP as FALSE because you will only want exact matches.
Regarding performance, excel formulas are often very fast.
Read the help file on VLOOKUP and ask further questions here.
Late edit (from Mark Baker's answer): There is now a python solution for xlsx. Openpyxl was created this year by Eric Gazoni to read and write Excel's xlsx format.
I only heard about this project this morning, so I've not had an opportunity to look at it, and have no idea what it's like; but take a look at Eric' Gazoni's openpyxl project. The code can be found on bitbucket. The driving force behind this was the ability to read/write xlsx files from Python.
Try http://www.python-excel.org/
My mistake - I missed the .xlsx detail.
I guess it's a question of what's easier: finding or writing a library that handles .xlsx format natively OR save all the Excel spreadsheets as .xls and get on with it with the libraries that merely handle the older format.
Adding on the answer of Steven Rubalski:
You might want to be able to have your lookup value in any other than the leftmost column. In those cases the Index and Match functions come in handy.
See: http://www.mrexcel.com/articles/excel-vlookup-index-match.php