Python Excel parsing data with xlrd - python

Fairly simple; I've got the data I want out of the excel file, but can't seem to find anything inside the XLRD readme that explains how to go from this:
xldate:40397.007905092592
number:10000.0
text:u'No'
number:0.1203
number:0.096000000000000002
number:0.126
to their respective python datatypes. Any ideas?

did you tried the documentation help --> date_function

I had the same issue and used the following as a last resort:
def numobj2fl(p):
return float(str(p).split(":")[1])
for converting the 'number object' to float.

Related

Broken Excel output: Openpyxl formula settings?

I am creating some Excel spreadsheets from pandas DataFrames using the pandas.ExcelWriter().
Issue:
For some string input, this creates broken .xlsx files that need to be repaired. (problem with some content --- removed formula, cf error msg below)
I assume this happens because Excel interprets the cell content not as a string, but a formula which it cannot parse, e.g. when a string value starts with "="
Question:
When using xlsxwriter as engine, I can solve this issue by setting the argument options = {"strings_to_formulas" : False }
Is there a similar argument for openpyxl?
Troubleshooting:
I found the data_only argument to Workbook, but it only seems to apply to reading files / I cannot get it to work with ExcelWriter().
Not all output values are strings / I'd like to avoid converting all output to str
Could not find an applicable question on here
Any hints are much appreciated, thanks!
Error messages:
We found a problem with some content in 'file.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes
The log after opening says:
[...] summary="Following is a list of removed records:">Removed Records: Formula from /xl/worksheets/sheet1.xml part [...]
Code
import pandas
excelout = pandas.ExcelWriter(output_file, engine = "openpyxl")
df.to_excel(excelout)
excelout.save()
Versions:
pandas #0.24.2
openpyxl #2.5.6
Excel 2016 for Mac (but replicates on Win)
I've struggled of this issue too.
I have found a strange solution for formulas.
I had to replace all ; (semicolon) signs with , (comma) in the formulas.
When I opened the result xlsx file with Excel, this error didn't rise and the formula in Excel had usual ;.
I spent FAR too long trying to figure out this error.
Turned out I had an extra bracket, so the formula wasn't valid.
I know 99% of people will read this and say "thats not the issue" and move on, but take your formula and paste it into excel if you can (replacing dynamic values as best you can) and see if excel accepts it.
If it accepts it fine, move on and find whatever the other cause it, but if you find it doesn't like the formula, maybe I just saved you a couple of hours....
My command: f'''=IF(ISBLANK(E{row}),FALSE," "))'''
Tiny command, could not understand what was wrong with it. :facepalm:

What happens exactly in the i/o of json files?

I struggled with the following for a couple of hours yesterday. I figured out a workaround, but I'd like to understand a little more of what's going on in the background and, ideally, I'd like to remove the intermediate file from my code just for the sake of elegance. I'm using python, by the way and files_df starts off as a pandas dataframe.
Can you help me understand why the following code gives me an error.
files_json = files_df.to_json(orient='records')
for file_json in files_json:
print(file_json) #do stuff
But this code works?
files_json = files_df.to_json(orient='records')
with open('export_json.json', 'w') as f:
f.write(files_json)
with open('export_json.json') as data:
files_json = json.load(data)
for file_json in files_json:
print(file_json) #do stuff
Obviously, the export/import is converting the data somehow into a usable format. I would like to understand that a little better and know if there is some option within the pandas files_df.to_json command to perform the same conversion.
json.load is the opposite of json.dump, but you export from pandas data frames into file and than import again with standard library into some sort of python structure.
Try files_df.to_dict

how to read a data file including "pandas.core.frame, numpy.core.multiarray"

I met a DF file which is encoded in binary format. But when I open it using Vim, still I can see characters like "pandas.core.frame", "numpy.core.multiarray". So I guess it is related with Python. However I know little about the Python language. Though I have tried using pandas and numpy modules, I failed to read the file. Could you guys give any suggestion on this issue? Thank you in advance. Here is the Dropbox link to the DF file: https://www.dropbox.com/s/b22lez3xysvzj7q/flux.df
Looks like DataFrame stored with pickle, use read_pickle() to read it:
import pandas as pd
df = pd.read_pickle('flux.df')

Reading scientific numbers in xlrd

Pretty simple question but haven't been able to find a good answer.
In Excel, I am generating files that need to be automatically read. They are read by an ID number, but the format I get is setting it as text. When using xlrd, I get this format:
5.5112E+12
When I need it in this format:
5511195414392
What is the best way to achieve this? I would like to avoid using xlwt but if it is necessary I could use help on getting started in that process too
Give this a shot:
import decimal
decimalNotation = decimal.Decimal(scientificNotationValueFromExcel)
I made the following quick program to test it out. The Excel file it is reading from has a single entry in the first cell.
from xlrd import *
import decimal
workbook = open_workbook('test.xlsx')
sheet = workbook.sheet_by_index(0)
value = sheet.cell_value(0, 0)
print decimal.Decimal(value)
I used the CSV module to figure this out, as it read the cells correctly.

Is there any way to edit an existing Excel file using Python preserving formulae?

I am trying to edit several excel files (.xls) without changing the rest of the sheet. The only thing close so far that I've found is the xlrd, xlwt, and xlutils modules. The problem with these is it seems that xlrd evaluates formulae when reading, then puts the answer as the value of the cell. Does anybody know of a way to preserve the formulae so I can then use xlwt to write to the file without losing them? I have most of my experience in Python and CLISP, but could pick up another language pretty quick if they have better support. Thanks for any help you can give!
I had the same problem... And eventually found the next module:
from openpyxl import load_workbook
def Write_Workbook():
wb = load_workbook(path)
ws = wb.get_sheet_by_name("Sheet_name")
c = ws.cell(row = 2, column = 1)
c.value = Some_value
wb.save(path)
==> Doing this, my file got saved preserving all formulas inserted before.
Hope this helps!
I've used the xlwt.Formula function before to be able to get hyperlinks into a cell. I imagine it will also work with other formulas.
Update: Here's a snippet I found in a project I used it in:
link = xlwt.Formula('HYPERLINK("%s";"View Details")' % url)
sheet.write(row, col, link)
As of now, xlrd doesn't read formulas. It's not that it evaluates them, it simply doesn't read them.
For now, your best bet is to programmatically control a running instance of Excel, either via pywin32 or Visual Basic or VBScript (or some other Microsoft-friendly language which has a COM interface). If you can't run Excel, then you may be able to do something analogous with OpenOffice.org instead.
We've just had this problem and the best we can do is to manually re-write the formulas as text, then convert them to proper formulas on output.
So open Excel and replace =SUM(C5:L5) with "=SUM(C5:L5)" including the quotes. If you have a double quote in your formula, replace it with 2 double quotes, as this will escape it, so = "a" & "b" becomes "= ""a"" & ""b"" ")
Then in your Python code, loop over every cell in the source and output sheets and do:
output_sheet.write(row, col, xlwt.ExcelFormula.Formula(source_cell[1:-1]))
We use this SO answer to make a copy of the source sheet to be the output sheet, which even preserves styles, and avoids overwriting the hand written text formulas from above.

Categories