Fastest way to format a column with openpyxl (Python) - python

I'm using Openpyxl and applying number formatting for a dynamically determined number of columns and rows (based on available data), e.g.
ws.cell(row=i, column=idx + 1).number_format = '_(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(#_)'
It takes a long time to format some of the bigger workbooks.
All I'm trying to accomplish is creating workbooks that treat integers and floats as numbers (either no decimal places or two decimal places), rather than strings, and I want that for all idx columns. I've read that it's possible, presumably related to this: https://openpyxl.readthedocs.io/en/stable/_modules/openpyxl/styles/numbers.html but I'm not sure how to implement this.

If what you are trying to accomplish is just to make Excel treat numbers well, as numbers and not like strings you can try converting them to float in python. This method is almost 50% faster than assigning format to each cell;
ws.cell(row=i, column=idx + 1).value = float(ws.cell(row=i, column=idx + 1).value)

Related

Save float and int to file simultaneously in Python

I have a numpy array with a few columns, containing floats, and I want to add one more columns, containing only zeros and save this to file. For the program I need to use this file for, the last column should appear as 0 instead of 0.00000e+00. I tired this:
z = np.zeros(len(data[:,0])).astype(int)
new_data = np.column_stack((data,z))
np.savetxt("data_new.dat",new_data)
but it doesn't seem to work i.e. the zeros appear as floats. One more thing, how can I also specify the number of decimals that the floats should be saved with to the file? Thank you!

Long numbers conversion format

The conversion of xml to csv file, this is done by some code and the specifications that I have added.
As as result I get a csv file, once I open it I see some weird numbers that look something like this
1,25151E+21
Is there any way to eliminate this and show the whole numbers. The code itself that parses xml to csv is working fine so I’m assuming it is an excel thing.
I don’t want to go and do something manually every time I am generating a new csv file
Additional
The entire code can be found HERE and I have only long numbers in Quality
for qu in sn.findall('.//Qualify'):
repeated_values['qualify'] = qu.text
CSV doesn't pass any cell formatting rules to Excel. Hence if you open a CSV that has very large numbers in it, the default cell formatting will likely be Scientific. You can try changing the cell formatting to Number and if that changes the view to the entire number like you want, consider using the Xlsxwriter to apply cell formatting to the document while writing to Xlsx instead of CSV.
I often end up running a lambda on dataframes with this issue when I bring in csv, fwf, etc, for ETL and back out to XLSX. In my case they are all account numbers, so it's pretty bad when Excel helpfully overrides it to scientific notation.
If you don't mind the long number being a string, you can do this:
# First I force it to be an int column as I import everything as objects for unrelated reasons
df.thatlongnumber = df.thatlongnumber.astype(np.int64)
# Then I convert that to a string
df.thatlongnumber.apply(lambda x: '{:d}'.format(x))
Let me know if this is useful at all.
Scientific notation is a pain, what I've used before to handle situations like this is to cast it into a float and then use a format specifier, something like this should work:
a = "1,25151E+21"
print(f"{float(a.replace(',', '.')):.0f}")
>>> 1251510000000000065536

Subtract 2 dataframe columns and get the result without weird rounding (floating point arithmetic)

I have 2 Pandas dataframes, with thousands of values. I load them from a csv file with Pandas' read_csv function.
I need to subtract a column ("open") of the second one from a column of the first, and i do it like this:
subtraction = shiftedDataset.open - dataset.open
And i get a series with the results.
The problem is the results come with the weird rounding that comes from the floating point arithmetic.
(e.g. a value that should be 0.00003 is -2.999999999997449e-05)
How can i get the correct results? I can manipulate the dataframe before the subtraction or the values after the subtraction, i don't care, but i need to get the best performance possible.
This is scientific notation, and is probably more accurate if you want to do more calculations. If this is purely for display purposes, look at this post
Example:
v = -2.999999999997449e-05
print('%f' % v)
>>> '-0.000030'
Some are for formatting the output (tuning your value into a string, might not be what you want), but there's also a pandas setting you can use (also on the same post, scroll down a bit).

How can I read every field as string in xlwings?

I have an exelfile that I want to convert but the default type for numbers is float. How can I change it so xlwings explicitly uses strings and not numbers?
This is how I read the value of a field:
xw.Range(sheet, fieldname ).value
The problem is that numbers like 40 get converted to 40.0 if I create a string from that. I strip it with: str(xw.Range(sheetFronius, fieldname ).value).rstrip('0').rstrip('.') but that is not very helpful and leads to errors because sometimes the same field can contain both a number and a string. (Not at the same time, the value is chosen from a list)
With xlwings if no options are set during reading/writing operations single cells are read in as 'floats'. Also, by default cells with numbers are read as 'floats'. I scoured the docs, but don't think you can convert a cell that has numbers to a 'string' via xlwings outright. Fortunately all is not lost...
You could read in the cells as 'int' with xlwings and then convert the 'int' to 'string' in Python. The way to do that is as follows:
xw.Range(sheet, fieldname).options(numbers=int).value
And finally, you can read in your data this way (by packing the string conversion into the options upfront):
xw.Range(sheet, fieldname).options(numbers=lambda x: str(int(x))).value
Then you would just convert that to string in Python in the usual way.
Good luck!
In my case conclusion was, just adding one row to the last row of raw data.
Write any text in the column you want to change to str, save, load, and then delete the last line.

Converting long integers to strings in pandas (to avoid scientific notation)

I want the following records (currently displaying as 3.200000e+18 but actually (hopefully) each a different long integer), created using pd.read_excel(), to be interpreted differently:
ipdb> self.after['class_parent_ref']
class_id
3200000000000515954 3.200000e+18
3200000000000515951 NaN
3200000000000515952 NaN
3200000000000515953 NaN
3200000000000515955 3.200000e+18
3200000000000515956 3.200000e+18
Name: class_parent_ref, dtype: float64
Currently, they seem to 'come out' as scientifically notated strings:
ipdb> self.after['class_parent_ref'].iloc[0]
3.2000000000005161e+18
Worse, though, it's not clear to me that the number has been read correctly from my .xlsx file:
ipdb> self.after['class_parent_ref'].iloc[0] -3.2e+18
516096.0
The number in Excel (the data source) is 3200000000000515952.
This is not about the display, which I know I can change here. It's about keeping the underlying data in the same form it was in when read (so that if/when I write it back to Excel, it'll look the same and so that if I use the data, it'll look like it did in Excel and not Xe+Y). I would definitely accept a string if I could count on it being a string representation of the correct number.
You may notice that the number I want to see is in fact (incidentally) one of the labels. Pandas correctly read those in as strings (perhaps because Excel treated them as strings?) unlike this number which I entered. (Actually though, even when I enter ="3200000000000515952" into the cell in question before redoing the read, I get the same result described above.)
How can I get 3200000000000515952 out of the dataframe? I'm wondering if pandas has a limitation with long integers, but the only thing I've found on it is 1) a little dated, and 2) doesn't look like the same thing I'm facing.
Thank you!
Convert your column values with NaN into 0 then typcast that column as integer to do so.
df[['class_parent_ref']] = df[['class_parent_ref']].fillna(value = 0)
df['class_parent_ref'] = df['class_parent_ref'].astype(int)
Or in reading your file, specify keep_default_na = False for pd.read_excel() and na_filter = False for pd.read_csv()

Categories