When writing to csv's before using Pandas, I would often use the following format for percentages:
'%0.2f%%' % (x * 100)
This would be processed by Excel correctly when loading the csv.
Now, I'm trying to use Pandas' to_excel function and using
(simulated * 100.).to_excel(writer, 'Simulated', float_format='%0.2f%%')
and getting a "ValueError: invalid literal for float(): 0.0126%". Without the '%%' it writes fine but is not formatted as percent.
Is there a way to write percentages in Pandas' to_excel?
This question is all pretty old at this point. For better solutions check out xlsxwriter working with pandas.
You can do the following workaround in order to accomplish this:
df *= 100
df = pandas.DataFrame(df, dtype=str)
df += '%'
ew = pandas.ExcelWriter('test.xlsx')
df.to_excel(ew)
ew.save()
This is the solution I arrived at using pandas with OpenPyXL v2.2, and ensuring cells contain numbers at the end, and not strings. Keep values as floats, apply format at the end cell by cell (warning: not efficient):
xlsx = pd.ExcelWriter(output_path)
df.to_excel(xlsx, "Sheet 1")
sheet = xlsx.book.worksheets[0]
for col in sheet.columns[1:sheet.max_column]:
for cell in col[1:sheet.max_row]:
cell.number_format = '0.00%'
cell.value /= 100 #if your data is already in percentages, has to be fractions
xlsx.save()
See OpenPyXL documentation for more number formats.
Interestingly enough, the docos suggest that OpenPyXL is smart enough to guess percentages from string formatted as "1.23%", but this doesn't happen for me. I found code in Pandas' _Openpyxl1Writer that uses "set_value_explicit" on strings, but nothing of the like for other versions. Worth further investigation if somebody wants to get to the bottom of this.
The XlsxWriter docs have a helpful example of how to achieve this:
https://xlsxwriter.readthedocs.io/example_pandas_percentage.html
Here's the gist:
writer = pd.ExcelWriter('pandas_percent.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
percent_format = writer.book.add_format({'num_format': '0%'})
# Now apply the number format to the column with index 2.
writer.sheets['Sheet1'].set_column(2, 2, None, percent_format)
writer.save()
Note 1: The column you want to format as a percent must be a ratio float (i.e. do not multiply it by 100).
Note 2: The parameter in the set_column() call that is set to None is the column width. If you want to automatically fit the column width check out this post:
https://stackoverflow.com/a/61617835/13261722.
Note 3: If you want more on the set_column() function you can check out the docs:
https://xlsxwriter.readthedocs.io/worksheet.html?highlight=set_column#set_column
Related
I am trying to round the numbers of a data frame and put it into a table then save it as a jpeg so I can text it out daily as a leaderboard. I am able to accomplish everything but when I create by table in style.background_gradient() it adds a lot of 0's.
I usually have been using the round(table,0) function but it doesn't work on this particular table type. Any suggestions would be appreciated! This is the data frame below pre style.
Once I add the following code it turns it to this
styled = merged.round(2).style.background_gradient()
I would love to get rid of the zero's if possible.
This worked for me:
merged.style.set_precision(2).background_gradient(cmap = 'Blues')
If there are Nan in a column the dtype is float and the notebook will display the data with a comma (also if the first decimal is zero). The solution I suggest is to transform the dtype of these columns in 'Int32' or 'Int32' (int raise an error)
for col in data.columns:
if data[col].dtype == 'float64':
data[col].astype('Int32')
I am trying to read a google sheet using python using the gspread library.
The initial authentication settings is done and I am able to read the respective sheet.
However when I do
sheet.get_all_records()
The column containing numeric like values (eg. 0001,0002,1000) are converted as numeric field. That is the leading zeroes are truncated. How to prevent this from happening?
You can prevent gspread from casting values to int passing the numericise_ignore parameter to the get_all_records() method.
You can disable it for a specific list of indices in the row:
# Disable casting for columns 1, 2 and 4 (1 indexed):
sheet.get_all_records(numericise_ignore=[1, 2, 4])
Or, disable it for the whole row values with numericise_ignore set to 'all' :
sheet.get_all_records(numericise_ignore=['all'])
How about this answer? In this answer, as one of several workarounds, get_all_values() is used instead of get_all_records(). After the values are retrieved, the array is converted to the list. Please think of this as just one of several answers.
Sample script:
values = worksheet.get_all_values()
head = values.pop(0)
result = [{head[i]: col for i, col in enumerate(row)} for row in values]
Reference:
get_all_values()
If this was not the direction you want, I apologize.
I would like to export a dataframe to Excel as xls and show numbers with a 1000 separator and 2 decimal places and percentages as % with 2 decimals, e.g. 54356 as 54,356.00 and 0.0345 as 3.45%
When I change the format in python using .map() and .format it displays them correctly in python, but it turns them into a string and when I export them to xls Excel does not recognize them as numbers/percentages.
import pandas as pd
d = {'Percent': [0.01, 0.0345], 'Number': [54464.43, 54356]}
df = pd.DataFrame(data=d)
df['Percent'] = pd.Series(["{0:.2f}%".format(val * 100) for val in df['Percent']], index = df.index)
df['Number'] = df['Number'].map('{:,.2f}'.format)
The data frame looks as expected, but the type of the cells is now str and if I export it to xls (df.to_excel('file.xls')), Excel shows the "The number in this cell is formatted as text or preceded by an apostrophe" warning.
You can edit the style property when you display the DataFrame, but the underlying data is not touched
df.style.format({
'Number': '{0:.2f}'.format,
'Percent': '{0:.2%}'.format,
})
see: pandas style guide
If I understand your question correctly, your goal is the way how the actual cells in the output excel file are formatted. To do this, you may want to focus on the actual file rather than formatting within pandas DataFrame (i.e. your data is correct, the problem is the way they're displayed in the output).
You may want to try something like this.
maybe lambda would be more helpful
df['Percent'] = df['Percent'].apply(lambda x: round(x*100, 2))
df['Number'] = df['Number'].apply(lambda x: round(x, 2))
For df["Number"] : df.Number = df.Number.apply(lambda x : '{0:,.2f}'.format(x)) will work
For df["Percent"] : first multiply the series by 100. df["Percent"] = df.Percent.multiply(100)
Then use df.Percent = df.Percent.apply(lambda x: "{0:.2f}%".format(x))
Hope this helps
In the excel sheet , i have two columns with large numbers.
But when i read the excel file with read_excel() and display the dataframe,
those two columns are printed in scientific format with exponential.
How can get rid of this format?
Thanks
Output in Pandas
The way scientific notation is applied is controled via pandas' display options:
pd.set_option('display.float_format', '{:.2f}'.format)
df = pd.DataFrame({'Traded Value':[67867869890077.96,78973434444543.44],
'Deals':[789797, 789878]})
print(df)
Traded Value Deals
0 67867869890077.96 789797
1 78973434444543.44 789878
If this is simply for presentational purposes, you may convert your
data to strings while formatting them on a column-by-column basis:
df = pd.DataFrame({'Traded Value':[67867869890077.96,78973434444543.44],
'Deals':[789797, 789878]})
df
Deals Traded Value
0 789797 6.786787e+13
1 789878 7.897343e+13
df['Deals'] = df['Deals'].apply(lambda x: '{:d}'.format(x))
df['Traded Value'] = df['Traded Value'].apply(lambda x: '{:.2f}'.format(x))
df
Deals Traded Value
0 789797 67867869890077.96
1 789878 78973434444543.44
An alternative more straightforward method would to put the following line at the top of your code that would format floats only:
pd.options.display.float_format = '{:.2f}'.format
try '{:.0f}' with Sergeys, worked for me.
The following evaluates to true:
'{:.2f}'.format(2.0) > '{:.2f}'.format(10.0)
I want to write a number to excel with two decimals (2 as 2.00) using dataframe.to_excel, but it gets displayed as text instead of a number in a cell.
Update:
import pandas as pd
data_table = [[1.0,2.0,3.0],[4.0,5.0,6.0]]
dataframe = pd.DataFrame(data_table)
writer = pd.ExcelWriter(fileName, engine='xlsxwriter')
dataframe.to_excel(writer,float_format='%11.2f', index=False)
writer.save()
It still shows 6 instead of 6.00. Maybe excel strips the 0 when I open it
1 2 3
4 5 6
to_excel has a float_format option.
Use int() on your result.
Why String.format Returns a string? It doesn't check if it's a number, and it should be applied on strings only.
EDIT:
Of course, you want that formatting visible in the resulting Excel file.
Use:
Dataframe.to_excel(excel_writer, float_format=float_format, **kwargs)
whereas **kwargs are your other arguments and float_format is your Format.
This will Format the numbers correctly in your Excel, if you save them as a number