How to round numbers in python a style.background_gradient() - python

I am trying to round the numbers of a data frame and put it into a table then save it as a jpeg so I can text it out daily as a leaderboard. I am able to accomplish everything but when I create by table in style.background_gradient() it adds a lot of 0's.
I usually have been using the round(table,0) function but it doesn't work on this particular table type. Any suggestions would be appreciated! This is the data frame below pre style.
Once I add the following code it turns it to this
styled = merged.round(2).style.background_gradient()
I would love to get rid of the zero's if possible.

This worked for me:
merged.style.set_precision(2).background_gradient(cmap = 'Blues')

If there are Nan in a column the dtype is float and the notebook will display the data with a comma (also if the first decimal is zero). The solution I suggest is to transform the dtype of these columns in 'Int32' or 'Int32' (int raise an error)
for col in data.columns:
if data[col].dtype == 'float64':
data[col].astype('Int32')

Related

Python iloc slice range from dictionary value

I am trying to use a dictionary value to define the slice ranges for the iloc function but I keep getting the error -- Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] . The excel sheet is built for visual information and not in any kind of real table format (not mine so I can’t change it) so I have to slice the specific ranges without column labels.
tried code - got the error
cr_dict= {'AA':'[42:43,32:65]', 'BB':'[33:34, 32:65]'}
df = my_df.iloc[cr_dict['AA']]
the results I want would be similar to
df = my_df.iloc[42:43,32:65]
I know I could change the dictionary and use the following but it looks convoluted and not as easy to read– is there a better way?
Code
cr_dict= {'AA':[42,43,32,65], 'BB':'[33,34, 32,65]'}
df = my_df.iloc[cr_dict['AA'][0]: cr_dict['AA'][0], cr_dict['AA'][0]: cr_dict['AA'][0]]
Define your dictionaries slightly differently.
cr_dict= {'AA':[42,43]+list(range(32,65)),
'BB':[33,34]+list(range(32,65))}
Then you can slice your DataFrame like so:
>>> my_df.iloc[cr_dict["AA"], cr_dict["BB"]].sort_index()

Remove scientific notation floats in a dataframe

I am receiving different series from a source. Some of those series have the values in big numbers (X billions). I then combine all the series to a dataframe with individual columns for each series.
Now, when I print the dataframe, the big numbers in the series are showed in scientific notation. Even printing the series individually shows the numbers in scientific notation.
Dataframe df (multiindex) output is:
Values
Item Sub
A 1 1.396567e+12
B 1 2.868929e+12
I have tried this:
pd.set_option('display.float_format', lambda x: '%,.2f' % x)
This doesn't work as:
it converts everywhere. I only need the conversion in that specific dataframe.
it tries to convert all kinds of floats, and not just those in scientific. So, even if the float is 89.142, it will try to convert the format and as there's no digit to put ',' it shows an error.
Then I tried these:
df.round(2)
This only converted numeric floats to 2 decimals from existing 3 decimals. Didn't do anything to scientific values.
Then I tried:
df.astypes(floats)
Doesn't do anything visible. Output stayed the same.
How else can we change the scientific notation to normal float digits inside the dataframe. I do not want to create a new list with the converted values. The dataframe itself should show the values in normal terms.
Can you guys please help me find a solution for this?
Thank you.
try df['column'] = df['column'].astype(str) . if does not work you should change type of numbers to string before create pandas dataframe from your data
I would suggest keeping everything in a float type and adjust the display setting.
For example, I have generated a df with some random numbers.
df = pd.DataFrame({"Item": ["A", "B"], "Sub": [1,1],
"Value": [float(31132314122123.1), float(324231235232315.1)]})
# Item Sub Value
#0 A 1 3.113231e+13
#1 B 1 3.242312e+14
If we print(df), we can see that the Sub values are ints and the Value values are floats.
Item object
Sub int64
Value float64
dtype: object
You can then call pd.options.display.float_format = '{:.1f}'.format to suppress the scientific notation of the floats, while retaining the float format.
# Item Sub Value
#0 A 1 31132314122123.1
#1 B 1 324231235232315.1
Item object
Sub int64
Value float64
dtype: object
If you want the scientific notation back, you can call pd.reset_option('display.float_format')
Okay. I found something called option_context for pandas that allows to change the display options just for the particular case / action using a with statement.
with pd.option_context('display.float_format',{:.2f}.format):
print(df)
So, we do not have to reset the options again as well as the options stay default for all other data in the file.
Sadly though, I could find no way to store different columns in different float format (for example one column with currency - comma separated and 2 decimals, while next column in percentage - non-comma and 2 decimals.)

gspread - Getting values as string from numeric like column

I am trying to read a google sheet using python using the gspread library.
The initial authentication settings is done and I am able to read the respective sheet.
However when I do
sheet.get_all_records()
The column containing numeric like values (eg. 0001,0002,1000) are converted as numeric field. That is the leading zeroes are truncated. How to prevent this from happening?
You can prevent gspread from casting values to int passing the numericise_ignore parameter to the get_all_records() method.
You can disable it for a specific list of indices in the row:
# Disable casting for columns 1, 2 and 4 (1 indexed):
sheet.get_all_records(numericise_ignore=[1, 2, 4])
Or, disable it for the whole row values with numericise_ignore set to 'all' :
sheet.get_all_records(numericise_ignore=['all'])
How about this answer? In this answer, as one of several workarounds, get_all_values() is used instead of get_all_records(). After the values are retrieved, the array is converted to the list. Please think of this as just one of several answers.
Sample script:
values = worksheet.get_all_values()
head = values.pop(0)
result = [{head[i]: col for i, col in enumerate(row)} for row in values]
Reference:
get_all_values()
If this was not the direction you want, I apologize.

Pandas adding decimal points when using read_csv

I'm working with some csv files and using pandas to turn them into a dataframe. After that, I use an input to find values to delete
I'm hung up on one small issue: for some columns it's adding ".o" to the values in the column. It only does this in columns with numbers, so I'm guessing it's reading the column as a float. How do I prevent this from happening?
The part that really confuses me is that it only happens in a few columns, so I can't quite figure out a pattern. I need to chop off the ".0" so I can re-import it, and I feel like it would be easiest to prevent it from happening in the first place.
Thanks!
Here's a sample of my code:
clientid = int(input('What client ID needs to be deleted?'))
df1 = pd.read_csv('Client.csv')
clientclean = df1.loc[df1['PersonalID'] != clientid]
clientclean.to_csv('Client.csv', index=None)
Ideally, I'd like all of the values to be the same as the original csv file, but without the rows with the clientid from the user input.
The part that really confuses me is that it only happens in a few columns, so I can't quite figure out a pattern. I need to chop off the ".0" so I can re-import it, and I feel like it would be easiest to prevent it from happening in the first place.
Thanks!
If PersonalID if the header of the problematic column, try this:
df1 = pd.read_csv('Client.csv', dtype={'PersonalID':np.int32})
Edit:
As there are no NaN value for integer.
You can try this on each problematic colums:
df1[col] = df1[col].fillna(-9999) # or 0 or any value you want here
df1[col] = df1[col].astype(int)
You could go through each value, and if it is a number x, subtract int(x) from it, and if this difference is not 0.0, convert the number x to int(x). Or, if you're not dealing with any non-integers, you could just convert all values that are numbers to ints.
For an example of the latter (when your original data does not contain any non-integer numbers):
for index, row in df1.iterrows():
for c, x in enumerate(row):
if isinstance(x, float):
df1.iloc[index,c] = int(x)
For an example of the former (if you want to keep non-integer numbers as non-integer numbers, but want to guarantee that integer numbers stay as integers):
import numbers
import sys
for c, col in enumerate(df1.columns):
foundNonInt = False
for r, index in enumerate(df1.index):
if isinstance(x, float):
if (x - int(x) > sys.float_info.epsilon):
foundNonInt = True
break
if (foundNonInt==False):
df1.iloc[:,c] = int(df1.iloc[:,c])
else:
Note, the above method is not fool-proof: if by chance, a non-integer number column from the original data set contains non-integers that are all x.0000000, all the way to the last decimal place, this will fail.
It was a datatype issue.
ALollz's comment lead me in the right direction. Pandas was assuming a data type of float, which added the decimal points.
I specified the datatype as object (from Akarius's comment) when using read_csv, which resolved the issue.

Writing Percentages in Excel Using Pandas

When writing to csv's before using Pandas, I would often use the following format for percentages:
'%0.2f%%' % (x * 100)
This would be processed by Excel correctly when loading the csv.
Now, I'm trying to use Pandas' to_excel function and using
(simulated * 100.).to_excel(writer, 'Simulated', float_format='%0.2f%%')
and getting a "ValueError: invalid literal for float(): 0.0126%". Without the '%%' it writes fine but is not formatted as percent.
Is there a way to write percentages in Pandas' to_excel?
This question is all pretty old at this point. For better solutions check out xlsxwriter working with pandas.
You can do the following workaround in order to accomplish this:
df *= 100
df = pandas.DataFrame(df, dtype=str)
df += '%'
ew = pandas.ExcelWriter('test.xlsx')
df.to_excel(ew)
ew.save()
This is the solution I arrived at using pandas with OpenPyXL v2.2, and ensuring cells contain numbers at the end, and not strings. Keep values as floats, apply format at the end cell by cell (warning: not efficient):
xlsx = pd.ExcelWriter(output_path)
df.to_excel(xlsx, "Sheet 1")
sheet = xlsx.book.worksheets[0]
for col in sheet.columns[1:sheet.max_column]:
for cell in col[1:sheet.max_row]:
cell.number_format = '0.00%'
cell.value /= 100 #if your data is already in percentages, has to be fractions
xlsx.save()
See OpenPyXL documentation for more number formats.
Interestingly enough, the docos suggest that OpenPyXL is smart enough to guess percentages from string formatted as "1.23%", but this doesn't happen for me. I found code in Pandas' _Openpyxl1Writer that uses "set_value_explicit" on strings, but nothing of the like for other versions. Worth further investigation if somebody wants to get to the bottom of this.
The XlsxWriter docs have a helpful example of how to achieve this:
https://xlsxwriter.readthedocs.io/example_pandas_percentage.html
Here's the gist:
writer = pd.ExcelWriter('pandas_percent.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
percent_format = writer.book.add_format({'num_format': '0%'})
# Now apply the number format to the column with index 2.
writer.sheets['Sheet1'].set_column(2, 2, None, percent_format)
writer.save()
Note 1: The column you want to format as a percent must be a ratio float (i.e. do not multiply it by 100).
Note 2: The parameter in the set_column() call that is set to None is the column width. If you want to automatically fit the column width check out this post:
https://stackoverflow.com/a/61617835/13261722.
Note 3: If you want more on the set_column() function you can check out the docs:
https://xlsxwriter.readthedocs.io/worksheet.html?highlight=set_column#set_column

Categories