Dealing with decimals with many digits in Pandas, - python

pd.set_option('display.max_colwidth', None )
pd.set_option('display.float_format', lambda x: '%.200f' % x)
exData = pd.read_csv('AP11.csv',delimiter=';',float_precision=None)
x = exData.loc[:,['A','B']]
y = exData.loc[:,['C']]
x
my original float on excel is 0.1211101931541032183754113717355410323332315436353654273243543132542237415430173719
what is being displayed is
0.12111019315410319341363987177828676067292690277099609375000000000000000000000000000000000000000000000000000000000...
this is not a display issue. something in pandas rounds my float. i don't want to round any number for it will affect the result of my string. because this is originally a string that is converted to a float. i tried to use int64 but it can't handle big numbers. so instead i decided to use floats with "0.mystring" to not get "inf" displayed in pandas. and i get it rounded. is machine learning limited by these missy variables? or is there another way to deal with big numbers without rounding, displaying inf?

Use decimal instead of float. Just put
from decimal import Decimal
at the top of your code, and write your floats as
x = Decimal(0.121110193154103218375411371735541032333231543635365427324354313254223741543017371)
decimal is a library for floats with a dynamic length, rather than rounded.
Generally you should avoid floats, as they can have strange irregularities and roundings. Often when operations are performed on them, they can have a series of zeros and then some other numbers, when it should just have a few decimal places.

Related

Large decimal numbers with function math.modf

I am using the function math.modf which separates the integer and decimal part of a number as follows:
decimal_part, integer_part = math.modf(x)
Where x is a decimal number.
An example for a small number is as follows:
x = 1993.0787353515625
decimal_part = 0.0787353515625, integer_part = 1993.0
But when I work with very large numbers the following happens:
x = 6.797731511223558e+44
decimal_part = 0.0, integer_part = 6.797731511223558e+44
In this case it doesn't save the result in the decimal part and appears 0.0. And the same happens for numbers up to 300 digits. But when the number x has at least 360 digits, the following error appears:
OverflowError: int too large to convert to float.
I would like to save the decimal part of large numbers of at least 300 digits without overflowing the register where the decimal part is stored. And I would like to avoid the error in numbers with more than 360 digits: "OverflowError: int too large to convert to float".
How can I solve it?
Due to the extra information it has to save, float needs more space than int. But let's break this down:
The number 6.797731511223558e+44 is an integer, which means it has no decimal part, so it will always return 0.0 as decimal.
If you are providing an integer with 300+ digits, it will still be an integer, so the decimal part will still be 0.0, so there's no need to use the function. You are getting that error because you are passing a very large int that is converted to float to give you the result, but this is not necessary since you already know the result.
On the other hand, if you use the function with a float, the function doesn't have problems casting float to float, so it won't show the error.
The number 6.797731511223558e+44 should be a number with a decimal part because it is the result of dividing a number by another number. But python doesn't save the decimal result and 0.0 appears. When we introduce small numbers in the function, it saves the decimal part.

How do I put more precision in my imported data?

I'm trying to improve the precision by 10 digits for the columns marked in red (Image 1), I tried pd.set_options ('display.precision', 10), but it didn't work. Any ideas? the only line of code I have used is to import data from excel: df = pd.read_excel ('')
It looks like your numbers are actually really big integers, so pandas converts them to 'float' as opposed to integer when they get imported. (they don't fit in standard int64 data type).
Precision applies to decimals. i.e. if you have something like 0.234566 etc how many decimals to show. But in your case it's not decimals you looking for, but how many relevant digits to display, before cutting off to scientific notation.
Since it's a float data type, you have to control float format, i.e. pd.options.display.float_format
To keep scientific notation, but limit digits to 10,
set display options to this:
pd.options.display.float_format = '{:.10e}'.format
And it will display 10 digits after decimal point before e. You can change 10 to any other digit.
To understand how scientific notation works in python, try this:
print("{:.2e}".format(12345678))
Basically it will format 12345678 limited to 2 digits and will display as 1.23e+07 (which means 1.23 * 10^7)
If you don't want any scientific notation, and just want to display the long integer, use this:
pd.options.display.float_format = '{:.0f}'.format
In this case 0 means show 0 decimals, and it will show the full integer part of the number.
eg:
print("{:.0f}".format(1234567887298739)) returns 1234567887298739. While print("{:.2f}".format(1.234567887298739487)) returns 1.23.
Caveat: If the number is too long, it will start rounding and confusing after awhile. I think 10 digits is ok, but if it's much larger than that, python can't really handle it and it will start rounding and cutting things up... float precision has a system limit too.
Note: In all cases your underlying data stays the same. just the formatting changes.
By "improve precision", I assume you mean increase the number of digits that are displayed. In this case, pd.set_option('display.precision', 10) should give you what you want. I have an example as well.
import pandas as pd
c = pd.read_excel("Book1.xlsx")
print(c)
pd.set_option('display.precision', 10)
print()
print(c)
pd.set_option('display.precision', 20)
print()
print(c)
This yields the following.
https://i.stack.imgur.com/XHpdO.png
Make note that Pandas actually does have the actual values in memory, it is just printing less digits in accordance with default settings. As an example, you can verify this by indexing your data frames.
>>> frame = pd.DataFrame([123.456789101112])
>>> frame
0
0 123.456789
>>> frame[0].data[0]
123.456789101112

float number with leading zero at end on python

I want to convert string number to float and keep zeros at the end like this f=float('.270') and f should be 0.270, not 0.27 or '0.270' how I can do it?
Depending of the application, you should use the Decimal lib - specially if you are dealing with critical calculations like money
https://docs.python.org/3.8/library/decimal.html
import decimal
decimal.getcontext().prec = 3
f = decimal.Decimal('0.270')
print(f)
Or simply "%.3f" % f
Have you considered saving the float as a string rather than a float? If needed for calculations then it can be casted to a float. If you need to have this for significant figures, then this article on rounding numbers in Python should help. It uses the format() method.
I hope this was able to help!
phylo

Clean way to convert string to floating point number with specific precision?

I'm trying to convert strings of numbers that come from the output of another program into floating point numbers with two forced decimal places (including trailing zeros).
Right now I'm converting the strings to floats, then separately specifying precision (two decimal places), then converting back to float to do numeral comparisons on later.
# convert to float
float1 = float(output_string[6])
# this doesn't guarantee two decimal places in my output
# eg: -36.55, -36.55, -40.34, -36.55, -35.7 (no trailing zero on the last number)
nice_float = float('{0:.2f}'.format(float1))
# this works but then I later need to convert back into a float
# string->float->string->float is not super clean
nice_string = '{0:.2f}'.format(float1)
Edit for clarity:
I have a problem with the display in that I need that to show exactly two decimal places.
Is there a way to convert a string to a floating point number rounded to two decimal places that's cleaner than my implementation which involves converting a string to a float, then the float back into a formatted string?

generate random numbers truncated to 2 decimal places

I would like to generate uniformly distributed random numbers between 0 and 0.5, but truncated to 2 decimal places.
without the truncation, I know this is done by
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,1))*0.5
could anyone help me with suggestions on how to generate random numbers up to 2 d.p. only? Thanks!
A float cannot be truncated (or rounded) to 2 decimal digits, because there are many values with 2 decimal digits that just cannot be represented exactly as an IEEE double.
If you really want what you say you want, you need to use a type with exact precision, like Decimal.
Of course there are downsides to doing that—the most obvious one for numpy users being that you will have to use dtype=object, with all of the compactness and performance implications.
But it's the only way to actually do what you asked for.
Most likely, what you actually want to do is either Joran Beasley's answer (leave them untruncated, and just round at print-out time) or something similar to Lauritz V. Thaulow's answer (get the closest approximation you can, then use explicit epsilon checks everywhere).
Alternatively, you can do implicitly fixed-point arithmetic, as David Heffernan suggests in a comment: Generate random integers between 0 and 50, keep them as integers within numpy, and just format them as fixed point decimals and/or convert to Decimal when necessary (e.g., for printing results). This gives you all of the advantages of Decimal without the costs… although it does open an obvious window to create new bugs by forgetting to shift 2 places somewhere.
decimals are not truncated to 2 decimal places ever ... however their string representation maybe
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,1))*0.5
print ["%0.2d"%val for val in set]
How about this?
np.random.randint(0, 50, size=(50,1)).astype("float") / 100
That is, create random integers between 0 and 50, and divide by 100.
EDIT:
As made clear in the comments, this will not give you exact two-digit decimals to work with, due to the nature of float representations in memory. It may look like you have the exact float 0.1 in your array, but it definitely isn't exactly 0.1. But it is very very close, and you can get it closer by using a "double" datatype instead.
You can postpone this problem by just keeping the numbers as integers, and remember that they're to be divided by 100 when you use them.
hundreds = random.randint(0, 50, size=(50, 1))
Then at least the roundoff won't happen until at the last minute (or maybe not at all, if the numerator of the equation is a multiple of the denominator).
I managed to find another alternative:
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,2))
for i in range(50):
for j in range(2):
set[i,j] = round(set[i,j],2)

Categories