Get displayed precision of floating point digits in pandas - python

I have a dataframe of floating point numbers, and I want to work with what I intuitively see to be their precision, or number of digits past zero:
dd = pd.DataFrame({'x':[12.123456,10.12345,9.1234]})
dd['digits'] = dd['x'].apply(lambda num: num - int(num))
dd['target'] = [6, 5, 4]
x
digits
target
12.123456
0.123456
6
10.123450
0.123450
5
9.123400
0.123400
4
My solution:
dd['precision'] = dd['x'].astype('str').str.split('.').str[1].str.len()
x
digits
target
precision
12.123456
0.123456
6
6
10.123450
0.123450
5
5
9.123400
0.123400
4
4
It works, but it's so ugly and difficult to recall something similar in 3 months when I'll need it again. Is there a cleaner solution? If not, could someone share some insight the docs don't? What exactly is the data type output by each of these dotted steps? Some of them seem to operate on the series, whereas other operate on the individual values.
Perhaps this is a property of the series dtype? or value metadata?
EDIT: I need this to run performantly as well. Is it possible to find a vectorized solution? I recall that "object" types in Pandas are pointers, in this case to string data, which sounds like it would make it very difficult to run calculations on more than one value at a time. Therefore, converting to string and accessing its values like:
.astype('str').str...
doesn't seem like the correct approach.
On the other hand, floating-point arithmetic used to count these digits without conversion sounds error-prone as well.

Here is another way to do it:
import pandas as pd
df = pd.DataFrame({"x": [12.123456, 10.12345, 9.1234]})
df["precision"] = df["x"].apply(
lambda x: [i for i in range(pd.options.display.precision + 1) if x == round(x, i)][0]
)
print(df)
# Output
x precision
0 12.123456 6
1 10.123450 5
2 9.123400 4
As per Pandas documentation, display.precision is an integer (6 by default) which represents the "floating point output precision in terms of number of places after the decimal, for regular formatting as well as scientific notation".

Related

Why does my program only print the first few characters of e rather than the whole number?

e = str(2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274)
print(e)
Output:
2.718281828459045
Screenshots: here and here.
Why does the code only print out the first few characters of e instead of the whole string?
A string str has characters, but a number (be it an int or a float) just has a value.
If you do this:
e_first_100 = '2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274'
print(e_first_100)
You'll see all digits printed, because they are just characters in a string, it could have also been the first 100 characters from 'War and Peace' and you would not expect any of that to get lost either.
Since 'e' is not an integer value, you can't use int here, so you'll have to use float, but Python uses a finite number of bits to represent such a number, while there's an infinite number of real numbers. In fact there's an infinite number of values between any two real numbers. So a clever way has to be used to represent at least the ones you use most often, with a limited amount of precision.
You often don't notice the lack of precision, but try something like .1 + .1 + .1 == .3 in Python and you'll see that it can pop up in common situations.
Your computer already has a built-in way to represent these floating point numbers, using either 32 or 64 bits, although many languages (Python included) do offer additional ways of representing floats that aren't part of the way your computer works and allow a bit more precision. By default, Python uses these standard representations of real numbers.
So, if you then do this:
e1 = float(e_first_100)
print(e1)
e2 = 2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274
print(e2)
Both result in a value that, when you print it, looks like:
2.718281828459045
Because that's the precision up to which the number is (more or less) accurately represented.
If you need to use e in a more precise manner, you can use Python's own representation:
from decimal import Decimal
e3 = Decimal(e_first_100)
print(e3)
That looks promising, but even Decimal only has limited precision, although it's better than standard floats:
print(e2 * 3)
print(e3 * Decimal(3))
The difference:
8.154845485377136
8.154845485377135706080862414
To expand on Grismar's answer, you don't see the data because the default string representation of floats cuts off at that point as going further than that wouldn't be very useful, but while the object is a float the data is still there.
To get a string with the data, you could provide a fixed precision to some larger amount of digits, for example
In [2]: e = format(
...: 2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274,
...: ".50f",
...: )
In [3]: e
Out[3]: '2.71828182845904509079559829842764884233474731445312'
which gives us the first 50 digits, but this is of course not particularly useful with floats as the loss of precision picks up the further you go

How do I put more precision in my imported data?

I'm trying to improve the precision by 10 digits for the columns marked in red (Image 1), I tried pd.set_options ('display.precision', 10), but it didn't work. Any ideas? the only line of code I have used is to import data from excel: df = pd.read_excel ('')
It looks like your numbers are actually really big integers, so pandas converts them to 'float' as opposed to integer when they get imported. (they don't fit in standard int64 data type).
Precision applies to decimals. i.e. if you have something like 0.234566 etc how many decimals to show. But in your case it's not decimals you looking for, but how many relevant digits to display, before cutting off to scientific notation.
Since it's a float data type, you have to control float format, i.e. pd.options.display.float_format
To keep scientific notation, but limit digits to 10,
set display options to this:
pd.options.display.float_format = '{:.10e}'.format
And it will display 10 digits after decimal point before e. You can change 10 to any other digit.
To understand how scientific notation works in python, try this:
print("{:.2e}".format(12345678))
Basically it will format 12345678 limited to 2 digits and will display as 1.23e+07 (which means 1.23 * 10^7)
If you don't want any scientific notation, and just want to display the long integer, use this:
pd.options.display.float_format = '{:.0f}'.format
In this case 0 means show 0 decimals, and it will show the full integer part of the number.
eg:
print("{:.0f}".format(1234567887298739)) returns 1234567887298739. While print("{:.2f}".format(1.234567887298739487)) returns 1.23.
Caveat: If the number is too long, it will start rounding and confusing after awhile. I think 10 digits is ok, but if it's much larger than that, python can't really handle it and it will start rounding and cutting things up... float precision has a system limit too.
Note: In all cases your underlying data stays the same. just the formatting changes.
By "improve precision", I assume you mean increase the number of digits that are displayed. In this case, pd.set_option('display.precision', 10) should give you what you want. I have an example as well.
import pandas as pd
c = pd.read_excel("Book1.xlsx")
print(c)
pd.set_option('display.precision', 10)
print()
print(c)
pd.set_option('display.precision', 20)
print()
print(c)
This yields the following.
https://i.stack.imgur.com/XHpdO.png
Make note that Pandas actually does have the actual values in memory, it is just printing less digits in accordance with default settings. As an example, you can verify this by indexing your data frames.
>>> frame = pd.DataFrame([123.456789101112])
>>> frame
0
0 123.456789
>>> frame[0].data[0]
123.456789101112

Taking just two decimals without rounding it

Basically, I have a list of float numbers with too many decimals. So when I created a second list with two decimals, Python rounded them. I used the following:
g1= ["%.2f" % i for i in g]
Where g1 is the new list with two decimals, but rounded, and g is the list with float numbers.
How can I make one without rounding them?
I'm a newbie, btw. Thanks!
So, you want to truncate the numbers at the second digit?
Beware that rounding might be the better and more accurate solution anyway.
If you want to truncate the numbers, there are a couple of ways - one of them is to multiply the number by 10 elevated to the number of desired decimal places (100 for 2 places), apply "math.floor", and divide the total back by the same number.
However, as internal floating point arithmetic is not base 10, you'd risk getting more decimal places on the division to scale down.
Another way is to create a string with 3 digits after the "." and drop the last one - that'd be rounding proof.
And again, keep in mind that this converts the numbers to strings - what should be done for presentation purposes only. Also, "%" formatting is quite an old way to format parameters in a string. In modern Python, f-strings are the preferred way:
g1 = [f"{number:.03f}"[:-1] for number in g]
Another, more correct way, is, of course, treat numbers as numbers, and not play tricks on adding or removing digits on it. As noted in the comments, the method above would work for numbers like "1.227", that would be kept as "1.22", but not for "2.99999", which would be rounded to "3.000" and then truncated to "3.00".
Python has the decimal modules, which allows for arbitrary precision of decimal numbers - which includes less precision, if needed, and control of the way Python does the rounding - including rounding towards zero, instead of the nearest number.
Just set the decimal context to the decimal.ROUND_DOWN strategy, and then convert your numbers using either the round built-in (the exact number of digits is guaranteed, unlike using round with floating point numbers), or just do the rounding as part of the string formatting anyway. You can also convert your floats do Decimals in the same step:
from decimals import Decimal as D, getcontext, ROUND_DOWN
getcontext().rounding = ROUND_DOWN
g1 = [f"{D(number):.02f}" for number in g]
Again - by doing this, you could as well keep your numbers as Decimal objects, and still be able to perform math operations on them:
g2 = [round(D(number, 2)) for number in g]
Here is my solution where we don't even need to convert the number's to string to get the desired output:
def format_till_2_decimal(num):
return int(num*100)/100.0
g = [-5.427926, -12.222018, 7.214379, -16.771845, -6.1441464, 10.1383295, 14.740516, 5.9209185, -9.740783, -10.098338]
formatted_g = [format_till_2_decimal(num) for num in g]
print(formatted_g)
Hope this solution helps!!
Here might be the answer you are looking for:
g = [-5.427926, -12.222018, 7.214379, -16.771845, -6.1441464, 10.1383295, 14.740516, 5.9209185, -9.740783, -10.098338]
def trunc(number, ndigits=2):
parts = str(number).split('.') # divides number into 2 parts. for ex: -5, and 4427926
truncated_number = '.'.join([parts[0], parts[1][:ndigits]]) # We keep this first part, while taking only 2 digits from the second part. Then we concat it together to get '-5.44'
return round(float(truncated_number), 2) # This should return a float number, but to make sure it is roundded to 2 decimals.
g1 = [trunc(i) for i in g]
print(g1)
[-5.42, -12.22, 7.21, -16.77, -6.14, 10.13, 14.74, 5.92, -9.74, -10.09]
Hope this helps.
Actually if David's answer is what you are looking for, it can be done simply as following:
g = [-5.427926, -12.222018, 7.214379, -16.771845, -6.1441464, 10.1383295, 14.740516, 5.9209185, -9.740783, -10.098338]
g1 = [("%.3f" % i)[:-1] for i in g]
Just take 3 decimals, and remove the last chars from the result strings. (You may convert the result to float if you like)

generate random numbers truncated to 2 decimal places

I would like to generate uniformly distributed random numbers between 0 and 0.5, but truncated to 2 decimal places.
without the truncation, I know this is done by
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,1))*0.5
could anyone help me with suggestions on how to generate random numbers up to 2 d.p. only? Thanks!
A float cannot be truncated (or rounded) to 2 decimal digits, because there are many values with 2 decimal digits that just cannot be represented exactly as an IEEE double.
If you really want what you say you want, you need to use a type with exact precision, like Decimal.
Of course there are downsides to doing that—the most obvious one for numpy users being that you will have to use dtype=object, with all of the compactness and performance implications.
But it's the only way to actually do what you asked for.
Most likely, what you actually want to do is either Joran Beasley's answer (leave them untruncated, and just round at print-out time) or something similar to Lauritz V. Thaulow's answer (get the closest approximation you can, then use explicit epsilon checks everywhere).
Alternatively, you can do implicitly fixed-point arithmetic, as David Heffernan suggests in a comment: Generate random integers between 0 and 50, keep them as integers within numpy, and just format them as fixed point decimals and/or convert to Decimal when necessary (e.g., for printing results). This gives you all of the advantages of Decimal without the costs… although it does open an obvious window to create new bugs by forgetting to shift 2 places somewhere.
decimals are not truncated to 2 decimal places ever ... however their string representation maybe
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,1))*0.5
print ["%0.2d"%val for val in set]
How about this?
np.random.randint(0, 50, size=(50,1)).astype("float") / 100
That is, create random integers between 0 and 50, and divide by 100.
EDIT:
As made clear in the comments, this will not give you exact two-digit decimals to work with, due to the nature of float representations in memory. It may look like you have the exact float 0.1 in your array, but it definitely isn't exactly 0.1. But it is very very close, and you can get it closer by using a "double" datatype instead.
You can postpone this problem by just keeping the numbers as integers, and remember that they're to be divided by 100 when you use them.
hundreds = random.randint(0, 50, size=(50, 1))
Then at least the roundoff won't happen until at the last minute (or maybe not at all, if the numerator of the equation is a multiple of the denominator).
I managed to find another alternative:
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,2))
for i in range(50):
for j in range(2):
set[i,j] = round(set[i,j],2)

How to store exponential values using python

I am looking for a way to perform a digit divided by larger value(2/5000000) and then store that value in table, but the problem is when i save that value, only 0 is stored , instead of correct value.I tried with float, double precision, but still only 0 is stored, is there any other way .
Thank you
Remember to operate on floating numbers, and not convert it after the operation. E.g. 2/5000000.
Also, use the Decimal library, if you are looking for more accurate decimals.
You need to use floating point division. To be explicit, you can cast ints to float:
>>> a = 2
>>> b = 5000000
>>> c = a/float(b)
>>> c
4e-07
You can cast either a or b to float.

Categories