Range of python's random.random() from the standard library - python

Does python's random.random() ever return 1.0 or does it only return up until 0.9999..?

>>> help(random.random)
Help on built-in function random:
random(...)
random() -> x in the interval [0, 1).
That means 1 is excluded.

Docs are here: http://docs.python.org/library/random.html
...random(), which
generates a random float uniformly in
the semi-open range [0.0, 1.0).
So, the return value will be greater than or equal to 0, and less than 1.0.

The other answers already clarified that 1 is not included in the range, but out of curiosity, I decided to look at the source to see precisely how it is calculated.
The CPython source can be found here
/* random_random is the function named genrand_res53 in the original code;
* generates a random number on [0,1) with 53-bit resolution; note that
* 9007199254740992 == 2**53; I assume they're spelling "/2**53" as
* multiply-by-reciprocal in the (likely vain) hope that the compiler will
* optimize the division away at compile-time. 67108864 is 2**26. In
* effect, a contains 27 random bits shifted left 26, and b fills in the
* lower 26 bits of the 53-bit numerator.
* The orginal code credited Isaku Wada for this algorithm, 2002/01/09.
*/
static PyObject *
random_random(RandomObject *self)
{
unsigned long a=genrand_int32(self)>>5, b=genrand_int32(self)>>6;
return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));
}
So the function effectively generates m/2^53 where 0 <= m < 2^53 is an integer. Since floats have 53 bits of precision normally, this means that on the range [1/2, 1), every possible float is generated. For values closer to 0, it skips some possible float values for efficiency but the generated numbers are uniformly distributed within the range. The largest possible number generated by random.random is precisely
0.99999999999999988897769753748434595763683319091796875

Python's random.random function returns numbers that are less than, but not equal to, 1.
However, it can return 0.

From the code in Antimony's answers it is easy to see that random.random() never returns exactly 1.0 on platforms that have at least 53 bit mantissa for calculations involving constants not annotated with 'f' in C. That's the precision IEEE 754 prescribes and is standard today.
However, on platforms with lower precision, for example if Python is compiled with -fsingle-precision-constant for use on an embedded platform, adding b to a*67108864.0 can result in rounding up to 2^53 if b is close enough to 2^26 and this would mean that 1.0 is returned. Note that this happens regardless of what precision Python's PyFloat_FromDouble function uses.
One way to test for this would be to check a few hundred random numbers whether the 53rd bit is ever 1. If it is 1 at least once this proofs sufficient precision and you are fine. If not, rounding is the most likely explanation meaning that random.random() can return 1.0. It's of course possible that you were just unlucky. You can push the certainty as high as you want by testing more numbers.

Related

Python: Why does have 2^-n work for n>52 and not 1+2^-n-1?

I'm pretty new to python, and I've made a table which calculates T=1+2^-n-1 and C=2^n, which both give the same values from n=40 to n=52, but for n=52 to n=61 I get 0.0 for T, whereas C gives me progressively smaller decimals each time - why is this?
I think I understand why T becomes 0.0, because of python using binary floating point and because of the machine epsilon value - but I'm slightly confused as to why C doesn't also become 0.0.
import numpy as np
import math
t=np.zeros(21)
c=np.zeros(21)
for n in range(40,61):
m=n-40
t[m]=1+2**(-n)-1
c[m]=2**(-n)
print (n,t[m],c[m])
The "floating" in floating point means that values are represented by storing a fixed number of leading digits and a scale factor, rather than assuming a fixed scale (which would be fixed point).
2**-53 only takes one (binary) digit to represent (not including the scale), but 1+2**-53 would take 54 to represent exactly. Python floats only have 53 binary digits of precision; 2**-53 can be represented exactly, but 1+2**-53 gets rounded to exactly 1, and subtracting 1 from that gives exactly 0. Thus, we have
>>> 2**-53
1.1102230246251565e-16
>>> 1+(2**-53)-1
0.0
Postscript: you might wonder why 2**-53 displays as a value not equal to the exact mathematical value when I said it was exact. That's due to the float->string conversion logic, which only keeps enough decimal digits to reconstruct the original float (instead of printing a bunch of digits at the end that are usually just noise).
The difference between both is indeed due to floating-point representation. Indeed, if you perform 1 + X where X is a very very small number, then the floating-point representation sets its exponent value to 0 and the precision is ensured by the mantissa, which is 52-bit on a 64-bit computer. Therefore, 1 + 2^(-X) if X > 52 is equal to 1. However, even 2^-100 can be represented in double-precision floating-point, so you can see C decrease for a larger number of samples.

Python equality of floating point divisions

Using Python 3, how does the following return True ?
a = 2/3
b = 4/6
print(a == b)
I have an algorithm that requires sorting a list of numbers which are each of the form x/y where x and y are integers. (y != 0).
I was concerned that the numerical precision of the division would result in instability and arbitrary ordering of cases such as above. This being an example of relevant comments.But, as per the example and for larger integers as well, it does not seem to be an issue.
Does Python remove the common factor of 2 from the numerator and denominator of b, and retain information that a and b are not just floats?
Python follows the IEEE 754 floating point specification.* (64-bit) IEEE floats are essentially a form of base 2 scientific notation, broken down as follows:
One bit for the sign (positive or negative)
53 bits for the mantissa or significand, including the implied leading one.
11 bits for the exponent.
Multiplying or dividing a floating point value by two, or any power of two, only affects the exponent, and not the mantissa.** As a result, it is normally a fairly "stable" operation by itself, so 2/3 should yield the same result as 4/6. However, IEEE floats still have the following problems:
Most operations are not associative (e.g. (a * b) * c != a * (b * c) in the general case).
More complicated operations are not required to be correctly rounded (however, as Tim Peters points out, division certainly is not a "more complicated" operation and will be correctly rounded).***
Intermediate results are always rounded to 53 bits.
You should be prepared to handle these issues and assume that most mathematically-equivalent floating point expressions will not result in identical values. In Python specifically, you can use math.isclose() to estimate whether two floats are "close enough" to be "probably the same value."
* Actually, this is a lie. Python follows C's double, which nearly always follows IEEE 754 in some fashion, but might deviate from it on sufficiently exotic architectures. In such cases the C standard provides few or no guarantees, so you will have to look to your architecture or compiler's floating point documentation.
** Provided the exponent does not overflow or underflow. If it does, then you will typically land on an appropriately-signed infinity or zero, respectively, or you might underflow to a denormal number depending on architecture and/or how Python was compiled.
*** The exact set of "more complicated" operations varies somewhat because IEEE 754 made a lot of operations optional while still demanding precision. As a result, it is seldom obvious whether a given operation conforms to IEEE 754 or only conforms to the notoriously lax C standard. In some cases, an operation may conform to no standard whatsoever.
Just noting that so long as integers x and y are exactly representable as Python floats, x / y is - on all current machines - the correctly rounded value of the infinitely precise quotient. That's what the IEEE 754 floating-point standard requires, and all current machines support that.
So the important part in your specific example isn't that the numerator and denominator in b = 4/6 have a factor of (specifically!) 2 in common, it's that (a) they have some factor in common; and, (b) 4 and 6 are both exactly representable as Python floats.
So, for example, it's guaranteed that
(2 * 9892837) / (3 * 9892837) == 2 / 3
is also true. Because the infinitely precise value of (2 * 9892837) / (3 * 9892837) is the same as the infinitely precisely value of 2/3, and IEEE 754 division acts as if the infinitely precise quotient were computed. And you can replace 9892837 with any other non-zero integer in that, provided that the products remain exactly representable as Python floats.
2/3 is the same as 4/6. (2/3)*(2/2) = 2/2 = 1, the identity element. The response is correct.

why (0.0006*100000)%10 is 10

When I did (0.0006*100000)%10 and (0.0003*100000)%10 in python it returned 9.999999999999993 respectively, but actually it has to be 0.
Similarly in c++ fmod(0.0003*100000,10) gives the value as 10. Can someone help me out where i'm getting wrong.
The closest IEEE 754 64-bit binary number to 0.0003 is 0.0002999999999999999737189393389513725196593441069126129150390625. The closest representable number to the result of multiplying it by 100000 is 29.999999999999996447286321199499070644378662109375.
There are a number of operations, such as floor and mod, that can make very low significance differences very visible. You need to be careful using them in connection with floating point numbers - remember that, in many cases, you have a very, very close approximation to the infinite precision value, not the infinite precision value itself. The actual value can be slightly high or, as in this case, slightly low.
Just to give the obvious answer: 0.0006 and 0.0003 are not representable in a machine double (at least on modern machines). So you didn't actually multiply by those values, but by some value very close. Slightly more, or slightly less, depending on how the compiler rounded them.
May I suggest using the remainder function in C?
It will compute the remainder after rounding the quotient to nearest integer, with exact computation (no rounding error):
remainder = dividend - round(dividend/divisor)*divisor
This way, your result will be in [-divisor/2,+divisor/2] interval.
This will still emphasize the fact that you don't get a float exactly equal to 6/10,000 , but maybe in a less surprising way when you expect a null remainder:
remainder(0.0006*100000,10.0) -> -7.105427357601002e-15
remainder(0.0003*100000,10.0) -> -3.552713678800501e-15
I don't know of such remainder function support in python, but there seems to be a match in gnulib-python module (to be verified...)
https://github.com/ghostmansd/gnulib-python/blob/master/modules/remainder
EDIT
Why does it apparently work with every other N/10,000 in [1,9] interval but 3 and 6?
It's not completely lucky, this is somehow good properties of IEEE 754 in default rounding mode (round to nearest, tie to even).
The result of a floating point operation is rounded to nearest floating point value.
Instead of N/D you thus get (N/D+err) where the absolute error err is given by this snippet (I'm more comfortable in Smalltalk, but I'm sure you will find equivalent in Python):
| d |
d := 10000.
^(1 to: 9) collect: [:n | ((n/d) asFloat asFraction - (n/d)) asFloat]
It gives you something like:
#(4.79217360238593e-21 9.58434720477186e-21 -2.6281060661048628e-20 1.916869440954372e-20 1.0408340855860843e-20 -5.2562121322097256e-20 -7.11236625150491e-21 3.833738881908744e-20 -2.4633073358870662e-20)
Changing the last bit of a floating point significand leads to a small difference named the unit of least precision (ulp), and it might be good to express the error in term of ulp:
| d |
d := 10000.
^(1 to: 9) collect: [:n | ((n/d) asFloat asFraction - (n/d)) / (n/d) asFloat ulp]
the number of ulp off the exact fraction is thus:
#(0.3536 0.3536 -0.4848 0.3536 0.096 -0.4848 -0.0656 0.3536 -0.2272)
The error is the same for N=1,2,4,8 because they are essentially the same floating point - same significand, just the exponent changes.
It's also the same for N=3 and 6 for same reason, but very near the maximum error for a single operation which is 0.5 ulp (unluckily the number can be half way between two floats).
For N=9, the relative error is smaller than for N=1, and for 5 and 7, the error is very small.
Now when we multiply these approximation by 10000 which is exactly representable as a float, (N/D+err)D is N+Derr, and it's then rounded to nearest float. If D*err is less than half distance to next float, then this is rounded to N and the rounding error vanishes.
| d |
d := 10000.
^(1 to: 9) collect: [:n | ((n/d) asFloat asFraction - (n/d)) * d / n asFloat ulp]
OK, we were unlucky for N=3 and 6, the already high rounding error magnitude has become greater than 0.5 ulp:
#(0.2158203125 0.2158203125 -0.591796875 0.2158203125 0.1171875 -0.591796875 -0.080078125 0.2158203125 -0.138671875)
Beware, the distance is not symmetric for exact powers of two, the next float after 1.0 is 1.0+2^-52, but before 1.0 it's 1.0-2^-53.
Nonetheless, what we see here, is that after the second rounding operation, the error did annihilate in four cases, and did cumulate only in a single case (counting only the cases with different significands).
We can generalize that result. As long as we do not sum numbers with very different exponents, but just use muliply/divide operations, while the error bound can be high after P operations, the statistical distribution of cumulated errors has a remarkably narrow peak compared to this bound, and the result are somehow surprisingly good w.r.t. what we regularly read about float imprecision. See my answer to The number of correct decimal digits in a product of doubles with a large number of terms for example.
I just wanted to mention that yes, float are inexact, but they sometimes do such a decent job, that they are fostering the illusion of exactness. Finding a few outliers like mentionned in this post is then surprising. The sooner surprise, the least surprise. Ah, if only float were implemented less carefully, there would be less questions in this category...

What is the nature of the round off error here?

Can someone help me unpack what exactly is going on under the hood here?
>>> 1e16 + 1.
1e+16
>>> 1e16 + 1.1
1.0000000000000002e+16
I'm on 64-bit Python 2.7. For the first, I would assume that since there is only a precision of 15 for float that it's just round-off error. The true floating-point answer might be something like
10000000000000000.999999....
And the decimal just gets lopped of. But the second result makes me question this understanding and can't 1 be represented exactly? Any thoughts?
[Edit: Just to clarify. I'm not in any way suggesting that the answers are "wrong." Clearly, they're right, because, well they are. I'm just trying to understand why.]
It's just rounding as close as it can.
1e16 in floating hex is 0x4341c37937e08000.
1e16+2 is 0x4341c37937e08001.
At this level of magnitude, the smallest difference in precision that you can represent is 2. Adding 1.0 exactly rounds down (because typically IEEE floating point math will round to an even number). Adding values larger than 1.0 will round up to the next representable value.
10^16 = 0x002386f26fc10000 is exactly representable as a double precision floating point number. The next representable number is 1e16+2. 1e16+1 is correctly rounded to 1e16, and 1e16+1.1 is correctly rounded to 1e16+2. Check the output of this C program:
#include <stdio.h>
#include <math.h>
#include <stdint.h>
int main()
{
uint64_t i = 10000000000000000ULL;
double a = (double)i;
double b = nextafter(a,1.0e20); // next representable number
printf("I=0x%016llx\n",i); // 10^16 in hex
printf("A=%a (%.4f)\n",a,a); // double representation
printf("B=%a (%.4f)\n",b,b); // next double
}
Output:
I=0x002386f26fc10000
A=0x1.1c37937e08p+53 (10000000000000000.0000)
B=0x1.1c37937e08001p+53 (10000000000000002.0000)
Let's decode some floats, and see what's actually going on! I'm going to use Common Lisp, which has a handy function for getting at the significand (a.k.a mantissa) and exponent of a floating-point number without needing to twiddle any bits. All floats used are IEEE double-precision floats.
> (integer-decode-float 1.0d0)
4503599627370496
-52
1
That is, if we consider the value stored in the significand as an integer, it is the maximum power of 2 available (4503599627370496 = 2^52), scaled down (2^-52). (It isn't stored as 1 with an exponent of 0 because it's simpler for the significand to never have zeros on the left, and this allows us to skip representing the leftmost 1 bit and have more precision. Numbers not in this form are called denormal.)
Let's look at 1e16.
> (integer-decode-float 1d16)
5000000000000000
1
1
Here we have the representation (5000000000000000) * 2^1. Note that the significand, despite being a nice round decimal number, is not a power of 2; this is because 1e16 is not a power of 2. Every time you multiply by 10, you are multiplying by 2 and 5; multiplying by 2 is just incrementing the exponent, but multiplying by 5 is an "actual" multiplication, and here we've multiplied by 5 16 times.
5000000000000000 = 10001110000110111100100110111111000001000000000000000 (base 2)
Observe that this is a 53-bit binary number, as it should be since double floats have a 53-bit significand.
But the key to understanding the situation is that the exponent is 1. (The exponent being small is an indication that we are getting close to the limits of precision.) This means that the float value is 2^1 = 2 times this significand.
Now, what happens when we try to represent adding 1 to this number? Well, we need to represent 1 at the same scale. But the smallest change we can make in this number is exactly 2, because the least significant bit of the significand has value 2!
That is, if we increment the significand, making the smallest possible change, we get
5000000000000001 = 10001110000110111100100110111111000001000000000000001 (base 2)
and when we apply the exponent, we get 2 * 5000000000000001 = 10000000000000002, which is exactly the value you observed. You can only have either 10000000000000000 or 10000000000000002, and 10000000000000001.1 is closer to the latter.
(Note that the issue here isn't even that decimal numbers aren't exact in binary! There's no binary "repeating decimals" here, and there's plenty of 0 bits on the right end of the significand — it's just that your input neatly falls just beyond the lowest bit.)
With numpy, you can see the next larger and smaller representable IEEE floating point number:
>>> import numpy as np
>>> huge=1e100
>>> tiny=1e-100
>>> np.nextafter(1e16,huge)
10000000000000002.0
>>> np.nextafter(1e16,tiny)
9999999999999998.0
So:
>>> (np.nextafter(1e16,huge)-np.nextafter(1e16,tiny))/2.0
2.0
And:
>>> 1.1>2.0/2
True
Therefore 1e16 + 1.1 is correctly rounded to the next larger IEEE representable number of 10000000000000002.0
As is:
>>> 1e16+1.0000000000000005
1.0000000000000002e+16
and 1e16-(something slightly larger than 1) is rounded down by 2 to the next smaller IEEE number:
>>> 1e16-1.0000000000000005
9999999999999998.0
Keep in mind that 32 bit vs 64 bit Python is irrelevant. It is the size of the IEEE format used that matters. Also keep in mind that the larger the magnitude of the number, the epsilon value (the spread between the two next larger and smaller IEEE values basically) changes.
You can see this in bits as well:
>>> def f_to_bits(f): return struct.unpack('<Q', struct.pack('<d', f))[0]
...
>>> def bits_to_f(bits): return struct.unpack('<d', struct.pack('<Q', bits))[0]
...
>>> bits_to_f(f_to_bits(1e16)+1)
1.0000000000000002e+16
>>> bits_to_f(f_to_bits(1e16)-1)
9999999999999998.0

Safest way to convert float to integer in python?

Python's math module contain handy functions like floor & ceil. These functions take a floating point number and return the nearest integer below or above it. However these functions return the answer as a floating point number. For example:
import math
f=math.floor(2.3)
Now f returns:
2.0
What is the safest way to get an integer out of this float, without running the risk of rounding errors (for example if the float is the equivalent of 1.99999) or perhaps I should use another function altogether?
All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
To quote from Wikipedia,
Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.
Use int(your non integer number) will nail it.
print int(2.3) # "2"
print int(math.sqrt(5)) # "2"
You could use the round function. If you use no second parameter (# of significant digits) then I think you will get the behavior you want.
IDLE output.
>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2
Combining two of the previous results, we have:
int(round(some_float))
This converts a float to an integer fairly dependably.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
This post explains why it works in that range.
In a double, you can represent 32bit integers without any problems. There cannot be any rounding issues. More precisely, doubles can represent all integers between and including 253 and -253.
Short explanation: A double can store up to 53 binary digits. When you require more, the number is padded with zeroes on the right.
It follows that 53 ones is the largest number that can be stored without padding. Naturally, all (integer) numbers requiring less digits can be stored accurately.
Adding one to 111(omitted)111 (53 ones) yields 100...000, (53 zeroes). As we know, we can store 53 digits, that makes the rightmost zero padding.
This is where 253 comes from.
More detail: We need to consider how IEEE-754 floating point works.
1 bit 11 / 8 52 / 23 # bits double/single precision
[ sign | exponent | mantissa ]
The number is then calculated as follows (excluding special cases that are irrelevant here):
-1sign × 1.mantissa ×2exponent - bias
where bias = 2exponent - 1 - 1, i.e. 1023 and 127 for double/single precision respectively.
Knowing that multiplying by 2X simply shifts all bits X places to the left, it's easy to see that any integer must have all bits in the mantissa that end up right of the decimal point to zero.
Any integer except zero has the following form in binary:
1x...x where the x-es represent the bits to the right of the MSB (most significant bit).
Because we excluded zero, there will always be a MSB that is one—which is why it's not stored. To store the integer, we must bring it into the aforementioned form: -1sign × 1.mantissa ×2exponent - bias.
That's saying the same as shifting the bits over the decimal point until there's only the MSB towards the left of the MSB. All the bits right of the decimal point are then stored in the mantissa.
From this, we can see that we can store at most 52 binary digits apart from the MSB.
It follows that the highest number where all bits are explicitly stored is
111(omitted)111. that's 53 ones (52 + implicit 1) in the case of doubles.
For this, we need to set the exponent, such that the decimal point will be shifted 52 places. If we were to increase the exponent by one, we cannot know the digit right to the left after the decimal point.
111(omitted)111x.
By convention, it's 0. Setting the entire mantissa to zero, we receive the following number:
100(omitted)00x. = 100(omitted)000.
That's a 1 followed by 53 zeroes, 52 stored and 1 added due to the exponent.
It represents 253, which marks the boundary (both negative and positive) between which we can accurately represent all integers. If we wanted to add one to 253, we would have to set the implicit zero (denoted by the x) to one, but that's impossible.
If you need to convert a string float to an int you can use this method.
Example: '38.0' to 38
In order to convert this to an int you can cast it as a float then an int. This will also work for float strings or integer strings.
>>> int(float('38.0'))
38
>>> int(float('38'))
38
Note: This will strip any numbers after the decimal.
>>> int(float('38.2'))
38
math.floor will always return an integer number and thus int(math.floor(some_float)) will never introduce rounding errors.
The rounding error might already be introduced in math.floor(some_large_float), though, or even when storing a large number in a float in the first place. (Large numbers may lose precision when stored in floats.)
Another code sample to convert a real/float to an integer using variables.
"vel" is a real/float number and converted to the next highest INTEGER, "newvel".
import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
for row in cursor:
curvel = float(row[1])
newvel = int(math.ceil(curvel))
Since you're asking for the 'safest' way, I'll provide another answer other than the top answer.
An easy way to make sure you don't lose any precision is to check if the values would be equal after you convert them.
if int(some_value) == some_value:
some_value = int(some_value)
If the float is 1.0 for example, 1.0 is equal to 1. So the conversion to int will execute. And if the float is 1.1, int(1.1) equates to 1, and 1.1 != 1. So the value will remain a float and you won't lose any precision.

Categories