Python large random integers and the precision of the Mersenne Twister - python

I am wondering what the "precision" of Python's random function means. It is described here:
Almost all module functions depend on the basic function random(),
which generates a random float uniformly in the semi-open range [0.0,
1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1.
(http://docs.python.org/library/random.html?highlight=mersenne%20twister, accessed 20120727)
What interests me is that I can generate very large random integers (long integers) that appear to have considerably more than 2^53 precision. For instance (using Ipython):
In [1]: from math import factorial as F
In [2]: from random import randint as R
In [3]: R(1, F(900))
Out[3]: 55655511302846458744179265243566263049348396362730789786376014445325896599604354914431619960209388364677180234108513221468671377813842671874148746886513973171423907294544220953849330089822288697383171078250181973489187774341795574648920075697792011317798969959919449394758519496792725695600701199089972009688412593325291810024048811890509220571436407156566269358600296506017343255050788936280200352509087073097532486502694101150248815092174847010359868156616901409331336760344351058867833528749797221612169430654334458578364850198977511061993233818849689759090377347376020160658362459773356292085856906573553086825560047089834757501023094429371408722563891227474029563545206865055657504766128286451181119906678062837368414582707728324415466848186858173236300969443478496634754744888060794778485246692104851885847515244146665974598354436781340057667983223238998674622833320199904840957000014767293658171874973067958145430346745707636676061629278168015549755791407108399231392952706279787486238512258804098030513575025870504347283221015756832157863142353915612138589145084128778032995695113870365505775392647256056048691602676699581153972467494111720212363912926352356346807790816796784781384561736415741104584667536002819103176714157723039428367564698686945824679882523439229215035996634289075127375256728472056511244548311771570743103809147045947583819651257115044154025329883682429231394004470689760531056853018427649916035935302356382633012319775473728455377657692268855776796385819792347680100513177355101630543290088996770992548670273727988974570199179655691444984337837105283447276788151912408533352627494948390016029881755603243934955207024221452181883522004648595373130617729041347013155205217774450836687880723915563507108222768637840614647145898936109917167237397888104669458661404234553707323638883064861414284282190898741067404128885188113697448726481104763682489126524054241797759521120664366845719767486252884585742737830119890190213053751046461419643379561983590174574185268661318409035375114305279020423595250660644954841798619767985549553380200803904976806468796334648515423467654573415304912570341635682203261002606581817207689816015969520503052648773609840260050676394927780076948629298559638703440007364834579712680931643829764810072128419905903786966L
I am wondering in what sense 53 bits is the precision limit of the random function. And concretely, if I ask Python to return pseudo-random numbers between 1 and some very large upper bound, is it not true that all integers in that range have an equal likelihood of being returned?

What Python does is generate 64 bits of randomness from calling the 32-bit version of MT19937 twice, but since that number is constrained to [0.0, 1.0) the result is constrained to 53 bits of precision (limitation of floating-point format).
Python random.py is capable of pulling bits from /dev/urandom or Windows CryptGenRandom (if supported), if you use the random.SystemRandom class. Otherwise, it generates larger numbers by pulling successive bits from repeated calls to MT19937.

53 bits is not an inherent limit in the generator, it is the amount that Python returns when you request a float, since floats have 53 bits of precision.
You can get random integers directly using stuff like random.getrandbits.
In more detail, the Mersenne Twister used in CPython generates 32 bits at a time. The module code calls this twice and combines the results to generate a 53 bit float. If you call getrandbits, it will call the internal function as many times as necessary to generate k bits. The code for this can be found in here.

Related

How to disable python rounding?

I'm looking for a way to disable this:
print(0+1e-20) returns 1e-20, but print(1+1e-20) returns 1.0
I want it to return something like 1+1e-20.
I need it because of this problem:
from numpy import sqrt
def f1(x):
return 1/((x+1)*sqrt(x))
def f2(x):
return 1/((x+2)*sqrt(x+1))
def f3(x):
return f2(x-1)
print(f1(1e-6))
print(f3(1e-6))
print(f1(1e-20))
print(f3(1e-20))
returns
999.9990000010001
999.998999986622
10000000000.0
main.py:10: RuntimeWarning: divide by zero encountered in double_scalars
return 1/((x+2)*sqrt(x+1))
inf
f1 is the original function, f2 is f1 shifted by 1 to the left and f3 is f2 moved back by 1 to the right. By this logic, f1 and f3 should be equal to eachother, but this is not the case.
I know about decimal. Decimal and it doesn't work, because decimal doesn't support some functions including sin. If you could somehow make Decimal work for all functions, I'd like to know how.
Can't be done. There is no rounding - that would imply that the exact result ever existed, which it did not. Floating-point numbers have limited precision. The easiest way to envision this is an analogy with art and computer screens. Imagine someone making a fabulously detailed painting, and all you have is a 1024x768 screen to view it through. If a microscopic dot is added to the painting, the image on the screen might not change at all. Maybe you need a 4K screen instead.
In Python, the closest representable number after 1.0 is 1.0000000000000002 (*), according to math.nextafter(1.0, math.inf) (Python 3.9+ required for math.nextafter).1e-20 and 1 are too different in magnitude, so the result of their addition cannot be represented by a Python floating-point number, which are precise to up to about 16 digits.
See Is floating point math broken? for an in-depth explanation of the cause.
As this answer suggests, there are libraries like mpmath that implement arbitrary-precision arithmetics:
from mpmath import mp, mpf
mp.dps = 25 # set precision to 25 decimal digits
mp.sin(1)
# => mpf('0.8414709848078965066525023183')
mp.sin(1 + mpf('1e-20')) # mpf is a constructor for mpmath floats
# => mpf('0.8414709848078965066579053457')
mpmath floats are sticky; if you add up an int and a mpf you get a mpf, so I did not have to write mp.mpf(1). The result is still not precise, but you can select what precision is sufficient for your needs. Also, note that the difference between these two results is, again, too small to be representable by Python's floating point numbers, so if the difference is meaningful to you, you have to keep it in the mpmath land:
float(mpf('0.8414709848078965066525023183')) == float(mpf('0.8414709848078965066579053457'))
# => True
(*) This is actually a white lie. The next number after 1.0 is 0x1.0000000000001, or 0b1.0000000000000000000000000000000000000000000000001, but Python doesn't like hexadecimal or binary float literals. 1.0000000000000002 is Python's approximation of that number for your decimal convenience.
As others have stated, in general this can't be done (due to how computers commonly represent numbers).
It's common to work with the precision you've got, ensuring that algorithms are numerically stable can be awkward.
In this case I'd redefine f1 to work on the logarithms of numbers, e.g.:
from numpy as sqrt, log, log1p,
def f1(x):
prod = log1p(x) + log(x) / 2
return exp(-prod)
You might need to alter other parts of the code to work in log space as well depending on what you need to do. Note that most stats algorithms work with log-probabilities because it's much more compatible with how computers represent numbers.
f3 is a bit more work due to the subtraction.

Why am I getting different bootstrap results using different algorithms?

I am using two different methods of trying to generate a bootstrap sample
np.random.seed(335)
y=np.random.normal(0,1,5)
b=np.empty(len(y)) #initializes an empty vector
for j in range(len(y)):
a = np.random.randint(1,len(y)) #Draws a random integer from 1 to n, where n is our sample size
b[j] = y[a-1] #indicies in python start at zero, the worst part of Python in my opinion
c = np.random.choice(y, size=5)
print(b)
print(c)
and for my output I get different results
[1.04749432 1.71963433 1.71963433 1.71963433 1.71963433]
[-0.25224454 -0.25224454 0.46604474 1.71963433 0.46604474]
I think the answer has something to do with the random number generator, but I'm confused as to the exact reason.
This comes down to the use of different algorithms for randomized selection. There are numerous equivalent ways to select items at random with replacement using a pseudorandom generator (or to generate random variates from any other distribution). In particular, the algorithm for numpy.random.choice need not make use of numpy.random.randint in theory. What matters is that these equivalent ways should produce the same distribution of random variates. In the case of NumPy, look at NumPy's source code.
Another, less important, reason for different results is that the two different selection procedures (randint and choice) produce pseudorandom numbers themselves, which can differ from each other because the selection procedures didn't begin with the same seed (more precisely, the same sequence of pseudorandom numbers). If we set the seed to the same value before beginning each procedure:
np.random.seed(335)
y=np.random.normal(0,1,5)
b=np.empty(len(y))
np.random.seed(999999) # Seed selection procedure 1
for j in range(len(y)):
a = np.random.randint(1,len(y))
b[j] = y[a-1]
np.random.seed(999999) # Seed selection procedure 2
c = np.random.choice(y, size=5)
print(b)
print(c)
then each procedure will begin with the same pseudorandom numbers. But even so, the two procedures may use different algorithms for random selection, and these differences may still lead to different results.
(However, numpy.random.* functions, such as randint and choice, have become legacy functions as of NumPy 1.17, and their algorithms are expected to remain as they are for backward compatibility reasons. That version didn't deprecate any numpy.random.* functions, however, so they are still available for the time being. See also this question. In newer applications you should make use of the new system introduced in version 1.17, including numpy.random.Generator, if you have that version or later. One advantage of the new system is that the application relies less on global state.)

Find the base of the power and power with given base in python

I have a number N let's say 5451 and I want to find the base for power 50. How do I do it with python?
The answer is 1.1877622648368 according this website (General Root calculator) But i need to find it with computation.
Second question. If I have the number N 5451, and I know the base 1.1877622648368, how do I find the power?
Taking the n-th root is the same as raising to the power 1/n. This is very simple to do in Python using the ** operator (exponentiation):
>>> 5451**(1/50)
1.1877622648368031
The second one requires a logarithm, which is the inverse of exponentiation. You want to take the base-1.1877622648368031 logarithm of 5451, which is done with the log function in the math module:
>>> import math
>>> math.log(5451, 1.1877622648368031)
50.00000000000001
As you can see, there's some roundoff error. This is unavoidable when working with limited-precision floating point numbers. Apply the round() function to the result if you know that it must be an integer.

Does Python have a function to mimic the sequence of C's rand()?

I am looking for a Python function that will mimic the behavior of rand() (and srand()) in c with the following requirements:
I can provide the same epoch time into the Python equivalent of srand() to seed the function
The equivalent of rand()%256 should result in the same char value as in c if both were provided the same seed.
So far, I have considered both the random library and numpy's random library. Instead of providing a random number from 0 to 32767 as C does though both yield a floating point number from 0 to 1 on their random functions. When attempting random.randint(0,32767), I yielded different results than when in my C function.
TL;DR
Is there an existing function/libary in Python that follows the same random sequence as C?
You can accomplish this with CDLL from https://docs.python.org/3/library/ctypes.html
from ctypes import CDLL
libc = CDLL("libc.so.6")
libc.srand(42)
print(libc.rand() % 32768)
You can't make a Python version of rand and srand functions "follo[w] the same random sequence" of C's rand and srand because the C standard doesn't specify exactly what that sequence is, even if the seed is given. Notably:
rand uses an unspecified pseudorandom number algorithm, and that algorithm can differ between C implementations, including versions of the same standard library.
rand returns values no greater than RAND_MAX, and RAND_MAX can differ between C implementations.
In general, the best way to "sync" PRNGs between two programs in different languages is to implement the same PRNG in both languages. This may be viable assuming you're not using pseudorandom numbers for information security purposes. See also this question: How to sync a PRNG between C#/Unity and Python?
You can use random.seed(). The following example should print the same sequence every time it runs:
import random
random.seed(42)
for _ in range(10):
print(random.randint(0, 32768))
Update: Just saw the last comment by the OP. No, this code won't give you the same sequence as the C code, because of reasons given in other comments. Two different C implementations won't agree either.

What is the default accuracy for the mpmath function hyperu?

Computation of the Tricomi confluent hypergeometric function can be ill-conditioned when it uses the sum of two 1F1 functions, as they can be nearly equal in size but opposite in sign. The mpmath function "hyperu" uses arbitrary precision internally and produces a result with 35 significant figures in default mode. How many of these digits are reliable? Does it depend on the parameters passed?
import mpmath
x = mpmath.hyperu(a, b + 1, u)
I have just received an email from the main author of mpmath, Fredrik Johansson, confirming that the full 35 digits are usable. He writes "hyperu uses adaptive higher precision internally, so the result should nearly always be accurate to the full precision set by the user".

Categories