Computation of the Tricomi confluent hypergeometric function can be ill-conditioned when it uses the sum of two 1F1 functions, as they can be nearly equal in size but opposite in sign. The mpmath function "hyperu" uses arbitrary precision internally and produces a result with 35 significant figures in default mode. How many of these digits are reliable? Does it depend on the parameters passed?
import mpmath
x = mpmath.hyperu(a, b + 1, u)
I have just received an email from the main author of mpmath, Fredrik Johansson, confirming that the full 35 digits are usable. He writes "hyperu uses adaptive higher precision internally, so the result should nearly always be accurate to the full precision set by the user".
Related
I'm looking for a way to disable this:
print(0+1e-20) returns 1e-20, but print(1+1e-20) returns 1.0
I want it to return something like 1+1e-20.
I need it because of this problem:
from numpy import sqrt
def f1(x):
return 1/((x+1)*sqrt(x))
def f2(x):
return 1/((x+2)*sqrt(x+1))
def f3(x):
return f2(x-1)
print(f1(1e-6))
print(f3(1e-6))
print(f1(1e-20))
print(f3(1e-20))
returns
999.9990000010001
999.998999986622
10000000000.0
main.py:10: RuntimeWarning: divide by zero encountered in double_scalars
return 1/((x+2)*sqrt(x+1))
inf
f1 is the original function, f2 is f1 shifted by 1 to the left and f3 is f2 moved back by 1 to the right. By this logic, f1 and f3 should be equal to eachother, but this is not the case.
I know about decimal. Decimal and it doesn't work, because decimal doesn't support some functions including sin. If you could somehow make Decimal work for all functions, I'd like to know how.
Can't be done. There is no rounding - that would imply that the exact result ever existed, which it did not. Floating-point numbers have limited precision. The easiest way to envision this is an analogy with art and computer screens. Imagine someone making a fabulously detailed painting, and all you have is a 1024x768 screen to view it through. If a microscopic dot is added to the painting, the image on the screen might not change at all. Maybe you need a 4K screen instead.
In Python, the closest representable number after 1.0 is 1.0000000000000002 (*), according to math.nextafter(1.0, math.inf) (Python 3.9+ required for math.nextafter).1e-20 and 1 are too different in magnitude, so the result of their addition cannot be represented by a Python floating-point number, which are precise to up to about 16 digits.
See Is floating point math broken? for an in-depth explanation of the cause.
As this answer suggests, there are libraries like mpmath that implement arbitrary-precision arithmetics:
from mpmath import mp, mpf
mp.dps = 25 # set precision to 25 decimal digits
mp.sin(1)
# => mpf('0.8414709848078965066525023183')
mp.sin(1 + mpf('1e-20')) # mpf is a constructor for mpmath floats
# => mpf('0.8414709848078965066579053457')
mpmath floats are sticky; if you add up an int and a mpf you get a mpf, so I did not have to write mp.mpf(1). The result is still not precise, but you can select what precision is sufficient for your needs. Also, note that the difference between these two results is, again, too small to be representable by Python's floating point numbers, so if the difference is meaningful to you, you have to keep it in the mpmath land:
float(mpf('0.8414709848078965066525023183')) == float(mpf('0.8414709848078965066579053457'))
# => True
(*) This is actually a white lie. The next number after 1.0 is 0x1.0000000000001, or 0b1.0000000000000000000000000000000000000000000000001, but Python doesn't like hexadecimal or binary float literals. 1.0000000000000002 is Python's approximation of that number for your decimal convenience.
As others have stated, in general this can't be done (due to how computers commonly represent numbers).
It's common to work with the precision you've got, ensuring that algorithms are numerically stable can be awkward.
In this case I'd redefine f1 to work on the logarithms of numbers, e.g.:
from numpy as sqrt, log, log1p,
def f1(x):
prod = log1p(x) + log(x) / 2
return exp(-prod)
You might need to alter other parts of the code to work in log space as well depending on what you need to do. Note that most stats algorithms work with log-probabilities because it's much more compatible with how computers represent numbers.
f3 is a bit more work due to the subtraction.
I have a number N let's say 5451 and I want to find the base for power 50. How do I do it with python?
The answer is 1.1877622648368 according this website (General Root calculator) But i need to find it with computation.
Second question. If I have the number N 5451, and I know the base 1.1877622648368, how do I find the power?
Taking the n-th root is the same as raising to the power 1/n. This is very simple to do in Python using the ** operator (exponentiation):
>>> 5451**(1/50)
1.1877622648368031
The second one requires a logarithm, which is the inverse of exponentiation. You want to take the base-1.1877622648368031 logarithm of 5451, which is done with the log function in the math module:
>>> import math
>>> math.log(5451, 1.1877622648368031)
50.00000000000001
As you can see, there's some roundoff error. This is unavoidable when working with limited-precision floating point numbers. Apply the round() function to the result if you know that it must be an integer.
I'm trying to implement Haversine's formula to determine whether a given location's latitude and longitude is within a specified radius. I used a formula detailed here:
Calculate distance between two latitude-longitude points? (Haversine formula)
I am experiencing a Math Domain Error with the following re-producible input, it does not happen all the time, but often enough to make me think I've written in-correct code:
from math import atan2, sqrt, sin, cos
# All long / lat values are in radians and of type float
centerLongitude = -0.0391412861306467
centerLatitude = 0.9334153362515779
inputLatitudeValue = -0.6096173085842176
inputLongitudeValue = 2.4190393564390438
longitudeDelta = inputLongitudeValue - centerLongitude # 2.4581806425696904
latitudeDelta = inputLatitudeValue - centerLatitude # -1.5430326448357956
a = (sin(latitudeDelta / 2) ** 2 + cos(centerLatitude) * cos(centerLongitude)
* sin(longitudeDelta / 2) ** 2)
# a = 1.0139858858386017
c = 2 * atan2(sqrt(a), sqrt(1 - a)) # Error occurs on this line
# Check whether distance is within our specified radius below
You cannot use sqrt on negative numbers:
>>> sqrt(-1)
ValueError: math domain error
use cmath.srt:
>>> import cmath
>>> cmath.sqrt(-1)
1j
in your case:
>>> a = 1.0139858858386017
>>> sqrt(1-a)
ValueError: math domain error
Speaking generally, and said simply... the variable a must be protected. It must never be greater than 1.0, and it must never be less than 0.0, and normally, with clean data, it should be properly in this range.
The problem is with how the common popular computer implementation of floating point arithmetic does its approximation and roundings, and how those results can sometimes be out of range or out of domain for a builtin math function. It is not particularly common that the combination of operations results in a number that is out of domain for a following math function, but in those cases where it does occur, depending on the algorithm, it is consistently reproducable, and needs to be accounted for and mitigated, beyond a normal theoretical algorithm intuition. What we code in computers is subject to how theoretical concepts have been implemented for the software and hardware. On paper, with pencil, we still must have guidelines for when and how to round floating point math results. Computer implementations are no different, but sometimes we are blissfully unaware of such things going on under the hood. Popular computer implementations not only need to know at what precision to round, but also are dealing with how and when to approximate and round in the conversion to and from the binary number representations that are actually calculated at the machine level.
Regarding the haversine formula, what a am I speaking about?
as seen in this version of code (for reference):
import math
a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
In this above example, the a is not properly protected. It is a lurking problem waiting to crash the c calculation on the next line, in certain conditions presented particularly to the haversine formula.
If some latlon combination of data results in
a = 1.00000000000000000000000000000000000001
the following c calculation with cause an error.
If some latlon combination of data results in
a = -0.00000000000000000000000000000000000000000000001
the following c calculation will cause an error.
It is the floating point math implementation of your language/platform and its method of rounding approximation that can cause the rare but actual and consistently repeatable ever so slightly out of domain condition that causes the error in an unprotected haversine coding.
Years ago I did a three day brute force test of relative angular distances between 0 and 1, and 179 and 180, with VERY small step values. The radius of the sphere is one, a unit sphere, so that radius values are irrelevant. Of course, finding approximate distances on the surface of the sphere in any unit other than angular distance would require including the radius of the sphere in its units. But I was testing the haversine logic implementation itself, and a radius of 1 eliminated a complication. When the relative angular distances are 0 -1, or 179-180, these are the conditions where haversine can have difficulties, implemented with popular floating point implementations that involve converting to and from binary at a low system level, if the a is not protected. Haversine is supposed to be well conditioned for small angular distances, theoretically, but a machine or software implementation of FPA (floating point arithmetic) is not always precisely cooperative with the ideals of a spherical geometry theory. After 3 days of my brute force test, there were logged thousands of latlon combinations that would crash the unprotected popularly posted haversine formula, because the a was not protected. You must protect the a. If it goes above 1.0, or below 0.0, even the very slightest bit, all one need do is simply test for that condition and nudge it back into range. Simple.
I protected an a of -0.000000000000000000002 or in other words, if a < 0.0, by reassigning the value 0.0 to it, and also setting another flag that I could inspect so I would know if a protective action had been necessary for that data.
I protected an a of 1.000000000000000000002 or in other words, if a > 1.0, by reassigning the value 1.0 to it, and also setting another flag that I could inspect so I would know if a protective action had been necessary for that data.
It is a simple 2 extra lines before the c calculation. You could squeeze it all on one line. The protective line(s) of code go after the a calculation, and before the c calculation. Protect your a.
Are you then loosing precision with those slight nudges? No more than what the floating point math with that data is already introducing with its approximations and rounding. It crashed with data that should not have crashed a pure theoretical algorithm, one that doesn't have FPA rare issues. Simply protect the a, and this should mitigate those errors, with haversine in this form. There are alternatives to haversine, but haversine is completely suitable for many uses, if one understands where it is well suited. I use it for skysphere calculations where the ellipsoid shape of the earth has nothing to do with anything. Haversine is simple and fast. But remember to protect your a.
After being unsuccessful in using decorators to define the stochastic object of the "logarithm of an exponential random variable", I decided to manually write the code for this new distribution using pymc.stochastic_from_dist. The model that I am trying to implement is available here(the first model):
Now when I try to sample the log(alpha) using MCMC Metropolis and with a Normal distribution as proposal(as it has been stated in the following picture as the sampling method), I am getting the following error:
File "/Library/Python/2.7/site-packages/pymc/distributions.py", line 980, in rdirichlet
return (gammas[0]/gammas[0].sum())[:-1]
FloatingPointError: invalid value encountered in divide
Although the times that the sampling doesn't run into error the sampling histograms are matching with the ones in this paper. My hierarchical model is:
"""
A Hierarchical Bayesian Model for Bags of Marbles
logalpha ~ logarithm of an exponential distribution with parameter lambd
beta ~ Dirichlet([black and white ball proportions]:vector of 1's)
theta ~ Dirichlet(alpha*beta(vector))
"""
import numpy as np
import pymc
from scipy.stats import expon
lambd=1.
__all__=['alpha','beta','theta','logalpha']
#------------------------------------------------------------
# Set up pyMC model: logExponential
# 1 parameter: (alpha)
def logExp_like(x,explambda):
"""log-likelihood for logExponential"""
return -lambd*np.exp(x)+x
def rlogexp(explambda, size=None):
"""random variable from logExponential"""
sample=np.random.exponential(explambda,size)
logSample=np.log(sample)
return logSample
logExponential=pymc.stochastic_from_dist('logExponential',logp=logExp_like,
random=rlogexp,
dtype=np.float,
mv=False)
#------------------------------------------------------------
#Defining model parameteres alpha and beta.
beta=pymc.Dirichlet('beta',theta=[1,1])
logalpha=logExponential('logalpha',lambd)
#pymc.deterministic(plot=False)
def multipar(a=logalpha,b=beta):
out=np.empty(2)
out[0]=(np.exp(a)*b)
out[1]=(np.exp(a)*(1-b))
return out
theta=pymc.Dirichlet('theta',theta=multipar)
And my test sampling code is:
from pymc import Metropolis
from pymc import MCMC
from matplotlib import pyplot as plt
import HBM
import numpy as np
import pymc
import scipy
M=MCMC(HBM)
M.use_step_method(Metropolis,HBM.logalpha, proposal_sd=1.,proposal_distribution='Normal')
M.sample(iter=1000,burn=200)
When I check the values of theta passed to gamma distribution in line 978 of distributions.py I see that there are not zero but small values! So I don't know how to prevent this floating point error?
I found this nugget in their documentation:
The stochastic variable cutoff cannot be smaller than the largest element of D, otherwise D’s density would be zero. The standard Metropolis step method can handle this case without problems; it will propose illegal values occasionally, but these will be rejected.
This would lead me to believe that the dtype=np.float (which is essential has the same range as float), may not be the method you want to go about. The documentation says it needs to be a numpy dtype, but it just needs to be a dtype that converts to a numpy dtype object and in Python2 (correct me if I'm wrong) number dtypes were fixed size types meaning they're the same. Maybe utilizing the Decimal module would be an option. This way you can set the level of precision to encapsulate expected value ranges, and pass it to your extended stochastic method where it would be converted.
from decimal import Decimal, getcontext
getcontext().prec = 15
dtype=Decimal
I don't know this wouldn't still be truncated once the numpy library got a hold of it, or if it would respect the inherited level of precision. I have no accurate method of testing this, but give it a try and let me know how that works for you.
Edit: I tested the notion of precision inheritance and it would seem to hold:
>>> from decimal import Decimal, getcontext
>>> getcontext().prec = 10
>>> Decimal(1) / Decimal(7)
Decimal('0.1428571429')
>>> np.float(Decimal(1) / Decimal(7))
0.1428571429
>>> getcontext().prec = 15
>>> np.float(Decimal(1) / Decimal(7))
0.142857142857143
>>>
If you do get small numbers, it might simply be too small for a float. This is typically also what the logarithms are there for to avoid. What if you use dtype=np.float64?
As you have suggested at the end of your question, the issue is with too small numbers that are float-casted to 0. One solution could be to tweak a little the source code and replace the division with for example np.divide and in the "where" condition to add some explicit casting for to small values to a given threshold.
I am wondering what the "precision" of Python's random function means. It is described here:
Almost all module functions depend on the basic function random(),
which generates a random float uniformly in the semi-open range [0.0,
1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1.
(http://docs.python.org/library/random.html?highlight=mersenne%20twister, accessed 20120727)
What interests me is that I can generate very large random integers (long integers) that appear to have considerably more than 2^53 precision. For instance (using Ipython):
In [1]: from math import factorial as F
In [2]: from random import randint as R
In [3]: R(1, F(900))
Out[3]: 55655511302846458744179265243566263049348396362730789786376014445325896599604354914431619960209388364677180234108513221468671377813842671874148746886513973171423907294544220953849330089822288697383171078250181973489187774341795574648920075697792011317798969959919449394758519496792725695600701199089972009688412593325291810024048811890509220571436407156566269358600296506017343255050788936280200352509087073097532486502694101150248815092174847010359868156616901409331336760344351058867833528749797221612169430654334458578364850198977511061993233818849689759090377347376020160658362459773356292085856906573553086825560047089834757501023094429371408722563891227474029563545206865055657504766128286451181119906678062837368414582707728324415466848186858173236300969443478496634754744888060794778485246692104851885847515244146665974598354436781340057667983223238998674622833320199904840957000014767293658171874973067958145430346745707636676061629278168015549755791407108399231392952706279787486238512258804098030513575025870504347283221015756832157863142353915612138589145084128778032995695113870365505775392647256056048691602676699581153972467494111720212363912926352356346807790816796784781384561736415741104584667536002819103176714157723039428367564698686945824679882523439229215035996634289075127375256728472056511244548311771570743103809147045947583819651257115044154025329883682429231394004470689760531056853018427649916035935302356382633012319775473728455377657692268855776796385819792347680100513177355101630543290088996770992548670273727988974570199179655691444984337837105283447276788151912408533352627494948390016029881755603243934955207024221452181883522004648595373130617729041347013155205217774450836687880723915563507108222768637840614647145898936109917167237397888104669458661404234553707323638883064861414284282190898741067404128885188113697448726481104763682489126524054241797759521120664366845719767486252884585742737830119890190213053751046461419643379561983590174574185268661318409035375114305279020423595250660644954841798619767985549553380200803904976806468796334648515423467654573415304912570341635682203261002606581817207689816015969520503052648773609840260050676394927780076948629298559638703440007364834579712680931643829764810072128419905903786966L
I am wondering in what sense 53 bits is the precision limit of the random function. And concretely, if I ask Python to return pseudo-random numbers between 1 and some very large upper bound, is it not true that all integers in that range have an equal likelihood of being returned?
What Python does is generate 64 bits of randomness from calling the 32-bit version of MT19937 twice, but since that number is constrained to [0.0, 1.0) the result is constrained to 53 bits of precision (limitation of floating-point format).
Python random.py is capable of pulling bits from /dev/urandom or Windows CryptGenRandom (if supported), if you use the random.SystemRandom class. Otherwise, it generates larger numbers by pulling successive bits from repeated calls to MT19937.
53 bits is not an inherent limit in the generator, it is the amount that Python returns when you request a float, since floats have 53 bits of precision.
You can get random integers directly using stuff like random.getrandbits.
In more detail, the Mersenne Twister used in CPython generates 32 bits at a time. The module code calls this twice and combines the results to generate a 53 bit float. If you call getrandbits, it will call the internal function as many times as necessary to generate k bits. The code for this can be found in here.