I calculate the first derivative using the following code:
def f(x):
f = np.exp(x)
return f
def dfdx(x):
Df = (f(x+h)-f(x-h)) / (2*h)
return Df
For example, for x == 10 this works fine. But when I set h to around 10E-14 or below, Df starts
to get values that are really far away from the expected value f(10) and the relative error between the expected value and Df becomes huge.
Why is that? What is happening here?
The evaluation of f(x) has, at best, a rounding error of |f(x)|*mu where mu is the machine constant of the floating point type. The total error of the central difference formula is thus approximately
2*|f(x)|*mu/(2*h) + |f'''(x)|/6 * h^2
In the present case, the exponential function is equal to all of its derivatives, so that the error is proportional to
mu/h + h^2/6
which has a minimum at h = (3*mu)^(1/3), which for the double format with mu=1e-16 is around h=1e-5.
The precision is increased if instead of 2*h the actual difference (x+h)-(x-h) between the evaluation points is used in the denominator. This can be seen in the following loglog plot of the distance to the exact derivative.
You are probably encountering some numerical instability, as for x = 10 and h =~ 1E-13, the argument for np.exp is very close to 10 whether h is added or subtracted, so small approximation errors in the value of np.exp are scaled significantly by the division with the very small 2 * h.
In addition to the answer by #LutzL I will add some info from a great book Numerical Recipes 3rd Edition: The Art of Scientific Computing from chapter 5.7 about Numerical Derivatives, especially about the choice of optimal h value for given x:
Always choose h so that h and x differ by an exactly representable number. Funny stuff like 1/3 should be avoided, except when x is equal to something along the lines of 14.3333333.
Round-off error is approximately epsilon * |f(x) * h|, where epsilon is floating point accuracy, Python represents floating point numbers with double precision so it's 1e-16. It may differ for more complicated functions (where precision errors arise further), though it's not your case.
Choice of optimal h: Not getting into details it would be sqrt(epsilon) * x for simple forward case, except when your x is near zero (you will find more information in the book), which is your case. You may want to use higher x values in such cases, complementary answer is already provided. In the case of f(x+h) - f(x-h) as in your example it would amount to epsilon ** 1/3 * x, so approximately 5e-6 times x, which choice might be a little difficult in case of small values like yours. Quite close (if one can say so bearing in mind floating point arithmetic...) to practical results posted by #LutzL though.
You may use other derivative formulas, except the symmetric one you are using. You may want to use the forward or backward evaluation(if the function is costly to evaluate and you have calculated f(x) beforehand. If your function is cheap to evaluate, you may want to evaluate it multiple times using higher order methods to make the precision error smaller (see five-point stencil on wikipedia as provided in the comment to your question).
This Python tutorial explains the reason behind the limited precision. In summary, decimals are ultimately represented in binary and the precision is about 17 significant digits. So, you are right that it gets fuzzy beyond 10E-14.
Related
Entailed by the fundamental theorem of algebra is the existence of n complex roots for the formula z^n=a where a is a real number, n is a positive integer, and z is a complex number. Some roots will also be real in addition to complex (i.e. a+bi where b=0).
One example where there are multiple real roots is z^2=1 where we obtain z = ±sqrt(1) = ± 1. The solution z = 1 is immediate. The solution z = -1 is obtained by z = sqrt(1) = sqrt(-1 * -1) = I * I = -1, which I is the imaginary unit.
In Python/NumPy (as well as many other programming languages and packages) only a single value is returned. Here are two examples for 5^{1/3}, which has 3 roots.
>>> 5 ** (1 / 3)
1.7099759466766968
>>> import numpy as np
>>> np.power(5, 1/3)
1.7099759466766968
It is not a problem for my use case that only one of the possible roots are returned, but it would be informative to know 'which' root is systematically calculated in the contexts of Python and NumPy. Perhaps there is an (ISO) standard stating which root should be returned, or perhaps there is a commonly-used algorithm that happens to return a specific root. I've imagined of an equivalence class such as "the maximum of the real-valued solutions", but I do not know.
Question: When I take an nth root in Python and NumPy, which of the n existing roots do I actually get?
Since typically the idenity xᵃ = exp(a⋅log(x)) is used to define the general power, you'll get the root corresponding to the chosen branch cut of the complex logarithm.
With regards to this, the numpy documentation says:
For real-valued input data types, log always returns real output. For each value that cannot be expressed as a real number or infinity, it yields nan and sets the invalid floating point error flag.
For complex-valued input, log is a complex analytical function that has a branch cut [-inf, 0] and is continuous from above on it. log handles the floating-point negative zero as an infinitesimal negative number, conforming to the C99 standard.
So for example, np.power(-1 +0j, 1/3) = 0.5 + 0.866j = np.exp(np.log(-1+0j)/3).
I am trying to implement Gensim's most_similar function by hand but calculate the similarity between the query word and just one other word (avoiding the time to calculate it for the query word with all other words). So far I use
cossim = (np.dot(a, b)
/ np.linalg.norm(a)
/ np.linalg.norm(b))
and this is the same as the similarity result between a and b. I find this works almost exactly but that some precision is lost, for example
from gensim.models.word2vec import Word2Vec
import gensim.downloader as api
model_gigaword = api.load("glove-wiki-gigaword-300")
a = 'france'
b = 'chirac'
cossim1 = model_gigaword.most_similar(a)
import numpy as np
cossim2 = (np.dot(model_gigaword[a], model_gigaword[b])
/ np.linalg.norm(model_gigaword[a])
/ np.linalg.norm(model_gigaword[b]))
print(cossim1)
print(cossim2)
Output:
[('french', 0.7344760894775391), ('paris', 0.6580672264099121), ('belgium', 0.620672345161438), ('spain', 0.573593258857727), ('italy', 0.5643460154533386), ('germany', 0.5567398071289062), ('prohertrib', 0.5564222931861877), ('britain', 0.5553334355354309), ('chirac', 0.5362644195556641), ('switzerland', 0.5320892333984375)]
0.53626436
So the most_similar function gives 0.53626441955... (rounds to 0.53626442) and the calculation with numpy gives 0.53626436. Similarly, you can see differences between the values for 'paris' and 'italy' (in similarity compared to 'france'). These differences suggest that the calculation is not being done to full precision (but it is in Gensim). How can I fix it and get the output for a single similarity to higher precision, exactly as it comes from most_similar?
TL/DR - I want to use function('france', 'chirac') and get 0.5362644195556641, not 0.53626436.
Any idea what's going on?
UPDATE: I should clarify, I want to know and replicate how most_similar does the computation, but for only one (a,b) pair. That's my priority, rather than finding out how to improve the precision of my cossim calculation above. I just assumed the two were equivalent.
To increase accuracy you can try the following:
a = np.array(model_gigaword[a]).astype('float128')
b = np.array(model_gigaword[b]).astype('float128')
cossim = (np.dot(a, b)
/ np.linalg.norm(a)
/ np.linalg.norm(b))
The vectors are likely to use lower-precision floats and hence there is loss precision in calculations.
However, the results I got are somewhat different to what model_gigaword.most_similar offers for you:
model_gigaword.similarity: 0.5362644
float64: 0.5362644263010196
float128: 0.53626442630101950744
You may want to check what you get on your machine and with your version of Python and gensim.
Because floating-point numbers (like the np.float32-typed values in these vector models) are represented using an imprecise binary approximation, none of the numbers you're working with, or displaying, are the exact decimal numbers you think they are.
The number you're seeing as 0.53626436 isn't exactly that - but some binary floating-point number very close to that number. Similarly, the number you're seeing as 0.5362644195556641 isn't exactly that – but some other binary floating-point number, ver close to that.
Further, these tiny imprecisions can mean that mathematical expressions that should under ideal circumstances give identical results to each other, no matter the order-of-evaluation, instead give slightly different results for different orders-of-evaluation. For example, we know that mathematically, a * (b + c) is always equal to ab + ac. However, if a, b, & c are floating-point numbers with limited precision, the results of doing the addition then multiplication, versus doing two multiplications then one addition, might vary - because the interim values would have been approximated slightly differently.
But: for nearly all domains in which these numbers are used, this tiny amount of noise shouldn't make any difference. The right policy is to ignore it, and write code that's robust to this small 'jitter' in extremely-low-significance digits - especially when printing or comparing results.
So really you should only be printing/comparing these numbers to a level of significance where they reliably agree, say, 4 digits after the decimal:
0.53626436
0.5362644195556641
(In fact, your output already makes it look like you may have changed the default level of display-precision in numpy or python, because it wouldn't be typical for the results of most_simlar() to display with those 16 digits after the decimal.)
If you really, really wanted, as an exploration, to match the most_similar() results exactly, you could look at its source code. Then, perform the exact same steps, in the exact same order, using the exact same library routines, on your inputs.
(Here's the source for most_similar() in the current gensim-4.0.0beta prerelease: https://github.com/RaRe-Technologies/gensim/blob/4.0.0beta/gensim/models/keyedvectors.py#L690)
But: insisting on such exact correspondence is usually unwise, & creates more-fragile code, given the inherent imprecision in floating-point math.
See also: another answer covering some similar issues, which also points out a way to change the default displayed precision.
I'm trying to create a function in a network with trainable parameters. In my function I have an exponential that for large tensor values goes to infinity. What would the best way to avoid this be?
The function is as follows:
step1 = Pss-(k*Pvv)
step2 = step1*s
step3 = torch.exp(step2)
step4 = torch.log10(1+step3)
step5 = step4/s
#or equivalently
# train_curve = torch.log(1+torch.exp((Pss-k*Pvv)*s))/s
If it makes it easier to understand, the basic function is log10(1+e^(x-const)*10)/10. The exponential inside the log gets too big and goes to inf.
I think I might have to normalize my tensor x, and this would mean normalizing the constants and the rest of the function also. Would someone have any thoughts on the best way to go about this?
Thanks so much.
One solution is to just use a more stable computation. Notice that log(1 + exp(x)) is approximately equal to x when x is large enough. Intuitively this can be observed by noting that, for example, exp(50) is approximately 5.18e+21 for which adding 1 will have no effect when using 32-bit floating point arithmetic like PyTorch does. Further verification using an arbitrary precision calculator shows that the error in this approximation at 50 is far outside the maximum 32-bit floating point precision (which is about 7 decimal digits).
Using this information we can implement a simple piecewise function in PyTorch for which we use log1p(exp(x)) for values less than 50 and x for values greater than 50. Also note that this function is autograd compatible
def log1pexp(x):
# more stable version of log(1 + exp(x))
return torch.where(x < 50, torch.log1p(torch.exp(x)), x)
This get's us most of the way to a solution since you actually want to evaluate torch.log10(1+torch.exp((Pss-k*Pvv)*s))/s
Now we can use our new log1pexp function to compute this expression without worrying about infinities
(log1pexp((Pss - k*Pvv)*s) / math.log(10)) / s
and mind the conversion from natural log to log base-10 by dividing by log(10).
Why is 1**Inf == 1 ?
I believe it should be NaN, just like Inf-Inf or Inf/Inf.
How is exponentiation implemented on floats in python?
exp(y*log(x)) would get correct result :/
You are right, mathematically, the value of 1∞ is indeterminate.
However, Python doesn't follow the maths exactly in this case. The document of math.pow says:
math.pow(x, y)
Return x raised to the power y. Exceptional cases follow Annex ‘F’ of the C99 standard as far as possible. In particular, pow(1.0, x) and pow(x, 0.0) always return 1.0, even when x is a zero or a NaN.
Floating-point arithmetic is not real-number arithmetic. Notions of "correct" informed by real analysis do not necessarily apply to floating-point.
In this case, however, the trouble is just that pow fundamentally represents two similar but distinct functions:
Exponentiation with an integer power, which is naturally a function RxZ --> R (or RxN --> R).
The two-variable complex function given by pow(x,y) = exp(y * log(x)) restricted to the real line.
These functions agree for normal values, but differ in their edge cases at zero, infinity, and along the negative real axis (which is traditionally the branch cut for the second function).
These two functions are sometimes divided up to make the edge cases more reasonable; when that's done the first function is called pown and the second is called powr; as you have noticed pow is a conflation of the two functions, and uses the edge cases for these values that come from pown.
Technically 1^inf is defined as limit(1^x, x->inf). 1^x = 1 for any x >1, so it should be limit(1,x->inf) = 1, not NaN
The following code causes the print statements to be executed:
import numpy as np
import math
foo = np.array([1/math.sqrt(2), 1/math.sqrt(2)], dtype=np.complex_)
total = complex(0, 0)
one = complex(1, 0)
for f in foo:
total = total + pow(np.abs(f), 2)
if(total != one):
print str(total) + " vs " + str(one)
print "NOT EQUAL"
However, my input of [1/math.sqrt(2), 1/math.sqrt(2)] results in the total being one:
(1+0j) vs (1+0j) NOT EQUAL
Is it something to do with mixing NumPy with Python's complex type?
When using floating point numbers it is important to keep in mind that working with these numbers is never accurate and thus computations are every time subject to rounding errors. This is caused by the design of floating point arithmetic and currently the most practicable way to do high arbitrary precision mathematics on computers with limited resources. You can't compute exactly using floats (means you have practically no alternative), as your numbers have to be cut off somewhere to fit in a reasonable amount of memory (in most cases at maximum 64 bits), this cut-off is done by rounding it (see below for an example).
To deal correctly with these shortcomings you should never compare to floats for equality, but for closeness. Numpy provides 2 functions for that: np.isclose for comparison of single values (or a item-wise comparison for arrays) and np.allclose for whole arrays. The latter is a np.all(np.isclose(a, b)), so you get a single value for an array.
>>> np.isclose(np.float32('1.000001'), np.float32('0.999999'))
True
But sometimes the rounding is very practicable and matches with our analytical expectation, see for example:
>>> np.float(1) == np.square(np.sqrt(1))
True
After squaring the value will be reduced in size to fit in the given memory, so in this case it's rounded to what we would expect.
These two functions have built-in absolute and relative tolerances (you can also give then as parameter) that are use to compare two values. By default they are rtol=1e-05 and atol=1e-08.
Also, don't mix different packages with their types. If you use Numpy, use Numpy-Types and Numpy-Functions. This will also reduce your rounding errors.
Btw: Rounding errors have even more impact when working with numbers which differ in their exponent widely.
I guess, the same considerations as for real numbers are applicable: never assume they can be equal, but rather close enough:
eps = 0.000001
if abs(a - b) < eps:
print "Equal"