numpy poly() and roots() aren't reversible?

numpy poly() and roots() aren't reversible? - python

I expected the poly() and roots() functions to be each other's inverse. However, this isn't quite true:
# Polys coeffs
pol_c = np.poly([-1, 1, 1, 10]) # Get Polynomial coeffs for eqt with stated roots
# Roots from the poly equation
root_val = np.roots(pol_c)
# Roots from the poly equation, manually entered as integers
roots_v2 = np.roots([1,-11,9,11,-10])
print(pol_c)
print(root_val)
print(roots_v2)
Gives
[1. -11. 9. 11. -10.]
[10.+0.0000000e+00j -1.+0.0000000e+00j 1.+9.6357437e-09j
1.-9.6357437e-09j]
[10.+0.0000000e+00j -1.+0.0000000e+00j 1.+9.6357437e-09j
1.-9.6357437e-09j]
ie. the 3rd & 4th roots are (slightly) imaginary instead of real
My first thought was floating point error, but given that roots() outputs the same answer for float and int inputs that seems not to be the case. Plus I would expect poly() to give non-integer answers if floating point accuracy was limiting the solves.

The functions are inverses of each other, within some computational errors (which may be complex), and up to reordering of roots.
pol_c = np.poly([-1, 1, 1, 10])
root_val = np.roots(pol_c)
print(np.real_if_close(np.around(root_val, 6)))
prints [10. -1. 1. 1.] which is the same as we started with, in another order.
Of course, the order need not be the same: the original order of roots is lost when pol_c was formed, and there is no canonical order for the roots of polynomials (which are generally complex) anyway.

Related

Finding FFT coefficients from FFT or RFFT in Python

I want to use the sum of first five fft coefficients as a feature for a classifier (in Python language). I tried a few resources but I can't get a grasp of this concept. For example, I have an array of 10 elements.
a = [ 1, 2, 3, 4, 1, 1, 1, 1, 1, 1] # Lets say, it represent discrete values of the x-axis of an accelerometer
if I apply fft in Python to this array, I get the following output:
array([ 16.00000000+0.j , 0.50000000-5.34306783j,
-3.73606798-0.36327126j, 0.50000000+1.98786975j,
0.73606798-1.53884177j, -2.00000000+0.j ,
0.73606798+1.53884177j, 0.50000000-1.98786975j,
-3.73606798+0.36327126j, 0.50000000+5.34306783j])
if I apply rfft (real fft) in Python to this array, I get the following output:
array([ 16. , 0.5 , -5.34306783, -3.73606798,
-0.36327126, 0.5 , 1.98786975, 0.73606798,
-1.53884177, -2. ])
How can I calculate the the sum of first five coefficients from these two outputs?
In case of rfft:
Should it be just the sum of absolute values of the first five values?

Can someone explain the difference between these two outputs? Shouldn't rfft just display the real part of the fft?
rfft efficiently computes the FFT of a real-valued input sequence whereas fft computes the FFT of a possibly complex-valued input sequence. If the input sequence happen to be purely real, fft will return an equivalent output, within some numerical accuracy and packaging considerations. More specifically for the packaging, rfft avoid returning the upper half of the spectrum which happens to be symmetric when computing the FFT of a real-valued input. It also avoids returning the imaginary part of the DC (0Hz) bin and of the Nyquist frequency (half the sampling rate) bin since those are always zero when dealing with real-valued inputs.
So, the output from fft.fft of your example can be mapped to the following outputs of fft.rfft:
16.00000000+0.j -> rfft[0]
0.50000000-5.34306783j -> rfft[1], rfft[2]
-3.73606798-0.36327126j -> rfft[3], rfft[4]
0.50000000+1.98786975j -> rfft[5], rfft[6]
0.73606798-1.53884177j -> rfft[7], rfft[8]
-2.00000000+0.j -> rfft[9]
How can I calculate the sum of first five coefficients from these two outputs? In case of rfft: should it be just the sum of absolute values of the first five values?
As observed from the different packaging of the outputs, the first 5 complex-valued coefficients of fft.fft correspond to the first 9 floating point values returned by fft.rfft. To compute the sum you will have to compute separately the sum on the real parts and on the imaginary parts. So, for the sum of the first five coefficients this would give you something like:
A = np.fft.rfft(a);
sum_re = A[0] + A[1] + A[3] + A[5] + A[7];
sum_im = A[2] + A[4] + A[6] + A[8];

Numerically stable softmax

Is there a numerically stable way to compute softmax function below?
I am getting values that becomes Nans in Neural network code.
np.exp(x)/np.sum(np.exp(y))

The softmax exp(x)/sum(exp(x)) is actually numerically well-behaved. It has only positive terms, so we needn't worry about loss of significance, and the denominator is at least as large as the numerator, so the result is guaranteed to fall between 0 and 1.
The only accident that might happen is over- or under-flow in the exponentials. Overflow of a single or underflow of all elements of x will render the output more or less useless.
But it is easy to guard against that by using the identity softmax(x) = softmax(x + c) which holds for any scalar c: Subtracting max(x) from x leaves a vector that has only non-positive entries, ruling out overflow and at least one element that is zero ruling out a vanishing denominator (underflow in some but not all entries is harmless).
Footnote: theoretically, catastrophic accidents in the sum are possible, but you'd need a ridiculous number of terms. For example, even using 16 bit floats which can only resolve 3 decimals---compared to 15 decimals of a "normal" 64 bit float---we'd need between 2^1431 (~6 x 10^431) and 2^1432 to get a sum that is off by a factor of two.

Softmax function is prone to two issues: overflow and underflow
Overflow: It occurs when very large numbers are approximated as infinity
Underflow: It occurs when very small numbers (near zero in the number line) are approximated (i.e. rounded to) as zero
To combat these issues when doing softmax computation, a common trick is to shift the input vector by subtracting the maximum element in it from all elements. For the input vector x, define z such that:
z = x-max(x)
And then take the softmax of the new (stable) vector z
Example:
def stable_softmax(x):
z = x - max(x)
numerator = np.exp(z)
denominator = np.sum(numerator)
softmax = numerator/denominator
return softmax
# input vector
In [267]: vec = np.array([1, 2, 3, 4, 5])
In [268]: stable_softmax(vec)
Out[268]: array([ 0.01165623, 0.03168492, 0.08612854, 0.23412166, 0.63640865])
# input vector with really large number, prone to overflow issue
In [269]: vec = np.array([12345, 67890, 99999999])
In [270]: stable_softmax(vec)
Out[270]: array([ 0., 0., 1.])
In the above case, we safely avoided the overflow problem by using stable_softmax()
For more details, see chapter Numerical Computation in deep learning book.

Extending #kmario23's answer to support 1 or 2 dimensional numpy arrays or lists. 2D tensors (assuming the first dimension is the batch dimension) are common if you're passing a batch of results through softmax:
import numpy as np
def stable_softmax(x):
z = x - np.max(x, axis=-1, keepdims=True)
numerator = np.exp(z)
denominator = np.sum(numerator, axis=-1, keepdims=True)
softmax = numerator / denominator
return softmax
test1 = np.array([12345, 67890, 99999999]) # 1D numpy
test2 = np.array([[12345, 67890, 99999999], # 2D numpy
[123, 678, 88888888]]) #
test3 = [12345, 67890, 999999999] # 1D list
test4 = [[12345, 67890, 999999999]] # 2D list
print(stable_softmax(test1))
print(stable_softmax(test2))
print(stable_softmax(test3))
print(stable_softmax(test4))
[0. 0. 1.]
[[0. 0. 1.]
[0. 0. 1.]]
[0. 0. 1.]
[[0. 0. 1.]]

There is nothing wrong with calculating the softmax function as it is in your case. The problem seems to come from exploding gradient or this sort of issues with your training methods. Focus on those matters with either "clipping values" or "choosing the right initial distribution of weights".

differentiating a polynomial interpolated set of data points

I'm aware that we can us numpy to differentiate polynomials with the following:
f = numpy.poly1d([1, 0, 1])
f.deriv()
I've tried interpolating a set of data points and performing deriv() on the resulting polynomial.
from scipy import interpolate
x = [-2,-1,2]
y = [-2,1,-1]
f = interpolate.interp1d(x, y)
f.deriv()
But the object f is of different type.
Basically, how might I convert f to a numpy polynomial object ready for differentiation?
Thanks a lot!

The issue you're facing here is the way the interpolation actually works. Interpolation can at most guess some local function that matches the given points the best, but it can't exactly, and probably never is, except for perhaps some extreme easy cases(?), be exactly correct as the actual function given.
That said, you can approximate a function in a given range as a Taylor polynomial to a good degree. This should for relatively narrow ranges and a good guess of the initial function work sufficiently well for you.(?)
import numpy as np
from scipy import interpolate
x = [-2, -1, 2]
y = [-2, 1, -1]
f = interpolate.interp1d(x, y)
h = interpolate.approximate_taylor_polynomial(f, -1, 2, 2)
h
>>>> poly1d([-0.61111111, 1.16666667, -0.22222222])
h.deriv()
>>>> poly1d([-1.22222222, 1.16666667])
EDIT I Expanding the original answer for clarification:
I wanted to show that this approach works to a point. The above OP example used is really small MWE example and thus the results are less than convincing.
To show it's fairly close approximation I'll construct a polynomial. I'll get its values in range [-5, 5]. I'll use the range [-5, 5] and the returned values of polynomial in as interpolation arrays.
I'll approximate the interpolated function with the Taylor series expansion using the best "guesses" I have (since I constructed the original polynomial this is not really a guess tbh).
I'll compare the results in range [-5, 5] from the Taylor expansion with the original polynomial values in the range.
f = np.poly1d([1,0,1])
f([-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5])
>>> array([26, 17, 10, 5, 2, 1, 2, 5, 10, 17, 26])
x = [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5]
y = [26, 17, 10, 5, 2, 1, 2, 5, 10, 17, 26]
f = interpolate.interp1d(x, y)
h = interpolate.approximate_taylor_polynomial(f, 0, 2, 5)
h(x)
>>> [26., 17.12, 10.21333333, 5.28, 2.32, 1.33333333, 2.32, 5.28, 10.21333333, 17.12, 26.]
f(x)
>>> [26., 17., 10., 5., 2., 1., 2., 5., 10., 17., 26.]
Here are some examples to show how the guesses get better and better the higher order of Taylor expansion you use. Careful as the manual says the expansion is unstable once it reaches order of 30.
h = interpolate.approximate_taylor_polynomial(f, 0, 15, 5)
>>> [ 25.41043927, 17.18570392, 10.19122784, 5.09107466,
2.02363911, 1. , 2.02664952, 5.07915194,
10.22646919, 17.13871545, 26. ])
h = interpolate.approximate_taylor_polynomial(f, 0, 20, 5)
>>> [ 26. , 17.13481942, 10.10070835, 5.21247548,
2.13174692, 1.23098041, 2.13174692, 5.21247548,
10.10070835, 17.13481942, 25.9999999 ])
EDIT II Answers for questions in comments:
It's not a stupid question. I can see that Taylor series is confusing you. In math they usually show the mathematical definition for the Taylor series based on nth order derivatives of the original function in the point of expansion but don't show others so it might be confusing as how to apply it in a broader sense.
In essence it's the same as with derivatives:
f' = lim d->0 [ ( f(x+d)-f(x) )/d ]
which in numerical programing we just approximate with:
f' = f(x+d)-f(x)/d (there are other approx of the derivative as well)
and that's an ok approximation as long as the d remains really small. Taylor series of a function goes something like this:
0th order: h ~ f(a)
1st order: h ~ f(a) + f'(a)(x-a)
2nd order: h ~ f(a) + f'(a)(x-a) + f''(a)/2 * (x-a)^2 ...
...
so if we now introduce our derivative approximation into the series expansion:
1st deriv 2nd deriv
h ~ f(a) + [ (f(a+d)-f(a))/d ] (x-a) + [ ( f(a+d) - 2f(a) + f(a-d) )/d^2 ) ] * (x-a)^2 ....
so now you see why the function needs the point in which it needs to be evaluated. Now, this helps us only to get rid of the derivation of the original function we had.
So you see, we don't need to know the original function at all. All we have to be able to do is provide approximated values the original function would have had in the point of expansion. And that is exactly what interpolation gives you.
Interpolation takes in a set of points presumably attributed to some original function and then based on the behavior of those points tries to guess which points in between the given points would most likely be in the graph of the original function as well. So in an essence, interpolation tries to guess the values of would-be function whose original points we know, within the range of those original points.
Ok. But what do we do about the fact that Taylor exp. goes into infinity?
We just round it of:
h ~ 0th + 1st + 2nd + 3rd + ... + nth + P
and we call P the remainder. There are ways of estimating this remainder, some given by Taylor himself iirc.
Ok, but now why do we need a range in our function call?
What we did here is what's called a finite difference method. In essence it is just about as simple as that, in reality things can get a bit more complicated because you have to show that indeed you can do these things and not break the convergence of Taylor series. It turns out that you don't break the Taylor series BUT only for bounded continuous functions, which means that you can only approx functions on a certain interval.
Think of it this way. You can approximate a straight line with a Taylor series. Think of it as compounding more and more and more polynomial orders until their "waviness" cancels each other out. Like doing sin^2 + cos^2 which is always 1.
But if you stop the series expansion at some order, then suddenly you don't have anything stopping the series from diverging again. Because Taylor series is just one big polynomial it will either start going up into infinity or down into infinity. Look at the image bellow, it shows Taylor series approximating the original quadratic function f in the point of expansion 0 on a range of 10 around it; but plotted from -50 to 50.
Special interest is the 1st series order, which is just a straight line as you can see from the formulas above (green). Notice how as soon as the series cross -10 or 10 they start diverging from the actual function by a lot. In some cases the functions were similar enough to continue being close in value with the original function (i.e. 2nd order Taylor series is also a quadratic equation which is why it traces the original function very well).
Unfortunately because we do not have any prior knowledge about the original function in your case it's impossible to determine that some Taylor expansion estimates it perfectly. As far as we know we only approximated the function around 0. It might as well contain sine or cosine members for all we know.
As far as your question about f is concerned:
f is just some dummy function I started from. It's supposed to look like np.poly1d([1,0,1]) which is supposed to be something like f(x) = x**2 +1. I don't know where you got 2.1667 + 0.25x - 0.9167x**2 from.
I used f just to create x and y arrays, so that I can be sure that indeed those numbers belong to a function. What would be the point otherwise. I only used it once more in the end by doing f(x) to show how similar the numbers turn out.
Remember x is an array, and f(x) means "calculate the value of function f for every member of array x". Nothing more. It's just the value of the function f(x) = x**2+1 in the points [-5, -4, .... 4, 5].
All other works was just based on how to approximate a function by Taylor expansion when all you have is some fixed data set and no knowledge of the original function. And I showed that if you interpolate between the points and approximate the unknown function with Taylor expansion you can reconstruct a function that has meaningfully similar results on a bounded range of numbers x.
That approximated function is called h in my snippets and it looks something like:
h = 2.28194274e-08 + 5.37467022e-17 x - 1.98652602e-06 x^2 - 3.65181145e-15 x^3 + 7.38646849e-05 x^4 + 1.02224219e-13 x^5 + ... till 25th order would be reached
and to get its derivative in python all you would need to do is
h.deriv()
because its type is poly1d.

Analytical solution for Linear Regression using Python vs. Julia

Using example from Andrew Ng's class (finding parameters for Linear Regression using normal equation):
With Python:
X = np.array([[1, 2104, 5, 1, 45], [1, 1416, 3, 2, 40], [1, 1534, 3, 2, 30], [1, 852, 2, 1, 36]])
y = np.array([[460], [232], [315], [178]])
θ = ((np.linalg.inv(X.T.dot(X))).dot(X.T)).dot(y)
print(θ)
Result:
[[ 7.49398438e+02]
[ 1.65405273e-01]
[ -4.68750000e+00]
[ -4.79453125e+01]
[ -5.34570312e+00]]
With Julia:
X = [1 2104 5 1 45; 1 1416 3 2 40; 1 1534 3 2 30; 1 852 2 1 36]
y = [460; 232; 315; 178]
θ = ((X' * X)^-1) * X' * y
Result:
5-element Array{Float64,1}:
207.867
0.0693359
134.906
-77.0156
-7.81836
Furthermore, when I multiple X by Julia's — but not Python's — θ, I get numbers close to y.
I can't figure out what I am doing wrong. Thanks!

Using X^-1 vs the pseudo inverse
pinv(X) which corresponds to the pseudo inverse is more broadly applicable than inv(X), which X^-1 equates to. Neither Julia nor Python do well using inv, but in this case apparently Julia does better.
but if you change the expression to
julia> z=pinv(X'*X)*X'*y
5-element Array{Float64,1}:
188.4
0.386625
-56.1382
-92.9673
-3.73782
you can verify that X*z = y
julia> X*z
4-element Array{Float64,1}:
460.0
232.0
315.0
178.0

A more numerically robust approach in Python, without having to do the matrix algebra yourself is to use numpy.linalg.lstsq to do the regression:
In [29]: np.linalg.lstsq(X, y)
Out[29]:
(array([[ 188.40031942],
[ 0.3866255 ],
[ -56.13824955],
[ -92.9672536 ],
[ -3.73781915]]),
array([], dtype=float64),
4,
array([ 3.08487554e+03, 1.88409728e+01, 1.37100414e+00,
1.97618336e-01]))
(Compare the solution vector with #waTeim's answer in Julia).
You can see the source of the ill-conditioning by printing the matrix inverse you're calculating:
In [30]: np.linalg.inv(X.T.dot(X))
Out[30]:
array([[ -4.12181049e+13, 1.93633440e+11, -8.76643127e+13,
-3.06844458e+13, 2.28487459e+12],
[ 1.93633440e+11, -9.09646601e+08, 4.11827338e+11,
1.44148665e+11, -1.07338299e+10],
[ -8.76643127e+13, 4.11827338e+11, -1.86447963e+14,
-6.52609055e+13, 4.85956259e+12],
[ -3.06844458e+13, 1.44148665e+11, -6.52609055e+13,
-2.28427584e+13, 1.70095424e+12],
[ 2.28487459e+12, -1.07338299e+10, 4.85956259e+12,
1.70095424e+12, -1.26659193e+11]])
Eeep!
Taking the dot product of this with X.T leads to a catastrophic loss of precision.

Notice that X is a 4x5 matrix or in statistical terms that you have fewer observations than parameters to estimate. Therefore, the least squares problem has infinitely many solutions with the sum of the squared errors exactly equal to zero. In this case, the normal equations don't help you much because the matrix X'X is singular. Instead, you should just find a solution to X*b=y.
Most numerical linear algebra systems are based on the FORTRAN package LAPACK which uses the a pivoted QR factorization for solving the problem X*b=y. Since there are infinitely many solutions, LAPACK's picks the solution with the smallest norm. In Julia, you can get this solution, simply by writing
float(X)\y
(Unfortunately, the float part is necessary right now, but that will change.)
In exact arithmetic, you should get the same solution as the one above with either of your proposed methods, but the floating point representation of you problem introduces small rounding errors and these errors will affect the calculated solution. The effect of the rounding errors on the solution is much larger when using the normal equations compared to using the QR factorization directly on X.
This holds true also in the usual case where X has more rows than columns so often it is recommended that you avoid the normal equations when solving least squares problems. However, when X has many more rows than columns, the matrix X'X is relatively small. In this case, it will be much faster to solve the problem with the normal equations instead of using the QR factorization. In many statistical problems, the extra numerical error is extremely small compared to the statical error so the loss of precision due to the normal equations can simply be ignored.

Matlab VS Python - eig(A,B) VS sc.linalg.eig(A,B)

I have the following matrices sigma and sigmad:
sigma:
1.9958 0.7250
0.7250 1.3167
sigmad:
4.8889 1.1944
1.1944 4.2361
If I try to solve the generalized eigenvalue problem in python I obtain:
d,V = sc.linalg.eig(matrix(sigmad),matrix(sigma))
V:
-1 -0.5614
-0.4352 1
If I try to solve the g. e. problem in matlab I obtain:
[V,d]=eig(sigmad,sigma)
V:
-0.5897 -0.5278
-0.2564 0.9400
But the d's do coincide.

Any (nonzero) scalar multiple of an eigenvector will also be an eigenvector; only the direction is meaningful, not the overall normalization. Different routines use different conventions -- often you'll see the magnitude set to 1, or the maximum value set to 1 or -1 -- and some routines don't even bother being internally consistent for performance reasons. Your two different results are multiples of each other:
In [227]: sc = array([[-1., -0.5614], [-0.4352, 1. ]])
In [228]: ml = array([[-.5897, -0.5278], [-0.2564, 0.94]])
In [229]: sc/ml
Out[229]:
array([[ 1.69577751, 1.06366048],
[ 1.69734789, 1.06382979]])
and so they're actually the same eigenvectors. Think of the matrix as an operator which changes a vector: the eigenvectors are the special directions where a vector pointing that way won't be twisted by the matrix, and the eigenvalues are the factors measuring how much the matrix expands or contracts the vector.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.