I am trying to multiply a polynomial by a function represented as a numpy array, so that in the end I can have an object that is a function that can be manipulated as a function (take derivatives, etc.). So this is what I have tried:
import numpy as np
from numpy.polynomial.hermite import Hermite as He
N = 15
L = 2
x = np.zeros(N,dtype=float)
for i in range(N):
x[i] = (i-N//2)*L/N
h = He([0,1,0])*np.exp(-x*x/2)
print(h(x))
print(2*x*np.exp(-x*x/2))
And my result is:
[ 2.72028782e+06 -1.36933903e+07 -1.73242347e+07 -1.17112917e+07
-3.41036609e+06 2.02073199e+06 2.55751492e+06 -1.11607501e-09
-1.76349396e+06 4.85092636e+05 6.89290562e+06 1.37361270e+07
1.48504968e+07 5.00284621e+06 -1.60432564e+07]
[-1.20755633 -1.16183846 -1.06764987 -0.92525704 -0.73849308 -0.51470353
-0.2643068 0. 0.2643068 0.51470353 0.73849308 0.92525704
1.06764987 1.16183846 1.20755633]
Since H_1(x) = 2x, I was expecting the two results to be the same, but they are not. How can I achieve the desired result?
I've taken a look at your code and understood that you wish to multiply the Hermite polynomial by the array. The error consists in the fact that you need to multiply the exponential after you defined the polynomial:
import numpy as np
from numpy.polynomial.hermite import
Hermite as He
N = 15
L = 2
x = np.zeros(N,dtype=float)
for i in range(N):
x[i] = (i-N//2)*L/N
h = He([0,1,0])
print(h(x)*np.exp(-x*x/2))
print(2*x*np.exp(-x*x/2))
Which would result in:
[-1.20755633 -1.16183846 -1.06764987
-0.92525704 -0.73849308 -0.51470353
-0.2643068 0. 0.2643068
0.51470353 0.73849308 0.92525704
1.06764987 1.16183846 1.20755633]
[-1.20755633 -1.16183846 -1.06764987
-0.92525704 -0.73849308 -0.51470353
-0.2643068 0. 0.2643068
0.51470353 0.73849308 0.92525704
1.06764987 1.16183846 1.20755633]
If you still want to keep a reusable function, I would recommend:
def h(i):
a = He([0,1,0])
z = a(i)*(np.exp(-i*i/2))
return z
print(h(x))
print(2*x*np.exp(-x*x/2))
I'm not 100% sure why this happens, but what I did understand is that when defining the Hermite, the term np.exp(-x*x/2) is taken into consideration:
Default Hermite
Multiplied Hermite
Hope this helps !
Related
I have something like below:
random_array = np.random.random(10)
scaled_array = random_array/np.sum(random_array)
This gives me a nice array with random floats that sum to 1. However, I am trying to take this a step further and do the following:
For example, fix the 2nd and 5th elements to be 0.04 and 0.09 respectively, and generate all other elements randomly. But the sum of the whole array still needs to be exactly 1.
Taking one more step, I want to provide an upper (lower) bound for all/each element(s). For example, I still want to fix the 4th element to be 0.09 but ALSO want to force ALL elements to be LESS THAN 0.1. (They will still add up to 1 because I have more than 10 elements.)
How can I achieve this?
If you want the values before scaling:
import numpy as np
random_array = np.random.random(10)
random_array[1] = 0.04
random_array[4] = 0.09
scaled_array = random_array/np.sum(random_array)
assert np.isclose(1, scaled_array.sum())
If you want fixed values after scaling:
import numpy as np
random_array = np.random.random(10)
random_array[1] = 0
random_array[4] = 0
scaled_array = (random_array/np.sum(random_array)) * (1.0 - (0.04 + 0.09))
scaled_array[1] = 0.04
scaled_array[4] = 0.09
assert np.isclose(1, scaled_array.sum())
Try the string cutting approach of dirichlet distribution:
N=7 # total number of elements in result
d = {2:0.04, 5:0.09} # dictionary with index as key and values
fixed_sum = 0.
result = np.zeros(N) # placeholder numpy array
# Put the fixed elements in their place and calculate their sum
for k,v in d.items():
result[k] = v
fixed_sum = fixed_sum + v
remaining_sum = 1 - fixed_sum
# Use dirichlet distribution to get elements which sum to 1.
# Multiply with remaining_sum to get elements which sum to "remaining_sum".
remaining_arr = np.random.default_rng().dirichlet(np.ones(N-len(d)))*remaining_sum
# Get the index of result where elements are zero.
zero_indx = np.nonzero(result==0)[0]
# Place the elements of remaining_arr in the result.
result[zero_indx] = remaining_arr
it's known that when the number of variables (p) is larger than the number of samples (n) the least square estimator is not defined.
In sklearn I receive this values:
In [30]: lm = LinearRegression().fit(xx,y_train)
In [31]: lm.coef_
Out[31]:
array([[ 0.20092363, -0.14378298, -0.33504391, ..., -0.40695124,
0.08619906, -0.08108713]])
In [32]: xx.shape
Out[32]: (1097, 3419)
Call [30] should return an error. How does sklearn work when p>n like in this case?
EDIT:
It seems that the matrix is filled with some values
if n > m:
# need to extend b matrix as it will be filled with
# a larger solution matrix
if len(b1.shape) == 2:
b2 = np.zeros((n, nrhs), dtype=gelss.dtype)
b2[:m,:] = b1
else:
b2 = np.zeros(n, dtype=gelss.dtype)
b2[:m] = b1
b1 = b2
When the linear system is underdetermined, then the sklearn.linear_model.LinearRegression finds the minimum L2 norm solution, i.e.
argmin_w l2_norm(w) subject to Xw = y
This is always well defined and obtainable by applying the pseudoinverse of X to y, i.e.
w = np.linalg.pinv(X).dot(y)
The specific implementation of scipy.linalg.lstsq, which is used by LinearRegression uses get_lapack_funcs(('gelss',), ... which is precisely a solver that finds the minimum norm solution via singular value decomposition (provided by LAPACK).
Check out this example
import numpy as np
rng = np.random.RandomState(42)
X = rng.randn(5, 10)
y = rng.randn(5)
from sklearn.linear_model import LinearRegression
lr = LinearRegression(fit_intercept=False)
coef1 = lr.fit(X, y).coef_
coef2 = np.linalg.pinv(X).dot(y)
print(coef1)
print(coef2)
And you will see that coef1 == coef2. (Note that fit_intercept=False is specified in the constructor of the sklearn estimator, because otherwise it would subtract the mean of each feature before fitting the model, yielding different coefficients)
I got an
array([[ 0.01454911+0.j, 0.01392502+0.00095922j,
0.00343284+0.00036535j, 0.00094982+0.0019255j ,
0.00204887+0.0039264j , 0.00112154+0.00133549j, 0.00060697+0.j],
[ 0.02179418+0.j, 0.01010125-0.00062646j,
0.00086327+0.00495717j, 0.00204473-0.00584213j,
0.00159394-0.00678094j, 0.00121372-0.0043044j , 0.00040639+0.j]])
I need a solution which gives me the possibility to replace just the imaginary components by an random value generated by:
numpy.random.vonmises(mu, kappa, size=size)
The resulting array needs to be in the same form as the first one.
Loop over the numbers and just set them to a value you like. The parameters mu and kappa for the numpy.random.vonmises function need to be defined, since in they are undefined in the below example.
import numpy as np
data = np.array([[ 0.01454911+0.j, 0.01392502+0.00095922j,
0.00343284+0.00036535j, 0.00094982+0.0019255j ,
0.00204887+0.0039264j , 0.00112154+0.00133549j, 0.00060697+0.j],
[ 0.02179418+0.j, 0.01010125-0.00062646j,
0.00086327+0.00495717j, 0.00204473-0.00584213j,
0.00159394-0.00678094j, 0.00121372-0.0043044j , 0.00040639+0.j]])
def setRandomImag(c):
c.imag = np.random.vonmises(mu, kappa, size=size)
return c
data = [ setRandomImag(i) for i in data]
n_epochs = 2
n_freqs = 7
# form giving parameters for the array
data2 = np.zeros((n_epochs, n_freqs), dtype=complex)
for i in range(0,n_epochs):
data2[i] = np.real(data[i]) + np.random.vonmises(mu, kappa) * complex(0,1)
It gives my whole n_epoch the same imaginary value. Not exactly what I was asking for, but solves my problem.
Try using this approach:
Store your numbers into a 2-D array: Real-part and Imaginary-part.
Then replace the Imaginary-part with the randomly chosen numbers.
I have obtained the coefficients for the Legendre polynomial that best fits my data. Now I am needing to determine the value of that polynomial at each time-step of my data. I need to do this so that I can subtract the fit from my data. I have looked at the documentation for the Legendre module, and I'm not sure if I just don't understand my options or if there isn't a native tool in place for what I want. If my data-points were evenly spaced, linspace would be a good option, but that's not the case here. Does anyone have a suggestion for what to try?
For those who would like to demand a minimum working example of code, just use a random array, get the coefficients, and tell me from there how you would proceed. The values themselves don't matter. It's the technique that I'm asking about here. Thanks.
To simplify Ahmed's example
In [1]: from numpy.polynomial import Polynomial, Legendre
In [2]: p = Polynomial([0.5, 0.3, 0.1])
In [3]: x = np.random.rand(10) * 10
In [4]: y = p(x)
In [5]: pfit = Legendre.fit(x, y, 2)
In [6]: plot(*pfit.linspace())
Out[6]: [<matplotlib.lines.Line2D at 0x7f815364f310>]
In [7]: plot(x, y, 'o')
Out[7]: [<matplotlib.lines.Line2D at 0x7f81535d8bd0>]
The Legendre functions are scaled and offset, as the data should be confined to the interval [-1, 1] to get any advantage over the usual power basis. If you want the coefficients for plain old Legendre functions
In [8]: pfit.convert()
Out[8]: Legendre([ 0.53333333, 0.3 , 0.06666667], [-1., 1.], [-1., 1.])
But that isn't recommended.
Once you have a function, you can just generate a numpy array for the timepoints:
>>> import numpy as np
>>> timepoints = [1,3,7,15,16,17,19]
>>> myarray = np.array(timepoints)
>>> def mypolynomial(bins, pfinal): #pfinal is just the estimate of the final array (i'll do quadratic)
... a,b,c = pfinal # obviously, for a*x^2 + b*x + c
... return (a*bins**2) + b*bins + c
>>> mypolynomial(myarray, (1,1,0))
array([ 2, 12, 56, 240, 272, 306, 380])
It automatically evaluates it for each timepoint is in the numpy array.
Now all you have to do is rewrite mypolynomial to go from a simple quadratic example to a proper one for a Legendre polynomial. Treat the function as if it were evaluating a float to return the value, and when called on the numpy array it will automatically evaluate it for each value.
EDIT:
Let's say I wanted to generalize this to all standard polynomials:
>>> import numpy as np
>>> timepoints = [1,3,7,15,16,17,19]
>>> myarray = np.array(timepoints)
>>> def mypolynomial(bins, pfinal): #pfinal is just the estimate of the final array (i'll do quadratic)
>>> hist = np.zeros((1, len(myarray))) # define blank return
... for i in range(len(pfinal)):
... # fixed a typo here, was pfinal[-i] which would give -0 rather than -1, since negative indexing starts at -1, not -0
... const = pfinal[-i-1] # negative index to go from 0 exponent to highest exponent
... hist += const*(bins**i)
... return hist
>>> mypolynomial(myarray, (1,1,0))
array([ 2, 12, 56, 240, 272, 306, 380])
EDIT2: Typo fix
EDIT3:
#Ahmed is perfect right when he states Homer's rule is good for numerical stability. The implementation here would be as follows:
>>> def horner(coeffs, x):
... acc = 0
... for c in coeffs:
... acc = acc * x + c
... return acc
>>> horner((1,1,0), myarray)
array([ 2, 12, 56, 240, 272, 306, 380])
Slightly modified to keep the same argument order as before, from the code here:
http://rosettacode.org/wiki/Horner%27s_rule_for_polynomial_evaluation#Python
When you're using a nice library to fit polynomials, the library will in my experience usually have a function to evaluate them. So I think it is useful to know how you're generating these coefficients.
In the example below, I used two functions in numpy, legfit and legval which made it trivial to both fit and evaluate the Legendre polynomials without any need to invoke Horner's rule or do the bookkeeping yourself. (Though I do use Horner's rule to generate some example data.)
Here's a complete example where I generate some sparse data from a known polynomial, fit a Legendre polynomial to it, evaluate that polynomial on a dense grid, and plot. Note that the fitting and evaluating part takes three lines thanks to the numpy library doing all the heavy lifting.
It produces the following figure:
import numpy as np
### Setup code
def horner(coeffs, x):
"""Evaluate a polynomial at a point or array"""
acc = 0.0
for c in reversed(coeffs):
acc = acc * x + c
return acc
x = np.random.rand(10) * 10
true_coefs = [0.1, 0.3, 0.5]
y = horner(true_coefs, x)
### Fit and evaluate
legendre_coefs = np.polynomial.legendre.legfit(x, y, 2)
new_x = np.linspace(0, 10)
new_y = np.polynomial.legendre.legval(new_x, legendre_coefs)
### Plotting only
try:
import pylab
pylab.ion() # turn on interactive plotting
pylab.figure()
pylab.plot(x, y, 'o', new_x, new_y, '-')
pylab.xlabel('x')
pylab.ylabel('y')
pylab.title('Fitting Legendre polynomials and evaluating them')
pylab.legend(['original sparse data', 'fit'])
except:
print("Can't start plots.")
With numpy or scipy, is there any existing method that will return the endpoints of an interval which contains a specified percent of the values in a 1D array? I realize that this is simple to write myself, but it seems like the kind of thing that might be built in, although I can't find it.
E.g:
>>> import numpy as np
>>> x = np.random.randn(100000)
>>> print(np.bounding_interval(x, 0.68))
Would give approximately (-1, 1)
You can use np.percentile:
In [29]: x = np.random.randn(100000)
In [30]: p = 0.68
In [31]: lo = 50*(1 - p)
In [32]: hi = 50*(1 + p)
In [33]: np.percentile(x, [lo, hi])
Out[33]: array([-0.99206523, 1.0006089 ])
There is also scipy.stats.scoreatpercentile:
In [34]: scoreatpercentile(x, [lo, hi])
Out[34]: array([-0.99206523, 1.0006089 ])
I don't know of a built-in function to do it, but you can write one using the math package to specify approximate indices like this:
from __future__ import division
import math
import numpy as np
def bound_interval(arr_in, interval):
lhs = (1 - interval) / 2 # Specify left-hand side chunk to exclude
rhs = 1 - lhs # and the right-hand side
sorted = np.sort(arr_in)
lower = sorted[math.floor(lhs * len(arr_in))] # use floor to get index
upper = sorted[math.floor(rhs * len(arr_in))]
return (lower, upper)
On your specified array, I got the interval (-0.99072237819851039, 0.98691691784955549). Pretty close to (-1, 1)!