I have data that I want to fit with polynomials. I have 200,000 data points, so I want an efficient algorithm. I want to use the numpy.polynomial package so that I can try different families and degrees of polynomials. Is there some way I can formulate this as a system of equations like Ax=b? Is there a better way to solve this than with scipy.minimize?
import numpy as np
from scipy.optimize import minimize as mini
x1 = np.random.random(2000)
x2 = np.random.random(2000)
y = 20 * np.sin(x1) + x2 - np.sin (30 * x1 - x2 / 10)
def fitness(x, degree=5):
poly1 = np.polynomial.polynomial.polyval(x1, x[:degree])
poly2 = np.polynomial.polynomial.polyval(x2, x[degree:])
return np.sum((y - (poly1 + poly2)) ** 2 )
# It seems like I should be able to solve this as a system of equations
# x = np.linalg.solve(np.concatenate([x1, x2]), y)
# minimize the sum of the squared residuals to find the optimal polynomial coefficients
x = mini(fitness, np.ones(10))
print fitness(x.x)
Your intuition is right. You can solve this as a system of equations of the form Ax = b.
However:
The system is overdefined and you want to get the least-squares solution, so you need to use np.linalg.lstsq instead of np.linalg.solve.
You can't use polyval because you need to separate the coefficients and powers of the independent variable.
This is how to construct the system of equations and solve it:
A = np.stack([x1**0, x1**1, x1**2, x1**3, x1**4, x2**0, x2**1, x2**2, x2**3, x2**4]).T
xx = np.linalg.lstsq(A, y)[0]
print(fitness(xx)) # test the result with original fitness function
Of course you can generalize over the degree:
A = np.stack([x1**p for p in range(degree)] + [x2**p for p in range(degree)]).T
With the example data, the least squares solution runs much faster than the minimize solution (800µs vs 35ms on my laptop). However, A can become quite large, so if memory is an issue minimize might still be an option.
Update:
Without any knowledge about the internals of the polynomial function things become tricky, but it is possible to separate terms and coefficients. Here is a somewhat ugly way to construct the system matrix A from a function like polyval:
def construct_A(valfunc, degree):
columns1 = []
columns2 = []
for p in range(degree):
c = np.zeros(degree)
c[p] = 1
columns1.append(valfunc(x1, c))
columns2.append(valfunc(x2, c))
return np.stack(columns1 + columns2).T
A = construct_A(np.polynomial.polynomial.polyval, 5)
xx = np.linalg.lstsq(A, y)[0]
print(fitness(xx)) # test the result with original fitness function
Related
I would like to solve a linear equation system in numpy in order to check whether a point lines up with a vector or not.
Given are the following equations for a vector2:
point[x] = vector1[x] + λ * vector2[x]
point[y] = vector1[y] + λ * vector2[y]
Numpys linalg.solve() offers the option to solve two equations in the form:
ax + by = c
by defining the parameters a and b in a numpy.array().
But I can't seem to find a way to deal with equations with one fixed parameter like:
m*x + b = 0
Am I missing a point or do I have to deal with another solution?
Thanks in advance!
Hi I will give it a try to help with this question.
The numpy.linagl.solve says:
Computes the “exact” solution, x, of the well-determined, i.e., full rank, linear matrix equation ax = b.
Note the assumptions made on the matrix!
Lambda the same
If your lambda for the point[x] and point[y] equation should be the same. Then just concatenate all the vectors.
x_new = np.concatenate([x,y])
vec1_new = np.concatenate([vec1_x,vec1_y])
...
Assuming that this will overdetermined your system and probably it will. Meaning you have too many equations and only one parameter to determine (well-determined assumption violated). My approach would be to go with least sqare.
The numpy.linagl.lstsq has a least square method too. Where the equation is y = mx + c is solved. For your case this is y = point[x], x = vector2[x] and c = vector1[x].
This is copied from the numpy.linagl.lstsq example:
x = np.array([0, 1, 2, 3])
y = np.array([-1, 0.2, 0.9, 2.1])
A = np.vstack([x, np.ones(len(x))]).T # => horizontal stack
m, c = np.linalg.lstsq(A, y, rcond=None)[0]
Lambda different
If the lambdas are different. Stack the vector2[x] and vector2[y] horizontal and you have [lambda_1, lambda_2] to find. Probably also more equations then lambds and you will find a least square solution.
Note
Keep in mind that even if you construct your system from a staight line and a fixed lambda. You might need a least square approach due to rounding and numeric differences.
You can solve your equation 2*x + 4 = 0 with sympy:
from sympy.abc import x
from sympy import Eq, solve
eq = Eq(2 * x + 4, 0)
print(solve(eq))
I'm doing least squares curve fitting with Python and getting decent results, but would like it to be a bit more robust.
I have data from a first order LTI system, more specifically the speed of a motor that is read by a tachymeter. I'm trying to fit the step response of the motors so I can deduce its transfer function.
The speed (v(t)) has the following form:
v(t) = K * (1 - exp(-t/T))
I'm having some outliers in the data I use though, and would like to mitigate them. This mostly happens when the speeds becomes constant. Say the speed is 10000 units, I sometimes get outliers that are 10000 +/- 400. I wonder how to set my f_scale parameter given I want my data points to stay within +/- 400 of the "actual" speed (mean). Should I set f_scale to 400 or 800? I'm not sure what exactly I should set there.
Thanks
EDIT: Some data.
I have constructed a minimal example which is for a curve similar to yours. If you had posted actual data instead of a picture, this would have gone a bit faster. The two key things to understand about robust fitting with least_squares is that you have to use a different value for the loss parameter than linear and that f_scale is used as a scaling parameter for the loss function.
Basically, from the docs, least_squares tries to
minimize F(x) = 0.5 * sum(rho(f_i(x)**2)
and setting the loss loss parameter changes rho in the above formula. For loss='linear' rho is just the identity function. When loss='soft_l1', rho(z) = 2 * ((1 + z)**0.5 - 1). f_scale is used to scale the loss function such that rho_(f**2) = C**2 * rho(f**2 / C**2). So it doesn't have the same kind of meaning as you are asking for above, it's more like a way of penalising larger errors less.
In this particular case it doesn't appear to make much difference though.
import numpy
import matplotlib.pyplot as plt
import scipy.optimize
tmax = 6000
N = 100
K = 6000
T = 200
smootht = numpy.linspace(0, tmax, 1000)
tm = numpy.linspace(0, tmax, N)
def f(t, K, T):
return K * (1 - numpy.exp(-t/T))
v = f(smootht, K, T)
vm = f(tm, K, T) + numpy.random.randn(N)*400
def error(pars):
K, T = pars
vp = f(tm, K, T)
return vm - vp
f_scales = [0.01, 1, 100]
plt.scatter(tm, vm)
for f_scale in f_scales:
r = scipy.optimize.least_squares(error, [10, 10], loss='soft_l1', f_scale=f_scale)
vp = f(smootht, *r.x)
plt.plot(smootht, vp, label=f_scale)
plt.legend()
The resulting plot looks like this:
My suggestion is to start by just experimenting with the different loss functions before playing with f_scale.
I have used numpy's polyfit and obtained a very good fit (using a 7th order polynomial) for two arrays, x and y. My relationship is thus;
y(x) = p[0]* x^7 + p[1]*x^6 + p[2]*x^5 + p[3]*x^4 + p[4]*x^3 + p[5]*x^2 + p[6]*x^1 + p[7]
where p is the polynomial array output by polyfit.
Is there a way to reverse this method easily, so I have a solution in the form of,
x(y) = p[0]*y^n + p[1]*y^n-1 + .... + p[n]*y^0
No there is no easy way in general. Closed form-solutions for arbitrary polynomials are not available for polynomials of the seventh order.
Doing the fit in the reverse direction is possible, but only on monotonically varying regions of the original polynomial. If the original polynomial has minima or maxima on the domain you are interested in, then even though y is a function of x, x cannot be a function of y because there is no 1-to-1 relation between them.
If you are (i) OK with redoing the fitting procedure, and (ii) OK with working piecewise on single monotonic regions of your fit at a time, then you could do something like this:
-
import numpy as np
# generate a random coefficient vector a
degree = 1
a = 2 * np.random.random(degree+1) - 1
# an assumed true polynomial y(x)
def y_of_x(x, coeff_vector):
"""
Evaluate a polynomial with coeff_vector and degree len(coeff_vector)-1 using Horner's method.
Coefficients are ordered by increasing degree, from the constant term at coeff_vector[0],
to the linear term at coeff_vector[1], to the n-th degree term at coeff_vector[n]
"""
coeff_rev = coeff_vector[::-1]
b = 0
for a in coeff_rev:
b = b * x + a
return b
# generate some data
my_x = np.arange(-1, 1, 0.01)
my_y = y_of_x(my_x, a)
# verify that polyfit in the "traditional" direction gives the correct result
# [::-1] b/c polyfit returns coeffs in backwards order rel. to y_of_x()
p_test = np.polyfit(my_x, my_y, deg=degree)[::-1]
print p_test, a
# fit the data using polyfit but with y as the independent var, x as the dependent var
p = np.polyfit(my_y, my_x, deg=degree)[::-1]
# define x as a function of y
def x_of_y(yy, a):
return y_of_x(yy, a)
# compare results
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(my_x, my_y, '-b', x_of_y(my_y, p), my_y, '-r')
Note: this code does not check for monotonicity but simply assumes it.
By playing around with the value of degree, you should see that see the code only works well for all random values of a when degree=1. It occasionally does OK for other degrees, but not when there are lots of minima / maxima. It never does perfectly for degree > 1 because approximating parabolas with square-root functions doesn't always work, etc.
I am now trying to learn the ADMM algorithm (Boyd 2010) for LASSO regression.
I found out a very good example on this page.
The matlab code is shown here.
I tried to convert it into python language so that I could develop a better understanding.
Here is the code:
import scipy.io as io
import scipy.sparse as sp
import scipy.linalg as la
import numpy as np
def l1_norm(x):
return np.sum(np.abs(x))
def l2_norm(x):
return np.dot(x.ravel().T, x.ravel())
def fast_threshold(x, threshold):
return np.multiply(np.sign(x), np.fmax(abs(x) - threshold, 0))
def lasso_admm(X, A, gamma):
c = X.shape[1]
r = A.shape[1]
C = io.loadmat("C.mat")["C"]
L = np.zeros(X.shape)
rho = 1e-4
maxIter = 200
I = sp.eye(r)
maxRho = 5
cost = []
for n in range(maxIter):
B = la.solve(np.dot(A.T, A) + rho * I, np.dot(A.T, X) + rho * C - L)
C = fast_threshold(B + L / rho, gamma / rho)
L = L + rho * (B - C);
rho = min(maxRho, rho * 1.1);
cost.append(0.5 * l2_norm(X - np.dot(A, B)) + gamma * l1_norm(B))
cost = np.array(cost).ravel()
return B, cost
data = io.loadmat("lasso.mat")
A = data["A"]
X = data["X"]
B, cost = lasso_admm(X, A, gamma)
I have found the loss function did not converge after 100+ iterations. Matrix B did not tend to be sparse, on the other hand, the matlab code worked in different situations.
I have checked with different input data and compared with Matlab outputs, yet I still could not get hints.
Could anybody take a try?
Thank you in advance.
My gut feeling as to why this is not working to your expectations is your la.solve() call. la.solve() assumes that the matrix is full rank and is independent (i.e. invertible). When you use \ in MATLAB, what MATLAB does under the hood is that if the matrix is full rank, the exact inverse is found. However, should the matrix not be this way (i.e. overdetermined or underdetermined), the solution to the system is solved by least-squares instead. I would suggest you modify that call so that you're using lstsq instead of solve. As such, simply replace your la.solve() call with this:
sol = la.lstsq(np.dot(A.T, A) + rho * I, np.dot(A.T, X) + rho * C - L)
B = sol[0]
Note that lstsq returns a whole bunch of outputs in a 4-element tuple, in addition to the solution. The solution of the system is in the first element of this tuple, which is why I did B = sol[0]. What is also returned are the sums of residues (second element), the rank (third element) and the singular values of the matrix you are trying to invert when solving (fourth element).
Also some peculiarities that I have noticed:
One thing that may or may not matter is the random generation of numbers. MATLAB and Python NumPy generate random numbers differently, so this may or may not affect your solution.
In MATLAB, Simon Lucey's code initializes L to be a zero matrix such that L = zeros(size(X));. However, in your Python code, you initialize L to be this way: L = np.zeros(C.shape);. You are using different variables to ascertain the shape of L. Obviously, the
code wouldn't work if there was a dimension mismatch, but that's another thing that's different. Not sure if this will affect your solution either.
So far I haven't found anything out of the ordinary, so try that fix and let me know.
I would like to try to compute y=filter(b,a,x,zi) and dy[i]/dx[j] using FFTs rather than in the time domain for possible speedup in a GPU implementation.
I am not sure it's possible, particularly when zi is non-zero. I looked through how scipy.signal.lfilter in scipy and filter in octave are implemented. They are both done directly in the time domain, with scipy using direct form 2 and octave direct form 1 (from looking through code in DLD-FUNCTIONS/filter.cc). I haven't seen anywhere an FFT implementation analogous to fftfilt for FIR filters in MATLAB (i.e. a = [1.]).
I tried doing y = ifft(fft(b) / fft(a) * fft(x)) but this seems to be conceptually wrong. Also, I am not sure how to handle the initial transient zi. Any references, pointer to existing implementation, would be appreciated.
Example code,
import numpy as np
import scipy.signal as sg
import matplotlib.pyplot as plt
# create an IRR lowpass filter
N = 5
b, a = sg.butter(N, .4)
MN = max(len(a), len(b))
# create a random signal to be filtered
T = 100
P = T + MN - 1
x = np.random.randn(T)
zi = np.zeros(MN-1)
# time domain filter
ylf, zo = sg.lfilter(b, a, x, zi=zi)
# frequency domain filter
af = sg.fft(a, P)
bf = sg.fft(b, P)
xf = sg.fft(x, P)
yfft = np.real(sg.ifft(bf/af * xf))[:T]
# error
print np.linalg.norm(yfft - ylf)
# plot, note error is larger at beginning and with larger N
plt.figure(1)
plt.clf()
plt.plot(ylf)
plt.plot(yfft)
You can reduce the error in your existing implementation by replacing P = T + MN - 1 with P = T + 2*MN - 1. This is purely intuitive, but it seems to me that the division of bf and af will require 2*MN terms, due to wraparound.
C.S. Burrus has a pretty terse writeup of how to regard filtering, whether FIR or IIR, in a block oriented way, here. I haven't read it in detail, but I think it gives you the equations you need to implement IIR filtering by convolution, including intermediate states.
I've forgotten what little I knew about FFTs but you could take a look at sedit.py and frequency.py at http://jc.unternet.net/src/ and see if anything there would help.
Try scipy.signal.lfiltic(b, a, y, x=None) to obtain the initial conditions.
Doc text for lfiltic:
Given a linear filter (b,a) and initial conditions on the output y
and the input x, return the inital conditions on the state vector zi
which is used by lfilter to generate the output given the input.
If M=len(b)-1 and N=len(a)-1. Then, the initial conditions are given
in the vectors x and y as
x = {x[-1],x[-2],...,x[-M]}
y = {y[-1],y[-2],...,y[-N]}
If x is not given, its inital conditions are assumed zero.
If either vector is too short, then zeros are added
to achieve the proper length.
The output vector zi contains
zi = {z_0[-1], z_1[-1], ..., z_K-1[-1]} where K=max(M,N).