Fully vectorise numpy polyfit

Fully vectorise numpy polyfit - python

Overview
I am running into issues with performance using polyfit because it doesn't appear able to accept broadcast arrays. I am aware from this post that the dependant data y can be multidimensional if you use numpy.polynomial.polynomial.polyfit. However, the x dimension cannot be multidimensional. Is there anyway around this?
Motivation
I need to compute the rate of change of some data. To match with an experiment I want to use the following method: take data y and x, for short sections of data fit a polynomial, then use the fitted coefficient as an estimate of the rate of change.
Illustration
import numpy as np
import matplotlib.pyplot as plt
n = 100
x = np.linspace(0, 10, n)
y = np.sin(x)
window_length = 10
ydot = [np.polyfit(x[j:j+window_length], y[j:j+window_length], 1)[0]
for j in range(n - window_length)]
x_mids = [x[j+window_length/2] for j in range(n - window_length)]
plt.plot(x, y)
plt.plot(x_mids, ydot)
plt.show()
The blue line is the original data (a sine curve), while the green is the first differential (a cosine curve).
The problem
To vectorise this I did the following:
window_length = 10
vert_idx_list = np.arange(0, len(x) - window_length, 1)
hori_idx_list = np.arange(window_length)
A, B = np.meshgrid(hori_idx_list, vert_idx_list)
idx_array = A + B
x_array = x[idx_array]
y_array = y[idx_array]
This broadcasts the two 1D vectors to 2D vectors of shape (n-window_length, window_length). Now I was hoping that polyfit would have an axis argument so I could parallelise the calculation, but no such luck.
Does anyone have any suggestion for how to do this? I am open to

The way polyfit works is by solving a least-square problem of the form:
y = [X].a
where y are your dependent coordinates, [X] is the Vandermonde matrix of the corresponding independent coordinates, and a is the vector of fitted coefficients.
In your case you are always computing a 1st degree polynomial approximation, and are actually only interested in the coefficient of the 1st degree term. This has a well known closed form solution you can find in any statistics book, or produce your self by creating a 2x2 linear system of equation premultiplying both sides of the above equation by the transpose of [X]. This all adds up to the value you want to calculate being:
>>> n = 10
>>> x = np.random.random(n)
>>> y = np.random.random(n)
>>> np.polyfit(x, y, 1)[0]
-0.29207474654700277
>>> (n*(x*y).sum() - x.sum()*y.sum()) / (n*(x*x).sum() - x.sum()*x.sum())
-0.29207474654700216
On top of that you have a sliding window running over your data, so you can use something akin to a 1D summed area table as follows:
def sliding_fitted_slope(x, y, win):
x = np.concatenate(([0], x))
y = np.concatenate(([0], y))
Sx = np.cumsum(x)
Sy = np.cumsum(y)
Sx2 = np.cumsum(x*x)
Sxy = np.cumsum(x*y)
Sx = Sx[win:] - Sx[:-win]
Sy = Sy[win:] - Sy[:-win]
Sx2 = Sx2[win:] - Sx2[:-win]
Sxy = Sxy[win:] - Sxy[:-win]
return (win*Sxy - Sx*Sy) / (win*Sx2 - Sx*Sx)
With this code you can easily check that (notice I extended the range by 1):
>>> np.allclose(sliding_fitted_slope(x, y, window_length),
[np.polyfit(x[j:j+window_length], y[j:j+window_length], 1)[0]
for j in range(n - window_length + 1)])
True
And:
%timeit sliding_fitted_slope(x, y, window_length)
10000 loops, best of 3: 34.5 us per loop
%%timeit
[np.polyfit(x[j:j+window_length], y[j:j+window_length], 1)[0]
for j in range(n - window_length + 1)]
100 loops, best of 3: 10.1 ms per loop
So it is about 300x faster for your sample data.

Sorry for answering my own question, but 20 minutes more of trying to get to grips with it I have the following solution:
ydot = np.polynomial.polynomial.polyfit(x_array[0], y_array.T, 1)[-1]
One confusing part is that np.polyfit returns the coefficients with the highest power first. In np.polynomial.polynomial.polyfit the highest power is last (hence the -1 instead of 0 index).
Another confusion is that we use only the first slice of x (x_array[0]). I think that this is okay because it is not the absolute values of the independent vector x that are used, but the difference between them. Or alternatively it is like changing the reference x value.
If there is a better way to do this I am still happy to hear about it!

Using an alternative method for calculating the rate of change may be the solution for both speed and accuracy increase.
n = 1000
x = np.linspace(0, 10, n)
y = np.sin(x)
def timingPolyfit(x,y):
window_length = 10
vert_idx_list = np.arange(0, len(x) - window_length, 1)
hori_idx_list = np.arange(window_length)
A, B = np.meshgrid(hori_idx_list, vert_idx_list)
idx_array = A + B
x_array = x[idx_array]
y_array = y[idx_array]
ydot = np.polynomial.polynomial.polyfit(x_array[0], y_array.T, 1)[-1]
x_mids = [x[j+window_length/2] for j in range(n - window_length)]
return ydot, x_mids
def timingSimple(x,y):
dy = (y[2:] - y[:-2])/2
dx = x[1] - x[0]
dydx = dy/dx
return dydx, x[1:-1]
y1, x1 = timingPolyfit(x,y)
y2, x2 = timingSimple(x,y)
polyfitError = np.abs(y1 - np.cos(x1))
simpleError = np.abs(y2 - np.cos(x2))
print("polyfit average error: {:.2e}".format(np.average(polyfitError)))
print("simple average error: {:.2e}".format(np.average(simpleError)))
result = %timeit -o timingPolyfit(x,y)
result2 = %timeit -o timingSimple(x,y)
print("simple is {0} times faster".format(result.best / result2.best))
polyfit average error: 3.09e-03
simple average error: 1.09e-05
100 loops, best of 3: 3.2 ms per loop
100000 loops, best of 3: 9.46 µs per loop
simple is 337.995634151131 times faster
Relative error:
Results:

Related

Vectorizing for loop with repeated indices in python

I am trying to optimize a snippet that gets called a lot (millions of times) so any type of speed improvement (hopefully removing the for-loop) would be great.
I am computing a correlation function of some j'th particle with all others
C_j(|r-r'|) = sqrt(E((s_j(r')-s_k(r))^2)) averaged over k.
My idea is to have a variable corrfun which bins data into some bins (the r, defined elsewhere). I find what bin of r each s_k belongs to and this is stored in ind. So ind[0] is the index of r (and thus the corrfun) for which the j=0 point corresponds to. Multiple points can fall into the same bin (in fact I want bins to be big enough to contain multiple points) so I sum together all of the (s_j(r')-s_k(r))^2 and then divide by number of points in that bin (stored in variable rw). The code I ended up making for this is the following (np is for numpy):
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
rw2 = rw
rw2[rw < 1] = 1
corrfun = np.sqrt(np.divide(corrfun, rw2))
Note, the rw2 business was because I want to avoid divide by 0 problems but I do return the rw array and I want to be able to differentiate between the rw=0 and rw=1 elements. Perhaps there is a more elegant solution for this as well.
Is there a way to make the for-loop faster? While I would like to not add the self interaction (j==k) I am even ok with having self interaction if it means I can get significantly faster calculation (length of ind ~ 1E6 so self interaction is probably insignificant anyways).
Thank you!
Ilya
Edit:
Here is the full code. Note, in the full code I am averaging over j as well.
import numpy as np
def twopointcorr(x,y,s,dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
print(r)
corrfun = r*0
rw = r*0
print(maxR)
''' go through all points'''
for j in range(0, n-1):
hypot = np.sqrt((x[j]-x)**2+(y[j]-y)**2)
ind = [np.abs(r-h).argmin() for h in hypot]
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
rw2 = rw
rw2[rw < 1] = 1
corrfun = np.sqrt(np.divide(corrfun, rw2))
return r, corrfun, rw
I debug test it the following way
from twopointcorr import twopointcorr
import numpy as np
import matplotlib.pyplot as plt
import time
n=1000
x = np.random.rand(n)
y = np.random.rand(n)
s = np.random.rand(n)
print('running two point corr functinon')
start_time = time.time()
r,corrfun,rw = twopointcorr(x,y,s,0.1)
print("--- Execution time is %s seconds ---" % (time.time() - start_time))
fig1=plt.figure()
plt.plot(r, corrfun,'-x')
fig2=plt.figure()
plt.plot(r, rw,'-x')
plt.show()
Again, the main issue is that in the real dataset n~1E6. I can resample to make it smaller, of course, but I would love to actually crank through the dataset.

Here is the code that use broadcast, hypot, round, bincount to remove all the loops:
def twopointcorr2(x, y, s, dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
osub = lambda x:np.subtract.outer(x, x)
ind = np.clip(np.round(np.hypot(osub(x), osub(y)) / dr), 0, len(r)-1).astype(int)
rw = np.bincount(ind.ravel())
rw[0] -= len(x)
corrfun = np.bincount(ind.ravel(), (osub(s)**2).ravel())
return r, corrfun, rw
to compare, I modified your code as follows:
def twopointcorr(x,y,s,dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
corrfun = r*0
rw = r*0
for j in range(0, n):
hypot = np.sqrt((x[j]-x)**2+(y[j]-y)**2)
ind = [np.abs(r-h).argmin() for h in hypot]
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
return r, corrfun, rw
and here is the code to check the results:
import numpy as np
n=1000
x = np.random.rand(n)
y = np.random.rand(n)
s = np.random.rand(n)
r1, corrfun1, rw1 = twopointcorr(x,y,s,0.1)
r2, corrfun2, rw2 = twopointcorr2(x,y,s,0.1)
assert np.allclose(r1, r2)
assert np.allclose(corrfun1, corrfun2)
assert np.allclose(rw1, rw2)
and the %timeit results:
%timeit twopointcorr(x,y,s,0.1)
%timeit twopointcorr2(x,y,s,0.1)
outputs:
1 loop, best of 3: 5.16 s per loop
10 loops, best of 3: 134 ms per loop

Your original code on my system runs in about 5.7 seconds. I fully vectorized the inner loop and got it to run in 0.39 seconds. Simply replace your "go through all points" loop with this:
points = np.column_stack((x,y))
hypots = scipy.spatial.distance.cdist(points, points)
inds = np.rint(hypots.clip(max=maxR) / dr).astype(np.int)
# go through all points
for j in range(n): # n.b. previously n-1, not sure why
ind = inds[j]
np.add.at(corrfun, ind, (s - s[j])**2)
np.add.at(rw, ind, 1)
rw[ind[j]] -= 1 # subtract self
The first observation was that your hypot code was computing 2D distances, so I replaced that with cdist from SciPy to do it all in a single call. The second was that the inner for loop was slow, and thanks to an insightful comment from #hpaulj I vectorized that as well using np.add.at().
Since you asked how to vectorize the inner loop as well, I did that later. It now takes 0.25 seconds to run, for a total speedup of over 20x. Here's the final code:
points = np.column_stack((x,y))
hypots = scipy.spatial.distance.cdist(points, points)
inds = np.rint(hypots.clip(max=maxR) / dr).astype(np.int)
sn = np.tile(s, (n,1)) # n copies of s
diffs = (sn - sn.T)**2 # squares of pairwise differences
np.add.at(corrfun, inds, diffs)
rw = np.bincount(inds.flatten(), minlength=len(r))
np.subtract.at(rw, inds.diagonal(), 1) # subtract self
This uses more memory but does produce a substantial speedup vs. the single-loop version above.

Ok, so as it turns out outer products are incredibly memory expensive, however, using answers from #HYRY and #JohnZwinck i was able to make code that is still roughly linear in n in memory and computes fast (0.5 seconds for the test case)
import numpy as np
def twopointcorr(x,y,s,dr,maxR=-1):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
if maxR < dr:
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR+dr, dr)
corrfun = r*0
rw = r*0
for j in range(0, n):
ind = np.clip(np.round(np.hypot(x[j]-x,y[j]-y) / dr), 0, len(r)-1).astype(int)
np.add.at(corrfun, ind, (s - s[j])**2)
np.add.at(rw, ind, 1)
rw[0] -= n
corrfun = np.sqrt(np.divide(corrfun, np.maximum(rw,1)))
r=np.delete(r,-1)
rw=np.delete(rw,-1)
corrfun=np.delete(corrfun,-1)
return r, corrfun, rw

Python baseline correction library

I am currently working with some Raman Spectra data, and I am trying to correct my data caused by florescence skewing. Take a look at the graph below:
I am pretty close to achieving what I want. As you can see, I am trying to fit a polynomial in all my data whereas I should really just be fitting a polynomial at the local minimas.
Ideally I would want to have a polynomial fitting which when subtracted from my original data would result in something like this:
Are there any built in libs that does this already?
If not, any simple algorithm one can recommend for me?

I found an answer to my question, just sharing for everyone who stumbles upon this.
There is an algorithm called "Asymmetric Least Squares Smoothing" by P. Eilers and H. Boelens in 2005. The paper is free and you can find it on google.
def baseline_als(y, lam, p, niter=10):
L = len(y)
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
w = np.ones(L)
for i in xrange(niter):
W = sparse.spdiags(w, 0, L, L)
Z = W + lam * D.dot(D.transpose())
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z

The following code works on Python 3.6.
This is adapted from the accepted correct answer to avoid the dense matrix diff computation (which can easily cause memory issues) and uses range (not xrange)
import numpy as np
from scipy import sparse
from scipy.sparse.linalg import spsolve
def baseline_als(y, lam, p, niter=10):
L = len(y)
D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))
w = np.ones(L)
for i in range(niter):
W = sparse.spdiags(w, 0, L, L)
Z = W + lam * D.dot(D.transpose())
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z

There is a python library available for baseline correction/removal. It has Modpoly, IModploy and Zhang fit algorithm which can return baseline corrected results when you input the original values as a python list or pandas series and specify the polynomial degree.
Install the library as pip install BaselineRemoval. Below is an example
from BaselineRemoval import BaselineRemoval
input_array=[10,20,1.5,5,2,9,99,25,47]
polynomial_degree=2 #only needed for Modpoly and IModPoly algorithm
baseObj=BaselineRemoval(input_array)
Modpoly_output=baseObj.ModPoly(polynomial_degree)
Imodpoly_output=baseObj.IModPoly(polynomial_degree)
Zhangfit_output=baseObj.ZhangFit()
print('Original input:',input_array)
print('Modpoly base corrected values:',Modpoly_output)
print('IModPoly base corrected values:',Imodpoly_output)
print('ZhangFit base corrected values:',Zhangfit_output)
Original input: [10, 20, 1.5, 5, 2, 9, 99, 25, 47]
Modpoly base corrected values: [-1.98455800e-04 1.61793368e+01 1.08455179e+00 5.21544654e+00
7.20210508e-02 2.15427531e+00 8.44622093e+01 -4.17691125e-03
8.75511661e+00]
IModPoly base corrected values: [-0.84912125 15.13786196 -0.11351367 3.89675187 -1.33134142 0.70220645
82.99739548 -1.44577432 7.37269705]
ZhangFit base corrected values: [ 8.49924691e+00 1.84994576e+01 -3.31739230e-04 3.49854060e+00
4.97412948e-01 7.49628529e+00 9.74951576e+01 2.34940300e+01
4.54929023e+01

Recently, I needed to use this method. The code from answers works well, but it obviously overuses the memory. So, here is my version with optimized memory usage.
def baseline_als_optimized(y, lam, p, niter=10):
L = len(y)
D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))
D = lam * D.dot(D.transpose()) # Precompute this term since it does not depend on `w`
w = np.ones(L)
W = sparse.spdiags(w, 0, L, L)
for i in range(niter):
W.setdiag(w) # Do not create a new matrix, just update diagonal values
Z = W + D
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z
According to my benchmarks bellow, it is also about 1,5 times faster.
%%timeit -n 1000 -r 10 y = randn(1000)
baseline_als(y, 10000, 0.05) # function from #jpantina's answer
# 20.5 ms ± 382 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)
%%timeit -n 1000 -r 10 y = randn(1000)
baseline_als_optimized(y, 10000, 0.05)
# 13.3 ms ± 874 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)
NOTE 1: The original article says:
To emphasize the basic simplicity of the algorithm, the number of iterations has been fixed to 10. In practical applications one should check whether the weights show any change; if not, convergence has been attained.
So, it means that the more correct way to stop iteration is to check that ||w_new - w|| < tolerance
NOTE 2: Another useful quote (from #glycoaddict's comment) gives an idea how to choose values of the parameters.
There are two parameters: p for asymmetry and λ for smoothness. Both have to be
tuned to the data at hand. We found that generally 0.001 ≤ p ≤ 0.1 is a good choice (for a signal with positive peaks) and 102 ≤ λ ≤ 109, but exceptions may occur. In any case one should vary λ on a grid that is approximately linear for log λ. Often visual inspection is sufficient to get good parameter values.

I worked the version of the algorithm referenced by glinka in a previous comment, which is an improvement of the penalized weighted linear squares method published in a relatively recent paper. I took Rustam Guliev's code to build this one:
from scipy import sparse
from scipy.sparse import linalg
import numpy as np
from numpy.linalg import norm
def baseline_arPLS(y, ratio=1e-6, lam=100, niter=10, full_output=False):
L = len(y)
diag = np.ones(L - 2)
D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L - 2)
H = lam * D.dot(D.T) # The transposes are flipped w.r.t the Algorithm on pg. 252
w = np.ones(L)
W = sparse.spdiags(w, 0, L, L)
crit = 1
count = 0
while crit > ratio:
z = linalg.spsolve(W + H, W * y)
d = y - z
dn = d[d < 0]
m = np.mean(dn)
s = np.std(dn)
w_new = 1 / (1 + np.exp(2 * (d - (2*s - m))/s))
crit = norm(w_new - w) / norm(w)
w = w_new
W.setdiag(w) # Do not create a new matrix, just update diagonal values
count += 1
if count > niter:
print('Maximum number of iterations exceeded')
break
if full_output:
info = {'num_iter': count, 'stop_criterion': crit}
return z, d, info
else:
return z
In order to test the algorithm, I created a spectrum similar to the one shown in Fig. 3 of the paper, by first generating a simulated spectra consisting of multiple Gaussian peaks:
def spectra_model(x):
coeff = np.array([100, 200, 100])
mean = np.array([300, 750, 800])
stdv = np.array([15, 30, 15])
terms = []
for ind in range(len(coeff)):
term = coeff[ind] * np.exp(-((x - mean[ind]) / stdv[ind])**2)
terms.append(term)
spectra = sum(terms)
return spectra
x_vals = np.arange(1, 1001)
spectra_sim = spectra_model(x_vals)
Then, I created a third-order interpolating polynomial using 4 points taken directly from the paper:
from scipy.interpolate import CubicSpline
x_poly = np.array([0, 250, 700, 1000])
y_poly = np.array([200, 180, 230, 200])
poly = CubicSpline(x_poly, y_poly)
baseline = poly(x_vals)
noise = np.random.randn(len(x_vals)) * 0.1
spectra_base = spectra_sim + baseline + noise
Finally, I used the baseline correction algorithm to subtract the baseline out of the altered spectra (spectra_base):
_, spectra_arPLS, info = baseline_arPLS(spectra_base, lam=1e4, niter=10,
full_output=True)
The results were (for reference, I compared with the pure ALS implementation by Rustam Guliev's, using lam = 1e4 and p = 0.001):

I know this is an old question, but I stumpled upon it a few months ago and implemented the equivalent answer using spicy.sparse routines.
# Baseline removal
def baseline_als(y, lam, p, niter=10):
s = len(y)
# assemble difference matrix
D0 = sparse.eye( s )
d1 = [numpy.ones( s-1 ) * -2]
D1 = sparse.diags( d1, [-1] )
d2 = [ numpy.ones( s-2 ) * 1]
D2 = sparse.diags( d2, [-2] )
D = D0 + D2 + D1
w = np.ones( s )
for i in range( niter ):
W = sparse.diags( [w], [0] )
Z = W + lam*D.dot( D.transpose() )
z = spsolve( Z, w*y )
w = p * (y > z) + (1-p) * (y < z)
return z
Cheers,
Pedro.

Correctly annotate a numba function using jit

I started with this code to calculate a simple matrix multiplication. It runs with %timeit in around 7.85s on my machine.
To try to speed this up I tried cython which reduced the time to 0.4s. I want to also try to use numba jit compiler to see if I can get similar speed ups (with less effort). But adding the #jit annotation appears to give exactly the same timings (~7.8s). I know it can't figure out the types of the calculate_z_numpy() call but I'm not sure what I can do to coerce it. Any ideas?
from numba import jit
import numpy as np
#jit('f8(c8[:],c8[:],uint)')
def calculate_z_numpy(q, z, maxiter):
"""use vector operations to update all zs and qs to create new output array"""
output = np.resize(np.array(0, dtype=np.int32), q.shape)
for iteration in range(maxiter):
z = z*z + q
done = np.greater(abs(z), 2.0)
q = np.where(done, 0+0j, q)
z = np.where(done, 0+0j, z)
output = np.where(done, iteration, output)
return output
def calc_test():
w = h = 1000
maxiter = 1000
# make a list of x and y values which will represent q
# xx and yy are the co-ordinates, for the default configuration they'll look like:
# if we have a 1000x1000 plot
# xx = [-2.13, -2.1242,-2.1184000000000003, ..., 0.7526000000000064, 0.7584000000000064, 0.7642000000000064]
# yy = [1.3, 1.2948, 1.2895999999999999, ..., -1.2844000000000058, -1.2896000000000059, -1.294800000000006]
x1, x2, y1, y2 = -2.13, 0.77, -1.3, 1.3
x_step = (float(x2 - x1) / float(w)) * 2
y_step = (float(y1 - y2) / float(h)) * 2
y = np.arange(y2,y1-y_step,y_step,dtype=np.complex)
x = np.arange(x1,x2,x_step)
q1 = np.empty(y.shape[0],dtype=np.complex)
q1.real = x
q1.imag = y
# Transpose y
x_y_square_matrix = x+y[:, np.newaxis] # it is np.complex128
# convert square matrix to a flatted vector using ravel
q2 = np.ravel(x_y_square_matrix)
# create z as a 0+0j array of the same length as q
# note that it defaults to reals (float64) unless told otherwise
z = np.zeros(q2.shape, np.complex128)
output = calculate_z_numpy(q2, z, maxiter)
print(output)
calc_test()

I figured out how to do this with some help from someone else.
#jit('i4[:](c16[:],c16[:],i4,i4[:])',nopython=True)
def calculate_z_numpy(q, z, maxiter,output):
"""use vector operations to update all zs and qs to create new output array"""
for iteration in range(maxiter):
for i in range(len(z)):
z[i] = z[i] + q[i]
if z[i] > 2:
output[i] = iteration
z[i] = 0+0j
q[i] = 0+0j
return output
What I learnt is that use numpy datastructures as inputs (for typing), but within use c like paradigms for looping.
This runs in 402ms which is a touch faster than cython code 0.45s so for fairly minimal work in rewriting the loop explicitly we have a python version faster than C(just).

Python/Cython: efficient regression

I'm working on an algorithm (written in Python/Cython) that estimates the gradient of each point in noisy data, using a variable window size. It's working very well, but it seems that the algorithm is limited by the regression part. Here is what I use:
cdef double regression(np.ndarray[DTYPE_t, ndim=1] data, np.ndarray[DTYPE_t, ndim=1] time, unsigned int leftlim2, unsigned int rightlim2):
cdef unsigned int length, j
cdef double x, y, sumx, sumy, xy, xx, result, a, b, invlen
length = 0
sumx = 0
sumy = 0
xy = 0
xx = 0
for j from leftlim2 <= j < rightlim2:
x = time[j]
y = data[j]
sumx += x
sumy += y
xy += x*y
xx += x*x
length = rightlim2 - leftlim2
invlen = 1.0/length
a = xy-(sumx*sumy)*invlen
b = xx-(sumx*sumx)*invlen
result = a/b
return result
Inputs:
vectors/arrays of the data and time that was measured during an experiment. The data array contains noisy data of, for example, applied force, the time array contains equally spaced time recordings (0.1s, 0.2s, 0.3s, etc.)
the left and right limits of how much data has to be included for the regression, provided as indices (i.e. the data used for regression is given by data[leftlim2:rightlim2])
Output: the slope of a straight line (y = a*x + b) approximating the dataset.
I'm only interested in the slope, not in the intercept, hence the use of a loop rather than regression using matrix-vector multiplications. I was wondering if anyone knows a way to increase the efficiency of the regression, without sacrificing accuracy. Perhaps there's a way to exploit the equal spacing of the time array?

I'm not familiar with Cython syntax, but something like this should speed things up:
def my_regression(data, time, leftlim, rightlim):
timeslice = time[leftlim:rightlim]
dataslice = data[leftlim:rightlim]
sumx = sum(timeslice)
a = sum(timeslice*dataslice)-sum(dataslice)*sumx/(rightlim-leftlim)
b = sum(timeslice**2)-sumx**2/(rightlim-leftlim)
return a/b
Timing results:
n = 1000
data = np.random.random(n)
time = np.arange(n,dtype=float)/n
leftlim = 10
rightlim = 900
%timeit my_regression(data,time,leftlim,rightlim)
>> 10000 loops, best of 3: 74.3 µs per loop
%timeit your_regression(data,time,leftlim,rightlim)
>> 100 loops, best of 3: 2.88 ms per loop

Scipy Fast 1-D interpolation without any loop

I have two 2D array, x(ni, nj) and y(ni,nj), that I need to interpolate over one axis. I want to interpolate along last axis for every ni.
I wrote
import numpy as np
from scipy.interpolate import interp1d
z = np.asarray([200,300,400,500,600])
out = []
for i in range(ni):
f = interp1d(x[i,:], y[i,:], kind='linear')
out.append(f(z))
out = np.asarray(out)
However, I think this method is inefficient and slow due to loop if array size is too large. What is the fastest way to interpolate multi-dimensional array like this? Is there any way to perform linear and cubic interpolation without loop? Thanks.

The method you propose does have a python loop, so for large values of ni it is going to get slow. That said, unless you are going to have large ni you shouldn't worry much.
I have created sample input data with the following code:
def sample_data(n_i, n_j, z_shape) :
x = np.random.rand(n_i, n_j) * 1000
x.sort()
x[:,0] = 0
x[:, -1] = 1000
y = np.random.rand(n_i, n_j)
z = np.random.rand(*z_shape) * 1000
return x, y, z
And have tested them with this two versions of linear interpolation:
def interp_1(x, y, z) :
rows, cols = x.shape
out = np.empty((rows,) + z.shape, dtype=y.dtype)
for j in xrange(rows) :
out[j] =interp1d(x[j], y[j], kind='linear', copy=False)(z)
return out
def interp_2(x, y, z) :
rows, cols = x.shape
row_idx = np.arange(rows).reshape((rows,) + (1,) * z.ndim)
col_idx = np.argmax(x.reshape(x.shape + (1,) * z.ndim) > z, axis=1) - 1
ret = y[row_idx, col_idx + 1] - y[row_idx, col_idx]
ret /= x[row_idx, col_idx + 1] - x[row_idx, col_idx]
ret *= z - x[row_idx, col_idx]
ret += y[row_idx, col_idx]
return ret
interp_1 is an optimized version of your code, following Dave's answer. interp_2 is a vectorized implementation of linear interpolation that avoids any python loop whatsoever. Coding something like this requires a sound understanding of broadcasting and indexing in numpy, and some things are going to be less optimized than what interp1d does. A prime example being finding the bin in which to interpolate a value: interp1d will surely break out of loops early once it finds the bin, the above function is comparing the value to all bins.
So the result is going to be very dependent on what n_i and n_j are, and even how long your array z of values to interpolate is. If n_j is small and n_i is large, you should expect an advantage from interp_2, and from interp_1 if it is the other way around. Smaller z should be an advantage to interp_2, longer ones to interp_1.
I have actually timed both approaches with a variety of n_i and n_j, for z of shape (5,) and (50,), here are the graphs:
So it seems that for z of shape (5,) you should go with interp_2 whenever n_j < 1000, and with interp_1 elsewhere. Not surprisingly, the threshold is different for z of shape (50,), now being around n_j < 100. It seems tempting to conclude that you should stick with your code if n_j * len(z) > 5000, but change it to something like interp_2 above if not, but there is a great deal of extrapolating in that statement! If you want to further experiment yourself, here's the code I used to produce the graphs.
n_s = np.logspace(1, 3.3, 25)
int_1 = np.empty((len(n_s),) * 2)
int_2 = np.empty((len(n_s),) * 2)
z_shape = (5,)
for i, n_i in enumerate(n_s) :
print int(n_i)
for j, n_j in enumerate(n_s) :
x, y, z = sample_data(int(n_i), int(n_j), z_shape)
int_1[i, j] = min(timeit.repeat('interp_1(x, y, z)',
'from __main__ import interp_1, x, y, z',
repeat=10, number=1))
int_2[i, j] = min(timeit.repeat('interp_2(x, y, z)',
'from __main__ import interp_2, x, y, z',
repeat=10, number=1))
cs = plt.contour(n_s, n_s, np.transpose(int_1-int_2))
plt.clabel(cs, inline=1, fontsize=10)
plt.xlabel('n_i')
plt.ylabel('n_j')
plt.title('timeit(interp_2) - timeit(interp_1), z.shape=' + str(z_shape))
plt.show()

One optimization is to allocate the result array once like so:
import numpy as np
from scipy.interpolate import interp1d
z = np.asarray([200,300,400,500,600])
out = np.zeros( [ni, len(z)], dtype=np.float32 )
for i in range(ni):
f = interp1d(x[i,:], y[i,:], kind='linear')
out[i,:]=f(z)
This will save you some memory copying that occurs in your implementation, which occurs in the calls to out.append(...).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fully vectorise numpy polyfit - python

Related

Vectorizing for loop with repeated indices in python

Python baseline correction library

Correctly annotate a numba function using jit

Python/Cython: efficient regression

Scipy Fast 1-D interpolation without any loop

Categories

Resources