I have an array that is not monotonic increasing. I would like to make it monotonic increasing applying a constant rate when the array decreases.
I have create a small example here where the rate is 0.2:
# Rate
rate = 0.2
# Array to interpolate
arr1 = np.array([0,1,2,3,4,4,4,3,2,2.5,3.5,5.2,7,10,9.5,np.nan,np.nan,np.nan,11.2, 11.4, 12,10,9,9.5,10.2,10.5,10.8,12,12.5,15],dtype=float)
# Line with constant rate at first monotonic decrease (index 6)
xx1 = 6
xr1 = np.array(np.arange(0,arr1.shape[0]+1),dtype=float)
yr1 = rate*xr1 + (arr1[xx1]-rate*xx1)
# Line with constant rate at second monotonic decrease [index 14]
xx2 = 13
xr2 = np.array(np.arange(0,arr1.shape[0]+1),dtype=float)
yr2 = rate*xr2 + (arr1[xx2]-rate*xx2)
# Line with constant rate at second monotonic decrease [index 14]
xx3 = 20
xr3 = np.array(np.arange(0,arr1.shape[0]+1),dtype=float)
yr3 = rate*xr3 + (arr1[xx3]-rate*xx3)
plt.figure()
plt.plot(arr1,'.-',label='Original')
plt.plot(xr1,yr1,label='Const Rate line 1')
plt.plot(xr2,yr2,label='Const Rate line 2')
plt.plot(xr3,yr3,label='Const Rate line 2')
plt.legend()
plt.grid()
The "Original" array is my dataset.
The final results I would like is the blue + red-dashed line. In the figure I highlighted also the "constant rate curves".
Since I have very large arrays (millions of records), I would like to avoid for-loops over the entire array.
Thanks a lot to everybody for the help!
Here's a different option: If you are interested in plotting monotonically increasing curve from your data, then you can simply skip the unwanted points between two successive increasing points, e.g. between arr1[6] = 4 and arr1[11] = 5, by connecting them with a line.
import numpy as np
import matplotlib.pyplot as plt
arr1 = np.array([0,1,2,3,4,4,4,3,2,2.5,3.5,5.2,7,10,9.5,np.nan,np.nan,np.nan,11.2, 11.4, 12,10,9,9.5,10.2,10.5,10.8,12,12.5,15],dtype=float)
mask = (arr1 == np.maximum.accumulate(np.nan_to_num(arr1)))
x = np.arange(len(arr1))
plt.figure()
plt.plot(x, arr1,'.-',label='Original')
plt.plot(x[mask], arr1[mask], 'r-', label='Interp.')
plt.legend()
plt.grid()
arr2 = arr1[1:] - arr1[:-1]
ind = numpy.where(arr2 < 0)[0]
for i in ind:
arr1[i] = arr1[i - 1] + rate
You may need to replace first any numpy.nan with values, such as numpy.amin(arr1)
I would like to avoid for-loops over the entire array.
Frankly speaking, it is hard to achieve no for-loops in numpy, because numpy as the C-made-library uses for-loops implemented in C / C++. And all sorting algorithm (like np.argwhere, np.all, etc.) requires comparisons and therefore also iterations.
Contrarily, I suggest using at least one explicit loop made in Python (iteration is made only once):
arr0 = np.zeros_like(arr1)
num = 1
rate = .2
while(num < len(arr1)):
if arr1[num] < arr1[num-1] or np.isnan(arr1[num]):
start = arr1[num-1]
while(start > arr1[num] or np.isnan(arr1[num])):
print(arr1[num])
arr0[num] = arr0[num-1] + rate
num+=1
continue
arr0[num] = arr1[num]
num +=1
Your problem can be expressed in one simple recursive difference equation:
y[n] = max(y[n-1] + 0.2, x[n])
So the direct Python form would be
def func(a):
out = np.zeros_like(a)
out[0] = a[0]
for i in range(1, len(a)):
out[i] = max(out[i-1] + 0.2, a[i])
return out
Unfortunately, this equation is recursive and non-linear, so finding a vectorized algorithm may be difficult.
However, using Numba we can speed up this loop-based algorithm by a factor of 300:
fastfunc = numba.jit(func)
arr1 = np.random.rand(1000000)
%timeit func(arr1)
# 599 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fastfunc(arr1)
# 2.22 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I finally managed to do what I wanted with a while loop.
# data['myvar'] is the original dataset I want to reshape
data['myvar_corrected'] = data['myvar'].values
temp_d = data['myvar'].fillna(0).values*1.0
dtc = np.maximum.accumulate(temp_d)
data.loc[temp_d < np.maximum.accumulate(dtc),'myvar_corrected'] = float('nan')
stay_in_while = True
min_rate = 5/200000/(24*60)
idx_next = 0
while stay_in_while:
df_temp = data.iloc[idx_next:]
if df_tem['myvar'].isnull().sum()>0:
idx_first_nan = df_temp.reset_index().['myvar_corrected'].isnull().argmax()
idx_nan_or = (data_new.index.values==df_temp.index.values[idx_first_nan]).argmax()
x = np.arange(idx_first_nan-1,df_temp.shape[0])
y0 = df_temp.iloc[idx_first_nan-1]['myvar_corrected']
rate_curve = min_rate*x + (y0 - min_rate*(idx_first_nan-1))
damage_m_rate = df_temp.iloc[idx_first_nan-1:]['myvar_corrected']-rate_curve
try:
idx_intercept = (data_new.index.values==damage_m_rate[damage_m_rate>0].index.values[0]).argmax()
data_new.iloc[idx_nan_or:idx_intercept]['myvar'] = rate_curve[0:(damage_m_rate.index.values==damage_m_rate[damage_m_rate>0].index.values[0]).argmax()-1]
idx_next = idx_intercept + 1
except:
stay_in_while = False
else:
stay_in_while = False
# Finally I have my result stored in data_new['myvar']
In the following picture the result.
Thanks to everybody for the contribution!
Related
The task is to combine two arrays row by row (construct the permutations) based on the resulting multiplication of two corresponding vectors. Such as:
Row1_A,
Row2_A,
Row3_A,
Row1_B,
Row2_B,
Row3_B,
The result should be: Row1_A_Row1_B, Row1_A_Row2_B, Row1_A_Row3_B, Row2_A_Row1_B, etc..
Given the following initial arrays:
n_rows = 1000
A = np.random.randint(10, size=(n_rows, 5))
B = np.random.randint(10, size=(n_rows, 5))
P_A = np.random.rand(n_rows, 1)
P_B = np.random.rand(n_rows, 1)
Arrays P_A and P_B are corresponding vectors to the individual arrays, which contain a float. The combined rows should only appear in the final array if the multiplication surpasses a certain threshold, for example:
lim = 0.8
I have thought of the following functions or ways to solve this problem, but I would be interested in faster solutions. I am open to using numba or other libraries, but ideally I would like to improve the vectorized solution using numpy.
Method A
def concatenate_per_row(A, B):
m1,n1 = A.shape
m2,n2 = B.shape
out = np.zeros((m1,m2,n1+n2),dtype=A.dtype)
out[:,:,:n1] = A[:,None,:]
out[:,:,n1:] = B
return out.reshape(m1*m2,-1)
%%timeit
A_B = concatenate_per_row(A, B)
P_A_B = (P_A[:, None]*P_B[None, :])
P_A_B = P_A_B.flatten()
idx = P_A_B > lim
A_B = A_B[idx, :]
P_A_B = P_A_B[idx]
37.8 ms ± 660 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method B
%%timeit
A_B = []
P_A_B = []
for i in range(len(P_A)):
P_A_B_i = P_A[i]*P_B
idx = np.where(P_A_B_i > lim)[0]
if len(idx) > 0:
P_A_B.append(P_A_B_i[idx])
A_B_i = np.zeros((len(idx), A.shape[1] + B.shape[1]), dtype='int')
A_B_i[:, :A.shape[1]] = A[i]
A_B_i[:, A.shape[1]:] = B[idx, :]
A_B.append(A_B_i)
A_B = np.concatenate(A_B)
P_A_B = np.concatenate(P_A_B)
9.65 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
First of all, there is a more efficient algorithm. Indeed, you can pre-compute the size of the output array so the values can be directly written in the final output arrays rather than stored temporary in lists. To find the size efficiently, you can sort the array P_B and then do a binary search so to find the number of value greater than lim/P_A[i,0] for all possible i (P_B*P_A[i,0] > lim is equivalent to P_B > lim/P_A[i,0]). The number of item filtered for each i can be temporary stored so to quickly loop over the filtered items.
Moreover, you can use Numba to significantly speed the computation of the main loop up.
Here is the resulting code:
#nb.njit('(int_[:,::1], int_[:,::1], float64[:,::1], float64[:,::1])')
def compute(A, B, P_A, P_B):
assert P_A.shape[1] == 1
assert P_B.shape[1] == 1
P_B_sorted = np.sort(P_B.reshape(P_B.size))
counts = len(P_B) - np.searchsorted(P_B_sorted, lim/P_A[:,0], side='right')
n = np.sum(counts)
mA, mB = A.shape[1], B.shape[1]
m = mA + mB
A_B = np.empty((n, m), dtype=np.int_)
P_A_B = np.empty((n, 1), dtype=np.float64)
k = 0
for i in range(P_A.shape[0]):
if counts[i] > 0:
idx = np.where(P_B > lim/P_A[i, 0])[0]
assert counts[i] == len(idx)
start, end = k, k + counts[i]
A_B[start:end, :mA] = A[i, :]
A_B[start:end, mA:] = B[idx, :]
P_A_B[start:end, :] = P_B[idx, :] * P_A[i, 0]
k += counts[i]
return A_B, P_A_B
Here are performance results on my machine:
Original: 35.6 ms
Optimized original: 18.2 ms
Proposed (with order): 0.9 ms
Proposed (no ordering): 0.3 ms
The algorithm proposed above is 20 times faster than the original optimized algorithm. It can be made even faster. Indeed, if the order of items do not matter you can use an argsort so to reorder both B and P_B. This enable you not to compute idx every time in the hot loop and select directly the last elements from B and P_B (that are guaranteed to be higher than the threshold but not in the same order than the original code). Because the selected items are stored contiguously in memory, this implementation is much faster. In the end, this last implementation is about 60 times faster than the original optimized algorithm. Note that the proposed implementations are significantly faster than the original ones even without Numba.
Here is the implementation that do not care about the order of the items:
#nb.njit('(int_[:,::1], int_[:,::1], float64[:,::1], float64[:,::1])')
def compute(A, B, P_A, P_B):
assert P_A.shape[1] == 1
assert P_B.shape[1] == 1
nA, mA = A.shape
nB, mB = B.shape
m = mA + mB
order = np.argsort(P_B.reshape(nB))
P_B_sorted = P_B[order, :]
B_sorted = B[order, :]
counts = nB - np.searchsorted(P_B_sorted.reshape(nB), lim/P_A[:,0], side='right')
nRes = np.sum(counts)
A_B = np.empty((nRes, m), dtype=np.int_)
P_A_B = np.empty((nRes, 1), dtype=np.float64)
k = 0
for i in range(P_A.shape[0]):
if counts[i] > 0:
start, end = k, k + counts[i]
A_B[start:end, :mA] = A[i, :]
A_B[start:end, mA:] = B_sorted[nB-counts[i]:, :]
P_A_B[start:end, :] = P_B_sorted[nB-counts[i]:, :] * P_A[i, 0]
k += counts[i]
return A_B, P_A_B
Input: hermitian matrix \rho_{i,j} with i,j=0,1,..d-1
Output: neg=\sum |W(mu,m)|-W(mu,m), the sum over all mu,m=0,..d-1 and
W(mu,m)=\sum exp(-4i\pi mu n /d) \rho_{(m-n)%d,(m+n)%d}, where n=0,..d-1
Problem: 1) for large d (d>5 000) a direct method (see snippet 1) is rather slow.
2) making use of 'np.fft.fft()' is much faster, but in the definition exponent with 2 is used rather than 4
https://docs.scipy.org/doc/numpy/reference/routines.fft.html#module-numpy.fft
Is it possible to improve snippet 1 using snippet 2 to boost speed calculation? May be one can use 2D fft?
Snippet 1:
W=np.zeros([d,d])
neg=0
for mu in range(d):
for m in range(d):
x=0
for n in range(d):
x+=np.exp(-4*np.pi*1.0j*mu*n/N)*rho[(m-n)%d,(m+n)%d]
W[mu,m]=x.real
neg+=np.abs(W[mu,m])-W[mu,m]
Snippet 2:
# create matrix \rho
psi=np.random.rand(500)
psi=psi/np.linalg.norm(psi) # normalize it
d=len(psi)
rho=np.outer(psi,np.conj(psi))
#
start_time=time.time()
m=1 # for example, use particular m
a=np.array([rho[(m-nn)%d,(m+nn)%d] for nn in range(d)])
ft=np.fft.fft(a)
end_time=time.time()
print(end_time-start_time)
Remove the nested loops by exploiting numpys array arithmetics.
import numpy as np
def my_sins(x):
return np.sin(2*x) + np.cos(4*x) + np.sin(3*x)
def dft(x, n=None):
if n is None:
n = len(x)
k = len(x)
cn = np.sum(x*np.exp(-2*np.pi*1j*np.outer(np.arange(k),np.arange(n))/n),1)
return cn
For some sample data
x = np.linspace(0,2*np.pi,1000)
y = my_sins(x)
%timeit dft(y)
On my system this yields:
145 ms ± 953 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
I have several thousand "observations". Each observation consists of location (x,y) and sensor reading (z), see example below.
I would like to fit a bi-linear surface to the x,y, and z data. I am currently doing it with the code-snippet from amroamroamro/gist:
def bi2Dlinter(xdata, ydata, zdata, gridrez):
X,Y = np.meshgrid(
np.linspace(min(x), max(x), endpoint=True, num=gridrez),
np.linspace(min(y), max(y), endpoint=True, num=gridrez))
A = np.c_[xdata, ydata, np.ones(len(zdata))]
C,_,_,_ = scipy.linalg.lstsq(A, zdata)
Z = C[0]*X + C[1]*Y + C[2]
return Z
My current approach is to cycle through the rows of the DataFrame. (This works great for 1000 observations but is not usable for larger data-sets.)
ZZ = []
for index, row in df2.iterrows():
x=row['x1'], row['x2'], row['x3'], row['x4'], row['x5']
y=row['y1'], row['y2'], row['y3'], row['y4'], row['y5']
z=row['z1'], row['z2'], row['z3'], row['z4'], row['z5']
ZZ.append(np.median(bi2Dlinter(x,y,z,gridrez)))
df2['ZZ']=ZZ
I would be surprised if there is not a more efficient way to do this.
Is there a way to vectorize the linear interpolation?
I put the code here which also generates dummy entries.
Thanks
Looping over DataFrames like this is generally not recommended. Instead you should opt to try and vectorize your code as much as possible.
First we create an array for your inputs
x_vals = df2[['x1','x2','x3','x4','x5']].values
y_vals = df2[['y1','y2','y3','y4','y5']].values
z_vals = df2[['z1','z2','z3','z4','z5']].values
Next we need to create a bi2Dlinter function that handles vector inputs, this involves changing linspace/meshgrid to work for an array and changing the least_squares function. Normally scipy.linalg functions work over an array but as far as I'm aware the .lstsq method doesn't. Instead we can use the .SVD to replicate the same functionality over an array.
def create_ranges(start, stop, N, endpoint=True):
if endpoint==1:
divisor = N-1
else:
divisor = N
steps = (1.0/divisor) * (stop - start)
return steps[:,None]*np.arange(N) + start[:,None]
def linspace_nd(x,y,gridrez):
a1 = create_ranges(x.min(axis=1), x.max(axis=1), N=gridrez, endpoint=True)
a2 = create_ranges(y.min(axis=1), y.max(axis=1), N=gridrez, endpoint=True)
out_shp = a1.shape + (a2.shape[1],)
Xout = np.broadcast_to(a1[:,None,:], out_shp)
Yout = np.broadcast_to(a2[:,:,None], out_shp)
return Xout, Yout
def stacked_lstsq(L, b, rcond=1e-10):
"""
Solve L x = b, via SVD least squares cutting of small singular values
L is an array of shape (..., M, N) and b of shape (..., M).
Returns x of shape (..., N)
"""
u, s, v = np.linalg.svd(L, full_matrices=False)
s_max = s.max(axis=-1, keepdims=True)
s_min = rcond*s_max
inv_s = np.zeros_like(s)
inv_s[s >= s_min] = 1/s[s>=s_min]
x = np.einsum('...ji,...j->...i', v,
inv_s * np.einsum('...ji,...j->...i', u, b.conj()))
return np.conj(x, x)
def vectorized_bi2Dlinter(x_vals, y_vals, z_vals, gridrez):
X,Y = linspace_nd(x_vals, y_vals, gridrez)
A = np.stack((x_vals,y_vals,np.ones_like(z_vals)), axis=2)
C = stacked_lstsq(A, z_vals)
n_bcast = C.shape[0]
return C.T[0].reshape((n_bcast,1,1))*X + C.T[1].reshape((n_bcast,1,1))*Y + C.T[2].reshape((n_bcast,1,1))
Upon testing this on data for n=10000 rows, the vectorized function was significantly faster.
%%timeit
ZZ = []
for index, row in df2.iterrows():
x=row['x1'], row['x2'], row['x3'], row['x4'], row['x5']
y=row['y1'], row['y2'], row['y3'], row['y4'], row['y5']
z=row['z1'], row['z2'], row['z3'], row['z4'], row['z5']
ZZ.append((bi2Dlinter(x,y,z,gridrez)))
df2['ZZ']=ZZ
Out: 5.52 s ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
res = vectorized_bi2Dlinter(x_vals,y_vals,z_vals,gridrez)
Out: 74.6 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
You should pay careful attention to whats going on in this vectorize function and familiarize yourself with broadcasting in numpy. I cannot take credit for the first three functions, instead I will link their answers from stack overflow for you to get an understanding.
Vectorized NumPy linspace for multiple start and stop values
how to solve many overdetermined systems of linear equations using vectorized codes?
How to use numpy.c_ properly for arrays
I have a function to calculate the log gamma function that I am decorating with numba.njit.
import numpy as np
from numpy import log
from scipy.special import gammaln
from numba import njit
coefs = np.array([
57.1562356658629235, -59.5979603554754912,
14.1360979747417471, -0.491913816097620199,
.339946499848118887e-4, .465236289270485756e-4,
-.983744753048795646e-4, .158088703224912494e-3,
-.210264441724104883e-3, .217439618115212643e-3,
-.164318106536763890e-3, .844182239838527433e-4,
-.261908384015814087e-4, .368991826595316234e-5
])
#njit(fastmath=True)
def gammaln_nr(z):
"""Numerical Recipes 6.1"""
y = z
tmp = z + 5.24218750000000000
tmp = (z + 0.5) * log(tmp) - tmp
ser = np.ones_like(y) * 0.999999999999997092
n = coefs.shape[0]
for j in range(n):
y = y + 1
ser = ser + coefs[j] / y
out = tmp + log(2.5066282746310005 * ser / z)
return out
When I use gammaln_nr for a large array, say np.linspace(0.001, 100, 10**7), my run time is about 7X slower than scipy (see code in appendix below). However, if I run for any individual value, my numba function is always about 2X faster. How is this happening?
z = 11.67
%timeit gammaln_nr(z)
%timeit gammaln(z)
>>> 470 ns ± 29.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> 1.22 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
My intuition is that if my function is faster for one value, it should be faster for an array of values. Of course, this may not be the case because I don't know whether numba is using SIMD instructions or some other sort of vectorization, whereas scipy may be.
Appendix
import matplotlib.pyplot as plt
import seaborn as sns
n_trials = 8
scipy_times = np.zeros(n_trials)
fastats_times = np.zeros(n_trials)
for i in range(n_trials):
zs = np.linspace(0.001, 100, 10**i) # evaluate gammaln over this range
# dont take first timing - this is just compilation
start = time.time()
gammaln_nr(zs)
end = time.time()
start = time.time()
gammaln_nr(zs)
end = time.time()
fastats_times[i] = end - start
start = time.time()
gammaln(zs)
end = time.time()
scipy_times[i] = end - start
fig, ax = plt.subplots(figsize=(12,8))
sns.lineplot(np.logspace(0, n_trials-1, n_trials), fastats_times, label="numba");
sns.lineplot(np.logspace(0, n_trials-1, n_trials), scipy_times, label="scipy");
ax.set(xscale="log");
ax.set_xlabel("Array Size", fontsize=15);
ax.set_ylabel("Execution Time (s)", fontsize=15);
ax.set_title("Execution Time of Log Gamma");
Implementing gammaln in Numba
It can be quite some work to reimplement some often used functions, not only to reach the performance, but also to get a well defined level of precision.
So the direct way would be to simply wrap a working implementation.
In case of gammaln scipy- calls a C-implemntation of this function. Therefore the speed of the scipy-implementation also depends on the compiler and compilerflags used when compiling the scipy dependencies.
It is also not very suprising that the performance results for one value can differ quite a lot from the results of larger arrays.
In the first case the calling overhead (including type conversions, input checking,...) dominates, in the second case the performance of the
implementation gets more and more important.
Improving your implementation
Write explicit loops. In Numba vectorized operations are expanded to loops and after that Numba tries to join the loops. It is often better to write out and join this loops manually.
Think of the differences of basic arithmetic implementations. Python always checks for a division by 0 and raises an exception in such a case, which is very costly. Numba also uses this behaviour by default, but you can also switch to Numpy-error checking. In this case a division by 0 results in NaN. The way NaN and Inf -0/+0 is handled in further calculations is also influenced by the fast-math flag.
Code
import numpy as np
from numpy import log
from scipy.special import gammaln
from numba import njit
import numba as nb
#njit(fastmath=True,error_model='numpy')
def gammaln_nr(z):
"""Numerical Recipes 6.1"""
#Don't use global variables.. (They only can be changed if you recompile the function)
coefs = np.array([
57.1562356658629235, -59.5979603554754912,
14.1360979747417471, -0.491913816097620199,
.339946499848118887e-4, .465236289270485756e-4,
-.983744753048795646e-4, .158088703224912494e-3,
-.210264441724104883e-3, .217439618115212643e-3,
-.164318106536763890e-3, .844182239838527433e-4,
-.261908384015814087e-4, .368991826595316234e-5])
out=np.empty(z.shape[0])
for i in range(z.shape[0]):
y = z[i]
tmp = z[i] + 5.24218750000000000
tmp = (z[i] + 0.5) * np.log(tmp) - tmp
ser = 0.999999999999997092
n = coefs.shape[0]
for j in range(n):
y = y + 1.
ser = ser + coefs[j] / y
out[i] = tmp + log(2.5066282746310005 * ser / z[i])
return out
#njit(fastmath=True,error_model='numpy',parallel=True)
def gammaln_nr_p(z):
"""Numerical Recipes 6.1"""
#Don't use global variables.. (They only can be changed if you recompile the function)
coefs = np.array([
57.1562356658629235, -59.5979603554754912,
14.1360979747417471, -0.491913816097620199,
.339946499848118887e-4, .465236289270485756e-4,
-.983744753048795646e-4, .158088703224912494e-3,
-.210264441724104883e-3, .217439618115212643e-3,
-.164318106536763890e-3, .844182239838527433e-4,
-.261908384015814087e-4, .368991826595316234e-5])
out=np.empty(z.shape[0])
for i in nb.prange(z.shape[0]):
y = z[i]
tmp = z[i] + 5.24218750000000000
tmp = (z[i] + 0.5) * np.log(tmp) - tmp
ser = 0.999999999999997092
n = coefs.shape[0]
for j in range(n):
y = y + 1.
ser = ser + coefs[j] / y
out[i] = tmp + log(2.5066282746310005 * ser / z[i])
return out
import matplotlib.pyplot as plt
import seaborn as sns
import time
n_trials = 8
scipy_times = np.zeros(n_trials)
fastats_times = np.zeros(n_trials)
fastats_times_p = np.zeros(n_trials)
for i in range(n_trials):
zs = np.linspace(0.001, 100, 10**i) # evaluate gammaln over this range
# dont take first timing - this is just compilation
start = time.time()
arr_1=gammaln_nr(zs)
end = time.time()
start = time.time()
arr_1=gammaln_nr(zs)
end = time.time()
fastats_times[i] = end - start
start = time.time()
arr_3=gammaln_nr_p(zs)
end = time.time()
fastats_times_p[i] = end - start
start = time.time()
start = time.time()
arr_3=gammaln_nr_p(zs)
end = time.time()
fastats_times_p[i] = end - start
start = time.time()
arr_2=gammaln(zs)
end = time.time()
scipy_times[i] = end - start
print(np.allclose(arr_1,arr_2))
print(np.allclose(arr_1,arr_3))
fig, ax = plt.subplots(figsize=(12,8))
sns.lineplot(np.logspace(0, n_trials-1, n_trials), fastats_times, label="numba");
sns.lineplot(np.logspace(0, n_trials-1, n_trials), fastats_times_p, label="numba_parallel");
sns.lineplot(np.logspace(0, n_trials-1, n_trials), scipy_times, label="scipy");
ax.set(xscale="log");
ax.set_xlabel("Array Size", fontsize=15);
ax.set_ylabel("Execution Time (s)", fontsize=15);
ax.set_title("Execution Time of Log Gamma");
fig.show()
I got a 2-D dataset with two columns x and y. I would like to get the linear regression coefficients and interception dynamically when new data feed in. Using scikit-learn I could calculate all current available data like this:
from sklearn.linear_model import LinearRegression
regr = LinearRegression()
x = np.arange(100)
y = np.arange(100)+10*np.random.random_sample((100,))
regr.fit(x,y)
print(regr.coef_)
print(regr.intercept_)
However, I got quite big dataset (more than 10k rows in total) and I want to calculate coefficient and intercept as fast as possible whenever there's new rows coming in. Currently calculate 10k rows takes about 600 microseconds, and I want to accelerate this process.
Scikit-learn looks like does not have online update function for linear regression module. Is there any better ways to do this?
I've found solution from this paper: updating simple linear regression. The implementation is as below:
def lr(x_avg,y_avg,Sxy,Sx,n,new_x,new_y):
"""
x_avg: average of previous x, if no previous sample, set to 0
y_avg: average of previous y, if no previous sample, set to 0
Sxy: covariance of previous x and y, if no previous sample, set to 0
Sx: variance of previous x, if no previous sample, set to 0
n: number of previous samples
new_x: new incoming 1-D numpy array x
new_y: new incoming 1-D numpy array x
"""
new_n = n + len(new_x)
new_x_avg = (x_avg*n + np.sum(new_x))/new_n
new_y_avg = (y_avg*n + np.sum(new_y))/new_n
if n > 0:
x_star = (x_avg*np.sqrt(n) + new_x_avg*np.sqrt(new_n))/(np.sqrt(n)+np.sqrt(new_n))
y_star = (y_avg*np.sqrt(n) + new_y_avg*np.sqrt(new_n))/(np.sqrt(n)+np.sqrt(new_n))
elif n == 0:
x_star = new_x_avg
y_star = new_y_avg
else:
raise ValueError
new_Sx = Sx + np.sum((new_x-x_star)**2)
new_Sxy = Sxy + np.sum((new_x-x_star).reshape(-1) * (new_y-y_star).reshape(-1))
beta = new_Sxy/new_Sx
alpha = new_y_avg - beta * new_x_avg
return new_Sxy, new_Sx, new_n, alpha, beta, new_x_avg, new_y_avg
Performance comparison:
Scikit learn version that calculate 10k samples altogether.
from sklearn.linear_model import LinearRegression
x = np.arange(10000).reshape(-1,1)
y = np.arange(10000)+100*np.random.random_sample((10000,))
regr = LinearRegression()
%timeit regr.fit(x,y)
# 419 µs ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
My version assume 9k sample is already calculated:
Sxy, Sx, n, alpha, beta, new_x_avg, new_y_avg = lr(0, 0, 0, 0, 0, x.reshape(-1,1)[:9000], y[:9000])
new_x, new_y = x.reshape(-1,1)[9000:], y[9000:]
%timeit lr(new_x_avg, new_y_avg, Sxy,Sx,n,new_x, new_y)
# 38.7 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
10 times faster, which is expected.
Nice! Thanks for sharing your findings :) Here is an equivalent implementation of this solution written with dot products:
class SimpleLinearRegressor(object):
def __init__(self):
self.dots = np.zeros(5)
self.intercept = None
self.slope = None
def update(self, x: np.ndarray, y: np.ndarray):
self.dots += np.array(
[
x.shape[0],
x.sum(),
y.sum(),
np.dot(x, x),
np.dot(x, y),
]
)
size, sum_x, sum_y, sum_xx, sum_xy = self.dots
det = size * sum_xx - sum_x ** 2
if det > 1e-10: # determinant may be zero initially
self.intercept = (sum_xx * sum_y - sum_xy * sum_x) / det
self.slope = (sum_xy * size - sum_x * sum_y) / det
When working with time series data, we can extend this idea to do sliding window regression with a soft (EMA-like) window.
You can use accelerated libraries that implement faster algorithms - particularly
https://github.com/intel/scikit-learn-intelex
For linear regression you would get much better performance
First install package
pip install scikit-learn-intelex
And then add in your python script
from sklearnex import patch_sklearn
patch_sklearn()