I'm currently trying to code this beastie in Python (using the numpy libraries). The lambda * w is supposed to be outside the summation.
Currently, I've coded the problem using a for loop, and a running total outside; however, this approach takes a long time.
My vectors for y, w, and x are very large - think 100,000s of elements. I was wondering whether there is a simpler way to vectorize the element instead using simple matrix operations instead of looping through the vector one element by another element.
This is my vectorized code:
xty = xtrain.T.dot(ytrain)
e = math.exp(-w_0.T.dot(xty))
gradient = (-xty*(e/1+e)-lambda_var*w_0)
If I understand your problem correctly, you might just have to bite the bullet and go with the loop:
import numpy as np
wave = 1e3
xs, ys, w = np.arange(1, 4), np.arange(4, 7), np.arange(7, 10)
eps = np.zeros(w.T.shape)
for x, y in zip(xs, ys):
eps += -y * np.exp(-y * w.T * x) * x / (1 + np.exp(-y * w.T * x))
print(eps + wave * w)
[ 7000. 8000. 9000.]
Related
I know there are other questions asked concerning this topic so I'm sorry I have to ask it again, but I cannot get it to work since I'm quite new to this topic.
I have four for-loop (nested) in which certain algbraic calculations are done (matrix operations for example). These calculations take too much time to complete, so I was hoping I could speed this up with Multiprocessing.
The code is given below. I simulated the ranges and matrix sizes here, but in my code these ranges are really used (so it's not strange that it takes so long). You should be able to run it directly when copy-paste the code.
import numpy as np
from scipy.linalg import fractional_matrix_power
import math
#Lists for the loop (and one value)
x_list = np.arange(0, 32, 1)
y_list = np.arange(0, 32, 1)
a_list = np.arange(0, 501, 1)
b_list = np.arange(0, 501, 1)
c_list = np.arange(0, 64, 1)
d_number = 32
#Matrices
Y = np.arange(2048).reshape(32, 64)
g = np.asmatrix(np.empty([d_number, 1], dtype=np.complex_))
A = np.empty([len(a_list), len(b_list), len(c_list)], dtype=np.complex_)
A_in = np.empty([len(a_list), len(b_list)], dtype=np.complex_)
for ai in range(len(a_list)):
for bi in range(len(b_list)):
for ci in range(len(c_list)):
f_k_i = c_list[ci]
X_i = np.asmatrix([Y[:, ci]]).T
for di in range(d_number):
r = math.sqrt((x_list[di] - a_list[ai])**2 + (y_list[di] - b_list[bi])**2 + 63**2)
g[di, 0] = np.exp(-2 * np.pi * 1j * f_k_i * (r / 8)) / r #g is a vector
A[-bi, -ai, ci] = ((1 / np.linalg.norm(g)**2) * (((g.conj().T * fractional_matrix_power((X_i * X_i.conj().T), (1/5)) * g) / np.linalg.norm(g)**2)**2)).item(0)
A_in[-bi, -ai] = (1 / len(c_list)) * sum(A[-bi, -ai, :])
What is the best way to approach this? If multiprocessing is the solution, how to implement this for my case (since I couldn't figure that out).
Thanks in advance.
One way to approach it would be to move the two inside loops into a function taking ai and bi as parameters and returning the indexes and the result. Then use multiprocessing.Pool.imap_unordered() to run the function on ai, bi pairs. Something like this (untested):
def partial_calc(index):
"""
This function replaces the inner two loops to calculate the value of
A_in[-bi, -ai]. index is a tuple (ai, bi).
"""
ai, bi = index
for ci in range(len(c_list)):
f_k_i = c_list[ci]
X_i = np.asmatrix([Y[:, ci]]).T
for di in range(d_number):
r = math.sqrt((x_list[di] - a_list[ai])**2 + (y_list[di] - b_list[bi])**2 + 63**2)
g[di, 0] = np.exp(-2 * np.pi * 1j * f_k_i * (r / 8)) / r #g is a vector
A[-bi, -ai, ci] = ((1 / np.linalg.norm(g)**2) * (((g.conj().T * fractional_matrix_power((X_i * X_i.conj().T), (1/5)) * g) / np.linalg.norm(g)**2)**2)).item(0)
return ai, bi, (1 / len(c_list)) * sum(A[-bi, -ai, :])
def main():
with multiprocessing.Pool(None) as p:
# this replaces the outer two loops
indices = itertools.product(range(len(a_list)), range(len(b_list)))
partial_results = p.imap_unordered(partial_calc, indices)
for ai, bi, value in partial_results:
A_in[-bi, -ai] = value
#... do something with A_in ...
if __name__ == "__main__":
main()
Or put the inner three loops into the function and generate one "row" for A_in at a time. Profile it both ways and see which is faster.
The trick will be setting up the lists (a_list, b_list, etc) and the Y matrix. And that depends on their characteristics (constant, quickly/slowly calculated, large/small, etc).
I am trying to evaluate a function at discretized point and stored in column-major order like this:
import numpy as np;
N = 3 ##
n = N * N
h = 1 / (N + 1) # step size
h2 = h**2 #
deltaX = np.zeros(N)
deltaY = np.zeros(N);
def Function(x, y):
output = -20. * np.pi * np.sin(2 * np.pi * x) * sin(4 * np.pi * y)
return output
## Equally spaced delta:
for i in range(1, N + 1):
deltaX[i - 1] = i * h;
deltaY[i - 1] = i * h;
### Lexicographic Row order ###
### Evaluation of function at deltaX and deltaY
feval = np.zeros((n, 1))
How could I approach to evaluate the discretization for this function?
Good news: your function properly uses numpy operations, so is completely vectorized. That means that you can evaluate it at every element of the input arrays.
The shape of the inputs don't have to match exactly. They just have to broadcast together. That means that only non-singleton dimensions need to match.
So start by creating the appropriate input arrays. Numpy provides the tools to do this elegantly without looping:
N = 3
h = 1 / (N + 1)
delta_x = np.arange(1., N + 1.) * h
delta_y = np.linspace(h, N * h, N)[:, None]
I deliberately used two different ways to create the coordinate arrays, to serve as an example. In practice, you'd want to use one of the two methods.
The index [:, None] turns delta_y into a column vector. None introduces a new singleton axis. There are any number of Other ways to do the same thing, like `delta_y = ....reshape(-1, 1).
And read the docs I linked to, and for all the functions I used.
Now that you have a column in the y direction and a row in the x, you can call Function as just
val = Function(delta_x, delta_y)
The operation of arranging the 2D matrix val into a 1D array is called raveling. By default, it uses the default row-major order that numpy uses in memory. This order is also called "C" order. The alternative arrangement is to interpret the array in column major order, like Matlab does. This is called Fortran order. It will require a copy of the data since that's not how the elements are laid out in memory.
One way to ravel in Fortran order:
feval = val.ravel(order='F')
An alternative is to transpose and use C order:
feval = val.T.ravel()
The last two lines can be combined, so you end up with 3 lines:
delta_x = h * np.arange(1., N + 1.)
delta_y = h * np.arange(1., N + 1.)[:, None]
feval = Function(delta_x, delta_y).ravel(order='F')
You could make it into a one-liner, but that's pushing it.
I'm trying to do a particle in a box simulation with no potential field. Took me some time to find out that simple explicit and implicit methods break unitary time evolution so I resorted to crank-nicolson, which is supposed to be unitary. But when I try it I find that it still is not so. I'm not sure what I'm missing.. The formulation I used is this:
where T is the tridiagonal Toeplitz matrix for the second derivative wrt x and
The system simplifies to
The A and B matrices are:
I just solve this linear system for using the sparse module. The math makes sense and I found the same numeric scheme in some papers so that led me to believe my code is where the problem is.
Here's my code so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import toeplitz
from scipy.sparse.linalg import spsolve
from scipy import sparse
# Spatial discretisation
N = 100
x = np.linspace(0, 1, N)
dx = x[1] - x[0]
# Time discretisation
K = 10000
t = np.linspace(0, 10, K)
dt = t[1] - t[0]
alpha = (1j * dt) / (2 * (dx ** 2))
A = sparse.csc_matrix(toeplitz([1 + 2 * alpha, -alpha, *np.zeros(N-4)]), dtype=np.cfloat) # 2 less for both boundaries
B = sparse.csc_matrix(toeplitz([1 - 2 * alpha, alpha, *np.zeros(N-4)]), dtype=np.cfloat)
# Initial and boundary conditions (localized gaussian)
psi = np.exp((1j * 50 * x) - (200 * (x - .5) ** 2))
b = B.dot(psi[1:-1])
psi[0], psi[-1] = 0, 0
for index, step in enumerate(t):
# Within the domain
psi[1:-1] = spsolve(A, b)
# Enforce boundaries
# psi[0], psi[N - 1] = 0, 0
b = B.dot(psi[1:-1])
# Square integration to show if it's unitary
print(np.trapz(np.abs(psi) ** 2, dx))
You are relying on the Toeplitz constructor to produce a symmetric matrix, so that the entries below the diagonal are the same as above the diagonal. However, the documentation for scipy.linalg.toeplitz(c, r=None) says not "transpose", but
*"If r is not given, r == conjugate(c) is assumed."
so that the resulting matrix is self-adjoint. In this case this means that the entries above the diagonal have their sign switched.
It makes no sense to first construct a dense matrix and then extract a sparse representation. Construct it as sparse tridiagonal matrix from the start, using scipy.sparse.diags
A = sparse.diags([ (N-3)*[-alpha], (N-2)*[1+2*alpha], (N-3)*[-alpha]], [-1,0,1], format="csc");
B = sparse.diags([ (N-3)*[ alpha], (N-2)*[1-2*alpha], (N-3)*[ alpha]], [-1,0,1], format="csc");
I have two 2D array, x(ni, nj) and y(ni,nj), that I need to interpolate over one axis. I want to interpolate along last axis for every ni.
I wrote
import numpy as np
from scipy.interpolate import interp1d
z = np.asarray([200,300,400,500,600])
out = []
for i in range(ni):
f = interp1d(x[i,:], y[i,:], kind='linear')
out.append(f(z))
out = np.asarray(out)
However, I think this method is inefficient and slow due to loop if array size is too large. What is the fastest way to interpolate multi-dimensional array like this? Is there any way to perform linear and cubic interpolation without loop? Thanks.
The method you propose does have a python loop, so for large values of ni it is going to get slow. That said, unless you are going to have large ni you shouldn't worry much.
I have created sample input data with the following code:
def sample_data(n_i, n_j, z_shape) :
x = np.random.rand(n_i, n_j) * 1000
x.sort()
x[:,0] = 0
x[:, -1] = 1000
y = np.random.rand(n_i, n_j)
z = np.random.rand(*z_shape) * 1000
return x, y, z
And have tested them with this two versions of linear interpolation:
def interp_1(x, y, z) :
rows, cols = x.shape
out = np.empty((rows,) + z.shape, dtype=y.dtype)
for j in xrange(rows) :
out[j] =interp1d(x[j], y[j], kind='linear', copy=False)(z)
return out
def interp_2(x, y, z) :
rows, cols = x.shape
row_idx = np.arange(rows).reshape((rows,) + (1,) * z.ndim)
col_idx = np.argmax(x.reshape(x.shape + (1,) * z.ndim) > z, axis=1) - 1
ret = y[row_idx, col_idx + 1] - y[row_idx, col_idx]
ret /= x[row_idx, col_idx + 1] - x[row_idx, col_idx]
ret *= z - x[row_idx, col_idx]
ret += y[row_idx, col_idx]
return ret
interp_1 is an optimized version of your code, following Dave's answer. interp_2 is a vectorized implementation of linear interpolation that avoids any python loop whatsoever. Coding something like this requires a sound understanding of broadcasting and indexing in numpy, and some things are going to be less optimized than what interp1d does. A prime example being finding the bin in which to interpolate a value: interp1d will surely break out of loops early once it finds the bin, the above function is comparing the value to all bins.
So the result is going to be very dependent on what n_i and n_j are, and even how long your array z of values to interpolate is. If n_j is small and n_i is large, you should expect an advantage from interp_2, and from interp_1 if it is the other way around. Smaller z should be an advantage to interp_2, longer ones to interp_1.
I have actually timed both approaches with a variety of n_i and n_j, for z of shape (5,) and (50,), here are the graphs:
So it seems that for z of shape (5,) you should go with interp_2 whenever n_j < 1000, and with interp_1 elsewhere. Not surprisingly, the threshold is different for z of shape (50,), now being around n_j < 100. It seems tempting to conclude that you should stick with your code if n_j * len(z) > 5000, but change it to something like interp_2 above if not, but there is a great deal of extrapolating in that statement! If you want to further experiment yourself, here's the code I used to produce the graphs.
n_s = np.logspace(1, 3.3, 25)
int_1 = np.empty((len(n_s),) * 2)
int_2 = np.empty((len(n_s),) * 2)
z_shape = (5,)
for i, n_i in enumerate(n_s) :
print int(n_i)
for j, n_j in enumerate(n_s) :
x, y, z = sample_data(int(n_i), int(n_j), z_shape)
int_1[i, j] = min(timeit.repeat('interp_1(x, y, z)',
'from __main__ import interp_1, x, y, z',
repeat=10, number=1))
int_2[i, j] = min(timeit.repeat('interp_2(x, y, z)',
'from __main__ import interp_2, x, y, z',
repeat=10, number=1))
cs = plt.contour(n_s, n_s, np.transpose(int_1-int_2))
plt.clabel(cs, inline=1, fontsize=10)
plt.xlabel('n_i')
plt.ylabel('n_j')
plt.title('timeit(interp_2) - timeit(interp_1), z.shape=' + str(z_shape))
plt.show()
One optimization is to allocate the result array once like so:
import numpy as np
from scipy.interpolate import interp1d
z = np.asarray([200,300,400,500,600])
out = np.zeros( [ni, len(z)], dtype=np.float32 )
for i in range(ni):
f = interp1d(x[i,:], y[i,:], kind='linear')
out[i,:]=f(z)
This will save you some memory copying that occurs in your implementation, which occurs in the calls to out.append(...).
Short summary: How do I quickly calculate the finite convolution of two arrays?
Problem description
I am trying to obtain the finite convolution of two functions f(x), g(x) defined by
To achieve this, I have taken discrete samples of the functions and turned them into arrays of length steps:
xarray = [x * i / steps for i in range(steps)]
farray = [f(x) for x in xarray]
garray = [g(x) for x in xarray]
I then tried to calculate the convolution using the scipy.signal.convolve function. This function gives the same results as the algorithm conv suggested here. However, the results differ considerably from analytical solutions. Modifying the algorithm conv to use the trapezoidal rule gives the desired results.
To illustrate this, I let
f(x) = exp(-x)
g(x) = 2 * exp(-2 * x)
the results are:
Here Riemann represents a simple Riemann sum, trapezoidal is a modified version of the Riemann algorithm to use the trapezoidal rule, scipy.signal.convolve is the scipy function and analytical is the analytical convolution.
Now let g(x) = x^2 * exp(-x) and the results become:
Here 'ratio' is the ratio of the values obtained from scipy to the analytical values. The above demonstrates that the problem cannot be solved by renormalising the integral.
The question
Is it possible to use the speed of scipy but retain the better results of a trapezoidal rule or do I have to write a C extension to achieve the desired results?
An example
Just copy and paste the code below to see the problem I am encountering. The two results can be brought to closer agreement by increasing the steps variable. I believe that the problem is due to artefacts from right hand Riemann sums because the integral is overestimated when it is increasing and approaches the analytical solution again as it is decreasing.
EDIT: I have now included the original algorithm 2 as a comparison which gives the same results as the scipy.signal.convolve function.
import numpy as np
import scipy.signal as signal
import matplotlib.pyplot as plt
import math
def convolveoriginal(x, y):
'''
The original algorithm from http://www.physics.rutgers.edu/~masud/computing/WPark_recipes_in_python.html.
'''
P, Q, N = len(x), len(y), len(x) + len(y) - 1
z = []
for k in range(N):
t, lower, upper = 0, max(0, k - (Q - 1)), min(P - 1, k)
for i in range(lower, upper + 1):
t = t + x[i] * y[k - i]
z.append(t)
return np.array(z) #Modified to include conversion to numpy array
def convolve(y1, y2, dx = None):
'''
Compute the finite convolution of two signals of equal length.
#param y1: First signal.
#param y2: Second signal.
#param dx: [optional] Integration step width.
#note: Based on the algorithm at http://www.physics.rutgers.edu/~masud/computing/WPark_recipes_in_python.html.
'''
P = len(y1) #Determine the length of the signal
z = [] #Create a list of convolution values
for k in range(P):
t = 0
lower = max(0, k - (P - 1))
upper = min(P - 1, k)
for i in range(lower, upper):
t += (y1[i] * y2[k - i] + y1[i + 1] * y2[k - (i + 1)]) / 2
z.append(t)
z = np.array(z) #Convert to a numpy array
if dx != None: #Is a step width specified?
z *= dx
return z
steps = 50 #Number of integration steps
maxtime = 5 #Maximum time
dt = float(maxtime) / steps #Obtain the width of a time step
time = [dt * i for i in range (steps)] #Create an array of times
exp1 = [math.exp(-t) for t in time] #Create an array of function values
exp2 = [2 * math.exp(-2 * t) for t in time]
#Calculate the analytical expression
analytical = [2 * math.exp(-2 * t) * (-1 + math.exp(t)) for t in time]
#Calculate the trapezoidal convolution
trapezoidal = convolve(exp1, exp2, dt)
#Calculate the scipy convolution
sci = signal.convolve(exp1, exp2, mode = 'full')
#Slice the first half to obtain the causal convolution and multiply by dt
#to account for the step width
sci = sci[0:steps] * dt
#Calculate the convolution using the original Riemann sum algorithm
riemann = convolveoriginal(exp1, exp2)
riemann = riemann[0:steps] * dt
#Plot
plt.plot(time, analytical, label = 'analytical')
plt.plot(time, trapezoidal, 'o', label = 'trapezoidal')
plt.plot(time, riemann, 'o', label = 'Riemann')
plt.plot(time, sci, '.', label = 'scipy.signal.convolve')
plt.legend()
plt.show()
Thank you for your time!
or, for those who prefer numpy to C. It will be slower than the C implementation, but it's just a few lines.
>>> t = np.linspace(0, maxtime-dt, 50)
>>> fx = np.exp(-np.array(t))
>>> gx = 2*np.exp(-2*np.array(t))
>>> analytical = 2 * np.exp(-2 * t) * (-1 + np.exp(t))
this looks like trapezoidal in this case (but I didn't check the math)
>>> s2a = signal.convolve(fx[1:], gx, 'full')*dt
>>> s2b = signal.convolve(fx, gx[1:], 'full')*dt
>>> s = (s2a+s2b)/2
>>> s[:10]
array([ 0.17235682, 0.29706872, 0.38433313, 0.44235042, 0.47770012,
0.49564748, 0.50039326, 0.49527721, 0.48294359, 0.46547582])
>>> analytical[:10]
array([ 0. , 0.17221333, 0.29682141, 0.38401317, 0.44198216,
0.47730244, 0.49523485, 0.49997668, 0.49486489, 0.48254154])
largest absolute error:
>>> np.max(np.abs(s[:len(analytical)-1] - analytical[1:]))
0.00041657780840698155
>>> np.argmax(np.abs(s[:len(analytical)-1] - analytical[1:]))
6
Short answer: Write it in C!
Long answer
Using the cookbook about numpy arrays I rewrote the trapezoidal convolution method in C. In order to use the C code one requires three files (https://gist.github.com/1626919)
The C code (performancemodule.c).
The setup file to build the code and make it callable from python (performancemodulesetup.py).
The python file that makes use of the C extension (performancetest.py)
The code should run upon downloading by doing the following
Adjust the include path in performancemodule.c.
Run the following
python performancemodulesetup.py build
python performancetest.py
You may have to copy the library file performancemodule.so or performancemodule.dll into the same directory as performancetest.py.
Results and performance
The results agree neatly with one another as shown below:
The performance of the C method is even better than scipy's convolve method. Running 10k convolutions with array length 50 requires
convolve (seconds, microseconds) 81 349969
scipy.signal.convolve (seconds, microseconds) 1 962599
convolve in C (seconds, microseconds) 0 87024
Thus, the C implementation is about 1000 times faster than the python implementation and a bit more than 20 times as fast as the scipy implementation (admittedly, the scipy implementation is more versatile).
EDIT: This does not solve the original question exactly but is sufficient for my purposes.