I'm trying to write an algorithm for my student work, it is working well. However, it takes a long time to calculate, especially with big arrays.
This part of code is slowing down all program.
Shapes: X.shape = mask.shape = logBN.shape = (500,500,1000),
F.shape = (20,20),
A.shape = (481,481),
s2 -- scalar.
How should I change this code to make it faster?
h = F.shape[0]
w = F.shape[1]
q = np.zeros((A.shape[0], A.shape[1], X.shape[2]))
for i in range(A.shape[0]):
for j in range(A.shape[1]):
mask[:,:,:] = 0
mask[i:i + h,j:j + w,:] = 1
q[i,j,:] = ((logBN*(1 - mask)).sum(axis=(0,1)) +
(np.log(norm._pdf((X[i:i + h,j:j + w,:]-F[:,:,np.newaxis])/s2)/s2)).sum(axis=(0,1))
After heavy juggling through algebraic operations of log, exp, power, it all came to this -
# Params
m,n = F.shape[:2]
k1 = 1.0/(s2*np.sqrt(2*np.pi))
k2 = -0.5/s2**2
k3 = np.log(k1)*m*n
out = np.zeros((A.shape[0], A.shape[1], X.shape[2]))
for i in range(A.shape[0]):
for j in range(A.shape[1]):
mask[:] = 1
mask[i:i + h,j:j + w,:] = 0
XF = (X[i:i + h,j:j + w,:]-F[:,:,np.newaxis])
p1 = np.einsum('ijk,ijk->k',logBN,mask)
p2 = k2*np.einsum('ijk,ijk->k',XF,XF)
out[i,j,:] = p1 + p2
out += k3
Few things used were -
1] norm._pdf is basically : norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi). So, we could inline the implementation and optimize those at the script level.
2] The division by scalars won't be efficient, so those were replaced by multiplications by their reciprocals. So, as a pre-processing store their reciprocals before going into the loop.
Just trying to make sense of your inner loop
mask[:,:,:] = 0
mask[i:i + h,j:j + w,:] = 1
q[i,j,:] = ((logBN*(1 - mask)).sum(axis=(0,1)) +
(np.log(norm._pdf((X[i:i + h,j:j + w,:]-F[:,:,np.newaxis])/s2)/s2)).sum(axis=(0,1))
looks like
idx = (slice(i,i+h), slice(j,j_w), slice(None))
mask = np.zeros(X.shape)
mask(idx) = 1
mask = 1 - mask
# alt mask=np.ones(X.shape);mask[idx]=0
term1 = (logBN*mask).sum(axis=(0,1))
term2 = np.log(norm._pdf((X[idx] - F[...,None])/s2)/s2).sum(axis=(0,1))
q[i,j,:] = term1 + term2
So idx and mask define a subarray in A. You are using logBN outside the array; and term inside it. You are summing values on 1st 2 dim, so both term1 and term2 has shape X.shape[2], which you save in q.
That mask/window is 20x20.
As a first cut I'd try to calculate that term2 for all i,j at once. That looks like a typical sliding window problem. I'd also try to express the term1 as a subtraction - the whole logBN minus this window.
Related
This is more of a computational physics problem, and I've asked it on physics stack exchange, but no answers on there. This is, I suppose, a mix of the disciplines on here and there (and maybe even mathematics stack exchange), so finding the right place to post is a task in of itself apparently...
I'm attempting to use Crank-Nicolson scheme to solve the TDSE in 1D. The initial wave is a real Gaussian that has been normalised wrt its probability density. As the solution evolves, a depression grows in the central peak of the real part of the wave, and the imaginary part's central trough is perhaps a bit higher than I expect (image below).
Does this behaviour seem reasonable? I have searched around and not seen questions/figures that are similar. I've tested another person's code from Github and it exhibits the same behaviour, which makes me feel a bit better. But I still think the center peak should just decrease in height and increase in width. The likelihood of me getting a physics-based explanation is relatively low here I'd assume, but a computational-based explanation on errors I may have made is more likely.
I'm happy to give more information, for example my code, or the matrices used in the scheme, etc. Thanks in advance!
Here's a link to GIF of time evolution:
And the part of my code relevant to solving the 1D TDSE:
(pretty much the entire thing except the plotting)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# Define function for norm.
def normf(dxc, uc, ic):
return sum(dxc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of position.
def xexpf(dxc, xc, uc, ic):
return sum(dxc * xc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of squared position.
def xexpsf(dxc, xc, uc, ic):
return sum(dxc * np.square(xc) * np.square(np.abs(uc[ic, :])))
# Define function for standard deviation.
def sdaf(xexpc, xexpsc, ic):
return np.sqrt(xexpsc[ic] - np.square(xexpc[ic]))
# Time t: t0 =< t =< tf. Have N steps at which to evaluate the CN scheme. The
# time interval is dt. decp: variable for plotting to certain number of decimal
# places.
t0 = 0
tf = 20
N = 200
dt = tf / N
t = np.linspace(t0, tf, num = N + 1, endpoint = True)
decp = str(dt)[::-1].find('.')
# Initialise array for filling with norm values at each time step.
norm = np.zeros(len(t))
# Initialise array for expectation value of position.
xexp = np.zeros(len(t))
# Initialise array for expectation value of squared position.
xexps = np.zeros(len(t))
# Initialise array for alternate standard deviation.
sda = np.zeros(len(t))
# Position x: -a =< x =< a. M is an even number. There are M + 1 total discrete
# positions, for the points to be symmetric and centred at x = 0.
a = 100
M = 1200
dx = (2 * a) / M
x = np.linspace(-a, a, num = M + 1, endpoint = True)
# The gaussian function u diffuses over time. sd sets the width of gaussian. u0
# is the initial gaussian at t0.
sd = 1
var = np.power(sd, 2)
mu = 0
u0 = np.sqrt(1 / np.sqrt(np.pi * var)) * np.exp(-np.power(x - mu, 2) / (2 * \
var))
u = np.zeros([len(t), len(x)], dtype = 'complex_')
u[0, :] = u0
# Normalise u.
u[0, :] = u[0, :] / np.sqrt(normf(dx, u, 0))
# Set coefficients of CN scheme.
alpha = dt * -1j / (4 * np.power(dx, 2))
beta = dt * 1j / (4 * np.power(dx, 2))
# Tridiagonal matrices Al and AR. Al to be solved using Thomas algorithm.
Al = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Al[i + 1, i] = alpha
Al[i, i] = 1 - (2 * alpha)
Al[i, i + 1] = alpha
# Corner elements for BC's.
Al[M, M], Al[0, 0] = 1 - alpha, 1 - alpha
Ar = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Ar[i + 1, i] = beta
Ar[i, i] = 1 - (2 * beta)
Ar[i, i + 1] = beta
# Corner elements for BC's.
Ar[M, M], Ar[0, 0] = 1 - 2*beta, 1 - beta
# Thomas algorithm variables. Following similar naming as in Wiki article.
a = np.diag(Al, -1)
b = np.diag(Al)
c = np.diag(Al, 1)
NT = len(b)
cp = np.zeros(NT - 1, dtype = 'complex_')
for n in range(0, NT - 1):
if n == 0:
cp[n] = c[n] / b[n]
else:
cp[n] = c[n] / (b[n] - (a[n - 1] * cp[n - 1]))
d = np.zeros(NT, dtype = 'complex_')
dp = np.zeros(NT, dtype = 'complex_')
# Iterate over each time step to solve CN method. Maintain boundary
# conditions. Keep track of standard deviation.
for i in range(0, N):
# BC's.
u[i, 0], u[i, M] = 0, 0
# Find RHS.
d = np.dot(Ar, u[i, :])
for n in range(0, NT):
if n == 0:
dp[n] = d[n] / b[n]
else:
dp[n] = (d[n] - (a[n - 1] * dp[n - 1])) / (b[n] - (a[n - 1] * \
cp[n - 1]))
nc = NT - 1
while nc > -1:
if nc == NT - 1:
u[i + 1, nc] = dp[nc]
nc -= 1
else:
u[i + 1, nc] = dp[nc] - (cp[nc] * u[i + 1, nc + 1])
nc -= 1
norm[i] = normf(dx, u, i)
xexp[i] = xexpf(dx, x, u, i)
xexps[i] = xexpsf(dx, x, u, i)
sda[i] = sdaf(xexp, xexps, i)
# Fill in final norm value.
norm[N] = normf(dx, u, N)
# Fill in final position expectation value.
xexp[N] = xexpf(dx, x, u, N)
# Fill in final squared position expectation value.
xexps[N] = xexpsf(dx, x, u, N)
# Fill in final standard deviation value.
sda[N] = sdaf(xexp, xexps, N)
I am slightly new to python and I am trying to convert some code.This is an approximation method. Which isn't important. In my oddev function I get returned
c2[1:modes+1] = v* 1j
ValueError: could not broadcast input array from shape (25) into shape (25,1)
When I do this Matlab I believe it automatically casts it, and will store the complex array. The function is a getting the coefficient from a partial sine transform to do this. At first I tried storing the random matrix which just an array using np.matlib method and this had the same shape but I believe I will lose the real values of the filter when I cast it. How do I store this?
import math
import numpy as np
def quickcontmin(datain):
n = np.shape(datain)[0]
m = math.floor(n / 2)
modes = math.floor(m / 2)
addl = 20
nn = 20 * n
chi = 10 ** -13
def evenhp(xv):
"Even high pass"
n1 = np.shape(xv)[0]
vx = np.array(xv[:-1])
vx = vx[::-1]
c1 = np.append(xv,vx)
c1 = np.fft.fft(c1)
c1[0:modes-1] = 0.0
c1[-1 - modes + 2:-1] = 0.0
evenl = np.real(np.fft.ifft(c1))
even = evenl[0:n1-1]
return even
def evenhpt(xv):
" Transpose of EvenHP"
n1 = np.shape(xv)[0]
xy = np.zeros((n1- 2, 1))
c1 = np.append(xv,xy)
c1 = np.fft.fft(c1)
c1[0:modes-1] = 0.0
c1[-1 - modes + 1:-1] = 0.0
evenl = np.real(np.fft.ifft(c1))
even = evenl[0:n1-1]
even[1:-2] = even[1:-2] + evenl[-1:-1:n1+1]
return even``
def evenlp(xv):
" Low pass cosine filter"
n1 = np.shape(xv)[0]
vx = np.array(xv[:-1])
vx = vx[::-1]
c1 = np.append(xv,vx)
c1 = np.fft.fft(c1)
c1[modes + 1:-1 - modes + 1] = 0.0
evenl = np.real(np.fft.ifft(c1))
even = evenl[0:n1-1]
return even
def oddev(xv):
"Evaluate the sine modes on the grid"
c2 = np.zeros((2 *n - 2, 1))*1j
v = np.array(xv[:])
v1 = v[:-1]
v1 = v[::-1]
c2[1:modes+1] = v* 1j
c2[-1 - modes + 1:-1] = -v1* 1j
evall = np.fft.ifft(c2) * math.sqrt(2 * n - 2)
eva = evall[0:n-1]
return eva
def oddevt(xv):
" Transpose the sine modes on the function OddEv"
c1 = np.array(xv[1:-2])
c1 = np.insert(c1,0.0,0)
c1 = np.append(c1,0.0)
c1 = np.append(c1,xv[-2:-1:2])
c1a = np.divide(np.fft.fft(c1),math.sqrt(2 * n - 2))
fcoef = np.imag(c1a[1:modes])
return fcoef
def eextnd(xv):
"Obtain cosine coefficients and evalue on the refined grid"
vx = np.array(xv[:-1])
vx = vx[::-1]
c1 = np.append(xv,vx)
c1 = np.fft.fft(c1)
cL = np.zeros((2*nn-2,1))
cL[0:modes-1] = c1[0:modes-1]
cL[-1 - modes + 1:-1] = c1[-1 - modes + 1:-1]
evenexL = np.multiply(np.fft.ifft(cL) , (nn - 1) / (n - 1))
evenex = evenexL[0:nn-1]
return evenex
def oextnd(xv):
"Evaluate sine coefficients on the refined grid"
c2 = np.zeros((2 * nn - 2, 1))
c2[0] = 0.0
c2[1:modes + 1] = np.multiply(xv[0:-1],1j)
c2[-1 - modes + 1:-1] = np.multiply(-xv[-1:-1:1],1j)
evall = np.real(np.multiply(np.fft.ifft(c2), math.sqrt(2 * n - 2) * (2 *nn - 2) / (2 * n - 2)))
oox = evall[0:nn-1]
return oox
dc = evenlp(datain)
#L in paper, number of vectors used to sample the columnspace
lll = round(4 * math.log(m )/ math.log(2)) + addl
lll = int(lll)
#The following should be straightforward from the psuedo-code
w=2 * np.random.rand(modes , lll) - 1
p=np.matlib.zeros(shape=(n,lll))
for j in range(lll):
p[:,j] = evenhp(oddev(w[:,j]))
q,r = np.linalg.qr(p , mode='reduced')
z = np.zeros(shape=(modes,lll))
for j in range(lll):
z[:,j]= oddevt(evenhpt(q[:,j]))
un,s,v = np.linalg.svd(z,full_matrices='False')
ds=np.diag(s)
aa=np.extract(np.diag(s)>(chi))
aa[-1] = aa
aa = int(aa)
s = 0 * s
for j in range(aa):
s[j,j] = 1.0 / ds(j)
#find the sine coefficents
b=un*s* v.T* q.T* evenhp(datain)
#Constructing the continuation
exs=oddev(b)
pexs = evenlp(exs)
dataCont=exs-pexs+dc
dataCont[n+1:2*n-2]=-exs[-2:-1:1]-pexs[-2:-1:1]+dc[-2:-1:1]
#Evaluate the continuation on the refined grid
dataRefined=eextnd(dc-exs)+oextnd(b)
return dataRefined, dataCont
n1 = 100
t = np.linspace(0,2*math.pi,n1)
y = np.sin(t)
data = quickcontmin(y)
dc1 = data[1]
dc1 = dc1[0:n1-1]`
Replacing c2[1:modes+1] = v* 1j by c2[1:modes+1, 0] = v* 1j should fix that specific error.
More consistent would be to replace:
v = np.array(xv[:])
v1 = v[:-1]
v1 = v[::-1]
by
v = xv
v1 = v[:-1]
v is already a column vector so you don't need to transform it into a 1d vector when you later need a column vector.
The program needs to compute define integral with a predetermined
accuracy (eps) with the Trapezoidal Rule and my function needs to return:
1.the approximate value of the integral.
2.the number of iterations.
My code:
from math import *
def f1(x):
return (x ** 2 - 1)**(-0.5)
def f2(x):
return (cos(x)/(x + 1))
def integral(f,a,b,eps):
n = 2
x = a
h = (b - a) / n
sum = 0.5 * (f(a) + f(b))
for i in range(n):
sum = sum + f(a + i * h)
sum_2 = h * sum
k = 0
flag = 1
while flag == 1:
n = n * 2
sum = 0
k = k + 1
x = a
h = (b - a) / n
sum = 0.5 * (f(a) + f(b))
for i in range(n):
sum = sum + f(a + i * h)
sum_new = h * sum
if eps > abs(sum_new - sum_2):
t1 = sum_new
t2 = k
return t1, t2
else:
sum_2 = sum_new
x1 = float(input("First-begin: "))
x2 = float(input("First-end: "))
y1 = float(input("Second-begin: "))
y2 = float(input("Second-end: "))
int_1 = integral(f1,x1,y1,1e-6)
int_2 = integral(f2,x2,y2,1e-6)
print(int_1)
print(int_2)
It doesn't work correct. Help, please!
You implemented the math wrong. The error is in the lines
for i in range(n):
sum = sum + f(a + i * h)
range(n) always starts at 0, so in your first iteration you just add the f(a) term again.
If you replace it with
for i in range(1, n):
sum = sum + f(a + i * h)
it works.
Also, you have a ton of redundant code; you basically coded the core of the integration algorithm twice. Try to follow the DRY-principle.
The trapezoidal rule of integration simply says that an approximation to the integral $\int_a^b f(x) dx$ is (b-a) (f(a)+f(b))/2. The error is proportional to (b-a)^2, so that it is possible to have a better estimate using the composite rule, i.e., subdividing the initial interval in a number of shorter intervals.
Is it possible to use shorter intervals and still reuse the function values previously computed, so minimizing the total number of function evaluation?
Yes, it is possible if we divide each interval in two equal parts, so that at stage 0 we use 1 intervals, at stage 1 2 equal intervals and in general, at stage n, we use 2n equal intervals.
Let's start with a simple problem and see if it possible to generalize the procedure…
a, b = 0, 32
L = b-a = 32
by the trapezoidal rule the initial approximation say I0, is given by
I0 = L * (f0+f1)/2
= L * S0
with S0 = (f0+f1)/2; a pictorial representation of the real axis, the coordinates of the interval extremes and the evaluated functions follows
x0 x1
01234567890123456789012345679012
f0 f1
Next, we divide the original interval in two,
L = L/2
x0 x2 x1
01234567890123456789012345679012
f0 f2 f1
and the new approximation, stage n=1, is obtained using two times the trapezoidal rule and applying a bit of algebra
I1 = L * (f0+f2)/2 + L * (f2+f1)/2
= L * [(f0+f1)/2 + f2]
= L * [S0 + S1]
with S1 = f2
Another subdivision, stage n=2, L = L/2 and
x0 x3 x2 x4 x1
012345678901234567890123456789012
f0 f3 f2 f4 f1
I2 = L * [(f0+f3) + (f3+f2) + (f2+f4) + (f4+f1)] / 2
= L * [(f0+f1)/2 + f2 + (f3+f4)]
= L * [S0+S1+S2]
with S2 = f3 + f4.
It is not difficult, given this picture,
x0 x5 x3 x6 x2 x7 x4 x8 x1
012345678901234567890123456789012
f0 f5 f3 f6 f2 f7 f4 f8 f1
to understand that our next approximation can be computed as follows
L = L/2
S3 = f5+f6+f7+f8
I3 = L*[S0+S1+S2+S3]
Now, we have to understand how to compute a generalization of Sn,
n = 1, … — for us, the pseudocode is
L_n = (b-a)/2**n
list_x_n = list(a + L_n + 2*Ln*j for j=0, …, 2**n-1)
Sn = Sum(f(xj) for each xj in list_x_n)
For n = 3, L = (b-a)/8 = 4, we have from the formula above list_x_n = [4, 12, 20, 28], please check with the picture...
Now we are ready to code our algorithm in Python
def trapaezia(f, a, b, tol):
"returns integ(f, (a,b)), estimated error and number of evaluations"
from math import fsum # controls accumulation of rounding errors in sums
L = b - a
S = (f(a)+f(b))/2
I = L*S
n = 1
while True:
L = L/2
new_points = (a+L+j*L for j in range(0, n+n, 2))
delta_S = fsum(f(x) for x in new_points)
new_S = S + delta_S
new_I = L*new_S
# error is estimated using Richardson extrapolation (REP)
err = (new_I - I) * 4/3
if abs(err) > tol:
n = n+n
S, I = new_S, new_I
else:
# we return a better estimate using again REP
return (4*new_I-I)/3, err, n+n+1
If you are curious about Richardson extrapolation, I recommend this document that deals exactly with the application of REP to the trapezoidal rule quadrature algorithm.
If you are curious about math.fsum, the docs don't say too much but the link to the original implementation that also includes an extended explanation of all the issues involved.
I'm trying to approximate the Bessel function with Runge-Kutta. I'm using RK4 here cause I don't know what the equations are for RK2 for a system of ODEs.....Basically it's not working, or at least it's not working very well, I don't know why.
Anyway it's written in python. Here is code:
from __future__ import division
from pylab import*
#This literally just makes a matrix
def listoflists(rows,cols):
return [[0]*cols for i in range(rows)]
def f(x,Jm,Zm,m):
def rm(x):
return (m**2 - x**2)/(x**2)
def q(x):
return 1/x
return rm(x)*Jm-q(x)*Zm
n = 100 #No Idea what to set this to really
m = 1 #Bessel function order; computes up to m (i.e. 0,1,2,...m)
interval = [.01, 10] #Interval in x
dx = (interval[1]-interval[0])/n #Step size
x = zeros(n+1)
Z = listoflists(m+1,n+1) #Matrix: Rows are Function order, Columns are integration step (i.e. function value at xn)
J = listoflists(m+1,n+1)
x[0] = interval[0]
x[n] = interval[1]
#This reproduces all the Runge-Kutta relations if you read 'i' as 'm' and 'j' as 'n'
for i in range(m+1):
#Initial Conditions, i is m
if i == 0:
J[i][0] = 1
Z[i][0] = 0
if i == 1:
J[i][0] = 0
Z[i][0] = 1/2
#Generate each Bessel function, j is n
for j in range(n):
x[j] = x[0] + j*dx
K1 = Z[i][j]
L1 = f(x[j],J[i][j],Z[i][j],i)
K2 = Z[i][j] + L1/2
L2 = f(x[j] + dx/2, J[i][j]+K1/2,Z[i][j]+L1/2,i)
K3 = Z[i][j] + L2/2
L3 = f(x[j] +dx/2, J[i][j] + K2/2, Z[i][j] + L2/2,i)
K4 = Z[i][j] + L3
L4 = f(x[j]+dx,J[i][j]+K3, Z[i][j]+L3,i)
J[i][j+1] = J[i][j]+(dx/6)*(K1+2*K2+2*K3+K4)
Z[i][j+1] = Z[i][j]+(dx/6)*(L1+2*L2+2*L3+L4)
plot(x,J[0][:])
show()
(To close this old question with at least a partial answer.) The immediate error is that the RK4 method is incorrectly implemented. For instance instead of
K2 = Z[i][j] + L1/2
L2 = f(x[j] + dx/2, J[i][j]+K1/2, Z[i][j]+L1/2, i)
it should be
K2 = Z[i][j] + L1*dx/2
L2 = f(x[j] + dx/2, J[i][j]+K1*dx/2, Z[i][j]+L1*dx/2, i)
that is, the slopes have to be scaled by the step size to get the correct update for the variables.
This is a follow up to How to set up and solve simultaneous equations in python but I feel deserves its own reputation points for any answer.
For a fixed integer n, I have a set of 2(n-1) simultaneous equations as follows.
M(p) = 1+((n-p-1)/n)*M(n-1) + (2/n)*N(p-1) + ((p-1)/n)*M(p-1)
N(p) = 1+((n-p-1)/n)*M(n-1) + (p/n)*N(p-1)
M(1) = 1+((n-2)/n)*M(n-1) + (2/n)*N(0)
N(0) = 1+((n-1)/n)*M(n-1)
M(p) is defined for 1 <= p <= n-1. N(p) is defined for 0 <= p <= n-2. Notice also that p is just a constant integer in every equation so the whole system is linear.
Some very nice answers were given for how to set up a system of equations in python. However, the system is sparse and I would like to solve it for large n. How can I use scipy's sparse matrix representation and http://docs.scipy.org/doc/scipy/reference/sparse.linalg.html for example instead?
I wouldn't normally keep beating a dead horse, but it happens that my non-vectorized approach to solving your other question, has some merit when things get big. Because I was actually filling the coefficient matrix one item at a time, it is very easy to translate into COO sparse matrix format, which can efficiently be transformed to CSC and solved. The following does it:
import scipy.sparse
def sps_solve(n) :
# Solution vector is [N[0], N[1], ..., N[n - 2], M[1], M[2], ..., M[n - 1]]
n_pos = lambda p : p
m_pos = lambda p : p + n - 2
data = []
row = []
col = []
# p = 0
# n * N[0] + (1 - n) * M[n-1] = n
row += [n_pos(0), n_pos(0)]
col += [n_pos(0), m_pos(n - 1)]
data += [n, 1 - n]
for p in xrange(1, n - 1) :
# n * M[p] + (1 + p - n) * M[n - 1] - 2 * N[p - 1] +
# (1 - p) * M[p - 1] = n
row += [m_pos(p)] * (4 if p > 1 else 3)
col += ([m_pos(p), m_pos(n - 1), n_pos(p - 1)] +
([m_pos(p - 1)] if p > 1 else []))
data += [n, 1 + p - n , -2] + ([1 - p] if p > 1 else [])
# n * N[p] + (1 + p -n) * M[n - 1] - p * N[p - 1] = n
row += [n_pos(p)] * 3
col += [n_pos(p), m_pos(n - 1), n_pos(p - 1)]
data += [n, 1 + p - n, -p]
if n > 2 :
# p = n - 1
# n * M[n - 1] - 2 * N[n - 2] + (2 - n) * M[n - 2] = n
row += [m_pos(n-1)] * 3
col += [m_pos(n - 1), n_pos(n - 2), m_pos(n - 2)]
data += [n, -2, 2 - n]
else :
# p = 1
# n * M[1] - 2 * N[0] = n
row += [m_pos(n - 1)] * 2
col += [m_pos(n - 1), n_pos(n - 2)]
data += [n, -2]
coeff_mat = scipy.sparse.coo_matrix((data, (row, col))).tocsc()
return scipy.sparse.linalg.spsolve(coeff_mat,
np.ones(2 * (n - 1)) * n)
It is of course much more verbose than building it from vectorized blocks, as TheodorosZelleke does, but an interesting thing happens when you time both approaches:
First, and this is (very) nice, time is scaling linearly in both solutions, as one would expect from using the sparse approach. But the solution I gave in this answer is always faster, more so for larger ns. Just for the fun of it, I also timed TheodorosZelleke's dense approach from the other question, which gives this nice graph showing the different scaling of both types of solutions, and how very early, somewhere around n = 75, the solution here should be your choice:
I don't know enough about scipy.sparse to really figure out why the differences between the two sparse approaches, although I suspect heavily of the use of LIL format sparse matrices. There may be some very marginal performance gain, although a lot of compactness in the code, by turning TheodorosZelleke's answer into COO format. But that is left as an exercise for the OP!
This is a solution using scipy.sparse. Unfortunately the problem is not stated here. So in order to comprehend this solution, future visitors have to first look up the problem under the link provided in the question.
Solution using scipy.sparse:
from scipy.sparse import spdiags, lil_matrix, vstack, hstack
from scipy.sparse.linalg import spsolve
import numpy as np
def solve(n):
nrange = np.arange(n)
diag = np.ones(n-1)
# upper left block
n_to_M = spdiags(-2. * diag, 0, n-1, n-1)
# lower left block
n_to_N = spdiags([n * diag, -nrange[-1:0:-1]], [0, 1], n-1, n-1)
# upper right block
m_to_M = lil_matrix(n_to_N)
m_to_M[1:, 0] = -nrange[1:-1].reshape((n-2, 1))
# lower right block
m_to_N = lil_matrix((n-1, n-1))
m_to_N[:, 0] = -nrange[1:].reshape((n-1, 1))
# build A, combine all blocks
coeff_mat = hstack(
(vstack((n_to_M, n_to_N)),
vstack((m_to_M, m_to_N))))
# const vector, right side of eq.
const = n * np.ones((2 * (n-1),1))
return spsolve(coeff_mat.tocsr(), const).reshape((-1,1))
There's some code that I've looked at before here: http://jkwiens.com/heat-equation-using-finite-difference/ His function implements a finite difference method to solve the heat equation using the scipy sparse matrix package.