I am trying to analyse a wave on a string by solving the wave equation with Python. Here are my requirements for the solution.
1) I model reflective ends by using much larger masses on first and last point on the string -> Large inertia
2)No spring on edges. Then k[0] and k[-1] will be ZERO.
I have problem with indices. In my 2nd loop I get y[i,j-1], k[i-1], y[i-1,j]. In the first iteration the loop will then use y[0,-1], k[-1], y[-1,0]. These are my last points and not my first points. How can I avoid this problem?
What you need is initiating your mass array with one additional element. I mean
...
m = [mi]*N # mass array [!!!] instead of (N-1) [!!!]
...
Idem for your springs
...
k = [ki]*N
...
Consequently, you can keep k[0] equal to 10. since you model reflective ends. You may thus want to comment or drop this line
...
##k[0] = 0
...
For aesthetic considerations you may want to fill the gap at the end of the x-axis. In this case, simply do
N = 201 # Number of mass points
Your code thus becomes
from numpy import *
from matplotlib.pyplot import *
# Variables
N = 201 # Number of mass points
nT = 1200 # Number of time points
mi = 0.02 # mass in kg
m = [mi]*N # mass array
m[-1] = 100 # Large last mass reflective edges
m[0] = 100 # Large first mass reflective edges
ki = 10.#spring
k = [ki]*N
k[-1] = 0
dx = 0.2
kappa = ki*dx
my = mi/dx
c = sqrt(kappa/my) # velocity
dt = 0.04
# 3 vectors
x = arange( N )*dx # x points
t = arange( N )*dt # t points
y = zeros( [N, nT ] )# 2D array
# Loop over initial condition
for i in range(N-1):
y[i,0] = sin((7.*pi*i)/(N-1)) # Initial condition dependent on mass point
# Iterating over time and position to find next position of wave
for j in range(nT-1):
for i in range(N-1):
y[i,j+1] = 2*y[i,j] - y[i,j-1] + (dt**2/m[i])*(k[i-1]*y[i+1,j] -2*k[i-1]*y[i,j] + k[i]*y[i-1,j] )
#check values of edges
print y[:2,j+1],y[-2:,j+1]
# Creates an animation
cla()
ylabel("Amplitude")
xlabel("x")
ylim(-10,10)
plot(x,y[:,j-2])
pause(0.001)
close()
which produces
Following your comment, I think that if you want to see the wave traveling along the string before reflection, you should not initiate conditions everywhere (in space). I mean, e.g., doing
...
# Loop over initial condition
for i in range(N-1):
ci_i = sin(7.*pi*i/(N-1)) # Initial condition dependent on mass point
if np.sign(ci_i*y[i-1,0])<0:
break
else:
y[i,0] = ci_i
...
produces
New attempt after answers:
from numpy import *
from matplotlib.pyplot import *
N = 201
nT = 1200
mi = 0.02
m = [mi]*(N)
m[-1] = 1000
m[0] = 1000
ki = 10.
k = [ki]*N
dx = 0.2
kappa = ki*dx
my = mi/dx
c = sqrt(kappa/my)
dt = 0.04
x = arange( N )*dx
t = arange( N )*dt
y = zeros( [N, nT ] )
for i in range(N-1):
y[i,0] = sin((7.*pi*i)/(N-1))
for j in range(nT-1):
for i in range(N-1):
if j == 0: # if j = 0 then ... y[i,j-1]=y[i,j]
y[i,j+1] = 2*y[i,j] - y[i,j] + (dt**2/m[i])*(k[i-1]*y[i+1,j] -2*k[i-1]*y[i,j] + k[i]*y[i-1,j] )
else:
y[i,j+1] = 2*y[i,j] - y[i,j-1] + (dt**2/m[i])*( k[i-1]*y[i+1,j] -2*k[i-1]*y[i,j] + k[i]*y[i-1,j] )
cla()
ylim(-1,1)
plot(x,y[:,j-2])
pause(0.0001)
ylabel("Amplitude")
xlabel("x")
print len(x), len(t), N,nT
Here is a plot of the new attempt at solution with |amplitude| of anti node equal 1.0. Will this do anything with further solving the issue with indices?
Related
I am simulating 1D heat conduction using brownian motion in python. The question here is to track if the particle pass the inerface of the left or right cell. I have to count it somehow, could you please purpose solution or I should update code concept(rethink model).
Short desciption: Medium is consist of cell, in each cell it has own quantity of particles. The particles are moving from one cell to another. First and last cell has constant quantity of particle (in this case 500 and 0). Result gives the tempertaure profile along x. If we know the number of particles that are pass the interface of the cell (left or right) we might find Heat flux.
Edit: I've made counting of particle pass throught interface (from right or left). But in "theory" the value of Heat Flux should me constant. So, I gues there is the problem with my code(specified code excerpt). Could you please review it. Am I counting right?
Edit code:
import numpy as np
import matplotlib.pyplot as plt
def Cell_dist(a, dx):
res = [[] for i in range(N)]
for i in range(N):
for value in a:
if dx*i < value < dx*i+dx:
res[i].append(value)
return res
L = 0.2 # length of the medium
N = 10 # number of cells
dx = L/N # cell dimension
dt = 1 # time step
dur = 60 # duration
M = 20 # number of particles
ro = 8930 # density
k = 391 # thermal conductivity
C_p = 380 # specific heat capacity
T_0 = 100 # maintained temperature
T_r = 0 # reference temperature
DT = T_0 - T_r # characteristic temperature
a = k / ro / C_p # thermal diffusivity of the copper
dh_r = ro * C_p * dx * DT / M # refernce elementary enthalpy
c = (2*a*dt)**(1/2) # diffusion length
pos = [[] for i in range(N)] # creating cells
for time_step in range(dur):
M_plus = [[] for i in range(N-1)]
M_minus = [[] for i in range(N-1)]
flux = [0] * (N-1)
unirnd = np.random.uniform(0,dx, M).tolist() # uniform random distribution
pos[0] = unirnd # 1st cell BC
pos[-1] = [] # last cell BC at T_r = 0
curr_pos = sorted([x for sublist in pos for x in sublist]) # flatten list of particles (concatenation)
M_t = len(curr_pos) # number of particles at instant time step
pos_tmp = []
# Move each particle
for i in range(M_t):
normal_distr = np.random.default_rng().normal(0,1)
displacement = c*normal_distr
final_pos = curr_pos[i] + displacement
pos_tmp.append(final_pos)
#HERE is the question________________________
if normal_distr > 0:
for i in range(1,len(M_plus)-1):
if i*dx < final_pos < (i+1)*dx:
M_plus[i-1].append(1)
else:
for i in range(len(M_minus)):
if i*dx < final_pos < (i+1)*dx:
M_minus[i].append(1)
#END of the question________________________
for i in range(N-1):
flux[i] = (len(M_plus[i])-len(M_minus[i]))*dh_r/dt/1000
pos_new = Cell_dist(pos_tmp,dx)
pos_new[0] = unirnd
pos = pos_new
walker_number_in_cell = []
for i in range(N):
walker_number_in_cell.append(len(pos[i]))
T_n = []
for num in walker_number_in_cell:
T_n.append(T_r + num*dh_r/ro/C_p/dx)
# __________Plotting FLUX profile__________
x_a = [0]*(N-1)
for i in range(0, N-1):
x_a[i] = "{}".format(i+1)+"_{}".format(i+2)
plt.plot(x_a,flux[0:N-1],'-')
plt.xlabel('Interface')
plt.ylabel('Flux')
plt.show()
I want to get my code to loop so that every time it performs the calculation, it adds basically does a cumulative sum for my variable delta_omega. i.e. for every calculation, it takes the previous values in the delta_omega array, adds them together and uses that value to perform the calculation again and so on. I'm really not sure how to go about this as I want to plot these results too.
import numpy as np
import matplotlib.pyplot as plt
delta_omega = np.linspace(-900*10**6, -100*10**6, m) #Hz - range of frequencies
i = 0
while i<len(delta_omega):
delta = delta_omega[i] - (k*v_cap) + (mu_eff*B)/hbar
p_ee = (s0*L/2) / (1 + s0 + (2*delta/L)**2) #population of the excited state
R = L * p_ee # scattering rate
F = hbar*k*(R) #scattering force on atoms
a = F/m_Rb #acceleration assumed constant
vf_slower = (v_cap**2 - (2*a*z0))**0.5 #velocity at the end of the slower
t_d = 1/a * (v_cap - vf_slower) #time taken during slower
# -------- After slower --------
da = 0.1 #(m) distance from end of slower to the middle of the MOT
vf_MOT = (vf_slower**2 - (2*a*da))**0.5 #(m/s) - velocity of the particles at MOT center
t_a = da/vf_MOT #(s) time taken after slower
r0 = 0.01 #MOT capture radius
vr_max = r0/(t_b+t_d+t_a) #maximum transveral velocity
vz_max = (v_cap**2 + 2*a_max*z0)**0.5 #m/s - maximum axial velocity
# -------- Flux of atoms captured --------
P = 10**(4.312-(4040/T)) #vapour pressure for liquid phase (use 4.857 for solid phase)
A = 5*10**-4 #area of the oven aperture
n = P/(k_b*T) #atomic number density
f_oven = ((n*A)/4) * (2/(np.pi)**0.5) * ((2*k_b*T)/m_Rb)**0.5
f = f_oven * (1 - np.exp(-vr_max**2/vp**2))*(1 - np.exp(-vz_max**2/vp**2))
i+=1
plt.plot(delta_omega, f)
A simple cumulative sum would be defining the variable outside the loop and adding to it
i = 0
x = 0
while i < 10:
x = x + 5 #do your work on the cumulative value here
i += 1
print("cumulative sum: {}".format(x))
so define a variable that will contain the cumulative sum, and every loop, add to it
I am trying to solve the following problem via a Finite Difference Approximation in Python using NumPy:
$u_t = k \, u_{xx}$, on $0 < x < L$ and $t > 0$;
$u(0,t) = u(L,t) = 0$;
$u(x,0) = f(x)$.
I take $u(x,0) = f(x) = x^2$ for my problem.
Programming is not my forte so I need help with the implementation of my code. Here is my code (I'm sorry it is a bit messy, but not too bad I hope):
## This program is to implement a Finite Difference method approximation
## to solve the Heat Equation, u_t = k * u_xx,
## in 1D w/out sources & on a finite interval 0 < x < L. The PDE
## is subject to B.C: u(0,t) = u(L,t) = 0,
## and the I.C: u(x,0) = f(x).
import numpy as np
import matplotlib.pyplot as plt
# definition of initial condition function
def f(x):
return x^2
# parameters
L = 1
T = 10
N = 10
M = 100
s = 0.25
# uniform mesh
x_init = 0
x_end = L
dx = float(x_end - x_init) / N
#x = np.zeros(N+1)
x = np.arange(x_init, x_end, dx)
x[0] = x_init
# time discretization
t_init = 0
t_end = T
dt = float(t_end - t_init) / M
#t = np.zeros(M+1)
t = np.arange(t_init, t_end, dt)
t[0] = t_init
# Boundary Conditions
for m in xrange(0, M):
t[m] = m * dt
# Initial Conditions
for j in xrange(0, N):
x[j] = j * dx
# definition of solution to u_t = k * u_xx
u = np.zeros((N+1, M+1)) # NxM array to store values of the solution
# finite difference scheme
for j in xrange(0, N-1):
u[j][0] = x**2 #initial condition
for m in xrange(0, M):
for j in xrange(1, N-1):
if j == 1:
u[j-1][m] = 0 # Boundary condition
else:
u[j][m+1] = u[j][m] + s * ( u[j+1][m] - #FDM scheme
2 * u[j][m] + u[j-1][m] )
else:
if j == N-1:
u[j+1][m] = 0 # Boundary Condition
print u, t, x
#plt.plot(t, u)
#plt.show()
So the first issue I am having is I am trying to create an array/matrix to store values for the solution. I wanted it to be an NxM matrix, but in my code I made the matrix (N+1)x(M+1) because I kept getting an error that the index was going out of bounds. Anyways how can I make such a matrix using numpy.array so as not to needlessly take up memory by creating a (N+1)x(M+1) matrix filled with zeros?
Second, how can I "access" such an array? The real solution u(x,t) is approximated by u(x[j], t[m]) were j is the jth spatial value, and m is the mth time value. The finite difference scheme is given by:
u(x[j],t[m+1]) = u(x[j],t[m]) + s * ( u(x[j+1],t[m]) - 2 * u(x[j],t[m]) + u(x[j-1],t[m]) )
(See here for the formulation)
I want to be able to implement the Initial Condition u(x[j],t[0]) = x**2 for all values of j = 0,...,N-1. I also need to implement Boundary Conditions u(x[0],t[m]) = 0 = u(x[N],t[m]) for all values of t = 0,...,M. Is the nested loop I created the best way to do this? Originally I tried implementing the I.C. and B.C. under two different for loops which I used to calculate values of the matrices x and t (in my code I still have comments placed where I tried to do this)
I think I am just not using the right notation but I cannot find anywhere in the documentation for NumPy how to "call" such an array so at to iterate through each value in the proposed scheme. Can anyone shed some light on what I am doing wrong?
Any help is very greatly appreciated. This is not homework but rather to understand how to program FDM for Heat Equation because later I will use similar methods to solve the Black-Scholes PDE.
EDIT: So when I run my code on line 60 (the last "else" that I use) I get an error that says invalid syntax, and on line 51 (u[j][0] = x**2 #initial condition) I get an error that reads "setting an array element with a sequence." What does that mean?
I am trying to optimize a snippet that gets called a lot (millions of times) so any type of speed improvement (hopefully removing the for-loop) would be great.
I am computing a correlation function of some j'th particle with all others
C_j(|r-r'|) = sqrt(E((s_j(r')-s_k(r))^2)) averaged over k.
My idea is to have a variable corrfun which bins data into some bins (the r, defined elsewhere). I find what bin of r each s_k belongs to and this is stored in ind. So ind[0] is the index of r (and thus the corrfun) for which the j=0 point corresponds to. Multiple points can fall into the same bin (in fact I want bins to be big enough to contain multiple points) so I sum together all of the (s_j(r')-s_k(r))^2 and then divide by number of points in that bin (stored in variable rw). The code I ended up making for this is the following (np is for numpy):
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
rw2 = rw
rw2[rw < 1] = 1
corrfun = np.sqrt(np.divide(corrfun, rw2))
Note, the rw2 business was because I want to avoid divide by 0 problems but I do return the rw array and I want to be able to differentiate between the rw=0 and rw=1 elements. Perhaps there is a more elegant solution for this as well.
Is there a way to make the for-loop faster? While I would like to not add the self interaction (j==k) I am even ok with having self interaction if it means I can get significantly faster calculation (length of ind ~ 1E6 so self interaction is probably insignificant anyways).
Thank you!
Ilya
Edit:
Here is the full code. Note, in the full code I am averaging over j as well.
import numpy as np
def twopointcorr(x,y,s,dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
print(r)
corrfun = r*0
rw = r*0
print(maxR)
''' go through all points'''
for j in range(0, n-1):
hypot = np.sqrt((x[j]-x)**2+(y[j]-y)**2)
ind = [np.abs(r-h).argmin() for h in hypot]
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
rw2 = rw
rw2[rw < 1] = 1
corrfun = np.sqrt(np.divide(corrfun, rw2))
return r, corrfun, rw
I debug test it the following way
from twopointcorr import twopointcorr
import numpy as np
import matplotlib.pyplot as plt
import time
n=1000
x = np.random.rand(n)
y = np.random.rand(n)
s = np.random.rand(n)
print('running two point corr functinon')
start_time = time.time()
r,corrfun,rw = twopointcorr(x,y,s,0.1)
print("--- Execution time is %s seconds ---" % (time.time() - start_time))
fig1=plt.figure()
plt.plot(r, corrfun,'-x')
fig2=plt.figure()
plt.plot(r, rw,'-x')
plt.show()
Again, the main issue is that in the real dataset n~1E6. I can resample to make it smaller, of course, but I would love to actually crank through the dataset.
Here is the code that use broadcast, hypot, round, bincount to remove all the loops:
def twopointcorr2(x, y, s, dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
osub = lambda x:np.subtract.outer(x, x)
ind = np.clip(np.round(np.hypot(osub(x), osub(y)) / dr), 0, len(r)-1).astype(int)
rw = np.bincount(ind.ravel())
rw[0] -= len(x)
corrfun = np.bincount(ind.ravel(), (osub(s)**2).ravel())
return r, corrfun, rw
to compare, I modified your code as follows:
def twopointcorr(x,y,s,dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
corrfun = r*0
rw = r*0
for j in range(0, n):
hypot = np.sqrt((x[j]-x)**2+(y[j]-y)**2)
ind = [np.abs(r-h).argmin() for h in hypot]
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
return r, corrfun, rw
and here is the code to check the results:
import numpy as np
n=1000
x = np.random.rand(n)
y = np.random.rand(n)
s = np.random.rand(n)
r1, corrfun1, rw1 = twopointcorr(x,y,s,0.1)
r2, corrfun2, rw2 = twopointcorr2(x,y,s,0.1)
assert np.allclose(r1, r2)
assert np.allclose(corrfun1, corrfun2)
assert np.allclose(rw1, rw2)
and the %timeit results:
%timeit twopointcorr(x,y,s,0.1)
%timeit twopointcorr2(x,y,s,0.1)
outputs:
1 loop, best of 3: 5.16 s per loop
10 loops, best of 3: 134 ms per loop
Your original code on my system runs in about 5.7 seconds. I fully vectorized the inner loop and got it to run in 0.39 seconds. Simply replace your "go through all points" loop with this:
points = np.column_stack((x,y))
hypots = scipy.spatial.distance.cdist(points, points)
inds = np.rint(hypots.clip(max=maxR) / dr).astype(np.int)
# go through all points
for j in range(n): # n.b. previously n-1, not sure why
ind = inds[j]
np.add.at(corrfun, ind, (s - s[j])**2)
np.add.at(rw, ind, 1)
rw[ind[j]] -= 1 # subtract self
The first observation was that your hypot code was computing 2D distances, so I replaced that with cdist from SciPy to do it all in a single call. The second was that the inner for loop was slow, and thanks to an insightful comment from #hpaulj I vectorized that as well using np.add.at().
Since you asked how to vectorize the inner loop as well, I did that later. It now takes 0.25 seconds to run, for a total speedup of over 20x. Here's the final code:
points = np.column_stack((x,y))
hypots = scipy.spatial.distance.cdist(points, points)
inds = np.rint(hypots.clip(max=maxR) / dr).astype(np.int)
sn = np.tile(s, (n,1)) # n copies of s
diffs = (sn - sn.T)**2 # squares of pairwise differences
np.add.at(corrfun, inds, diffs)
rw = np.bincount(inds.flatten(), minlength=len(r))
np.subtract.at(rw, inds.diagonal(), 1) # subtract self
This uses more memory but does produce a substantial speedup vs. the single-loop version above.
Ok, so as it turns out outer products are incredibly memory expensive, however, using answers from #HYRY and #JohnZwinck i was able to make code that is still roughly linear in n in memory and computes fast (0.5 seconds for the test case)
import numpy as np
def twopointcorr(x,y,s,dr,maxR=-1):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
if maxR < dr:
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR+dr, dr)
corrfun = r*0
rw = r*0
for j in range(0, n):
ind = np.clip(np.round(np.hypot(x[j]-x,y[j]-y) / dr), 0, len(r)-1).astype(int)
np.add.at(corrfun, ind, (s - s[j])**2)
np.add.at(rw, ind, 1)
rw[0] -= n
corrfun = np.sqrt(np.divide(corrfun, np.maximum(rw,1)))
r=np.delete(r,-1)
rw=np.delete(rw,-1)
corrfun=np.delete(corrfun,-1)
return r, corrfun, rw
I'm looking for a method for solve the 2D heat equation with python. I have already implemented the finite difference method but is slow motion (to make 100,000 simulations takes 30 minutes). The idea is to create a code in which the end can write,
for t in TIME:
DeltaU=f(U)
U=U+DeltaU*DeltaT
save(U)
How can I do that?
In the first form of my code, I used the 2D method of finite difference, my grill is 5000x250 (x, y). Now I would like to decrease the speed of computing and the idea is to find
DeltaU = f(u)
where U is a heat function. For implementation I used this source http://www.timteatro.net/2010/10/29/performance-python-solving-the-2d-diffusion-equation-with-numpy/ for 2D case, but the run time is more expensive for my necessity. Are there some methods to do this?
Maybe I must to work with the matrix
A=1/dx^2 (2 -1 0 0 ... 0
-1 2 -1 0 ... 0
0 -1 2 -1 ... 0
. .
. .
. .
0 ... -1 2)
but how to make this in the 2D problem? How to inserting Boundary conditions in A?
This is the code for the finite difference that I used:
Lx=5000 # physical length x vector in micron
Ly=250 # physical length y vector in micron
Nx = 100 # number of point of mesh along x direction
Ny = 50 # number of point of mesh along y direction
a = 0.001 # diffusion coefficent
dx = 1/Nx
dy = 1/Ny
dt = (dx**2*dy**2)/(2*a*(dx**2 + dy**2)) # it is 0.04
x = linspace(0.1,Lx, Nx)[np.newaxis] # vector to create mesh
y = linspace(0.1,Ly, Ny)[np.newaxis] # vector to create mesh
I=sqrt(x*y.T) #initial data for heat equation
u=np.ones(([Nx,Ny])) # u is the matrix referred to heat function
steps=100000
for m in range (0,steps):
du=np.zeros(([Nx,Ny]))
for i in range (1,Nx-1):
for j in range(1,Ny-1):
dux = ( u[i+1,j] - 2*u[i,j] + u[i-1, j] ) / dx**2
duy = ( u[i,j+1] - 2*u[i,j] + u[i, j-1] ) / dy**2
du[i,j] = dt*a*(dux+duy)
# Boundary Conditions
t1=(u[:,0]+u[:,1])/2
u[:,0]=t1
u[:,1]=t1
t2=(u[0,:]+u[1,:])/2
u[0,:]=t2
u[1,:]=t2
t3=(u[-1,:]+u[-2,:])/2
u[-1,:]=t3
u[-2,:]=t3
u[:,-1]=1
filename1='data_{:08d}.txt'
if m%100==0:
np.savetxt(filename1.format(m),u,delimiter='\t' )
For elaborate 100000 steps the run time is about 30 minutes. I would to optimize this code (with the idea presented in the initial lines) to have a run time about 5/10 minutes or minus. How can I do it?
There are some simple but tremendous improvements possible.
Just by introducing Dxx, Dyy = 1/(dx*dx), 1/(dy*dy) the runtime drops 25%. By using slices and avoiding for-loops, the code is now 400 times faster.
import numpy as np
def f_for(u):
for m in range(0, steps):
du = np.zeros_like(u)
for i in range(1, Nx-1):
for j in range(1, Ny-1):
dux = (u[i+1, j] - 2*u[i, j] + u[i-1, j]) / dx**2
duy = (u[i, j+1] - 2*u[i, j] + u[i, j-1]) / dy**2
du[i, j] = dt*a*(dux + duy)
return du
def f_slice(u):
du = np.zeros_like(u)
Dxx, Dyy = 1/dx**2, 1/dy**2
i = slice(1, Nx-1)
iw = slice(0, Nx-2)
ie = slice(2, Nx)
j = slice(1, Ny-1)
js = slice(0, Ny-2)
jn = slice(2, Ny)
for m in range(0, steps):
dux = Dxx * (u[ie, j] - 2*u[i, j] + u[iw, j])
duy = Dyy * (u[i, jn] - 2*u[i, j] + u[i, js])
du[i, j] = dt*a*(dux + duy)
return du
Nx = 100 # number of mesh points in the x-direction
Ny = 50 # number of mesh points in the y-direction
a = 0.001 # diffusion coefficent
dx = 1/Nx
dy = 1/Ny
dt = (dx**2*dy**2)/(2*a*(dx**2 + dy**2))
steps = 10000
U = np.ones((Nx, Ny))
%timeit f_for(U)
%timeit f_slice(U)
Have you considered paralellizing your code or using GPU acceleration.
It would help if you ran your code the python profiler (cProfile) so that you can figure out where you bottleneck in runtime is. I'm assuming it's in solving the matrix equation you get to which can be easily sped up by the methods I listed above.
I might be wrong but in your code in the loop you create for the time steps, "m in range(steps)"
in the one line below you continue with;
Du =np.zeros(----).
Not an expert of python but this may be resulting in creating a sparse matrix in the number of steps, in this case 100k times.