I am writing a script in Python 2.7 that will train a neural network. As a part of the main script I need a program that solves 2D heat conduction partial derivative eauation. Previously I wrote this program in Fortran and then rewrote it in Python. The time that is required by fortran is 0.1 s, while Python requires 13 s! It is absolutely unacceptable for me since in that case computational time will be determined by the part of the program that solves PDE, but not by the epochs of training of the neural network.
How to solve that problem?
It seems that I cannot vectorize a matrix since a new element t[i.j] is calculated using value t[i-1,j], etc.
Here is the part of the code that is running slowly:
while (norm > eps):
# old value
t_old = np.copy(t)
# new value
for i in xrange(1,n-1):
for j in xrange(1,m-1):
d[i] = 0.0
a[i+1,j] = (0.5*dx/k[i,j] + 0.5*dx/k[i+1,j])
a[i-1,j] = (0.5*dx/k[i,j] + 0.5*dx/k[i-1,j])
a[i,j+1] = (0.5*dy/k[i,j] + 0.5*dy/k[i,j+1])
a[i,j-1] = (0.5*dy/k[i,j] + 0.5*dy/k[i,j-1])
a[i,j] = a[i+1,j] + a[i-1,j] + a[i,j+1] + a[i,j-1]
sum = a[i+1,j]*t[i+1,j] + a[i-1,j]*t[i-1,j] + a[i,j+1]*t[i,j+1] + a[i,j-1]*t[i,j-1] + d[i]
t[i,j] = ( sum + d[i] ) / a[i,j]
k[i,j] = k_func(t[i,j])
# matrix 2nd norm
norm = np.linalg.norm(t-t_old)
Pure Python optimizations
This won't bring all that much, but it is the easiest.
Eliminate dead code. In the inner loop, d[i] is set to zero. And then it is added to something else in two places. Adding 0 doesn't change anything, so you can remove d[i] altogether.
calculate things only once. k[i,j], 0.5*dx and 0.5*dy are used four times. So calculate them once and assign them to local variables.
remove unneccesary array access. In the inner loop, only five elements of the a matrix are calculated and used. So replace the matrix element by local variables a1 up to and including a5.
The code now looks like this:
while (norm > eps):
# old value
t_old = np.copy(t)
# new value
for i in xrange(1,n-1):
for j in xrange(1,m-1):
px = 0.5*dx
py = 0.5*py
q = k[i,j]
a1 = (px/q + px/k[i+1,j])
a2 = (px/q + px/k[i-1,j])
a3 = (py/q + py/k[i,j+1])
a4 = (py/q + py/k[i,j-1])
a5 = a1 + a2 + a3 + a4
sum = a1*t[i+1,j] + a2*t[i-1,j] + a3*t[i,j+1] + a4*t[i,j-1]
t[i,j] = sum / a5
k[i,j] = k_func(t[i,j])
# matrix 2nd norm
norm = np.linalg.norm(t-t_old)
Since your example doesn't give complete working code, I cannot measure the effects.
However, looping in Python is relatively inefficient. For good performance in pure Python it is better to use list comprehensions instead of loops. That is because in comprehensions the looping is done in the Python runtime in C, instead of in Python bytecode. But since we're already dealing with numpy arrays here, I will not expand on this.
Recode your algorithm to use numpy instead of loops
The basic idea behind numpy is that it has optimized routines (written in C or Fortran) for array operations. So for operating on arrays you should use numpy functions instead of loops!
Your loop consists mostly of filling a matrix with values derived from another matrix shifted one column or row. For that you could do something like this.
In this example I'll be shifting k one row down:
In [1]: import numpy as np
In [2]: k = np.arange(1, 26).reshape([5,5])
Out[2]:
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
In [3]: dx = 0.27
Out[3]: 0.27
In [4]: 0.5*dx/k[1:,:]
Out[4]:
array([[0.0225 , 0.01928571, 0.016875 , 0.015 , 0.0135 ],
[0.01227273, 0.01125 , 0.01038462, 0.00964286, 0.009 ],
[0.0084375 , 0.00794118, 0.0075 , 0.00710526, 0.00675 ],
[0.00642857, 0.00613636, 0.00586957, 0.005625 , 0.0054 ]])
In [5]: np.insert(0.5*dx/k[1:,:], 0, 0, axis=0)
Out[5]:
array([[0. , 0. , 0. , 0. , 0. ],
[0.0225 , 0.01928571, 0.016875 , 0.015 , 0.0135 ],
[0.01227273, 0.01125 , 0.01038462, 0.00964286, 0.009 ],
[0.0084375 , 0.00794118, 0.0075 , 0.00710526, 0.00675 ],
[0.00642857, 0.00613636, 0.00586957, 0.005625 , 0.0054 ]])
Related
I am trying to calculate radius of created lens by two overlapped spheres. In this regard, I tried both trigonometric method and another method based on just algebraic. I compared the results by these two methods with various data sets and find a small number of contradictions on just some of those data sets. The results are the same in most cases. The problem can be reproduced by the following example (on 3-5 indices):
poss = np.array([[[-0.884, -3.45, -0.99 ], [-0.901, -3.43, -0.995]], [[-0.993, -3.44, -0.97 ], [-1.01, -3.46, -1. ]],
[[-0.993, -3.44, -0.97 ], [-0.998, -3.45, -1. ]], [[0.885 , 0.967, -1.02 ], [0.885, 0.964, -1.02] ],
[[-0.252, -3.3 , -0.777], [-0.197, -3.3 , -0.777]], [[0.26 , -1.68, -0.803], [0.288, -1.67, -0.799]],
[[0.599 , 2.04 , -0.857], [0.607 , 2.04 , -0.84 ]], [[0.615 , 2. , -0.833], [0.633, 2. , -0.855]],
[[0.698 , 2.06 , -0.921], [0.679 , 2.06 , -0.914]]])
rad = np.array([[0.0108, 0.0205], [0.0231, 0.0259], [0.0231 , 0.0304], [0.0154, 0.0124], [0.0137, 0.0413],
[0.027 , 0.003 ], [0.0102, 0.022 ], [0.00221, 0.0268], [0.0147, 0.0124]])
# The length of the overlaps; lenses' heights
gap = np.array([-4.57922157e-03, -9.13773714e-03, -2.14843788e-02, -2.48000000e-02, -1.38777878e-17, -2.42861287e-17,
-1.34117058e-02, -5.84659193e-04, -6.85154327e-03])
The functions are:
def trigonometric(r_active, gap):
r_add = np.add.reduce(r_active, axis=1)
paired_cent_dis = np.sum((r_add, gap), axis=0)
intersect_angle_0 = np.arccos(np.clip((r_active[:, 0] ** 2 +
paired_cent_dis ** 2 - r_active[:, 1] ** 2) /
(2 * r_active[:, 0] * paired_cent_dis), -1, 1))
intersect_plane_rad = r_active[:, 0] * np.sin(intersect_angle_0)
return intersect_plane_rad
def algebraic(r, gap):
items_ = np.empty((len(gap), 1), dtype=np.float64)
for i in range(len(gap)):
r0, r1 = r[i]
cur_gap = gap[i]
paired_cent_dis = r0 + r1 + cur_gap
intersect_plane_rad = 0.5 * abs((-paired_cent_dis + r0 + r1) *
( paired_cent_dis + r0 + r1) * (-paired_cent_dis - r0 + r1) *
(-paired_cent_dis + r0 - r1)) ** 0.5 / paired_cent_dis
items_[i] = intersect_plane_rad
return items_.ravel()
trigonometric(rad, gap)
algebraic(rad, gap)
The results:
# repr trigonometric:
array([7.59403901e-03, 1.42126146e-02, 2.08670250e-02, 0.00000000e+00,
4.56484128e-10, 0.00000000e+00, 1.01747354e-02, 1.45347671e-03,
8.94740633e-03])
# repr algebraic:
array([7.59403901e-03, 1.42126146e-02, 2.08670250e-02, 4.69938148e-10,
5.34354024e-10, 3.68549655e-10, 1.01747354e-02, 1.45347671e-03,
8.94740633e-03])
As it can be seen by the results, there are some different resulted values on indices 3, 4, and 5. AFAIK, the two methods do the same job; It is proved by various data volumes. But some such differences may be happened on some indices in rare cases. In this example, just the 3rd index is affected by np.clip (this index in this small example gets 0 by trigonometric method, but it gets a nonzero value in my main code!? That nonzero value, too, was different from the same index resulted value by algebraic method i.e. 4.69938148e-10). As it is obvious in the images, and by focusing on the gap values (that are very small or near the diameter size of the smaller sphere), it seems the problem (differences between the results on some contacts) will be due to calculation precisions or something like that.
The final algebraic result shows the number of decimals for suspected indices will be in a reasonable range (here it is 10) and it seems trigonometric method is misled during the process.
I would be grateful to find
where is the problem source,
why the 4th index of trigonometric result gets a nonzero value but by a different magnitude although the 4th and 5th gap values are near the same,
and how could cure trigonometric method if it could.
I'm having an issue where I'm adding two 4x1 arrays and the result is a 4x4 array where the first column is repeated 4 times. The result I need is a 4x1 array.
I've initialized an array as such (m = 4): z = np.zeros((m, len(t))
Later in my code I pass this array into a function as z[:,k+1] so the dimensionality becomes a 4x1 array. (Note that when I print this array to my terminal is shows up as a row vector and not a column vector: [0. 0. 0. 0.], I'm not sure why this is either). The array that I'm trying to add to z has the following structure when printed to my terminal:
[[#]
[#]
[#]
[#]]
Clearly the addition is pulling the above array into each element of z instead of adding their respective components together, but I'm not sure why as they should both be column vectors. I'd appreciate any help with this.
EDIT: I have a lot of code so I've included a condensed version that hopefully gets the idea accross.
n = 4 # Defines number of states
m = 4 # Defines number of measurements
x = np.zeros((n, len(t)), dtype=np.float64) # Initializes states
z = np.zeros((m, len(t)), dtype=np.float64) # Initializes measurements
u = np.zeros((1, len(t)), dtype=np.float64) # Initializes input
...
C = np.eye(m) # Defines measurement matrix
...
for k in range(len(t)-1):
...
x_ukf[:,k+1], P_ukf[k+1,:,:] = function_call(x_ukf[:,k], z[:,k+1], u[:,k], P_ukf[k,:,:], C, Q, R, T) # Calls UKF function
This then leads to the function where the following occurrs (note that measurement_matrix = C (4x4 matrix), X is a 4x9 matrix, and W a 1x9 row vector):
Z = measurement_matrix # X # Calculates measurements based on sigma points
zhat = Z # W.T
...
state_vec = state_vec + K # (measurement_vec - zhat) # Updates state estimates
The issue I'm having is with the expression (measurement_vec - zhat). This is where the result should be a 4x1 vector but I'm getting a 4x4 matric.
This is sometimes called broadcasting:
a, b = np.arange(4), np.arange(8,12)
c = a + b[:,None]
Output:
array([[ 8, 9, 10, 11],
[ 9, 10, 11, 12],
[10, 11, 12, 13],
[11, 12, 13, 14]])
I have a tensor as follows:
arr = [[1.5,0.2],[2.3,0.1],[1.3,0.21],[2.2,0.09],[4.4,0.8]]
I would like to collect small arrays whose difference of first elements are within 0.3 and second elements are within 0.03.
For example [1.5,0.2] and [1.3,0.21] should belong to a same category. The difference of their first elements is 0.2<0.3 and second 0.01<0.03.
I want a tensor looks like this
arr = {[[1.5,0.2],[1.3,0.21]],[[2.3,0.1],[2.2,0.09]]}
How to do this in tensorflow? Eager mode is ok.
I found a way which is a bit ugly and slow:
samples = np.array([[1.5,0.2],[2.3,0.1],[1.3,0.2],[2.2,0.09],[4.4,0.8],[2.3,0.11]],dtype=np.float32)
ini_samples = samples
samples = tf.split(samples,2,1)
a = samples[0]
b = samples[1]
find_match1 = tf.reduce_sum(tf.abs(tf.expand_dims(a,0) - tf.expand_dims(a,1)),2)
a = tf.logical_and(tf.greater(find_match1, tf.zeros_like(find_match1)),tf.less(find_match1, 0.3*tf.ones_like(find_match1)))
find_match2 = tf.reduce_sum(tf.abs(tf.expand_dims(b,0) - tf.expand_dims(b,1)),2)
b = tf.logical_and(tf.greater(find_match2, tf.zeros_like(find_match2)),tf.less(find_match2, 0.03*tf.ones_like(find_match2)))
x,y = tf.unique(tf.reshape(tf.where(tf.logical_or(a,b)),[1,-1])[0])
r = tf.gather(ini_samples, x)
Does tensorflow have more elegant functions?
You cannot get a result composed of "groups" of vectors with different sizes. Instead, you can make a "group id" tensor that classifies each vector into a group according to your criteria. The part that makes this a bit more complicated is that you have to "fuse" groups with common elements, which I think can only be done with a loop. This code does something like that:
import tensorflow as tf
def make_groups(correspondences):
# Multiply each row by its index
m = tf.to_int32(correspondences) * tf.range(tf.shape(correspondences)[0])
# Pick the largest index for each row
r = tf.reduce_max(m, axis=1)
# While loop accounts for transitive correspondences
# (e.g. if A and B go toghether and B and C go together, then A, B and C go together)
# The loop makes sure every element gets the largest common group id
r_prev = -tf.ones_like(r)
r, _ = tf.while_loop(lambda r, r_prev: tf.reduce_any(tf.not_equal(r, r_prev)),
lambda r, r_prev: (tf.gather(r, r), tf.identity(r)),
[r, r_prev])
# Use unique indices to make sequential group ids starting from 0
return tf.unique(r)[1]
# Test
with tf.Graph().as_default(), tf.Session() as sess:
arr = tf.constant([[1.5 , 0.2 ],
[2.3 , 0.1 ],
[1.3 , 0.21],
[2.2 , 0.09],
[4.4 , 0.8 ],
[1.1 , 0.23]])
a = arr[:, 0]
b = arr[:, 0]
cond = (tf.abs(a - a[:, tf.newaxis]) < 0.3) | (tf.abs(b - b[:, tf.newaxis]) < 0.03)
groups = make_groups(cond)
print(sess.run(groups))
# [0 1 0 1 2 0]
So in this case, the groups would be:
[1.5, 0.2], [1.3, 0.21] and [1.1, 0.23]
[2.3, 0.1] and [2.2, 0.09]
[4.4, 0.8]
I'm very new to Python and currently trying to replicate plots etc that I previously used GrADs for. I want to calculate the divergence at each grid box using u and v wind fields (which are just scaled by specific humidity, q), from a netCDF climate model file.
From endless searching I know I need to use some combination of np.gradient and np.sum, but can't find the right combination. I just know that to do it 'by hand', the calculation would be
divg = dqu/dx + dqv/dy
I know the below is wrong, but it's the best I've got so far...
nc = Dataset(ifile)
q = np.array(nc.variables['hus'][0,:,:])
u = np.array(nc.variables['ua'][0,:,:])
v = np.array(nc.variables['va'][0,:,:])
lon=nc.variables['lon'][:]
lat=nc.variables['lat'][:]
qu = q*u
qv = q*v
dqu/dx, dqu/dy = np.gradient(qu, [dx, dy])
dqv/dx, dqv/dy = np.gradient(qv, [dx, dy])
divg = np.sum(dqu/dx, dqv/dy)
This gives the error 'SyntaxError: can't assign to operator'.
Any help would be much appreciated.
try something like:
dqu_dx, dqu_dy = np.gradient(qu, [dx, dy])
dqv_dx, dqv_dy = np.gradient(qv, [dx, dy])
you can not assign to any operation in python; any of those are syntax errors:
a + b = 3
a * b = 7
# or, in your case:
a / b = 9
UPDATE
following Pinetwig's comment: a/b is not a valid identifier name; it is (the return value of) an operator.
Try removing the [dx, dy].
[dqu_dx, dqu_dy] = np.gradient(qu)
[dqv_dx, dqv_dy] = np.gradient(qv)
Also to point out if you are recreating plots. Gradient changed in numpy between 1.82 and 1.9. This had an effect for recreating matlab plots in python as 1.82 was the matlab method. I am not sure how this relates to GrADs. Here is the wording for both.
1.82
"The gradient is computed using central differences in the interior
and first differences at the boundaries. The returned gradient hence has
the same shape as the input array."
1.9
"The gradient is computed using second order accurate central differences in the interior and either first differences or second order accurate one-sides (forward or backwards) differences at the boundaries. The returned gradient hence has the same shape as the input array."
The gradient function for 1.82 is here.
def gradient(f, *varargs):
"""
Return the gradient of an N-dimensional array.
The gradient is computed using central differences in the interior
and first differences at the boundaries. The returned gradient hence has
the same shape as the input array.
Parameters
----------
f : array_like
An N-dimensional array containing samples of a scalar function.
`*varargs` : scalars
0, 1, or N scalars specifying the sample distances in each direction,
that is: `dx`, `dy`, `dz`, ... The default distance is 1.
Returns
-------
gradient : ndarray
N arrays of the same shape as `f` giving the derivative of `f` with
respect to each dimension.
Examples
--------
>>> x = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> np.gradient(x)
array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ])
>>> np.gradient(x, 2)
array([ 0.5 , 0.75, 1.25, 1.75, 2.25, 2.5 ])
>>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float))
[array([[ 2., 2., -1.],
[ 2., 2., -1.]]),
array([[ 1. , 2.5, 4. ],
[ 1. , 1. , 1. ]])]
"""
f = np.asanyarray(f)
N = len(f.shape) # number of dimensions
n = len(varargs)
if n == 0:
dx = [1.0]*N
elif n == 1:
dx = [varargs[0]]*N
elif n == N:
dx = list(varargs)
else:
raise SyntaxError(
"invalid number of arguments")
# use central differences on interior and first differences on endpoints
outvals = []
# create slice objects --- initially all are [:, :, ..., :]
slice1 = [slice(None)]*N
slice2 = [slice(None)]*N
slice3 = [slice(None)]*N
otype = f.dtype.char
if otype not in ['f', 'd', 'F', 'D', 'm', 'M']:
otype = 'd'
# Difference of datetime64 elements results in timedelta64
if otype == 'M' :
# Need to use the full dtype name because it contains unit information
otype = f.dtype.name.replace('datetime', 'timedelta')
elif otype == 'm' :
# Needs to keep the specific units, can't be a general unit
otype = f.dtype
for axis in range(N):
# select out appropriate parts for this dimension
out = np.empty_like(f, dtype=otype)
slice1[axis] = slice(1, -1)
slice2[axis] = slice(2, None)
slice3[axis] = slice(None, -2)
# 1D equivalent -- out[1:-1] = (f[2:] - f[:-2])/2.0
out[slice1] = (f[slice2] - f[slice3])/2.0
slice1[axis] = 0
slice2[axis] = 1
slice3[axis] = 0
# 1D equivalent -- out[0] = (f[1] - f[0])
out[slice1] = (f[slice2] - f[slice3])
slice1[axis] = -1
slice2[axis] = -1
slice3[axis] = -2
# 1D equivalent -- out[-1] = (f[-1] - f[-2])
out[slice1] = (f[slice2] - f[slice3])
# divide by step size
outvals.append(out / dx[axis])
# reset the slice object in this dimension to ":"
slice1[axis] = slice(None)
slice2[axis] = slice(None)
slice3[axis] = slice(None)
if N == 1:
return outvals[0]
else:
return outvals
If your grid is Gaussian and the wind names in the file are "u" and "v" you can also calculate divergence directly using cdo:
cdo uv2dv in.nc out.nc
See https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-6850002.13.2 for more details.
I'm trying to evaluate the probabilities of end locations of random walks but I'm having some trouble with the speed of my program. Basically what I'm trying to do is take as an input a dictionary that contains the probabilities for a random walk( e.g. p = {0:0.5, 1:0.2. -1:0.3} meaning there's a 50% probability X stays at 0, a 20% probability X increases by 1, and a 30% probability X decreases by 1) and then calculate the probabilities for all the possible future states after n iterations.
So for example if p = {0:0.5, 1:0.2. -1:0.3} and n = 2 then it will return {0:0.37, 1:0.2, -1:0.3, 2:0.04, -2:0.09}
if p = {0:0.5, 1:0.2. -1:0.3} and n = 1 then it will return {0:0.5, 1:0.2. -1:0.3}
I have working code, and it runs relatively quickly if n is low and if the p dictionary is small, but when n > 500 and the dictionary has around 50 values it takes upwards of 5 minutes to calculate. I'm guessing this is because it does it only on one processor so I went ahead and modified it so it would use python's multiprocessing module (as I read that multithreading doesn't improve parallel computing performance because of GIL).
My problem is, that there is not much improvement with multiprocessing, now I'm not sure if it's because I'm implementing it wrong or because of the overhead of multiprocessing in python. I'm just wondering if there's a library somewhere that evaluates all the probabilities of all the possibilities of a random walk when n > 500 in parallel? My next step if I can't find anything is to write my own function as an extension in C but it will be my first time doing it and although I've coded in C before it has been a while.
Original Non MultiProcessed Code
def random_walk_predictor(probabilities_tree, period):
ret = probabilities_tree
probabilities_leaves = ret.copy()
for x in range(period):
tmp = {}
for leaf in ret.keys():
for tree_leaf in probabilities_leaves.keys():
try:
tmp[leaf + tree_leaf] = (ret[leaf] * probabilities_leaves[tree_leaf]) + tmp[leaf + tree_leaf]
except:
tmp[leaf + tree_leaf] = ret[leaf] * probabilities_leaves[tree_leaf]
ret = tmp
return ret
MultiProcessed code
from multiprocessing import Manager,Pool
from functools import partial
def probability_calculator(origin, probability, outp, reference):
for leaf in probability.keys():
try:
outp[origin + leaf] = outp[origin + leaf] + (reference[origin] * probability[leaf])
except KeyError:
outp[origin + leaf] = reference[origin] * probability[leaf]
def random_walk_predictor(probabilities_leaves, period):
probabilities_leaves = tree_developer(probabilities_leaves)
manager = Manager()
prob_leaves = manager.dict(probabilities_leaves)
ret = manager.dict({0:1})
p = Pool()
for x in range(period):
out = manager.dict()
partial_probability_calculator = partial(probability_calculator, probability = prob_leaves, outp = out, reference = ret.copy())
p.map(partial_probability_calculator, ret.keys())
ret = out
return ret.copy()
There tend to be analytic solutions to exactly solve this kind of problem that look similar to binomial distributions, but I'll assume you're really asking for a computational solution for a more general class of problem.
Rather than using python dictionaries, it's easier to think about this in terms of the underlying mathematical problem. Build a matrix A that describes the probability of going from one state to another. Build a state x that describes the probability of being at a given location at some time.
Because after n transitions you can step at most n steps from the origin (in either direction) - your state needs to have 2n+1 rows, and A needs to be square and of size 2n+1 by 2n+1.
For a two timestep problem your transition matrix will be 5x5 and look like:
[[ 0.5 0.2 0. 0. 0. ]
[ 0.3 0.5 0.2 0. 0. ]
[ 0. 0.3 0.5 0.2 0. ]
[ 0. 0. 0.3 0.5 0.2]
[ 0. 0. 0. 0.3 0.5]]
And your state at time 0 will be:
[[ 0.]
[ 0.]
[ 1.]
[ 0.]
[ 0.]]
The one step evolution of the system can be predicted by multiplying A and x.
So at t = 1,
x.T = [[ 0. 0.2 0.5 0.3 0. ]]
and at t = 2,
x.T = [[ 0.04 0.2 0.37 0.3 0.09]]
Because for even modest numbers of timesteps this is potentially going to take a fair bit of storage (A requires n^2 storage), but is very sparse, we can use sparse matrices to reduce our storage (and speed up our calculations). Doing this means A requires approximate 3n elements.
import scipy.sparse as sp
import numpy as np
def random_walk_transition_probability(n, left = 0.3, centre = 0.5, right = 0.2):
m = 2*n+1
A = sp.csr_matrix((m, m))
A += sp.diags(centre*np.ones(m), 0)
A += sp.diags(left*np.ones(m-1), -1)
A += sp.diags(right*np.ones(m-1), 1)
x = np.zeros((m,1))
x[n] = 1.0
for i in xrange(n):
x = A.dot(x)
return x
print random_walk_transition_probability(4)
Timings
%timeit random_walk_transition_probability(500)
100 loops, best of 3: 7.12 ms per loop
%timeit random_walk_transition_probability(10000)
1 loops, best of 3: 1.06 s per loop