I'm trying to evaluate the probabilities of end locations of random walks but I'm having some trouble with the speed of my program. Basically what I'm trying to do is take as an input a dictionary that contains the probabilities for a random walk( e.g. p = {0:0.5, 1:0.2. -1:0.3} meaning there's a 50% probability X stays at 0, a 20% probability X increases by 1, and a 30% probability X decreases by 1) and then calculate the probabilities for all the possible future states after n iterations.
So for example if p = {0:0.5, 1:0.2. -1:0.3} and n = 2 then it will return {0:0.37, 1:0.2, -1:0.3, 2:0.04, -2:0.09}
if p = {0:0.5, 1:0.2. -1:0.3} and n = 1 then it will return {0:0.5, 1:0.2. -1:0.3}
I have working code, and it runs relatively quickly if n is low and if the p dictionary is small, but when n > 500 and the dictionary has around 50 values it takes upwards of 5 minutes to calculate. I'm guessing this is because it does it only on one processor so I went ahead and modified it so it would use python's multiprocessing module (as I read that multithreading doesn't improve parallel computing performance because of GIL).
My problem is, that there is not much improvement with multiprocessing, now I'm not sure if it's because I'm implementing it wrong or because of the overhead of multiprocessing in python. I'm just wondering if there's a library somewhere that evaluates all the probabilities of all the possibilities of a random walk when n > 500 in parallel? My next step if I can't find anything is to write my own function as an extension in C but it will be my first time doing it and although I've coded in C before it has been a while.
Original Non MultiProcessed Code
def random_walk_predictor(probabilities_tree, period):
ret = probabilities_tree
probabilities_leaves = ret.copy()
for x in range(period):
tmp = {}
for leaf in ret.keys():
for tree_leaf in probabilities_leaves.keys():
try:
tmp[leaf + tree_leaf] = (ret[leaf] * probabilities_leaves[tree_leaf]) + tmp[leaf + tree_leaf]
except:
tmp[leaf + tree_leaf] = ret[leaf] * probabilities_leaves[tree_leaf]
ret = tmp
return ret
MultiProcessed code
from multiprocessing import Manager,Pool
from functools import partial
def probability_calculator(origin, probability, outp, reference):
for leaf in probability.keys():
try:
outp[origin + leaf] = outp[origin + leaf] + (reference[origin] * probability[leaf])
except KeyError:
outp[origin + leaf] = reference[origin] * probability[leaf]
def random_walk_predictor(probabilities_leaves, period):
probabilities_leaves = tree_developer(probabilities_leaves)
manager = Manager()
prob_leaves = manager.dict(probabilities_leaves)
ret = manager.dict({0:1})
p = Pool()
for x in range(period):
out = manager.dict()
partial_probability_calculator = partial(probability_calculator, probability = prob_leaves, outp = out, reference = ret.copy())
p.map(partial_probability_calculator, ret.keys())
ret = out
return ret.copy()
There tend to be analytic solutions to exactly solve this kind of problem that look similar to binomial distributions, but I'll assume you're really asking for a computational solution for a more general class of problem.
Rather than using python dictionaries, it's easier to think about this in terms of the underlying mathematical problem. Build a matrix A that describes the probability of going from one state to another. Build a state x that describes the probability of being at a given location at some time.
Because after n transitions you can step at most n steps from the origin (in either direction) - your state needs to have 2n+1 rows, and A needs to be square and of size 2n+1 by 2n+1.
For a two timestep problem your transition matrix will be 5x5 and look like:
[[ 0.5 0.2 0. 0. 0. ]
[ 0.3 0.5 0.2 0. 0. ]
[ 0. 0.3 0.5 0.2 0. ]
[ 0. 0. 0.3 0.5 0.2]
[ 0. 0. 0. 0.3 0.5]]
And your state at time 0 will be:
[[ 0.]
[ 0.]
[ 1.]
[ 0.]
[ 0.]]
The one step evolution of the system can be predicted by multiplying A and x.
So at t = 1,
x.T = [[ 0. 0.2 0.5 0.3 0. ]]
and at t = 2,
x.T = [[ 0.04 0.2 0.37 0.3 0.09]]
Because for even modest numbers of timesteps this is potentially going to take a fair bit of storage (A requires n^2 storage), but is very sparse, we can use sparse matrices to reduce our storage (and speed up our calculations). Doing this means A requires approximate 3n elements.
import scipy.sparse as sp
import numpy as np
def random_walk_transition_probability(n, left = 0.3, centre = 0.5, right = 0.2):
m = 2*n+1
A = sp.csr_matrix((m, m))
A += sp.diags(centre*np.ones(m), 0)
A += sp.diags(left*np.ones(m-1), -1)
A += sp.diags(right*np.ones(m-1), 1)
x = np.zeros((m,1))
x[n] = 1.0
for i in xrange(n):
x = A.dot(x)
return x
print random_walk_transition_probability(4)
Timings
%timeit random_walk_transition_probability(500)
100 loops, best of 3: 7.12 ms per loop
%timeit random_walk_transition_probability(10000)
1 loops, best of 3: 1.06 s per loop
Related
I am trying to calculate radius of created lens by two overlapped spheres. In this regard, I tried both trigonometric method and another method based on just algebraic. I compared the results by these two methods with various data sets and find a small number of contradictions on just some of those data sets. The results are the same in most cases. The problem can be reproduced by the following example (on 3-5 indices):
poss = np.array([[[-0.884, -3.45, -0.99 ], [-0.901, -3.43, -0.995]], [[-0.993, -3.44, -0.97 ], [-1.01, -3.46, -1. ]],
[[-0.993, -3.44, -0.97 ], [-0.998, -3.45, -1. ]], [[0.885 , 0.967, -1.02 ], [0.885, 0.964, -1.02] ],
[[-0.252, -3.3 , -0.777], [-0.197, -3.3 , -0.777]], [[0.26 , -1.68, -0.803], [0.288, -1.67, -0.799]],
[[0.599 , 2.04 , -0.857], [0.607 , 2.04 , -0.84 ]], [[0.615 , 2. , -0.833], [0.633, 2. , -0.855]],
[[0.698 , 2.06 , -0.921], [0.679 , 2.06 , -0.914]]])
rad = np.array([[0.0108, 0.0205], [0.0231, 0.0259], [0.0231 , 0.0304], [0.0154, 0.0124], [0.0137, 0.0413],
[0.027 , 0.003 ], [0.0102, 0.022 ], [0.00221, 0.0268], [0.0147, 0.0124]])
# The length of the overlaps; lenses' heights
gap = np.array([-4.57922157e-03, -9.13773714e-03, -2.14843788e-02, -2.48000000e-02, -1.38777878e-17, -2.42861287e-17,
-1.34117058e-02, -5.84659193e-04, -6.85154327e-03])
The functions are:
def trigonometric(r_active, gap):
r_add = np.add.reduce(r_active, axis=1)
paired_cent_dis = np.sum((r_add, gap), axis=0)
intersect_angle_0 = np.arccos(np.clip((r_active[:, 0] ** 2 +
paired_cent_dis ** 2 - r_active[:, 1] ** 2) /
(2 * r_active[:, 0] * paired_cent_dis), -1, 1))
intersect_plane_rad = r_active[:, 0] * np.sin(intersect_angle_0)
return intersect_plane_rad
def algebraic(r, gap):
items_ = np.empty((len(gap), 1), dtype=np.float64)
for i in range(len(gap)):
r0, r1 = r[i]
cur_gap = gap[i]
paired_cent_dis = r0 + r1 + cur_gap
intersect_plane_rad = 0.5 * abs((-paired_cent_dis + r0 + r1) *
( paired_cent_dis + r0 + r1) * (-paired_cent_dis - r0 + r1) *
(-paired_cent_dis + r0 - r1)) ** 0.5 / paired_cent_dis
items_[i] = intersect_plane_rad
return items_.ravel()
trigonometric(rad, gap)
algebraic(rad, gap)
The results:
# repr trigonometric:
array([7.59403901e-03, 1.42126146e-02, 2.08670250e-02, 0.00000000e+00,
4.56484128e-10, 0.00000000e+00, 1.01747354e-02, 1.45347671e-03,
8.94740633e-03])
# repr algebraic:
array([7.59403901e-03, 1.42126146e-02, 2.08670250e-02, 4.69938148e-10,
5.34354024e-10, 3.68549655e-10, 1.01747354e-02, 1.45347671e-03,
8.94740633e-03])
As it can be seen by the results, there are some different resulted values on indices 3, 4, and 5. AFAIK, the two methods do the same job; It is proved by various data volumes. But some such differences may be happened on some indices in rare cases. In this example, just the 3rd index is affected by np.clip (this index in this small example gets 0 by trigonometric method, but it gets a nonzero value in my main code!? That nonzero value, too, was different from the same index resulted value by algebraic method i.e. 4.69938148e-10). As it is obvious in the images, and by focusing on the gap values (that are very small or near the diameter size of the smaller sphere), it seems the problem (differences between the results on some contacts) will be due to calculation precisions or something like that.
The final algebraic result shows the number of decimals for suspected indices will be in a reasonable range (here it is 10) and it seems trigonometric method is misled during the process.
I would be grateful to find
where is the problem source,
why the 4th index of trigonometric result gets a nonzero value but by a different magnitude although the 4th and 5th gap values are near the same,
and how could cure trigonometric method if it could.
I am trying to write a function that returns an np.array of size nx x ny that contains a centered gaussian distribution with mean mu and sd sig. It works in principle like below but the problem is that the result is not completely symmetric. This is not a problem for larger nx x ny but for smaller ones it is obvious that something is not quite right in my implementation ...
For:
create2dGaussian (1, 1, 5, 5)
It outputs:
[[ 0. 0.2 0.3 0.1 0. ]
[ 0.2 0.9 1. 0.5 0. ]
[ 0.3 1. 1. 0.6 0. ]
[ 0.1 0.5 0.6 0.2 0. ]
[ 0. 0. 0. 0. 0. ]]
... which is not symmetric. For larger nx and ny a 3d plot looks perfectly fine/smooth but why are the detailed numerics not correct and how can I fix it?
import numpy as np
def create2dGaussian (mu, sigma, nx, ny):
x, y = np.meshgrid(np.linspace(-nx/2, +nx/2+1,nx), np.linspace(-ny/2, +ny/2+1,ny))
d = np.sqrt(x*x+y*y)
g = np.exp(-((d-mu)**2 / ( 2.0 * sigma**2 )))
np.set_printoptions(precision=1, suppress=True)
print(g.shape)
print(g)
return g
----- EDIT -----
While the below described solution works for the problem mentioned in the headline (non-symmetric distribution) this code has also some other issues that are discussed here.
Numpy's linspace is inclusive of both edges by default, unlike range, you don't need to add one to the right side. I'd also recommend only dividing by floats, just to be safe:
x, y = np.meshgrid(np.linspace(-nx/2.0, +nx/2.0,nx), np.linspace(-ny/2.0, +ny/2.0,ny))
The gradient of a symmetric function should have same derivatives in all dimensions.
numpy.gradient is providing different components.
Here is a MWE.
import numpy as np
x = (-1,0,1)
y = (-1,0,1)
X,Y = np.meshgrid(x,y)
f = 1/(X*X + Y*Y +1.0)
print(f)
>> [[0.33333333 0.5 0.33333333]
[0.5 1. 0.5 ]
[0.33333333 0.5 0.33333333]]
This has same values in both dimensions.
But np.gradient(f) gives
[array([[ 0.16666667, 0.5 , 0.16666667],
[ 0. , 0. , 0. ],
[-0.16666667, -0.5 , -0.16666667]]),
array([[ 0.16666667, 0. , -0.16666667],
[ 0.5 , 0. , -0.5 ],
[ 0.16666667, 0. , -0.16666667]])]
Both the components of the gradient are different.
Why so?
What I am missing in interpretation of the output?
Let's walk through this step by step. So first, as correctly mentioned by meowgoesthedog
numpy calculates derivatives in a direction.
Numpy's way of calculating gradients
It's important to note that np.gradient uses centric differences meaning (for simplicity we look at just one direction):
grad_f[i] = (f[i+1] - f[i])/2 + (f[i] - f[i-1])/2 = (f[i+1] - f[i-1])/2
At the boundary numpy calculates (take the min as example)
grad_f[min] = f[min+1] - f[min]
grad_f[max] = f[max] - f[max-1]
In your case the boundary is 0 and 2.
2D case
If you use more than one dimension we need to the direction of the derivative into account. np.gradient calculates the derivatives in all possible directions. Let's reproduce your results:
Let's move alongside the columns, so we calculate with row vectors
f[1,:] - f[0,:]
Output
array([0.16666667, 0.5 , 0.16666667])
which is exactly the first row of the first element of your gradient.
The row is calculated with centered derivatives, therefore:
(f[2,:]-f[1,:])/2 + (f[1,:]-f[0,:])/2
Output
array([0., 0., 0.])
The third row:
f[2,:] - f[1,:]
Output
array([-0.16666667, -0.5 , -0.16666667])
For the other direction just exchange the : and the numbers and take in mind that you are now calculating column vectors. This leads directly to the transposed derivative in the case of a symmetric function, like in your case.
3D case
x_ = (-1,0,4)
y_ = (-3,0,1)
z_ = (-1,0,12)
x, y, z = np.meshgrid(x_, y_, z_, indexing='ij')
f = 1/(x**2 + y**2 + z**2 + 1)
np.gradient(f)[1]
Output
array([[[ *2.50000000e-01, 4.09090909e-01, 3.97702165e-04*],
[ 8.33333333e-02, 1.21212121e-01, 1.75554093e-04],
[-8.33333333e-02, -1.66666667e-01, -4.65939801e-05]],
[[ **4.09090909e-01, 9.00000000e-01, 4.03045231e-04**],
[ 1.21212121e-01, 2.00000000e-01, 1.77904287e-04],
[-1.66666667e-01, -5.00000000e-01, -4.72366556e-05]],
[[ ***1.85185185e-02, 2.03619910e-02, 3.28827183e-04***],
[ 7.79727096e-03, 8.54700855e-03, 1.45243282e-04],
[-2.92397661e-03, -3.26797386e-03, -3.83406181e-05]]])
The gradient which is given here is calculated along rows (0 would be along matrices, 1 along rows, 2 along columns).
This can be calculated by
(f[:,1,:] - f[:,0,:])
Output
array([[*2.50000000e-01, 4.09090909e-01, 3.97702165e-04*],
[**4.09090909e-01, 9.00000000e-01, 4.03045231e-04**],
[***1.85185185e-02, 2.03619910e-02, 3.28827183e-04***]])
I added the asteriks so that it becomes clear where to find corresponding row vectors. Since we calculated the gradient in direction 1 we have to look for row vectors.
If one wants to reproduce the whole gradient, this is done by
np.stack(((f[:,1,:] - f[:,0,:]), (f[:,2,:] - f[:,0,:])/2, (f[:,2,:] - f[:,1,:])), axis=1)
n-dim case
We can generalize the things we learned to here to calculate gradients of arbitrary functions along directions.
def grad_along_axis(f, ax):
f_grad_ind = []
for i in range(f.shape[ax]):
if i == 0:
f_grad_ind.append(np.take(f, i+1, ax) - np.take(f, i, ax))
elif i == f.shape[ax] -1:
f_grad_ind.append(np.take(f, i, ax) - np.take(f, i-1, ax))
else:
f_grad_ind.append((np.take(f, i+1, ax) - np.take(f, i-1, ax))/2)
f_grad = np.stack(f_grad_ind, axis=ax)
return f_grad
where
np.take(f, i, ax) = f[:,...,i,...,:]
and i is at index ax.
Usually gradients and jacobians are operators on functions
Id you need the gradient of f = 1/(X*X + Y*Y +1.0) then you have to compute it symbolically. Or estimate it with numerical methods that use that function.
I do not know what a gradient of a constant 3d array is. numpy.gradient is a one dimensional concept.
Python has the sympy package that can automatically compute jacobians symbolically.
If by second order derivative of a scalar 3d field you mean a laplacian then you can estimate that with a standard 4 point stencil.
I'd like to compare each value x of an array with a rolling window of the n previous values. More precisely I'd like to see at which percentile this new value x would be, if we added it to the previous window:
import numpy as np
A = np.array([1, 4, 9, 28, 28.5, 2, 283, 3.2, 7, 15])
print A
n = 4 # window width
for i in range(len(A)-n):
W = A[i:i+n]
x = A[i+n]
q = sum(W <= x) * 1.0 / n
print 'Value:', x, ' Window before this value:', W, ' Quantile:', q
[ 1. 4. 9. 28. 28.5 2. 283. 3.2 7. 15. ]
Value: 28.5 Window before this value: [ 1. 4. 9. 28.] Quantile: 1.0
Value: 2.0 Window before this value: [ 4. 9. 28. 28.5] Quantile: 0.0
Value: 283.0 Window before this value: [ 9. 28. 28.5 2. ] Quantile: 1.0
Value: 3.2 Window before this value: [ 28. 28.5 2. 283. ] Quantile: 0.25
Value: 7.0 Window before this value: [ 28.5 2. 283. 3.2] Quantile: 0.5
Value: 15.0 Window before this value: [ 2. 283. 3.2 7. ] Quantile: 0.75
Question: What is the name of this computation? Is there a clever numpy way to compute this more efficiently on arrays of millions of items (with n that can be ~5000)?
Note: here is a simulation for 1M items and n=5000 but it would take ~ 2 hours:
import numpy as np
A = np.random.random(1000*1000) # the following is not very interesting with a [0,1]
n = 5000 # uniform random variable, but anyway...
Q = np.zeros(len(A)-n)
for i in range(len(Q)):
Q[i] = sum(A[i:i+n] <= A[i+n]) * 1.0 / n
if i % 100 == 0:
print "%.2f %% already done. " % (i * 100.0 / len(A))
print Q
Note: this is not similar to How to compute moving (or rolling, if you will) percentile/quantile for a 1d array in numpy?
Your code is so slow because you're using Python's own sum() instead of numpy.sum() or numpy.array.sum(); Python's sum() has to convert all the raw values to Python objects before doing the calculations, which is really slow. Just by changing sum(...) to np.sum(...) or (...).sum(), the runtime drops to under 20 seconds.
you can use np.lib.stride_tricks.as_strided as in the accepted answer of the question you linked. With the first example you give, it is pretty easy to understand:
A = np.array([1, 4, 9, 28, 28.5, 2, 283, 3.2, 7, 15])
n=4
print (np.lib.stride_tricks.as_strided(A, shape=(A.size-n,n),
strides=(A.itemsize,A.itemsize)))
# you get the A.size-n columns of the n rolling elements
array([[ 1. , 4. , 9. , 28. , 28.5, 2. ],
[ 4. , 9. , 28. , 28.5, 2. , 283. ],
[ 9. , 28. , 28.5, 2. , 283. , 3.2],
[ 28. , 28.5, 2. , 283. , 3.2, 7. ]])
Now to do the calculation, you can compare this array to A[n:], sum over the rows and divide by n:
print ((np.lib.stride_tricks.as_strided(A, shape=(n,A.size-n),
strides=(A.itemsize,A.itemsize))
<= A[n:]).sum(0)/(1.*n))
[1. 0. 1. 0.25 0.5 0.75] # same anwser
Now the problem is the size of you data (several M and n around 5000), not sure you can use directly this method. One way could be to chunk the data. Let's define a function
def compare_strides (arr, n):
return (np.lib.stride_tricks.as_strided(arr, shape=(n,arr.size-n),
strides=(arr.itemsize,arr.itemsize))
<= arr[n:]).sum(0)
and do the chunk, with np.concatenate and don't forget to divide by n:
nb_chunk = 1000 #this number depends on the capacity of you computer,
# not sure how to optimize it
Q = np.concatenate([compare_strides(A[chunk*nb_chunk:(chunk+1)*nb_chunk+n],n)
for chunk in range(0,A[n:].size/nb_chunk+1)])/(1.*n)
I can't do the 1M - 5000 test, but on a 5000 - 100, see the difference in timeit:
A = np.random.random(5000)
n = 100
%%timeit
Q = np.zeros(len(A)-n)
for i in range(len(Q)):
Q[i] = sum(A[i:i+n] <= A[i+n]) * 1.0 / n
#1 loop, best of 3: 6.75 s per loop
%%timeit
nb_chunk = 100
Q1 = np.concatenate([compare_strides(A[chunk*nb_chunk:(chunk+1)*nb_chunk+n],n)
for chunk in range(0,A[n:].size/nb_chunk+1)])/(1.*n)
#100 loops, best of 3: 7.84 ms per loop
#check for egality
print ((Q == Q1).all())
Out[33]: True
See the difference in time from 6750 ms to 7.84 ms. Hope it works on bigger data
Using np.sum instead of sum was already mentioned, so my only suggestion left is to additionally consider using pandas and its rolling window function, which you can apply any arbitrary function to:
import numpy as np
import pandas as pd
A = np.random.random(1000*1000)
df = pd.DataFrame(A)
n = 5000
def fct(x):
return np.sum(x[:-1] <= x[-1]) * 1.0 / (len(x)-1)
percentiles = df.rolling(n+1).apply(fct)
print(percentiles)
Additional benchmark: comparison between this solution and this solution:
import numpy as np, time
A = np.random.random(1000*1000)
n = 5000
def compare_strides (arr, n):
return (np.lib.stride_tricks.as_strided(arr, shape=(n,arr.size-n), strides=(arr.itemsize,arr.itemsize)) <= arr[n:]).sum(0)
# Test #1: with strides ===> 11.0 seconds
t0 = time.time()
nb_chunk = 10*1000
Q = np.concatenate([compare_strides(A[chunk*nb_chunk:(chunk+1)*nb_chunk+n],n) for chunk in range(0,A[n:].size/nb_chunk+1)])/(1.*n)
print time.time() - t0, Q
# Test #2: with just np.sum ===> 18.0 seconds
t0 = time.time()
Q2 = np.zeros(len(A)-n)
for i in range(len(Q2)):
Q2[i] = np.sum(A[i:i+n] <= A[i+n])
Q2 *= 1.0 / n # here the multiplication is vectorized; if instead, we move this multiplication to the previous line: np.sum(A[i:i+n] <= A[i+n]) * 1.0 / n, it is 6 seconds slower
print time.time() - t0, Q2
print all(Q == Q2)
There's also another (better) way, with numba and #jit decorator. Then it is much faster: only 5.4 seconds!
from numba import jit
import numpy as np
#jit # if you remove this line, it is much slower (similar to Test #2 above)
def doit():
A = np.random.random(1000*1000)
n = 5000
Q2 = np.zeros(len(A)-n)
for i in range(len(Q2)):
Q2[i] = np.sum(A[i:i+n] <= A[i+n])
Q2 *= 1.0/n
print(Q2)
doit()
When adding numba parallelization, it's even faster: 1.8 seconds!
import numpy as np
from numba import jit, prange
#jit(parallel=True)
def doit(A, Q, n):
for i in prange(len(Q)):
Q[i] = np.sum(A[i:i+n] <= A[i+n])
A = np.random.random(1000*1000)
n = 5000
Q = np.zeros(len(A)-n)
doit(A, Q, n)
You can use the np.quantile instead of sum(A[i:i+n] <= A[i+n]) * 1.0 / n. That may be as good as it gets. Not sure if there really is a better approach for your question.
I am now working on a calculation shown below. I want to update the values of each element based on their adjacent elements. I am now using two for loops, but it shows the calculation is very slow since there are several outer iterations. I want to know whether there is any way can speed up this calculation>
for i in range(1,nx+1):
for j in range(1,ny+1):
p[i,j]=(a*p[i-1,j]+b*p[i+1,j]+c*p[i,j-1]+d*p[i,j+1])
a, b, c, d are some constant, p is numpy.array type
Sample input:
import numpy as np
p = np.ones((5,5))
for i in range(1,4):
for j in range(1,4):
p[i,j]=p[i-1,j] + p[i+1,j] +2*p[i,j+1]+2*p[i,j-1]
print(p)
The final output should be:
[[ 1. 1. 1. 1. 1.]
[ 1. 6. 16. 36. 1.]
[ 1. 11. 41. 121. 1.]
[ 1. 16. 76. 276. 1.]
[ 1. 1. 1. 1. 1.]]
Don't have enough rep to comment and this doesn't fully answer the question, but if you are using NumPy, you should definitely look at array broadcasting. Hard to tell exactly what your code is doing, but using broadcasting should make it a lot easier to update the full matrix instead of value by value
We can at least get rid of one nested loop using np.cumsum. In favorable conditions (large number of columns) this can give a 30fold speedup. Sample run:
results equal True
original 31.644793 ms
optimized 0.861980 ms
Code:
import numpy as np
n, m = 50, 600
a, b, c, d = np.random.random((4,))
P = np.random.random((n, m))
def f_OP(P):
p = P.copy()
for i in range(1, n-1):
for j in range(1, m-1):
p[i,j]=a*p[i-1,j] + b*p[i+1,j] +c*p[i,j-1]+d*p[i,j+1]
return p
def f_pp(P):
p = P.copy()
pp = d*p[1:-1, 2:] + b*p[2:, 1:-1]
pp[0] += a*p[0, 1:-1]
pp[:, 0] += c*p[1:-1, 0]
x = np.full((m-2,), c)
x[0] = 1
x = np.cumprod(x)[::-1]
pp = np.cumsum(pp * x, axis=1)
for i in range(1, n-2):
pp[i] += a * np.cumsum(pp[i-1])
p[1:-1, 1:-1] = pp / x
return(p)
print('results equal', np.allclose(f_OP(P), f_pp(P)))
from timeit import timeit
kwds = dict(globals=globals(), number=10)
print('original {:10.6f} ms'.format(timeit('f_OP(P)', **kwds)*100))
print('optimized {:10.6f} ms'.format(timeit('f_pp(P)', **kwds)*100))