Efficiently Doing Diffusion on a 2d map in Python - python

I'm pretty new to Python, so I'm doing a project in it. Part of it includes a diffusion across a map. I'm implementing it by going through and making the current tile equal to .2 * the sum of its neighbors n,w,s,e. If I was doing this in C, I'd just do a double for loop that loops through an array doing arr[i*width + j] = arr of j+1, j-1, i+i, i-1 the neighbors) and have several different arrays that I'd do the same thing for (different qualities of the map I'd be changing). However, I'm not sure if this is really the fastest way in Python. Some people I have asked suggest stuff like numPy, but the width probably won't be more than ~200 (so 40-50k elements max) and I wasn't sure if the overhead is worth it. I don't really know any builtin functions to do what I want. Any advice?
edit: This will be very dense i.e. every spot is going to have a non-trivial calculation

This is quite simple to arrange with NumPy. The function np.roll returns a copy of the array, "rolled" in a specified direction.
For example, given the array x,
x=np.arange(9).reshape(3,3)
# array([[0, 1, 2],
# [3, 4, 5],
# [6, 7, 8]])
you can roll the columns to the right with
np.roll(x,shift=1,axis=1)
# array([[2, 0, 1],
# [5, 3, 4],
# [8, 6, 7]])
Using np.roll, boundaries are wrapped like on a torus. If you do not want wrapped boundaries, you could pad the array with an edge of zeros, and reset the edge to zero before every iteration.
import numpy as np
def diffusion(arr):
while True:
arr+=0.2*np.roll(arr,shift=1,axis=1) # right
arr+=0.2*np.roll(arr,shift=-1,axis=1) # left
arr+=0.2*np.roll(arr,shift=1,axis=0) # down
arr+=0.2*np.roll(arr,shift=-1,axis=0) # up
yield arr
N=5
initial=np.random.random((N,N))
for state in diffusion(initial):
print(state)
raw_input()

Use convolution.
from numpy import *
from scipy.signal import convolve2d
mapArr=array(map)
kernel=array([[0 , 0.2, 0],
[0.2, 0, 0.2],
[0 , 0.2, 0]])
diffused=convolve2d(mapArr,kernel,boundary='wrap')
Is this for the ants challenge? If so, in the ants context, convolve2d worked ~20 times faster than the loop, in my implementation.

This modification to unutbu's code maintains constant the global sum of the array while diffuses the values of it:
import numpy as np
def diffuse(arr, d):
contrib = (arr * d)
w = contrib / 8.0
r = arr - contrib
N = np.roll(w, shift=-1, axis=0)
S = np.roll(w, shift=1, axis=0)
E = np.roll(w, shift=1, axis=1)
W = np.roll(w, shift=-1, axis=1)
NW = np.roll(N, shift=-1, axis=1)
NE = np.roll(N, shift=1, axis=1)
SW = np.roll(S, shift=-1, axis=1)
SE = np.roll(S, shift=1, axis=1)
diffused = r + N + S + E + W + NW + NE + SW + SE
return diffused

Related

selecting random elements from each column of numpy array

I have an n row, m column numpy array, and would like to create a new k x m array by selecting k random elements from each column of the array. I wrote the following python function to do this, but would like to implement something more efficient and faster:
def sample_array_cols(MyMatrix, nelements):
vmat = []
TempMat = MyMatrix.T
for v in TempMat:
v = np.ndarray.tolist(v)
subv = random.sample(v, nelements)
vmat = vmat + [subv]
return(np.array(vmat).T)
One question is whether there's a way to loop over each column without transposing the array (and then transposing back). More importantly, is there some way to map the random sample onto each column that would be faster than having a for loop over all columns? I don't have that much experience with numpy objects, but I would guess that there should be something analogous to apply/mapply in R that would work?
One alternative is to randomly generate the indices first, and then use take_along_axis to map them to the original array:
arr = np.random.randn(1000, 5000) # arbitrary
k = 10 # arbitrary
n, m = arr.shape
idx = np.random.randint(0, n, (k, m))
new = np.take_along_axis(arr, idx, axis=0)
Output (shape):
in [215]: new.shape
out[215]: (10, 500) # (k x m)
To sample each column without replacement just like your original solution
import numpy as np
matrix = np.arange(4*3).reshape(4,3)
matrix
Output
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
k = 2
np.take_along_axis(matrix, np.random.rand(*matrix.shape).argsort(axis=0)[:k], axis=0)
Output
array([[ 9, 1, 2],
[ 3, 4, 11]])
I would
Pre-allocate the result array, and fill in columns, and
Use numpy index based indexing
def sample_array_cols(matrix, n_result):
(n,m) = matrix.shape
vmat = numpy.array([n_result, m], dtype= matrix.dtype)
for c in range(m):
random_indices = numpy.random.randint(0, n, n_result)
vmat[:,c] = matrix[random_indices, c]
return vmat
Not quite fully vectorized, but better than building up a list, and the code scans just like your description.

Numba-compatible implementation of np.tile?

I'm working on some code for dehazing images, based on this paper, and I started with an abandoned Py2.7 implementation. Since then, particularly with Numba, I've made some real performance improvements (important since I'll have to run this on 8K images).
I'm pretty convinced my last significant performance bottleneck is in performing the box filter step (I've already shaved off almost a minute per image, but this last slow step is ~30s/image), and I'm close to getting it to run as nopython in Numba:
#njit # Row dependencies means can't be parallel
def yCumSum(a):
"""
Numba based computation of y-direction
cumulative sum. Can't be parallel!
"""
out = np.empty_like(a)
out[0, :] = a[0, :]
for i in prange(1, a.shape[0]):
out[i, :] = a[i, :] + out[i - 1, :]
return out
#njit(parallel= True)
def xCumSum(a):
"""
Numba-based parallel computation
of X-direction cumulative sum
"""
out = np.empty_like(a)
for i in prange(a.shape[0]):
out[i, :] = np.cumsum(a[i, :])
return out
#jit
def _boxFilter(m, r, gpu= hasGPU):
if gpu:
m = cp.asnumpy(m)
out = __boxfilter__(m, r)
if gpu:
return cp.asarray(out)
return out
#jit(fastmath= True)
def __boxfilter__(m, r):
"""
Fast box filtering implementation, O(1) time.
Parameters
----------
m: a 2-D matrix data normalized to [0.0, 1.0]
r: radius of the window considered
Return
-----------
The filtered matrix m'.
"""
#H: height, W: width
H, W = m.shape
#the output matrix m'
mp = np.empty(m.shape)
#cumulative sum over y axis
ySum = yCumSum(m) #np.cumsum(m, axis=0)
#copy the accumulated values of the windows in y
mp[0:r+1,: ] = ySum[r:(2*r)+1,: ]
#differences in y axis
mp[r+1:H-r,: ] = ySum[(2*r)+1:,: ] - ySum[ :H-(2*r)-1,: ]
mp[(-r):,: ] = np.tile(ySum[-1,: ], (r, 1)) - ySum[H-(2*r)-1:H-r-1,: ]
#cumulative sum over x axis
xSum = xCumSum(mp) #np.cumsum(mp, axis=1)
#copy the accumulated values of the windows in x
mp[:, 0:r+1] = xSum[:, r:(2*r)+1]
#difference over x axis
mp[:, r+1:W-r] = xSum[:, (2*r)+1: ] - xSum[:, :W-(2*r)-1]
mp[:, -r: ] = np.tile(xSum[:, -1][:, None], (1, r)) - xSum[:, W-(2*r)-1:W-r-1]
return mp
There's plenty to do around the edges, but if I can get the tile operation as a nopython call, I can nopython the whole boxfilter step and get a big performance boost. I'm not super inclined to do something really really specific as I'd love to reuse this code elsewhere, but I wouldn't particularly object to it being limited to a 2D scope. For whatever reason I'm just staring at this and not really sure where to start.
np.tile is a bit too complicated to reimplement in full, but unless I'm misreading it looks like you only need to take a vector and then repeat it along a different axis r times.
A Numba-compatible way to do this is to write
y = x.repeat(r).reshape((-1, r))
Then x will be repeated r times along the second dimension, so that y[i, j] == x[i].
Example:
In [2]: x = np.arange(5)
In [3]: x.repeat(3).reshape((-1, 3))
Out[3]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
If you want x to be repeated along the first dimension instead, just take the transpose y.T.

Matrix QR factorization algorithms

I've been trying to visualize QR decomposition in a step by step fashion, but I'm not getting expected results. I'm new to numpy so it'd be nice if any expert eye could spot what I might be missing:
import numpy as np
from scipy import linalg
A = np.array([[12, -51, 4],
[6, 167, -68],
[-4, 24, -41]])
#Givens
v = np.array([12, 6])
vnorm = np.linalg.norm(v)
W_12 = np.array([[v[0]/vnorm, v[1]/vnorm, 0],
[-v[1]/vnorm, v[0]/vnorm, 0],
[0, 0, 1]])
W_12 * A #this should return a matrix such that [1,0] = 0
#gram-schmidt
A[:,0]
v = np.linalg.norm(A[:,0]) * np.array([1, 0, 0])
u = (A[:,0] - v)
u = u / np.linalg.norm(u)
W1 = np.eye(3) - 2 * np.outer(u, u.transpose())
W1 * A #this matrix's first column should look like [a, 0, 0]
any help clarifying the fact that this intermediate results don't show the properties that they are supposed to will be greatly received
NumPy is designed to work with homogeneous multi-dimensional arrays, it is not specifically a linear algebra package. So by design, the * operator is element-wise multiplication, not the matrix product.
If you want to get the matrix product, there are a few ways:
You can create np.matrix objects, rather than np.ndarray objects, for which the * operator is the matrix product.
You can also use the # operator, as in W_12 # A, which is the matrix product.
Or you can use np.dot(W_12, A) or W_12.dot(A), which computes the dot product.
Any one of these, using the data you give, returns the following for Givens rotation:
>>> np.dot(W_12 A)[1, 0]
-2.2204460492503131e-16
And this for the Gram-Schmidt step:
>>> (W1.dot(A))[:, 0]
array([ 1.40000000e+01, -4.44089210e-16, 4.44089210e-16])

Python NumPy: How to fill a matrix using an equation

I wish to initialise a matrix A, using the equation A_i,j = f(i,j) for some f (It's not important what this is).
How can I do so concisely avoiding a situation where I have two for loops?
numpy.fromfunction fits the bill here.
Example from doc:
>>> import numpy as np
>>> np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int)
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
One could also get the indexes of your array with numpy.indices and then apply the function f in a vectorized fashion,
import numpy as np
shape = 1000, 1000
Xi, Yj = np.indices(shape)
A = (2*Xi + 3*Yj).astype(np.int) # or any other function f(Xi, Yj)

Roll rows of a matrix independently

I have a matrix (2d numpy ndarray, to be precise):
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
And I want to roll each row of A independently, according to roll values in another array:
r = np.array([2, 0, -1])
That is, I want to do this:
print np.array([np.roll(row, x) for row,x in zip(A, r)])
[[0 0 4]
[1 2 3]
[0 5 0]]
Is there a way to do this efficiently? Perhaps using fancy indexing tricks?
Sure you can do it using advanced indexing, whether it is the fastest way probably depends on your array size (if your rows are large it may not be):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
# Use always a negative shift, so that column_indices are valid.
# (could also use module operation)
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:, np.newaxis]
result = A[rows, column_indices]
numpy.lib.stride_tricks.as_strided stricks (abbrev pun intended) again!
Speaking of fancy indexing tricks, there's the infamous - np.lib.stride_tricks.as_strided. The idea/trick would be to get a sliced portion starting from the first column until the second last one and concatenate at the end. This ensures that we can stride in the forward direction as needed to leverage np.lib.stride_tricks.as_strided and thus avoid the need of actually rolling back. That's the whole idea!
Now, in terms of actual implementation we would use scikit-image's view_as_windows to elegantly use np.lib.stride_tricks.as_strided under the hoods. Thus, the final implementation would be -
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r):
# Concatenate with sliced to cover all rolls
a_ext = np.concatenate((a,a[:,:-1]),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), (n-r)%n,0]
Here's a sample run -
In [327]: A = np.array([[4, 0, 0],
...: [1, 2, 3],
...: [0, 0, 5]])
In [328]: r = np.array([2, 0, -1])
In [329]: strided_indexing_roll(A, r)
Out[329]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
Benchmarking
# #seberg's solution
def advindexing_roll(A, r):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:,np.newaxis]
return A[rows, column_indices]
Let's do some benchmarking on an array with large number of rows and columns -
In [324]: np.random.seed(0)
...: a = np.random.rand(10000,1000)
...: r = np.random.randint(-1000,1000,(10000))
# #seberg's solution
In [325]: %timeit advindexing_roll(a, r)
10 loops, best of 3: 71.3 ms per loop
# Solution from this post
In [326]: %timeit strided_indexing_roll(a, r)
10 loops, best of 3: 44 ms per loop
In case you want more general solution (dealing with any shape and with any axis), I modified #seberg's solution:
def indep_roll(arr, shifts, axis=1):
"""Apply an independent roll for each dimensions of a single axis.
Parameters
----------
arr : np.ndarray
Array of any shape.
shifts : np.ndarray
How many shifting to use for each dimension. Shape: `(arr.shape[axis],)`.
axis : int
Axis along which elements are shifted.
"""
arr = np.swapaxes(arr,axis,-1)
all_idcs = np.ogrid[[slice(0,n) for n in arr.shape]]
# Convert to a positive shift
shifts[shifts < 0] += arr.shape[-1]
all_idcs[-1] = all_idcs[-1] - shifts[:, np.newaxis]
result = arr[tuple(all_idcs)]
arr = np.swapaxes(result,-1,axis)
return arr
I implement a pure numpy.lib.stride_tricks.as_strided solution as follows
from numpy.lib.stride_tricks import as_strided
def custom_roll(arr, r_tup):
m = np.asarray(r_tup)
arr_roll = arr[:, [*range(arr.shape[1]),*range(arr.shape[1]-1)]].copy() #need `copy`
strd_0, strd_1 = arr_roll.strides
n = arr.shape[1]
result = as_strided(arr_roll, (*arr.shape, n), (strd_0 ,strd_1, strd_1))
return result[np.arange(arr.shape[0]), (n-m)%n]
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
r = np.array([2, 0, -1])
out = custom_roll(A, r)
Out[789]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
By using a fast fourrier transform we can apply a transformation in the frequency domain and then use the inverse fast fourrier transform to obtain the row shift.
So this is a pure numpy solution that take only one line:
import numpy as np
from numpy.fft import fft, ifft
# The row shift function using the fast fourrier transform
# rshift(A,r) where A is a 2D array, r the row shift vector
def rshift(A,r):
return np.real(ifft(fft(A,axis=1)*np.exp(2*1j*np.pi/A.shape[1]*r[:,None]*np.r_[0:A.shape[1]][None,:]),axis=1).round())
This will apply a left shift, but we can simply negate the exponential exponant to turn the function into a right shift function:
ifft(fft(...)*np.exp(-2*1j...)
It can be used like that:
# Example:
A = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
r = np.array([1,-1,3])
print(rshift(A,r))
Building on divakar's excellent answer, you can apply this logic to 3D array easily (which was the problematic that brought me here in the first place). Here's an example - basically flatten your data, roll it & reshape it after::
def applyroll_30(cube, threshold=25, offset=500):
flattened_cube = cube.copy().reshape(cube.shape[0]*cube.shape[1], cube.shape[2])
roll_matrix = calc_roll_matrix_flattened(flattened_cube, threshold, offset)
rolled_cube = strided_indexing_roll(flattened_cube, roll_matrix, cube_shape=cube.shape)
rolled_cube = triggered_cube.reshape(cube.shape[0], cube.shape[1], cube.shape[2])
return rolled_cube
def calc_roll_matrix_flattened(cube_flattened, threshold, offset):
""" Calculates the number of position along time axis we need to shift
elements in order to trig the data.
We return a 1D numpy array of shape (X*Y, time) elements
"""
# armax(...) finds the position in the cube (3d) where we are above threshold
roll_matrix = np.argmax(cube_flattened > threshold, axis=1) + offset
# ensure we don't have index out of bound
roll_matrix[roll_matrix>cube_flattened.shape[1]] = cube_flattened.shape[1]
return roll_matrix
def strided_indexing_roll(cube_flattened, roll_matrix_flattened, cube_shape):
# Concatenate with sliced to cover all rolls
# otherwise we shift in the wrong direction for my application
roll_matrix_flattened = -1 * roll_matrix_flattened
a_ext = np.concatenate((cube_flattened, cube_flattened[:, :-1]), axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = cube_flattened.shape[1]
result = viewW(a_ext,(1,n))[np.arange(len(roll_matrix_flattened)), (n - roll_matrix_flattened) % n, 0]
result = result.reshape(cube_shape)
return result
Divakar's answer doesn't do justice to how much more efficient this is on large cube of data. I've timed it on a 400x400x2000 data formatted as int8. An equivalent for-loop does ~5.5seconds, Seberg's answer ~3.0seconds and strided_indexing.... ~0.5second.

Categories