Related
I have a 2D array of 5*5 like this:
>>> np.random.seed(100)
>>> a = np.random.randint(0,100, (5,5))
>>> a
array([[ 8, 24, 67, 87, 79],
[48, 10, 94, 52, 98],
[53, 66, 98, 14, 34],
[24, 15, 60, 58, 16],
[ 9, 93, 86, 2, 27]])
if I have an initial position, is there any way to quickly and easily get the values of its four neighbors around it? The method I'm using now is a bit cumbersome:
Suppose the current position is [x, y] (if x=2, y=3 then the value in the array is 14,)οΌthen the position above it is [x-1, y], the bottom is [x+1, y], the left side is [y-1, x], and the right side is [y+1, x]. I use the following four lines of code to get the values of neighbors.
curr_val = a[2,3]
up_val = a[2+1, 3]
bott_val = a[2-1, 3]
left_val = a[2, 3+1]
right_val = a[2, 3-1]
So my question is is there a more convenient function in numpy that can do this and even query the values of four neighbors at once?
You can also use:
mask = np.array([[0, 1, 0],
[1, 0, 1],
[0, 1, 0]]).astype(bool)
a[i-1:i+2, j-1:j+2][mask]
output:
array([53, 93, 94, 86])
This is not the shortest method, but a flexible way could be to use a mask and a convolution to build this mask.
The advantage is that you can use any mask easily, just change the kernel.
from scipy.signal import convolve2d
kernel = [[0,1,0], # define points to pick around the target
[1,0,1],
[0,1,0]]
mask = np.zeros_like(a, dtype=bool) # build empty mask
mask[x,y] = True # set target(s)
# boolean indexing
a[convolve2d(mask, kernel, mode='same').astype(bool)]
output: array([52, 98, 34, 58])
The fastest way is this one taking usec to compute. Some times shortest is not the best. This one is very simple to understand and has no package dependencies.
This also works for edge-cases.
def neighbors(matrix: np.ndarray, x: int, y: int):
x_len, y_len = np.array(matrix.shape) - 1
nbr = []
if x > x_len or y > y_len:
return nbr
if x != 0:
nbr.append(matrix[x-1][y])
if y != 0:
nbr.append(matrix[x][y-1])
if x != x_len:
nbr.append(matrix[x+1][y])
if y != y_len:
nbr.append(matrix[x][y+1])
return nbr
I have 2 2D-arrays. I am trying to convolve along the axis 1. np.convolve doesn't provide the axis argument. The answer here, convolves 1 2D-array with a 1D array using np.apply_along_axis. But it cannot be directly applied to my use case. The question here doesn't have an answer.
MWE is as follows.
import numpy as np
a = np.random.randint(0, 5, (2, 5))
"""
a=
array([[4, 2, 0, 4, 3],
[2, 2, 2, 3, 1]])
"""
b = np.random.randint(0, 5, (2, 2))
"""
b=
array([[4, 3],
[4, 0]])
"""
# What I want
c = np.convolve(a, b, axis=1) # axis is not supported as an argument
"""
c=
array([[16, 20, 6, 16, 24, 9],
[ 8, 8, 8, 12, 4, 0]])
"""
I know I can do it using np.fft.fft, but it seems like an unnecessary step to get a simple thing done. Is there a simple way to do this? Thanks.
Why not just do a list comprehension with zip?
>>> np.array([np.convolve(x, y) for x, y in zip(a, b)])
array([[16, 20, 6, 16, 24, 9],
[ 8, 8, 8, 12, 4, 0]])
>>>
Or with scipy.signal.convolve2d:
>>> from scipy.signal import convolve2d
>>> convolve2d(a, b)[[0, 2]]
array([[16, 20, 6, 16, 24, 9],
[ 8, 8, 8, 12, 4, 0]])
>>>
One possibility could be to manually go the way to the Fourier spectrum, and back:
n = np.max([a.shape, b.shape]) + 1
np.abs(np.fft.ifft(np.fft.fft(a, n=n) * np.fft.fft(b, n=n))).astype(int)
# array([[16, 20, 6, 16, 24, 9],
# [ 8, 8, 8, 12, 4, 0]])
Would it be considered too ugly to loop over the orthogonal dimension? That would not add much overhead unless the main dimension is very short. Creating the output array ahead of time ensures that no memory needs to be copied about.
def convolvesecond(a, b):
N1, L1 = a.shape
N2, L2 = b.shape
if N1 != N2:
raise ValueError("Not compatible")
c = np.zeros((N1, L1 + L2 - 1), dtype=a.dtype)
for n in range(N1):
c[n,:] = np.convolve(a[n,:], b[n,:], 'full')
return c
For the generic case (convolving along the k-th axis of a pair of multidimensional arrays), I would resort to a pair of helper functions I always keep on hand to convert multidimensional problems to the basic 2d case:
def semiflatten(x, d=0):
'''SEMIFLATTEN - Permute and reshape an array to convenient matrix form
y, s = SEMIFLATTEN(x, d) permutes and reshapes the arbitrary array X so
that input dimension D (default: 0) becomes the second dimension of the
output, and all other dimensions (if any) are combined into the first
dimension of the output. The output is always 2-D, even if the input is
only 1-D.
If D<0, dimensions are counted from the end.
Return value S can be used to invert the operation using SEMIUNFLATTEN.
This is useful to facilitate looping over arrays with unknown shape.'''
x = np.array(x)
shp = x.shape
ndims = x.ndim
if d<0:
d = ndims + d
perm = list(range(ndims))
perm.pop(d)
perm.append(d)
y = np.transpose(x, perm)
# Y has the original D-th axis last, preceded by the other axes, in order
rest = np.array(shp, int)[perm[:-1]]
y = np.reshape(y, [np.prod(rest), y.shape[-1]])
return y, (d, rest)
def semiunflatten(y, s):
'''SEMIUNFLATTEN - Reverse the operation of SEMIFLATTEN
x = SEMIUNFLATTEN(y, s), where Y, S are as returned from SEMIFLATTEN,
reverses the reshaping and permutation.'''
d, rest = s
x = np.reshape(y, np.append(rest, y.shape[-1]))
perm = list(range(x.ndim))
perm.pop()
perm.insert(d, x.ndim-1)
x = np.transpose(x, perm)
return x
(Note that reshape and transpose do not create copies, so these functions are extremely fast.)
With those, the generic form can be written as:
def convolvealong(a, b, axis=-1):
a, S1 = semiflatten(a, axis)
b, S2 = semiflatten(b, axis)
c = convolvesecond(a, b)
return semiunflatten(c, S1)
I wish to create a variable array of numbers in numpy while skipping a chunk of numbers. For instance, If I have the variables:
m = 5
k = 3
num = 50
I want to create a linearly spaced numpy array starting at num and ending at num - k, skip k numbers and continue the array generation. Then repeat this process m times. For example, the above would yield:
np.array([50, 49, 48, 47, 44, 43, 42, 41, 38, 37, 36, 35, 32, 31, 30, 29, 26, 25, 24, 23])
How can I accomplish this via Numpy?
You can try:
import numpy as np
m = 5
k = 3
num = 50
np.hstack([np.arange(num - 2*i*k, num - (2*i+1)*k - 1, -1) for i in range(m)])
It gives:
array([50, 49, 48, 47, 44, 43, 42, 41, 38, 37, 36, 35, 32, 31, 30, 29, 26,
25, 24, 23])
Edit:
#JanChristophTerasa posted an answer (now deleted) that avoided Python loops by masking some elements of an array obtained using np.arange(). Here is a solution inspired by that idea. It works much faster than the above one:
import numpy as np
m = 5
k = 3
num = 50
x = np.arange(num, num - 2*k*m , -1).reshape(-1, 2*k)
x[:, :k+1].ravel()
We can use a mask and np.tile:
def mask_and_tile(m=5, k=3, num=50):
a = np.arange(num, num - 2 * m * k, -1) # create numbers
mask = np.ones(k * 2, dtype=bool) # create mask
mask[k+1:] = False # set appropriate elements to False
mask = np.tile(mask, m) # repeat mask m times
result = a[mask] # mask our numbers
return result
Or we can use a mask and just toggle the appropriate element:
def mask(m=5, k=3, num=50):
a = np.arange(num, num - 2 * m * k, -1) # create numbers
mask = np.ones_like(a, dtype=bool).reshape(-1, k)
mask[1::2] = False
mask[1::2, 0] = True
result = a[mask.flatten()]
return result
This will work fine:
import numpy as np
m = 5
k = 3
num = 50
h=0
x = np.array([])
for i in range(m):
x = np.append(x, range(num-h,num-h-k-1,-1))
h+=2*k
print(x)
Output
[50. 49. 48. 47. 44. 43. 42. 41. 38. 37. 36. 35. 32. 31. 30. 29. 26. 25.
24. 23.]
One way of doing this is making a 2D grid and calculating each number based on its position in the grid, then flattening it to a 1D array.
import numpy as np
num=50
m=5
k=3
# coordinates in a grid of width k+1 and height m
y, x = np.mgrid[:m, :k+1]
# a=[[50-0, 50-1, 50-2, 50-3], [50-0-2*3*1, 50-1-2*3*1, ...], [50-0-2*3*2...]...]
a = num - x - 2 * k * y
print(a.ravel())
I have a numpy array:
A = np.array([8, 2, 33, 4, 3, 6])
What I want is to create another array B where each element is the pairwise max of 2 consecutive pairs in A, so I get:
B = np.array([8, 33, 33, 4, 6])
Any ideas on how to implement?
Any ideas on how to implement this for more then 2 elements? (same thing but for consecutive n elements)
Edit:
The answers gave me a way to solve this question, but for the n-size window case, is there a more efficient way that does not require loops?
Edit2:
Turns out that the question is equivalent for asking how to perform 1d max-pooling of a list with a window of size n.
Does anyone know how to implement this efficiently?
One solution to the pairwise problem is using the np.maximum function and array slicing:
B = np.maximum(A[:-1], A[1:])
A loop-free solution is to use max on the windows created by skimage.util.view_as_windows:
list(map(max, view_as_windows(A, (2,))))
[8, 33, 33, 4, 6]
Copy/pastable example:
import numpy as np
from skimage.util import view_as_windows
A = np.array([8, 2, 33, 4, 3, 6])
list(map(max, view_as_windows(A, (2,))))
Here is an approach specifically taylored for larger windows. It is O(1) in window size and O(n) in data size.
I've done a pure numpy and a pythran implementation.
How do we achieve O(1) in window size? We use a "sawtooth" trick: If w is the window width we group the data into lots of w and for each group we do the cumulative maximum from left to right and from right to left. The elements of any in-between window distribute over two groups and the maxima of the intersections are among the cumulative maxima we have computed earlier. So we need a total of 3 comparisons per data point.
benchit (thanks #Divakar) for w=100; my functions are pp (numpy) and winmax (pythran):
For small window size w=5 the picture is more even. Interestingly, pythran still has a huge edge even for very small sizes. They must be doing something right to mimimze call overhead.
python code:
cummax = np.maximum.accumulate
def pp(a,w):
N = a.size//w
if a.size-w+1 > N*w:
out = np.empty(a.size-w+1,a.dtype)
out[:-1] = cummax(a[w*N-1::-1].reshape(N,w),axis=1).ravel()[:w-a.size-1:-1]
out[-1] = a[w*N:].max()
else:
out = cummax(a[w*N-1::-1].reshape(N,w),axis=1).ravel()[:w-a.size-2:-1]
out[1:N*w-w+1] = np.maximum(out[1:N*w-w+1],
cummax(a[w:w*N].reshape(N-1,w),axis=1).ravel())
out[N*w-w+1:] = np.maximum(out[N*w-w+1:],cummax(a[N*w:]))
return out
pythran version; compile with pythran -O3 <filename.py>; this creates a compiled module which you can import:
import numpy as np
# pythran export winmax(float[:],int)
# pythran export winmax(int[:],int)
def winmax(data,winsz):
N = data.size//winsz
if N < 1:
raise ValueError
out = np.empty(data.size-winsz+1,data.dtype)
nxt = winsz
for j in range(winsz,data.size):
if j == nxt:
nxt += winsz
out[j+1-winsz] = data[j]
else:
out[j+1-winsz] = out[j-winsz] if out[j-winsz]>data[j] else data[j]
running = data[-winsz:N*winsz].max()
nxt -= winsz << (nxt > data.size)
for j in range(data.size-winsz,0,-1):
if j == nxt:
nxt -= winsz
running = data[j-1]
else:
running = data[j] if data[j] > running else running
out[j] = out[j] if out[j] > running else running
out[0] = data[0] if data[0] > running else running
return out
In this Q&A, we are basically asking for sliding max values. This has been explored before - Max in a sliding window in NumPy array. Since, we are looking to be efficient, we can look further. One of those would be numba and here are two final variants I ended up with that leverage parallel directive that boosts performance over a without version :
import numpy as np
from numba import njit, prange
#njit(parallel=True)
def numba1(a, W):
L = len(a)-W+1
out = np.empty(L, dtype=a.dtype)
v = np.iinfo(a.dtype).min
for i in prange(L):
max1 = v
for j in range(W):
cur = a[i + j]
if cur>max1:
max1 = cur
out[i] = max1
return out
#njit(parallel=True)
def numba2(a, W):
L = len(a)-W+1
out = np.empty(L, dtype=a.dtype)
for i in prange(L):
for j in range(W):
cur = a[i + j]
if cur>out[i]:
out[i] = cur
return out
From the earlier linked Q&A, the equivalent SciPy version would be -
from scipy.ndimage.filters import maximum_filter1d
def scipy_max_filter1d(a, W):
L = len(a)-W+1
hW = W//2 # Half window size
return maximum_filter1d(a,size=W)[hW:hW+L]
Benchmarking
Other posted working approaches for generic window arg :
from skimage.util import view_as_windows
def rolling(a, window):
shape = (a.size - window + 1, window)
strides = (a.itemsize, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
# #mathfux's soln
def npmax_strided(a,n):
return np.max(rolling(a, n), axis=1)
# #Nicolas Gervais's soln
def mapmax_strided(a, W):
return list(map(max, view_as_windows(a,W)))
cummax = np.maximum.accumulate
def pp(a,w):
N = a.size//w
if a.size-w+1 > N*w:
out = np.empty(a.size-w+1,a.dtype)
out[:-1] = cummax(a[w*N-1::-1].reshape(N,w),axis=1).ravel()[:w-a.size-1:-1]
out[-1] = a[w*N:].max()
else:
out = cummax(a[w*N-1::-1].reshape(N,w),axis=1).ravel()[:w-a.size-2:-1]
out[1:N*w-w+1] = np.maximum(out[1:N*w-w+1],
cummax(a[w:w*N].reshape(N-1,w),axis=1).ravel())
out[N*w-w+1:] = np.maximum(out[N*w-w+1:],cummax(a[N*w:]))
return out
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
import benchit
funcs = [mapmax_strided, npmax_strided, numba1, numba2, scipy_max_filter1d, pp]
in_ = {(n,W):(np.random.randint(0,100,n),W) for n in 10**np.arange(2,6) for W in [2, 10, 20, 50, 100]}
t = benchit.timings(funcs, in_, multivar=True, input_name=['Array-length', 'Window-length'])
t.plot(logx=True, sp_ncols=1, save='timings.png')
So, numba ones are great for window sizes lower than 10, at which there's no clear winner and on larger window sizes pp wins with SciPy one at second spot.
In case there are consecutive n items, extended solution requires looping:
np.maximum(*[A[i:len(A)-n+i+1] for i in range(n)])
In order to avoid it you can use stride tricks and convert A to array of n-length blocks:
def rolling(a, window):
shape = (a.size - window + 1, window)
strides = (a.itemsize, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
For example:
>>> rolling(A, 3)
array([[ 8, 2, 8],
[ 2, 8, 33],
[ 8, 33, 33],
[33, 33, 4]])
After it's done you can kill it with np.max(rolling(A, n), axis=1).
Though, despite its elegance, neither this solution nor first one were not efficient because we apply repeatedly maximum on adjacent blocks that differs by two items only.
a recursive solution, for all of n
import numpy as np
import sys
def recursive(a: np.ndarray, n: int, b=None, level=2):
if n <= 0 or n > len(a):
raise ValueError(f'len(a):{len(a)} n:{n}')
if n == 1:
return a
if len(a) == n:
return np.max(a)
b = np.maximum(a[:-1], a[1:]) if b is None else np.maximum(a[level - 1:], b)
if n == level:
return b
return recursive(a, n, b[:-1], level + 1)
test_data = np.array([8, 2, 33, 4, 3, 6])
for test_n in range(1, len(test_data) + 2):
try:
print(recursive(test_data, n=test_n))
except ValueError as e:
sys.stderr.write(str(e))
output
[ 8 2 33 4 3 6]
[ 8 33 33 4 6]
[33 33 33 6]
[33 33 33]
[33 33]
33
len(a):6 n:7
about recursive function
You can observe the following data, and then you will know how to write the recursive function.
"""
np.array([8, 2, 33, 4, 3, 6])
n=2: (8, 2), (2, 33), (33, 4), (4, 3), (3, 6) => [8, 33, 33, 4, 6] => B' = [8, 33, 33, 4]
n=3: (8, 2, 33), (2, 33, 4), (33, 4, 3), (4, 3, 6) => B' [33, 4, 3, 6] => np.maximum([8, 33, 33, 4], [33, 4, 3, 6]) => 33, 33, 33, 6
...
"""
Using Pandas:
A = pd.Series([8, 2, 33, 4, 3, 6])
res = pd.concat([A,A.shift(-1)],axis=1).max(axis=1,skipna=False).dropna()
>>res
0 8.0
1 33.0
2 33.0
3 4.0
4 6.0
Or using numpy:
np.vstack([A[1:],A[:-1]]).max(axis=0)
I am looking for a vectorized method to apply a function returning a 2-dimensional array to each row of a 2-dimensional array and produce a 3-dimensional array.
More specifically, I have a function that takes a vector of length p and returns a 2-dimensional array (m by n). The following is a stylized version of my function:
import numpy as np
def test_func(x, m, n):
# this function is just an example and does not do anything useful.
# but, the dimensions of input and output is what I want to convey.
np.random.seed(x.sum())
return np.random.randint(5, size=(m, n))
I have a t by p 2-dimensional input data:
t = 5
p = 6
input_data = np.arange(t*p).reshape(t, p)
input_data
Out[403]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
I want to apply test_func to each row of the input_data. Since test_func returns a matrix, I expect to create a 3-dimensional (t by m by n) array. I can produce my desired result with the following code:
output_data = np.array([test_func(x, m=3, n=2) for x in input_data])
output_data
Out[405]:
array([[[0, 4],
[0, 4],
[3, 3],
[1, 0]],
[[1, 0],
[1, 0],
[4, 1],
[2, 4]],
[[3, 3],
[3, 0],
[1, 4],
[0, 2]],
[[2, 4],
[2, 1],
[3, 2],
[3, 1]],
[[3, 4],
[4, 3],
[0, 3],
[3, 0]]])
However, this code does not seem to be the most optimal code. It has an explicit for which reduces the speed and it uses an intermediary list which unnecessarily allocates extra memory. So, I like to find a vectorized solution. My best guess was the following code, but it does not work.
output = np.apply_along_axis(test_func, m=3, n=2, axis=1, arr=input_data)
Traceback (most recent call last):
File "<ipython-input-406-5bef44da348f>", line 1, in <module>
output = np.apply_along_axis(test_func, m=3, n=2, axis=1, arr=input_data)
File "C:\Anaconda\lib\site-packages\numpy\lib\shape_base.py", line 117, in apply_along_axis
outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (3,2) into shape (3)
Would you please suggest an efficient way to this problem.
UPDATE
Below is the actual function that I want to apply. It performs Multidimensional Classical Scaling. The objective of the question was not to optimize the internal workings of the function, but to find a generalize method for vectorizing the function apply. But, in the spirit of full disclosure I put the actual function here. Note that this function only works if p == m*(m-1)/2
def mds_classical_scaling(v, m, n):
# create a symmetric distance matrix from the elements in vector v
D = np.zeros((m, m))
D[np.triu_indices(4, k=1)] = v
D = (D + D.T)
# Transform the symmetric matrix
A = -0.5 * (D**2)
# Create centering matrix
H = np.eye(m) - np.ones((m, m))/m
# Doubly center A and store in B
B = H*A*H
# B should be positive definite otherwise the function
# would not work.
mu, V = eig(B)
#index of largest eigen values
ndx = (-mu).argsort()
# calculate the point configuration from largest eigen values
# and corresponding eigen vectors
Mu1 = diag(mu[ndx][:n])
V1 = V[:, ndx[:n]]
X = V1*sqrt(Mu1)
return X
Any performance boost I get from vectorization is negligible comparing to the actual function. The main reason was learning:)
ali_m's comment is spot-on: for serious speed gains, you should be more specific about what the function does.
That being said, if you still want to use np.apply_along_axis to get a (possibly) small speed-boost, then consider (after rereading that function's docstring) that you can easily
wrap your function to produce 1D arrays,
use np.apply_along_axis with that wrapper and
reshape the resulting array:
def test_func_wrapper(*args, **kwargs):
return test_func(*args, **kwargs).ravel()
output = np.apply_along_axis(test_func_wrapper, m=3, n=2, axis=1, arr=input_data)
np.allclose(output.reshape(5,3, -1), output_data)
# output: True
Note that this is a generic way to speed up such loops. You'll probably get better performance if you use functionality more specific to the actual problem.