Repeat Numpy array by a sliding window - python

From the following array of shape (6, 3):
>>> arr
[
[1, 0, 1],
[0, 0, 2],
[1, 2, 0],
[0, 1, 3],
[2, 2, 1],
[2, 0, 2]
]
I'd like to repeat the values according to a sliding window of n=4, giving a new array of shape (6-n-1, n, 3):
>>> new_arr
[
[
[1, 0, 1],
[0, 0, 2],
[1, 2, 0],
[0, 1, 3]
],
[
[0, 0, 2],
[1, 2, 0],
[0, 1, 3],
[2, 2, 1]
],
[
[1, 2, 0],
[0, 1, 3],
[2, 2, 1],
[2, 0, 2]
]
]
It is relatively straightforward using a loop, but it gets extremely slow with several million values (instead of 6 in this example) in the initial array.
Is there a faster way to get to new_arr using Numpy primitives?

You can use NumPy, specifically this function (only NumPy >= 1.20.0):
from numpy.lib.stride_tricks import sliding_window_view
new_arr = sliding_window_view(arr, (n, arr.shape[1])).squeeze()

Related

How can I store store index pairs using True values from a boolean-like square symmetric numpy array?

I have a Numpy Array that with integer values 1 or 0 (can be cast as booleans if necessary). The array is square and symmetric (see note below) and I want a list of the indices where a 1 appears:
Note that array[i][j] == array[j][i] and array[i][i] == 0 by design. Also I cannot have any duplicates.
import numpy as np
array = np.array([
[0, 0, 1, 0, 1, 0, 1],
[0, 0, 1, 1, 0, 1, 0],
[1, 1, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 1, 1, 0],
[1, 0, 0, 1, 0, 0, 1],
[0, 1, 0, 1, 0, 0, 0],
[1, 0, 1, 0, 1, 0, 0]
])
I would like a result that is like this (order of each sub-list is not important, nor is the order of each element within the sub-list):
[
[0, 2],
[0, 4],
[0, 6],
[1, 2],
[1, 3],
[1, 5],
[2, 6],
[3, 4],
[3, 5],
[4, 6]
]
Another point to make is that I would prefer not to loop over all indices twice using the condition j<i because the size of my array can be large but I am aware that this is a possibility - I have written an example of this using two for loops:
result = []
for i in range(array.shape[0]):
for j in range(i):
if array[i][j]:
result.append([i, j])
print(pd.DataFrame(result).sort_values(1).values)
# using dataframes and arrays for formatting but looking for
# 'result' which is a list
# Returns (same as above but columns are the opposite way round):
[[2 0]
[4 0]
[6 0]
[2 1]
[3 1]
[5 1]
[6 2]
[4 3]
[5 3]
[6 4]]
idx = np.argwhere(array)
idx = idx[idx[:,0]<idx[:,1]]
Another way:
idx = np.argwhere(np.triu(array))
output:
[[0 2]
[0 4]
[0 6]
[1 2]
[1 3]
[1 5]
[2 6]
[3 4]
[3 5]
[4 6]]
Comparison:
##bousof solution
def method1(array):
return np.vstack(np.where(np.logical_and(array, np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]>=0))).transpose()[:,::-1]
#Also mentioned by #hpaulj
def method2(array):
return np.argwhere(np.triu(array))
def method3(array):
idx = np.argwhere(array)
return idx[idx[:,0]<idx[:,1]]
#The original method in question by OP(d-man)
def method4(array):
result = []
for i in range(array.shape[0]):
for j in range(i):
if array[i][j]:
result.append([i, j])
return result
#suggestd by #bousof in comments
def method5(array):
return np.vstack(np.where(np.triu(array))).transpose()
inputs = [np.random.randint(0,2,(n,n)) for n in [10,100,1000,10000]]
Seems like method1, method2 and method5 are slightly faster for large arrays while method3 is faster for smaller cases:
In [249]: arr = np.array([
...: [0, 0, 1, 0, 1, 0, 1],
...: [0, 0, 1, 1, 0, 1, 0],
...: [1, 1, 0, 0, 0, 0, 1],
...: [0, 1, 0, 0, 1, 1, 0],
...: [1, 0, 0, 1, 0, 0, 1],
...: [0, 1, 0, 1, 0, 0, 0],
...: [1, 0, 1, 0, 1, 0, 0]
...: ])
The most common way of getting indices on non-zeros (True) is with np.nonzero (aka np.where):
In [250]: idx = np.nonzero(arr)
In [251]: idx
Out[251]:
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6]),
array([2, 4, 6, 2, 3, 5, 0, 1, 6, 1, 4, 5, 0, 3, 6, 1, 3, 0, 2, 4]))
This is a tuple - 2 arrays for a 2d array. It can be used directly to index the array (or anything like it): arr[idx] will give all 1s.
Apply np.transpose to that and get an array of 'pairs':
In [252]: np.argwhere(arr)
Out[252]:
array([[0, 2],
[0, 4],
[0, 6],
[1, 2],
[1, 3],
[1, 5],
[2, 0],
[2, 1],
[2, 6],
[3, 1],
[3, 4],
[3, 5],
[4, 0],
[4, 3],
[4, 6],
[5, 1],
[5, 3],
[6, 0],
[6, 2],
[6, 4]])
Using such an array to index arr is harder - requiring a loop and conversion to tuple.
To weed out the symmetric duplicates we could make a tri-lower array:
In [253]: np.tril(arr)
Out[253]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0],
[1, 0, 1, 0, 1, 0, 0]])
In [254]: np.argwhere(np.tril(arr))
Out[254]:
array([[2, 0],
[2, 1],
[3, 1],
[4, 0],
[4, 3],
[5, 1],
[5, 3],
[6, 0],
[6, 2],
[6, 4]])
You can use numpy.where:
>>> np.vstack(np.where(np.logical_and(array, np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]<=0))).transpose()
array([[2, 0],
[2, 1],
[3, 1],
[4, 0],
[4, 3],
[5, 1],
[5, 3],
[6, 0],
[6, 2],
[6, 4]])
np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]<=0 is true only on the lower part of the matrix. If the order is important, you can get the same order as in the question using:
>>> np.vstack(np.where(np.logical_and(array, np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]>=0))).transpose()[:,::-1]
array([[2, 0],
[4, 0],
[6, 0],
[2, 1],
[3, 1],
[5, 1],
[6, 2],
[4, 3],
[5, 3],
[6, 4]])

Most efficient way to get sorted indices based on two numpy arrays

How can i get the sorted indices of a numpy array (distance), only considering certain indices from another numpy array (val).
For example, consider the two numpy arrays val and distance below:
val = np.array([[10, 0, 0, 0, 0],
[0, 0, 10, 0, 10],
[0, 10, 10, 0, 0],
[0, 0, 0, 10, 0],
[0, 0, 0, 0, 0]])
distance = np.array([[4, 3, 2, 3, 4],
[3, 2, 1, 2, 3],
[2, 1, 0, 1, 2],
[3, 2, 1, 2, 3],
[4, 3, 2, 3, 4]])
the distances where val == 10 are 4, 1, 3, 1, 0, 2. I would like to get these sorted to be 0, 1, 1, 2, 3, 4 and return the respective indices from distance array.
Returning something like:
(array([2, 1, 2, 3, 1, 0], dtype=int64), array([2, 2, 1, 3, 4, 0], dtype=int64))
or:
(array([2, 2, 1, 3, 1, 0], dtype=int64), array([2, 1, 2, 3, 4, 0], dtype=int64))
since the second and third element both have distance '1', so i guess the indices can be interchangable.
Tried using combinations of np.where, np.argsort, np.argpartition, np.unravel_index but cant seem to get it working right
Here's one way with masking -
In [20]: mask = val==10
In [21]: np.argwhere(mask)[distance[mask].argsort()]
Out[21]:
array([[2, 2],
[1, 2],
[2, 1],
[3, 3],
[1, 4],
[0, 0]])

Inserting an array of arrays as the last column

I have an array A:
array([[1, 2, 3],
[1, 1, 1],
[2, 2, 2]])
and an array B:
array([[1, 0],
[1, 0],
[0, 1]])
I want to make array B as the last column of array A, so I want the result array (let's call it C) to look like this:
array([[1, 2, 3, [1, 0]],
[1, 1, 1, [1, 0]],
[2, 2, 2, [0, 1]]])
I tried: np.insert(a,-1,b,axis=1) , but this gave me an error:
ValueError: could not broadcast input array from shape (2,3) into shape (3,3)
Maybe that's what you're looking for:
import numpy as np
a = np.array([[1, 2, 3],
[1, 1, 1],
[2, 2, 2]])
b = np.array([[1, 0],
[1, 0],
[0, 1]])
np.hstack([a,b])
Which results in:
array([[1, 2, 3, 1, 0],
[1, 1, 1, 1, 0],
[2, 2, 2, 0, 1]])
print zip(*zip(*a)+[b.tolist(),])
although it wont be a numpy array afterwards
>>> a
array([[1, 2, 3],
[1, 1, 1],
[2, 2, 2]])
>>> b
array([[1, 0],
[1, 0],
[0, 1]])
>>> zip(*zip(*a)+[b.tolist(),])
[(1, 2, 3, [1, 0]), (1, 1, 1, [1, 0]), (2, 2, 2, [0, 1])]

How do I set cell values in `np.array()` based on condition?

I have a numpy array and a list of valid values in that array:
import numpy as np
arr = np.array([[1,2,0], [2,2,0], [4,1,0], [4,1,0], [3,2,0], ... ])
valid = [1,4]
Is there a nice pythonic way to set all array values to zero, that are not in the list of valid values and do it in-place? After this operation, the list should look like this:
[[1,0,0], [0,0,0], [4,1,0], [4,1,0], [0,0,0], ... ]
The following creates a copy of the array in memory, which is bad for large arrays:
arr = np.vectorize(lambda x: x if x in valid else 0)(arr)
It bugs me, that for now I loop over each array element and set it to zero if it is in the valid list.
Edit: I found an answer suggesting there is no in-place function to achieve this. Also stop changing my whitespaces. It's easier to see the changes in arr whith them.
You can use np.place for an in-situ update -
np.place(arr,~np.in1d(arr,valid),0)
Sample run -
In [66]: arr
Out[66]:
array([[1, 2, 0],
[2, 2, 0],
[4, 1, 0],
[4, 1, 0],
[3, 2, 0]])
In [67]: np.place(arr,~np.in1d(arr,valid),0)
In [68]: arr
Out[68]:
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])
Along the same lines, np.put could also be used -
np.put(arr,np.where(~np.in1d(arr,valid))[0],0)
Sample run -
In [70]: arr
Out[70]:
array([[1, 2, 0],
[2, 2, 0],
[4, 1, 0],
[4, 1, 0],
[3, 2, 0]])
In [71]: np.put(arr,np.where(~np.in1d(arr,valid))[0],0)
In [72]: arr
Out[72]:
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])
Indexing with booleans would work too:
>>> arr = np.array([[1, 2, 0], [2, 2, 0], [4, 1, 0], [4, 1, 0], [3, 2, 0]])
>>> arr[~np.in1d(arr, valid).reshape(arr.shape)] = 0
>>> arr
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])

Numpy.select from 3D array

Suppose I have the following numpy arrays:
>>a
array([[0, 0, 2],
[2, 0, 1],
[2, 2, 1]])
>>b
array([[2, 2, 0],
[2, 0, 2],
[1, 1, 2]])
that I then vertically stack
c=np.dstack((a,b))
resulting in:
>>c
array([[[0, 2],
[0, 2],
[2, 0]],
[[2, 2],
[0, 0],
[1, 2]],
[[2, 1],
[2, 1],
[1, 2]]])
From this I wish to, for each 3rd dimension of c, check which combination is present in this subarray, and then number it accordingingly with the index of the list-match. I've tried the following, but it is not working. The algorithm is simple enough with double for-loops, but because c is very large, it is prohibitively slow.
classes=[(0,0),(2,1),(2,2)]
out=np.select( [h==c for h in classes], range(len(classes)), default=-1)
My desired output would be
out = [[-1,-1,-1],
[3, 1,-1],
[2, 2,-1]]
How about this:
(np.array([np.array(h)[...,:] == c for h in classes]).all(axis = -1) *
(2 + np.arange(len(classes)))[:, None, None]).max(axis=0) - 1
It returns, what you actually need
array([[-1, -1, -1],
[ 3, 1, -1],
[ 2, 2, -1]])
You can test the a and b arrays separately like this:
clsa = (0,2,2)
clesb = (0,1,2)
np.select ( [(ca==a) & (cb==b) for ca,cb in zip (clsa, clsb)], range (3), default = -1)
which gets your desired result (except returns 0,1,2 instead of 1,2,3).
Here is another way to get what you want, thought I would post it in case it's useful to anyone.
import numpy as np
a = np.array([[0, 0, 2],
[2, 0, 1],
[2, 2, 1]])
b = np.array([[2, 2, 0],
[2, 0, 2],
[1, 1, 2]])
classes=[(0,0),(2,1),(2,2)]
c = np.empty(a.shape, dtype=[('a', a.dtype), ('b', b.dtype)])
c['a'] = a
c['b'] = b
classes = np.array(classes, dtype=c.dtype)
classes.sort()
out = classes.searchsorted(c)
out = np.where(c == classes[out], out+1, -1)
print out
#array([[-1, -1, -1]
# [ 3, 1, -1]
# [ 2, 1, -1]])

Categories