Python: multidimensional array masking - python

What is equivalent pythonic implementation for the following simple piece of code in Matlab.
Matlab:
B = 2D array of integers as indices [1...100]
A = 2D array of numbers: [10x10]
A[B] = 0
which works well as for example for B[i]=42 it finds the position 2 of column 5 to be set.
In Python it causes an error: out of bound which is logical. However to translate the above Matlab code into Python we are looking for pythonic ways.
Please also consider the problem for higher dimensions such as:
B = 2D array of integers as indices [1...3000]
C = 3D array of numbers: [10x10x30]
C[B] = 0
One way we thought about it is to reform indices array elements as i,j instead of being absolute position. That is, position 42 to divmod(42,m=10)[::-1] >>> (2,4). So we will have a nx2 >>> ii,jj vectors of indices which can be used for indexing A easily.
We thought that it might be a better way, efficient also for higher dimensions in Python.

You can use .ravel() on the array (A) before indexing it, and then .reshape() after.
Alternatively, since you know A.shape, you can use np.unravel_index on the other array (B) before indexing.
Example 1:
>>> import numpy as np
>>> A = np.ones((5,5), dtype=int)
>>> B = [1, 3, 7, 23]
>>> A
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
>>> A_ = A.ravel()
>>> A_[B] = 0
>>> A_.reshape(A.shape)
array([[1, 0, 1, 0, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 0, 1]])
Example 2:
>>> b_row, b_col = np.vstack([np.unravel_index(b, A.shape) for b in B]).T
>>> A[b_row, b_col] = 0
>>> A
array([[1, 0, 1, 0, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 0, 1]])
Discovered later: you can use numpy.put
>>> import numpy as np
>>> A = np.ones((5,5), dtype=int)
>>> B = [1, 3, 7, 23]
>>> A.put(B, [0]*len(B))
>>> A
array([[1, 0, 1, 0, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 0, 1]])

Related

Extract upper diagonal of a block matrix in numpy - like np.triu_idx, but for block matrices

I want to extract off-block-diagonal elements from a block-diagonal matrix, i.e.
import numpy as np
import scipy as sp
A = np.array([
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]
])
B = np.array([
[2, 2],
[2, 2]
])
C = np.array([
[3]
])
D = sp.linalg.block_diag(A, B, C)
print(D)
>>> array([[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[0, 0, 0, 2, 2, 0],
[0, 0, 0, 2, 2, 0],
[0, 0, 0, 0, 0, 3]])
So, I need to extract the elements over the diagonal that do not belong to blocks, i.e. that ones which are zeros in D.
How to achieve that?
A straightforwad solution based on loops, can one make it better avoiding loops?
Taking upper triangle and then taking non-zero values is not what I want as it will not allow me to get indices for the original block matrix D, but only for it's upper triangle.
def block_triu_indices(block_sizes=None):
n = np.sum(block_sizes)
blocks = []
for block_size in block_sizes:
blocks.append(np.ones((block_size, block_size)))
A = sp.linalg.block_diag(*blocks)
row_idx = []
col_idx = []
for i in range(n):
for j in range(i+1, n):
if A[i,j]==0:
row_idx.append(i)
col_idx.append(j)
return (np.array(row_idx), np.array(col_idx))
block_triu_idx = block_triu_indices([3, 2, 1])
print(block_triu_idx)
>>> (array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 4]),
array([3, 4, 5, 3, 4, 5, 3, 4, 5, 5, 5]))
print(D[block_triu_idx])
>>> array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

How to change 1d array to 2d array under certain conditions

Is there a way through python numpy operations to produce the following result?
Input 1d array :
[3, 0, 0, 2, 2, 1]
Output 2d array :
3 0 0 2 2 1
0 0 0 2 2 1
0 0 0 2 2 1
2 2 2 2 2 1
2 2 2 2 2 1
1 1 1 1 1 1
in addition to jeromie’s brilliant answer, here is a support for unordered array:
indexes = np.arange(len(arr))
idx = np.maximum(indexes[None,:], indexes[:, None])
arr[idx]
Assuming the input array always contains increasing values, you can use np.maximum(arr[None,:], arr[:, None]). This compute the maximum of arr[i] and arr[j] for all items at the location (i, j) of the output array thanks to Numpy broadcasting. If the input does not always contains increasing values, then the out needs to be better defined.
Adir's solution is brilliant and should be accepted.
EDIT: original answer --
It seems that you want to essentially make a matrix by "rotating" a vector like a wiper blade from the top left corner. This produces that pattern:
def wiper_blade_matrix(x):
n = len(x)
z = np.zeros((n, n), x.dtype)
for k in range(-(n - 1), n):
z[np.where(np.eye(n, k=k))] += x[abs(k):]
return z
Usage:
In [4]: wiper_blade_matrix(np.array([0, 0, 0, 1, 1, 2]))
Out[4]:
array([[0, 0, 0, 1, 1, 2],
[0, 0, 0, 1, 1, 2],
[0, 0, 0, 1, 1, 2],
[1, 1, 1, 1, 1, 2],
[1, 1, 1, 1, 1, 2],
[2, 2, 2, 2, 2, 2]])
In [5]: wiper_blade_matrix(np.array([0, 0, 0, 5, 1, 2]))
Out[5]:
array([[0, 0, 0, 5, 1, 2],
[0, 0, 0, 5, 1, 2],
[0, 0, 0, 5, 1, 2],
[5, 5, 5, 5, 1, 2],
[1, 1, 1, 1, 1, 2],
[2, 2, 2, 2, 2, 2]])
In [6]: wiper_blade_matrix(np.array([3, 0, 0, 2, 2, 1]))
Out[6]:
array([[3, 0, 0, 2, 2, 1],
[0, 0, 0, 2, 2, 1],
[0, 0, 0, 2, 2, 1],
[2, 2, 2, 2, 2, 1],
[2, 2, 2, 2, 2, 1],
[1, 1, 1, 1, 1, 1]])

Flood fill NumPy Array `numpy.ndarray`, i. e. assign new value to element and change neighboring elements of the same value, too

Another, similar post called Flood Fill in Python is a very general question on flood fill and the answer only contains a broad pseudo code example. I'm look for an explicit solution with numpy or scipy.
Let's take this array for example:
a = np.array([
[0, 1, 1, 1, 1, 0],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]
])
For selecting element 0, 0 and flood fill with value 3, I'd expect:
[
[3, 1, 1, 1, 1, 0],
[3, 3, 1, 2, 1, 1],
[3, 1, 1, 1, 1, 0]
]
For selecting element 0, 1 and flood fill with value 3, I'd expect:
[
[0, 3, 3, 3, 3, 0],
[0, 0, 3, 2, 3, 3],
[0, 3, 3, 3, 3, 0]
]
For selecting element 0, 5 and flood fill with value 3, I'd expect:
[
[0, 1, 1, 1, 1, 3],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]
]
This should be a fairly basic operation, no? Which numpy or scipy method am I overlooking?
Approach #1
Module scikit-image offers the built-in to do the same with skimage.segmentation.flood_fill -
from skimage.morphology import flood_fill
flood_fill(image, (x, y), newval)
Sample runs -
In [17]: a
Out[17]:
array([[0, 1, 1, 1, 1, 0],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]])
In [18]: flood_fill(a, (0, 0), 3)
Out[18]:
array([[3, 1, 1, 1, 1, 0],
[3, 3, 1, 2, 1, 1],
[3, 1, 1, 1, 1, 0]])
In [19]: flood_fill(a, (0, 1), 3)
Out[19]:
array([[0, 3, 3, 3, 3, 0],
[0, 0, 3, 2, 3, 3],
[0, 3, 3, 3, 3, 0]])
In [20]: flood_fill(a, (0, 5), 3)
Out[20]:
array([[0, 1, 1, 1, 1, 3],
[0, 0, 1, 2, 1, 1],
[0, 1, 1, 1, 1, 0]])
Approach #2
We can use skimage.measure.label with some array-masking -
from skimage.measure import label
def floodfill_by_xy(a,xy,newval):
x,y = xy
l = label(a==a[x,y])
a[l==l[x,y]] = newval
return a
To make use of SciPy based label function - scipy.ndimage.measurements.label, it would mostly be the same -
from scipy.ndimage.measurements import label
def floodfill_by_xy_scipy(a,xy,newval):
x,y = xy
l = label(a==a[x,y])[0]
a[l==l[x,y]] = newval
return a
Note : These would work as in-situ edits.

What does csr_matrix.sort_indices do?

I make a csr_matrix in the following way:
>>> A = sparse.csr_matrix([[0, 1, 0],
[1, 0, 1],
[0, 1, 0]])
>>> A[2,:] = np.array([-1, -2, -3])
>>> A.indptr
Out[12]: array([0, 1, 3, 6], dtype=int32)
>>> A.indices
Out[13]: array([1, 0, 2, 0, 2, 1], dtype=int32)
>>> A.data
Out[14]: array([ 1, 1, 1, -1, -3, -2], dtype=int64)
Now I want to interchange the last two elements in the indices and data arrays, so I try:
>>> A.sort_indices()
This does not do anything to my matrix however. The manual for this function only states that it sorts the indices.
What does this function do? In which condition can you see a difference?
How can I sort the indices and data arrays, such that for each row the indices are sorted?
As stated in the document, A.sort_indices() sorts the indices in-place. But there is a cache: if A.has_sorted_indices is True, it won't do anything (the cache was introduced at 0.7.0).
So, in order to see a difference, you need to manually set A.has_sorted_indices to False.
>>> A.has_sorted_indices, A.indices
(True, array([1, 0, 2, 0, 2, 1], dtype=int32))
>>> A.sort_indices()
>>> A.has_sorted_indices, A.indices
(True, array([1, 0, 2, 0, 2, 1], dtype=int32))
>>> A.has_sorted_indices = False
>>> A.sort_indices()
>>> A.has_sorted_indices, A.indices
(True, array([1, 0, 2, 0, 1, 2], dtype=int32))
Note that, unlike what OP has indicated, as of SciPy 0.19.0 running A[2, :] = [-1, -2, -3] no longer produces an out-of-order index (this should have been fixed in 0.14.0). On the other hand, this operation produces a warning:
SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
Anyway, we could easily produce out-of-order index in other ways, e.g. by matrix multiplication:
>>> B = scipy.sparse.csr_matrix([[0, 1, 0], [1, 0, 1], [0, 1, 0]])
>>> C = B*B
>>> C.has_sorted_indices, C.indices
(0, array([2, 0, 1, 2, 0], dtype=int32))
>>> C.sort_indices()
>>> C.has_sorted_indices, C.indices
(True, array([0, 2, 1, 0, 2], dtype=int32))

making an array of n columns where each successive row increases by one

In numpy, I would like to be able to input n for rows and m for columns and end with the array that looks like:
[(0,0,0,0),
(1,1,1,1),
(2,2,2,2)]
So that would be a 3x4. Each column is just a copy of the previous one and the row increases by one each time. As an example:
input would be 4, then 6 and the output would be and array
[(0,0,0,0,0,0),
(1,1,1,1,1,1),
(2,2,2,2,2,2),
(3,3,3,3,3,3)]
4 rows and 6 columns where the row increases by one each time. Thanks for your time.
So many possibilities...
In [51]: n = 4
In [52]: m = 6
In [53]: np.tile(np.arange(n), (m, 1)).T
Out[53]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [54]: np.repeat(np.arange(n).reshape(-1,1), m, axis=1)
Out[54]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [55]: np.outer(np.arange(n), np.ones(m, dtype=int))
Out[55]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Here's one more. The neat trick here is that the values are not duplicated--only memory for the single sequence [0, 1, 2, ..., n-1] is allocated.
In [67]: from numpy.lib.stride_tricks import as_strided
In [68]: seq = np.arange(n)
In [69]: rep = as_strided(seq, shape=(n,m), strides=(seq.strides[0],0))
In [70]: rep
Out[70]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Be careful with the as_strided function. If you don't get the arguments right, you can crash Python.
To see that seq has not been copied, change seq in place, and then check rep:
In [71]: seq[1] = 99
In [72]: rep
Out[72]:
array([[ 0, 0, 0, 0, 0, 0],
[99, 99, 99, 99, 99, 99],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3]])
import numpy as np
def foo(n, m):
return np.array([np.arange(n)] * m).T
Natively (no Python lists):
rows, columns = 4, 6
numpy.arange(rows).reshape(-1, 1).repeat(columns, axis=1)
#>>> array([[0, 0, 0, 0, 0, 0],
#>>> [1, 1, 1, 1, 1, 1],
#>>> [2, 2, 2, 2, 2, 2],
#>>> [3, 3, 3, 3, 3, 3]])
You can easily do this using built in python functions. The program counts to 3 converting each number to a string and repeats the string 6 times.
print [6*str(n) for n in range(0,4)]
Here is the output.
ks-MacBook-Pro:~ kyle$ pbpaste | python
['000000', '111111', '222222', '333333']
On more for fun
np.zeros((n, m), dtype=np.int) + np.arange(n, dtype=np.int)[:,None]
As has been mentioned, there are many ways to do this.
Here's what I'd do:
import numpy as np
def makearray(m, n):
A = np.empty((m,n))
A.T[:] = np.arange(m)
return A
Here's an amusing alternative that will work if you aren't going to be changing the contents of the array.
It should save some memory.
Be careful though because this doesn't allocate a full array, it will have multiple entries pointing to the same memory address.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def makearray(m, n):
A = np.arange(m)
return as_strided(A, strides=(A.strides[0],0), shape=(m,n))
In either case, as I have written them, a 3x4 array can be created by makearray(3, 4)
Using count from the built-in module itertools:
>>> from itertools import count
>>> rows = 4
>>> columns = 6
>>> cnt = count()
>>> [[cnt.next()]*columns for i in range(rows)]
[[0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2, 2], [3, 3, 3, 3, 3, 3]]
you can simply
>>> nc=5
>>> nr=4
>>> [[k]*nc for k in range(nr)]
[[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3]]
Several other possibilities using a (n,1) array
a = np.arange(n)[:,None] (or np.arange(n).reshape(-1,1))
a*np.ones((m),dtype=int)
a[:,np.zeros((m),dtype=int)]
If used with a (m,) array, just leave it (n,1), and let broadcasting expand it for you.

Categories