Expand numpy array of indices into a matix - python

I have a numpy array of N integers ranging from 0 to M inclusive. I wish to treat them as indexes into an NxM matrix that contains a 1 in every position indicated by the array and a 0 everywhere else. For example, if given N=4, M=2 I have the following array
[1, 0, 2, 1]
I want to get this matrix
[0 1 0]
[1 0 0]
[0 0 1]
[0 1 0]
i.e. the row 0 has a one in column 1, row 1 has a 1 in column 0, etc.
How do I make this transformation in numpy?

This requires multi-dimensional array indexing.
a = np.array([1, 0, 2, 1])
z = np.zeros(12, dtype=int).reshape(4,3)
z[np.arange(a.size), a] = 1

Related

All combinations of a numpy 2d array filled with 0s and 1s

Given K, I need to have all the possibile combinations of K x 2 numpy matrices so that in each matrix there are all 0s except for two 1s in different rows and columns.
Something like this for K = 5:
[[1,0],[0,1],[0,0],[0,0][0,0]]
[[1,0],[0,0],[0,1],[0,0][0,0]]
[[1,0],[0,0],[0,0],[0,1][0,0]]
[[1,0],[0,0],[0,0],[0,0][0,1]]
[[0,0],[1,0],[0,1],[0,0][0,0]]
[[0,0],[1,0],[0,0],[0,1][0,0]]
... and so on
So the resulting array should be a K x 2 x (K*(K-1)/2).
I want to avoid loops since it's not an efficient way when K is big enough (in my specific case K = 300)
I can't think of an elegant solution but here's a not-so-elegant pure numpy one:
import numpy as np
def combination_matrices(K):
# get combination indices
i, j = np.indices((K, K))
comb_indices = np.transpose((i < j).nonzero()) # (num_combs, 2) array where ones are
num_combs = comb_indices.shape[0] # K*(K-1)/2
# create a matrix of the desired shape, first axis enumerates combinations
matrices = np.zeros((num_combs, K, 2), dtype=int)
# broadcasting assignment of ones
comb_range, col_index = np.ogrid[:num_combs, :2]
matrices[comb_range, comb_indices, col_index] = 1
return matrices
This first uses the indices of a (K, K)-shaped array to find the index pairs for every combination (these are indices that encode the upper triangle of the array, excluding the diagonal). Then we use a bit tricky broadcasting assignment (heavy fancy indexing) to set each corresponding element of the pre-allocated output array to 1.
Note that I put the K*(K-1)/2-sized axis first, because this makes the most sense in numpy with C-contiguous memory layout. This way when you take the matrix for combination index 3, arr[3, ...] will be a contiguous chunk of memory of shape (K, 2) that's fast to work with in vectorised operations.
The output for K = 4:
[[[1 0]
[0 1]
[0 0]
[0 0]]
[[1 0]
[0 0]
[0 1]
[0 0]]
[[1 0]
[0 0]
[0 0]
[0 1]]
[[0 0]
[1 0]
[0 1]
[0 0]]
[[0 0]
[1 0]
[0 0]
[0 1]]
[[0 0]
[0 0]
[1 0]
[0 1]]]
This is an oddly specific question, but an interesting problem, I'd love to know what the context is?
You are looking for all permutations of a multiset , python's itertools doesn't currently support this. So simplest solution is to use the multiset tools of the sympy library.
The following code took about ~2.5 minutes to run on my machine, so is fairly fast for a single thread. You're looking at 179700 unique permutations for K=300.
(I took inspiration from https://stackoverflow.com/a/40289807/10739860)
from collections import Counter
from math import factorial, prod
import numpy as np
from sympy.utilities.iterables import multiset_permutations
from tqdm import tqdm
def No_multiset_permutations(multiset: list) -> int:
"""Calculates the No. possible permutations given a multiset.
See: https://en.wikipedia.org/wiki/Permutation#Permutations_of_multisets
:param multiset: List representing a multiset.
"""
value_counts = Counter(multiset).values()
denominator = prod([factorial(val) for val in value_counts])
return int(factorial(len(multiset)) / denominator)
def multiset_Kx2_permutations(K: int) -> np.ndarray:
"""This will generate all possible unique Kx2 permutations of an array
withsize K where two values are 1 and the rest are 0.
:param K: The size of the array.
"""
# Construct number multiset, e.g. K=5 gives [1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
numbers = [1, 1] + [0] * (K - 1) * 2
# Use sympy's multiset_permutations to get a multiset permutation generator
generator = multiset_permutations(numbers)
# Calculate the No. possible permutations
number_of_perms = No_multiset_permutations(numbers)
# Get all permutations, bonus progress bar is included :)
unique_perms = [next(generator) for _ in tqdm(range(number_of_perms))]
# Reshape each permutation to Kx2
unique_perms = np.array(unique_perms, dtype=np.int8)
return unique_perms.reshape(-1, K, 2)
if __name__ == "__main__":
solution = multiset_Kx2_permutations(300)
Another possibility (with rearranged axes for clearer output):
from itertools import combinations
import numpy as np
k = 4
x = list(combinations(range(k), 2))
out = np.zeros((n := len(x), k, 2), dtype=int)
out[np.c_[:n], x, [0, 1]] = 1
print(out)
It gives:
[[[1 0]
[0 1]
[0 0]
[0 0]]
[[1 0]
[0 0]
[0 1]
[0 0]]
[[1 0]
[0 0]
[0 0]
[0 1]]
[[0 0]
[1 0]
[0 1]
[0 0]]
[[0 0]
[1 0]
[0 0]
[0 1]]
[[0 0]
[0 0]
[1 0]
[0 1]]]

Numpy: get only elements on odd or even diagonal offsets in a matrix, change the rest to zeros

I have a 2D (square) matrix, for example it can be like this:
1 2 3
4 5 6
7 8 9
I want to get only the elements on the odd or even diagonal offsets of it, and let the rest be zeros. For example, with even diagonal offsets (%2 = 0), the resulting matrix is:
1 0 3
0 5 0
7 0 9
Explanation: main diagonal has offset 0, which is 1 5 9. The next diagonal offsets are 2, 6 and 4, 8 thus they are changed to zeros. Repeat the process until we reach the last diagonal.
And with odd diagonal index, the resulting matrix is:
0 2 0
4 0 6
0 8 0
I look at the np.diag(np.diag(x)) but it only returns the main diagonal and the rest are zeros. How can I extend it to odd/even offsets?
I can also use PyTorch.
I would do it following way using numpy
import numpy as np
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
masktile = np.array([[True,False],[False,True]])
mask = np.tile(masktile, (2,2)) # this must be at least as big as arr
arr0 = np.where(mask[:arr.shape[0],:arr.shape[1]], arr, 0)
arr1 = np.where(mask[:arr.shape[0],:arr.shape[1]], 0, arr)
print(arr0)
print(arr1)
output:
[[1 0 3]
[0 5 0]
[7 0 9]]
[[0 2 0]
[4 0 6]
[0 8 0]]
Explanation: I am creating mask which is array of Trues and Falses to use to decide if given element is to remain or should be replaced by 0. I create single tile which then I feed into np.tile to get "chessboard" of sufficient size, then I use part of apprioate size of it together with np.where to replace selected elements with 0.
You can use the offset argument of diagonal in PyTorch like so:
x = torch.arange(1,10).view(3,3)
for i in range(offset, x.shape[0], 2):
x.diagonal(i).fill_(0)
x.diagonal(-i).fill_(0)
# offset = 0 (Even)
tensor([[0, 2, 0],
[4, 0, 6],
[0, 8, 0]])
# offset = 1 (Odd)
tensor([[1, 0, 3],
[0, 5, 0],
[7, 0, 9]])
You can also use np.indices. If the row and column indexes % 2 are equal, you are on an "even" diagonal. If they are not equal you are on an "odd" diagonal.
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
rows, cols = np.indices(arr.shape)
Set the even indexes to 0
>>> arr[(rows%2)==(cols%2)] = 0
>>> arr
... array([[0, 2, 0],
[4, 0, 6],
[0, 8, 0]])
Set the odd indexes to 0
>>> arr[(rows%2)!=(cols%2)] = 0
>>> arr
... array([[1, 0, 3],
[0, 5, 0],
[7, 0, 9]])
np.diag has an offset parameter, for example:
import numpy as np
a = np.ones((3,3))
print(a - np.diag(np.diag(a, 1), 1))
[[1. 0. 1.]
[1. 1. 0.]
[1. 1. 1.]]

I want to compare two numpy arrays and create a third array

Like the title says, I want to compare two sitk arrays that have 1s and 0s as elements, and create a 3rd array that has 1s for where both arrays have 1 and 0s for any other cases. The arrays are the same size and are 3 dimensional, but is there a more efficient way to do this than iterating through them with nested for-loops?
import numpy as np
a = np.random.randint(low=0, high=2, size=(2,3,4), dtype=np.int)
print(a)
b = np.random.randint(low=0, high=2, size=(2,3,4), dtype=np.int)
print(b)
c = np.logical_and(a,b).astype(int)
print(c)
Is that what you're looking for?
arr_shape = (1,4,3)
a = np.random.randint(low=0,high=2, size=arr_shape)
print(a)
b = np.random.randint(low=0,high=2, size=arr_shape)
print(b)
# the new array. subtract a and b and get the absolute value.
# then invert to get the required array
d = (~abs(b - a).astype(bool)).astype(int)
print(d)
output:
[[[1 1 0]
[1 0 0]
[0 1 1]
[1 1 0]]]
[[[0 1 0]
[0 1 0]
[1 0 0]
[0 0 1]]]
array([[[0, 1, 1],
[0, 0, 1],
[0, 0, 0],
[0, 0, 0]]])
If you have SimpleITK Images, you can use the And function.
import SimpleITK as sitk
result = sitk.And(image1, image2)
This is a functional version of the AndImageFilter. You can read the docs for the class here:
https://simpleitk.org/doxygen/latest/html/classitk_1_1simple_1_1AndImageFilter.html

How to center the nonzero values within 2D numpy array?

I'd like to locate all the nonzero values within a 2D numpy array and move them so that the image is centered. I do not want to pad the array because I need to keep it the same shape. For example:
my_array = np.array([[1, 1, 0, 0], [0, 0, 2, 4], [0, 0, 0, 0], [0, 0, 0, 0]])
# center...
>>> [[0 0 0 0]
[0 1 1 0]
[0 2 4 0]
[0 0 0 0]]
But in reality the arrays I need to center are much larger (like 200x200, 403x403, etc, and they are all square). I think np.nonzero and np.roll might come in handy, but am not sure of the best way to use these for my large arrays.
The combination of nonzero and roll can be used for this purpose. For example, if k=0 in the loop shown below, then np.any will identify the rows that are not identically zero. The first and last such rows are noted, and the shift along the axis is computed so that after the shift, (first+last)/2 will move to the middle row of the array. Then the same is done for columns.
import numpy as np
my_array = np.array([[1, 1, 0, 0], [2, 4, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]])
print(my_array) # before
for k in range(2):
nonempty = np.nonzero(np.any(my_array, axis=1-k))[0]
first, last = nonempty.min(), nonempty.max()
shift = (my_array.shape[k] - first - last)//2
my_array = np.roll(my_array, shift, axis=k)
print(my_array) # after
Before:
[[1 1 0 0]
[2 4 0 0]
[0 0 0 0]
[0 0 0 0]]
After:
[[0 0 0 0]
[0 1 1 0]
[0 2 4 0]
[0 0 0 0]]
Alternative: np.count_nonzeros can be used in place of np.any, which allows to potentially set some threshold for the number of nonzero pixels that are deemed "enough" to qualify a row as a part of the image.

First row of numpy.ones is still populated after referencing another matrix

I have a matrix 'A' whose values are shown below. After creating a matrix 'B' of ones using numpy.ones and assigning the values from 'A' to 'B' by indexing 'i' rows and 'j' columns, the resulting 'B' matrix is retaining the first row of ones from the original 'B' matrix. I'm not sure why this is happening with the code provided below.
The resulting 'B' matrix from command line is shown below:
import numpy
import numpy as np
A = np.matrix([[8,8,8,7,7,6,8,2],
[8,8,7,7,7,6,6,7],
[1,8,8,7,7,6,6,6],
[1,1,8,7,7,6,7,7],
[1,1,1,1,8,7,7,6],
[1,1,2,1,8,7,7,6],
[2,2,2,1,1,8,7,7],
[2,1,2,1,1,8,8,7]])
B = np.ones((8,8),dtype=np.int)
for i in np.arange(1,9):
for j in np.arange(1,9):
B[i:j] = A[i:j]
C = np.zeros((6,6),dtype=np.int)
print C
D = np.matrix([[1,1,2,3,3,2,2,1],
[1,2,1,2,3,3,3,2],
[1,1,2,1,1,2,2,3],
[2,2,3,2,2,2,1,3],
[1,2,2,3,2,3,1,3],
[1,2,3,3,2,3,2,3],
[1,2,2,3,2,3,1,2],
[2,2,3,2,2,3,2,2]])
print D
for k in np.arange(2,8):
for l in np.arange(2,8):
B[k,l] # point in middle
b = B[(k-1),(l-1)]
if b == 8:
# Matrix C is smaller than Matrix B
C[(k-1),(l-1)] = C[(k-1),(l-1)] + 1*D[(k-1),(l-1)]
#Output for Matrix B
B=
[1,1,1,1,1,1,1,1],
[8,8,7,7,7,6,6,7],
[1,8,8,7,7,6,6,6],
[1,1,8,7,7,6,7,7],
[1,1,1,1,8,7,7,6],
[1,1,2,1,8,7,7,6],
[2,2,2,1,1,8,7,7],
[2,1,2,1,1,8,8,7]
Python starts counting at 0, so your code should work find if you replace np.arange(1,9) with np.arange(9)
In [11]: np.arange(1,9)
Out[11]: array([1, 2, 3, 4, 5, 6, 7, 8])
In [12]: np.arange(9)
Out[12]: array([0, 1, 2, 3, 4, 5, 6, 7, 8])
As stated above: python indices start at 0.
In order to iterate over some (say matrix) indices, you should use the builtin function 'range' and not 'numpy.arange'. The arange returns an ndarray, while range returns a generator in a recent python version.
The syntax 'B[i:j]' does not refer to the element at row i and column j in an array B. It rather means: all rows of B starting at row i and going up to (but not including) row j (if B has so many rows, otherwise it returns until includingly the last row). The element at position i, j is in fact 'B[i,j]'.
The indexing syntax of python / numpy is quite powerful and performant.
For one thing, as others have mentioned, NumPy uses 0-based indexing. But even once you fix that, this is not what you want to use:
for i in np.arange(9):
for j in np.arange(9):
B[i:j] = A[i:j]
The : indicates slicing, so i:j means "all items from the i-th, up to the j-th, excluding the last one." So your code is copying every row over several times, which is not a very efficient way of doing things.
You probable wanted to use ,:
for i in np.arange(8): # Notice the range only goes up to 8
for j in np.arange(8): # ditto
B[i, j] = A[i, j]
This will work, but is also pretty wasteful performancewise when using NumPy. A much faster approach is to simply ask for:
B[:] = A
Here first what I think you are trying to do, with minimal corrections, comments to your code:
import numpy as np
A = np.matrix([[8,8,8,7,7,6,8,2],
[8,8,7,7,7,6,6,7],
[1,8,8,7,7,6,6,6],
[1,1,8,7,7,6,7,7],
[1,1,1,1,8,7,7,6],
[1,1,2,1,8,7,7,6],
[2,2,2,1,1,8,7,7],
[2,1,2,1,1,8,8,7]])
B = np.ones((8,8),dtype=np.int)
for i in np.arange(1,9): # i= 1...8
for j in np.arange(1,9): # j= 1..8, but A[8,j] and A[j,8] do not exist,
# if you insist on 1-based indeces, numpy still expects 0... n-1,
# so you'll have to subtract 1 from each index to use them
B[i-1,j-1] = A[i-1,j-1]
C = np.zeros((6,6),dtype=np.int)
D = np.matrix([[1,1,2,3,3,2,2,1],
[1,2,1,2,3,3,3,2],
[1,1,2,1,1,2,2,3],
[2,2,3,2,2,2,1,3],
[1,2,2,3,2,3,1,3],
[1,2,3,3,2,3,2,3],
[1,2,2,3,2,3,1,2],
[2,2,3,2,2,3,2,2]])
for k in np.arange(2,8): # k = 2..7
for l in np.arange(2,8): # l = 2..7 ; matrix B has indeces 0..7, so if you want inner points, you'll need 1..6
b = B[k-1,l-1] # so this is correct, gives you the inner matrix
if b == 8: # here b is a value in the matrix , not the index, careful not to mix those
# Matrix C is smaller than Matrix B ; yes C has indeces from 0..5 for k and l
# so to address C you'll need to subtract 2 from the k,l that you defined in the for loop
C[k-2,l-2] = C[k-2,l-2] + 1*D[k-1,l-1]
print C
output:
[[2 0 0 0 0 0]
[1 2 0 0 0 0]
[0 3 0 0 0 0]
[0 0 0 2 0 0]
[0 0 0 2 0 0]
[0 0 0 0 3 0]]
But there are more elegant ways to do it. In particular look up slicing, ( numpy conditional array arithmetic, possibly scipy threshold.All of the below should be much faster than Python loops too (numpy loops are written in C).
B=np.copy(A) #if you need a copy of A, this is the way
# one quick way to make a matrix that's 1 whereever A==8, and is smaller
from scipy import stats
B1=stats.threshold(A, threshmin=8, threshmax=8, newval=0)/8 # make a matrix with ones where there is an 8
B1=B1[1:-1,1:-1]
print B1
#another quick way to make a matrix that's 1 whereever A==8
B2 = np.zeros((8,8),dtype=np.int)
B2[A==8]=1
B2=B2[1:-1,1:-1]
print B2
# the following would obviously work with either B1 or B2 (which are the same)
print np.multiply(B2,D[1:-1,1:-1])
Output:
[[1 0 0 0 0 0]
[1 1 0 0 0 0]
[0 1 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]]
[[1 0 0 0 0 0]
[1 1 0 0 0 0]
[0 1 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]]
[[2 0 0 0 0 0]
[1 2 0 0 0 0]
[0 3 0 0 0 0]
[0 0 0 2 0 0]
[0 0 0 2 0 0]
[0 0 0 0 3 0]]
A cleaner way, in my opinion, of writing the C loop is:
for k in range(1,7):
for l in range(1,7):
if B[k,l]==8:
C[k-1, l-1] += D[k,l]
That inner block of B (and D) can be selected with slices, B[1:7, 1:7] or B[1:-1, 1:-1].
A and D are defined as np.matrix. Since we aren't doing matrix multiplications here (no dot products), that can create problems. For example I was puzzled why
In [27]: (B[1:-1,1:-1]==8)*D[1:-1,1:-1]
Out[27]:
matrix([[2, 1, 2, 3, 3, 3],
[3, 3, 3, 4, 5, 5],
[1, 2, 1, 1, 2, 2],
[2, 2, 3, 2, 3, 1],
[2, 2, 3, 2, 3, 1],
[2, 3, 3, 2, 3, 2]])
What I expected (and matches the loop C) is:
In [28]: (B[1:-1,1:-1]==8)*D.A[1:-1,1:-1]
Out[28]:
array([[2, 0, 0, 0, 0, 0],
[1, 2, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0],
[0, 0, 0, 2, 0, 0],
[0, 0, 0, 2, 0, 0],
[0, 0, 0, 0, 3, 0]])
B = A.copy() still leaves B as matrix. B=A.A returns an np.ndarray. (as does np.copy(A))
D.A is the array equivalent of D. B[1:-1,1:-1]==8 is boolean, but when used in the multiplication context it is effectively 0s and 1s.
But if we want to stick with np.matrix then I'd suggest using the element by element multiply function:
In [46]: np.multiply((A[1:-1,1:-1]==8), D[1:-1,1:-1])
Out[46]:
matrix([[2, 0, 0, 0, 0, 0],
[1, 2, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0],
[0, 0, 0, 2, 0, 0],
[0, 0, 0, 2, 0, 0],
[0, 0, 0, 0, 3, 0]])
or just multiply the full matrixes, and select the inner block after:
In [47]: np.multiply((A==8), D)[1:-1, 1:-1]
Out[47]:
matrix([[2, 0, 0, 0, 0, 0],
[1, 2, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0],
[0, 0, 0, 2, 0, 0],
[0, 0, 0, 2, 0, 0],
[0, 0, 0, 0, 3, 0]])

Categories