Convert a list to a numpy mask array - python

Given list like indice = [1, 0, 2] and dimension m = 3, I want to get the mask array like this
>>> import numpy as np
>>> mask_array = np.array([ [1, 1, 0], [1, 0, 0], [1, 1, 1] ])
>>> mask_array
[[1, 1, 0],
[1, 0, 0],
[1, 1, 1]]
Given m = 3, so the axis=1 of mask_array is 3, the row of mask_array indicates the length of indice.
For converting the indice to mask_array, the rule is marking the item values whose index is less or equal to the each entry of inside to value 1. For example, indice[0]=1, so the output is [1, 1, 0], given dimension is 3.
In NumPy, are there any APIs which can be used to do this?

Sure, just use broadcasting with arange(m), make sure to use an np.array for the indices, not a list...
>>> indice = [1, 0, 2]
>>> m = 3
>>> np.arange(m) <= np.array(indice)[..., None]
array([[ True, True, False],
[ True, False, False],
[ True, True, True]])
Note, the [..., None] just reshapes the indices array so that the broadcasting works like we want, like this:
>>> indices = np.array(indice)
>>> indices
array([1, 0, 2])
>>> indices[...,None]
array([[1],
[0],
[2]])

Related

2d numpy mask not working as expected

I'm trying to turn a 2x3 numpy array into a 2x2 array by removing select indexes.
I think I can do this with a mask array with true/false values.
Given
[ 1, 2, 3],
[ 4, 1, 6]
I want to remove one element from each row to give me:
[ 2, 3],
[ 4, 6]
However this method isn't working quite like I would expect:
import numpy as np
in_array = np.array([
[ 1, 2, 3],
[ 4, 1, 6]
])
mask = np.array([
[False, True, True],
[True, False, True]
])
print in_array[mask]
Gives me:
[2 3 4 6]
Which is not what I want. Any ideas?
The only thing 'wrong' with that is it is the shape - 1d rather than 2. But what if your mask was
mask = np.array([
[False, True, False],
[True, False, True]
])
1 value in the first row, 2 in second. It couldn't return that as a 2d array, could it?
So the default behavior when masking like this is to return a 1d, or raveled result.
Boolean indexing like this is effectively a where indexing:
In [19]: np.where(mask)
Out[19]: (array([0, 0, 1, 1], dtype=int32), array([1, 2, 0, 2], dtype=int32))
In [20]: in_array[_]
Out[20]: array([2, 3, 4, 6])
It finds the elements of the mask which are true, and then selects the corresponding elements of the in_array.
Maybe the transpose of where is easier to visualize:
In [21]: np.argwhere(mask)
Out[21]:
array([[0, 1],
[0, 2],
[1, 0],
[1, 2]], dtype=int32)
and indexing iteratively:
In [23]: for ij in np.argwhere(mask):
...: print(in_array[tuple(ij)])
...:
2
3
4
6

How to reduce multiple arrays to one?

Ignoring my data structure and just given a list of multiple numpy arrays (all arrays have the exact same size/shape):
list[0] = [[0, 2, 0],
[1, 3, 0]]
list[1] = [[0, 0, 0],
[0, 0, 3]]
list[2] = [[5, 0, 0],
[0, 0, 0]]
I want to reduce this list of arrays to only one array. Zero means there is no value. For every entry the only given value should be taken. There are no overlapping values. There is always only one assigned value in one array for a given position.
result: [[5, 2, 0]
[1, 3, 3]]
In my case I have a dictionary with tuples as keys and arrays as values. The arrays are Boolean arrays. Every dict entry represents a special channel and at one specific position only one dict entry has the value True.
I now want to replace all True values by the dictionary keys and reduce this dictionary down to only one array.
For example (near my real data):
dict { (9, 2, 6): [[False, True],
[True, False]]
(1, 5, 8): [[True, False],
[False, True]] }
result: [[(1, 5, 8),(9, 2, 6)]
[(9, 2, 6),(1, 5, 8)]]
How could this be done with list comprehension, a numpy function and/or map & reduce?
First try:
At first I thought I could just turn my numpy arrays into 0 & 1 (.astype(np.float32)) and then just multiply these arrays with my key:
values_filled = [key_tuple * value_array for key_tuple, value_array in dict]
And then to just sum over all arrays:
final = reduce(lambda right, left: right + left, values_filled)
But this obviously doesn't work since my keys are tuples of values and not just numbers.
What I try to archive is to do the opposite of the following operation:
{color: np.all(mask, axis=-1) for (color, mask) in
((color, segmentation == color) for color in colors) if mask.max()}
With this operation I take a segmented image and create a dictionary with predefined colors. The numpy arrays have True at every position where the color is equal to the color in the image at that position / equal to the key of the dictionary.
Now I want to reduce this dictionary back to an image (there where changes on the dictionary).
The first part of your question just requires an array sum:
In [167]: alist = [[[0, 2, 0],
...: [1, 3, 0]],[[0, 0, 0],
...: [0, 0, 3]],[[5, 0, 0],
...: [0, 0, 0]]]
...:
In [168]: alist
Out[168]: [[[0, 2, 0], [1, 3, 0]], [[0, 0, 0], [0, 0, 3]], [[5, 0, 0], [0, 0, 0]]]
In [169]: np.array(alist).shape
Out[169]: (3, 2, 3)
In [170]: np.array(alist).sum(axis=0)
Out[170]:
array([[5, 2, 0],
[1, 3, 3]])
That takes advantage of the fact that 0 doesn't affect the sum, and there aren't any overlapping values.
You apparently have a second question involving a dictionary of boolean arrays or masks. Assuming that's related to the first question, then you just need a way of translating those masks into the list of arrays (or lists) as given in the first.
Starting with the dictionary, we'll need to iterate over the keys (or items). We can use the same summing. After a little experimentation I decided I,J=np.where(v) was the easiest way of mapping the boolean mask on to the target array:
In [200]: dd={ (9, 2, 6): [[False, True],
...: [True, False]],
...: (1, 5, 8): [[True, False],
...: [False, True]] }
...:
In [201]: arr = np.zeros((2,2,3),int)
In [202]: for k,v in dd.items():
...: I,J = np.where(v)
...: arr[I,J,:] += k
...:
In [203]: arr
Out[203]:
array([[[1, 5, 8],
[9, 2, 6]],
[[9, 2, 6],
[1, 5, 8]]])
For the last iteration:
In [204]: k
Out[204]: (1, 5, 8)
In [205]: v
Out[205]: [[True, False], [False, True]]
In [206]: I,J=np.where(v)
In [207]: I,J
Out[207]: (array([0, 1]), array([0, 1]))
In [208]: arr[I,J,:]
Out[208]:
array([[1, 5, 8],
[1, 5, 8]])
This can be done without anything fancy, just good old for loops and list comprehension, and enumerate. I'm sure there's a better one liner out there or a library that can cover it, but here's a vanilla Python solution :
d = { (9, 2, 6): [[False, True],[True, False]],(1, 5, 8): [[True, False],[False, True]] }
new_list = []
for k,v in d.items():
if new_list:
for i, each in enumerate(v):
x = [k if z else new_list[i][j] for j,z in enumerate(each)]
new_list[i] = x
else:
for each in v:
new_list.append([k if x else x for x in each])
print(new_list) # [[(1, 5, 8), (9, 2, 6)], [(9, 2, 6), (1, 5, 8)]]
P.S. also thank you for showing your effort.
Another numpy approach, in case you want it to be a normal numpy array afterwards:
import numpy as np
d = {(9, 2, 6): [[False, True],
[True, False]],
(1, 5, 8): [[True, False],
[False, True]]}
x = np.sum(np.reshape(k, (1,1,-1)) * np.array(v)[..., None] for k, v in d.items())
# x = np.sum(np.array(k)[None, None, :] * np.array(v)[..., None] for k, v in d.items()) # Alternative way
print(X)
# array([[[1, 5, 8],
# [9, 2, 6]],
# [[9, 2, 6],
# [1, 5, 8]]])
np.all(x == np.array([[(1, 5, 8),(9, 2, 6)], [(9, 2, 6),(1, 5, 8)]]))
# True
This basically uses the approach you outlined in your question, of multiplying the truth mask with the value. I just added the fact that the content of that value is another (third) dimension and used numpys broadcasting features to achieve this.
Here is a Numpythonic approach:
arr = np.dstack((lst1, lst2, lst3)) # Create columns along the first axis
mask = arr.astype(bool) # create a mask from places that are nonzero
mask2 = (~mask).all(axis=-1) # another mask that gives rows that are all zero
mask[mask2] = [True, False, False] # Truthify one optional item in all-zero rows
result = arr[mask].reshape(lst1.shape) # get the desire items and reshape
Demo:
In [135]: arr = np.dstack((lst1, lst2, lst3))
In [136]: arr
Out[136]:
array([[[0, 0, 5],
[2, 0, 0],
[0, 0, 0]],
[[1, 0, 0],
[3, 0, 0],
[0, 3, 0]]])
In [137]: mask = arr.astype(bool)
In [138]: mask2 = (~mask).all(axis=-1)
In [139]: mask[mask2] = [True, False, False]
In [140]: arr[mask].reshape(lst1.shape)
Out[140]:
array([[5, 2, 0],
[1, 3, 3]])

Use numpy.argwhere to obtain the matching values in an np.array

I'd like to use np.argwhere() to obtain the values in an np.array.
For example:
z = np.arange(9).reshape(3,3)
[[0 1 2]
[3 4 5]
[6 7 8]]
zi = np.argwhere(z % 3 == 0)
[[0 0]
[1 0]
[2 0]]
I want this array: [0, 3, 6] and did this:
t = [z[tuple(i)] for i in zi] # -> [0, 3, 6]
I assume there is an easier way.
Why not simply use masking here:
z[z % 3 == 0]
For your sample matrix, this will generate:
>>> z[z % 3 == 0]
array([0, 3, 6])
If you pass a matrix with the same dimensions with booleans as indices, you get an array with the elements of that matrix where the boolean matrix is True.
This will furthermore work more efficient, since you do the filtering at the numpy level (whereas list comprehension works at the Python interpreter level).
Source for argwhere
def argwhere(a):
"""
Find the indices of array elements that are non-zero, grouped by element.
...
"""
return transpose(nonzero(a))
np.where is the same as np.nonzero.
In [902]: z=np.arange(9).reshape(3,3)
In [903]: z%3==0
Out[903]:
array([[ True, False, False],
[ True, False, False],
[ True, False, False]], dtype=bool)
In [904]: np.nonzero(z%3==0)
Out[904]: (array([0, 1, 2], dtype=int32), array([0, 0, 0], dtype=int32))
In [905]: np.transpose(np.nonzero(z%3==0))
Out[905]:
array([[0, 0],
[1, 0],
[2, 0]], dtype=int32)
In [906]: z[[0,1,2], [0,0,0]]
Out[906]: array([0, 3, 6])
z[np.nonzero(z%3==0)] is equivalent to using I,J as indexing arrays:
In [907]: I,J =np.nonzero(z%3==0)
In [908]: I
Out[908]: array([0, 1, 2], dtype=int32)
In [909]: J
Out[909]: array([0, 0, 0], dtype=int32)
In [910]: z[I,J]
Out[910]: array([0, 3, 6])

How to get a value from every column in a Numpy matrix

I'd like to get the index of a value for every column in a matrix M. For example:
M = matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In pseudocode, I'd like to do something like this:
for col in M:
idx = numpy.where(M[col]==0) # Only for columns!
and have idx be 0, 4, 0 for each column.
I have tried to use where, but I don't understand the return value, which is a tuple of matrices.
The tuple of matrices is a collection of items suited for indexing. The output will have the shape of the indexing matrices (or arrays), and each item in the output will be selected from the original array using the first array as the index of the first dimension, the second as the index of the second dimension, and so on. In other words, this:
>>> numpy.where(M == 0)
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
>>> row, col = numpy.where(M == 0)
>>> M[row, col]
matrix([[0, 0, 0]])
>>> M[numpy.where(M == 0)] = 1000
>>> M
matrix([[1000, 1, 1000],
[ 4, 2, 4],
[ 3, 4, 1],
[ 1, 3, 2],
[ 2, 1000, 3]])
The sequence may be what's confusing you. It proceeds in flattened order -- so M[0,2] appears second, not third. If you need to reorder them, you could do this:
>>> row[0,col.argsort()]
matrix([[0, 4, 0]])
You also might be better off using arrays instead of matrices. That way you can manipulate the shape of the arrays, which is often useful! Also note ajcr's transpose-based trick, which is probably preferable to using argsort.
Finally, there is also a nonzero method that does the same thing as where in this case. Using the transpose trick now:
>>> (M == 0).T.nonzero()
(matrix([[0, 1, 2]]), matrix([[0, 4, 0]]))
As an alternative to np.where, you could perhaps use np.argwhere to return an array of indexes where the array meets the condition:
>>> np.argwhere(M == 0)
array([[[0, 0]],
[[0, 2]],
[[4, 1]]])
This tells you each the indexes in the format [row, column] where the condition was met.
If you'd prefer the format of this output array to be grouped by column rather than row, (that is, [column, row]), just use the method on the transpose of the array:
>>> np.argwhere(M.T == 0).squeeze()
array([[0, 0],
[1, 4],
[2, 0]])
I also used np.squeeze here to get rid of axis 1, so that we are left with a 2D array. The sequence you want is the second column, i.e. np.argwhere(M.T == 0).squeeze()[:, 1].
The result of where(M == 0) would look something like this
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]])) First matrix tells you the rows where 0s are and second matrix tells you the columns where 0s are.
Out[4]:
matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In [5]: np.where(M == 0)
Out[5]: (matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
In [6]: M[0,0]
Out[6]: 0
In [7]: M[0,2] #0th row 2nd column
Out[7]: 0
In [8]: M[4,1] #4th row 1st column
Out[8]: 0
This isn't anything new on what's been already suggested, but a one-line solution is:
>>> np.where(np.array(M.T)==0)[-1]
array([0, 4, 0])
(I agree that NumPy matrix objects are more trouble than they're worth).
>>> M = np.array([[0, 1, 0],
... [4, 2, 4],
... [3, 4, 1],
... [1, 3, 2],
... [2, 0, 3]])
>>> [np.where(M[:,i]==0)[0][0] for i in range(M.shape[1])]
[0, 4, 0]

Instantiate a matrix with x zeros and the rest ones

I would like to be able to quickly instantiate a matrix where the first few (variable number of) cells in a row are 0, and the rest are ones.
Imagine we want a 3x4 matrix.
I have instantiated the matrix first as all ones:
ones = np.ones([4,3])
Then imagine we have an array that announces how many leading zeros there are:
arr = np.array([2,1,3,0]) # first row has 2 zeroes, second row 1 zero, etc
Required result:
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[1, 1, 1]])
Obviously this can be done in the opposite way as well, but I'd consider the approach where 1 is a default value, and zeros would be replaced.
What would be the best way to avoid some silly loop?
Here's one way. n is the number of columns in the result. The number of rows is determined by len(arr).
In [29]: n = 5
In [30]: arr = np.array([1, 2, 3, 0, 3])
In [31]: (np.arange(n) >= arr[:, np.newaxis]).astype(int)
Out[31]:
array([[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
There are two parts to the explanation of how this works. First, how to create a row with m zeros and n-m ones? For that, we use np.arange to create a row with values [0, 1, ..., n-1]`:
In [35]: n
Out[35]: 5
In [36]: np.arange(n)
Out[36]: array([0, 1, 2, 3, 4])
Next, compare that array to m:
In [37]: m = 2
In [38]: np.arange(n) >= m
Out[38]: array([False, False, True, True, True], dtype=bool)
That gives an array of boolean values; the first m values are False and the rest are True. By casting those values to integers, we get an array of 0s and 1s:
In [39]: (np.arange(n) >= m).astype(int)
Out[39]: array([0, 0, 1, 1, 1])
To perform this over an array of m values (your arr), we use broadcasting; this is the second key idea of the explanation.
Note what arr[:, np.newaxis] gives:
In [40]: arr
Out[40]: array([1, 2, 3, 0, 3])
In [41]: arr[:, np.newaxis]
Out[41]:
array([[1],
[2],
[3],
[0],
[3]])
That is, arr[:, np.newaxis] reshapes arr into a 2-d array with shape (5, 1). (arr.reshape(-1, 1) could have been used instead.) Now when we compare this to np.arange(n) (a 1-d array with length n), broadcasting kicks in:
In [42]: np.arange(n) >= arr[:, np.newaxis]
Out[42]:
array([[False, True, True, True, True],
[False, False, True, True, True],
[False, False, False, True, True],
[ True, True, True, True, True],
[False, False, False, True, True]], dtype=bool)
As #RogerFan points out in his comment, this is basically an outer product of the arguments, using the >= operation.
A final cast to type int gives the desired result:
In [43]: (np.arange(n) >= arr[:, np.newaxis]).astype(int)
Out[43]:
array([[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
Not as concise as I wanted (I was experimenting with mask_indices), but this will also do the work:
>>> n = 3
>>> zeros = [2, 1, 3, 0]
>>> numpy.array([[0] * zeros[i] + [1]*(n - zeros[i]) for i in range(len(zeros))])
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[1, 1, 1]])
>>>
Works very simple: concatenates multiplied required number of times, one-element lists [0] and [1], creating the array row by row.

Categories