Finding median of masked ndarrays representing images - python

I have 5 grayscale images in the form of 288x288 ndarrays. The values in each ndarray are just numpy.float32 numbers ranging from 0.0 to 255.0. For each ndarray, I've created a numpy.ma.MaskedArray object as follows:
def bool_row(row):
return [value == 183. for value in row]
mask = [bool_row(row) for row in nd_array_1]
masked_array_1 = ma.masked_array(nd_array_1, mask=mask)
The value 183. represents "garbage" in the image. All 5 images have a bit of "garbage" in them. I want to take the median of the masked images, where taking the median for each point should ignore any masked values. The result would be the correct image with no garbage.
When I try:
ma.median([masked_array_1, masked_array_2, masked_array_3, masked_array_4, masked_array_5], axis=0)
I get what seems to be the median except instead of ignoring masked values, it treats them as 183., so the result just has the superimposed garbage from all the pictures. When I just take the median of two masked images:
ma.median([masked_array_1, masked_array_2], axis=0)
It looks like it started to do the right thing, but then placed the value of 183. even where both masked arrays contain a MaskedConstant.
I could do something like the following, but I feel there's probably a way to make ma.median just behave as expected:
unmasked_array_12 = ma.median([masked_array_1, masked_array_2], axis=0)
mask = [bool_row(row) for row in unmasked_array_12]
masked_array_12 = ma.masked_array(unmasked_array_12, mask=mask)
unmasked_array_123 = ma.median([masked_array_12, masked_array_3], axis=0)
mask = [bool_row(row) for row in unmasked_array_123]
masked_array_123 = ma.masked_array(unmasked_array_123, mask=mask)
...
How do I make ma.median work as expected without resorting to the above unpleasantness?

I suspect the problem is in how ma.median handles a non-array argument. It might be converting a list to a plain numpy array, without checking the types of the elements of the list.
Consider the following example with 1-D arrays:
In [64]: a = ma.array([1, 2, -10, 3, -10, -10], mask=[0,0,1,0,1,1])
In [65]: b = ma.array([1, 2, -10, -10, 4, -10], mask=[0,0,1,1,0,1])
In [66]: a
Out[66]:
masked_array(data = [1 2 -- 3 -- --],
mask = [False False True False True True],
fill_value = 999999)
In [67]: b
Out[67]:
masked_array(data = [1 2 -- -- 4 --],
mask = [False False True True False True],
fill_value = 999999)
The following are not correct--it appears to ignore the masks:
In [68]: ma.median([a, b])
Out[68]: -4.5
In [69]: ma.median([a, b], axis=0)
Out[69]:
masked_array(data = [ 1. 2. -10. -3.5 -3. -10. ],
mask = False,
fill_value = 1e+20)
However, if I first create a new masked array using ma.array, ma.median handles it correctly:
In [70]: c = ma.array([a, b])
In [71]: c
Out[71]:
masked_array(data =
[[1 2 -- 3 -- --]
[1 2 -- -- 4 --]],
mask =
[[False False True False True True]
[False False True True False True]],
fill_value = 999999)
In [72]: ma.median(c)
Out[72]: 2.0
In [73]: ma.median(c, axis=0)
Out[73]:
masked_array(data = [1.0 2.0 -- 3.0 4.0 --],
mask = [False False True False False True],
fill_value = 1e+20)
So to fix your problem, it might be as simple as replacing this:
ma.median([masked_array_1, masked_array_2, masked_array_3, masked_array_4, masked_array_5], axis=0)
with this:
stacked = ma.array([masked_array_1, masked_array_2, masked_array_3, masked_array_4, masked_array_5])
ma.median(stacked, axis=0)

you can use the following to get rid of all of the 183 values just while calculating the median:
masked_arrays=[masked_array_1, masked_array_2, masked_array_3]
no_junk_arrays=[[x for x in masked_array if x is not 183] for masked_array in masked_arrays]
ma.median(no_junk_arrays)
For example
>>> masked_array_1 = [1,183,4]
>>> masked_array_2 = [1,183,2]
>>> masked_array_3 = [2,183,2]
>>> masked_arrays=[masked_array_1,masked_array_2,masked_array_3]
>>> no_junk_arrays=[[x for x in masked_array if x is not 183] for masked_array in masked_arrays]
>>> no_junk_arrays
[[1, 4], [1, 2], [2, 2]]

I'm sure it can be done if you find the clever sequence of numpy functions to invoke. But it can also be done naively:
def merge(a1, a2):
result = []
for x, y in zip(a1, a2):
if x == 183:
x = y
result.append(x)
return result
array_1 = [1, 183, 2]
array_2 = [1, 183, 183]
array_3 = [183, 4, 2]
print merge(merge(array_1, array_2), array_3)
If the result runs really too slowly, you can try the same code on PyPy instead of CPython.

If what you are after is fetching the non-nan value for every pixel, you could do someting along the lines of:
stacked_imgs = np.dstack((img1, img2, img3))
mask = stacked_imgs == 183
# Find the first False, i.e. non-183 entry, along stack axis
index = np.argmin(mask, axis=-1)
correct_image = stacked_image[..., index]
If all non-183 entries for a given pixel are always the same, this will give you the result you are after.

Related

Using np.where for nested lists

I am trying to use np.where() function with nested lists.
I would like to find an index with a given condition of the first layer of the nested list.
For example, if I have the following code
arr = [[1,1], [2,2],[3,3]]
a = np.where(arr == [2,2])
then ideally I would like code to return 'a' as 1.
Since [2,2] is in index 1 of the nested list.
However, I am just getting a empty array back as a result.
Of course, I can make it work easily by implementing external for loop such as
for n in range(len(arr)):
if arr[n] == [2,2]:
a = n
but I would like to implement this simply within the function np.where(write the entire code here).
Is there a way to do this?
Well you can write your own function to do so:
You'll need to
Find every line equal to what you looking for
Get indices of found rows (You can use where):
numpy compression
You can use compression operator to see if each line satisfies the condition. Such as:
np_arr = np.array(
[1, 2, 3, 4, 5]
)
print(np_arr < 3)
This will return a boolean where every element is True or False where the condition is satisfied:
[ True True False False False]
For a 2D array you'll get a 2D boolean array:
to_find = np.array([2, 2])
np_arr = np.array(
[
[1, 1],
[2, 2],
[3, 3],
[2, 2]
]
)
print(np_arr == to_find)
The result is:
[[False False]
[ True True]
[False False]
[ True True]]
Now we are looking for lines with all True values. So we can use all method of ndarray. And we will provide the axis we are looking to look to all. X, Y or Both. We want to look to x axis:
to_find = np.array([2, 2])
np_arr = np.array(
[
[1, 1],
[2, 2],
[3, 3],
[2, 2]
]
)
print((np_arr == to_find).all(axis=1))
The result is:
[False True False True]
Get indices of Trues
At the end you are looking for indices where the values are True:
np.where((np_arr == to_find).all(axis=1))
The result would be:
(array([1, 3]),)
The best solution is that mentioned by #Michael Szczesny, but using np.where you can do this too:
a = np.where(np.array(arr) == [2, 2])[0]
resulted_ind = np.where(np.bincount(a) == 2)[0] # --> [1]
numpy runs in Python, so you can use both the basic Python lists and numpy arrays (which are more like MATLAB matrices)
A list of lists:
In [43]: alist = [[1,1], [2,2],[3,3]]
A list has an index method, which tests against each element of the list (elements here are 2 element lists):
In [44]: alist.index([2,2])
Out[44]: 1
In [45]: alist.index([2,3])
Traceback (most recent call last):
Input In [45] in <cell line: 1>
alist.index([2,3])
ValueError: [2, 3] is not in list
alist==[2,2] returns False, because the list is not the same as the [2,2] list.
If we make an array from that list:
In [46]: arr = np.array(alist)
In [47]: arr
Out[47]:
array([[1, 1],
[2, 2],
[3, 3]])
we can do an == test - but it compares numeric elements.
In [48]: arr == np.array([2,2])
Out[48]:
array([[False, False],
[ True, True],
[False, False]])
Underlying this comparison is the concept of broadcasting, allow it to compare a (3,2) array with a (2,) (a 2d with a 1d). Here's its trivial, but it can be much more complicated.
To find rows where all values are True, use:
In [50]: (arr == np.array([2,2])).all(axis=1)
Out[50]: array([False, True, False])
and where finds the True in that array (the result is a tuple with 1 array):
In [51]: np.where(_)
Out[51]: (array([1]),)
In Octave the equivalent is:
>> arr = [[1,1];[2,2];[3,3]]
arr =
1 1
2 2
3 3
>> all(arr == [2,2],2)
ans =
0
1
0
>> find(all(arr == [2,2],2))
ans = 2

Create a mask from labels to compute loss with numpy

I'm having troubles creating a mask without using a for loop.
I've got a numpy array of size N with my labels and I want to create a mask of size NxN where mask[i, j] = True if and only if y[i] == y[j].
I've managed to do so by using a for loop :
mask = np.asarray([np.where(y==y[k], 1, 0) for k in range(len(y))])
But I'm working on a GPU and this greatly increase the compute time. How can I do it without looping?
This might get you started:
n = 3
a = np.arange(n)
np.equal.outer(a, a)
# this is the same as
a[:,None] == a
Output:
array([[ True, False, False],
[False, True, False],
[False, False, True]])
This is basically comparing the elements from a cartesian product. a[0] == a[1], a[1] == a[1], a[1] == a[2] and so forth, which is why the diagonal values are True when using np.arange.
You can use np.repeat and .T
a and b are just arbitrary data - the labels in your case.
import numpy as np
size = 4
a = np.arange(size)[:, None]
b = a.T
b[0, 2] = 1
c = np.repeat(a.T, repeats=size, axis=0)
d = np.repeat(b, repeats=size, axis=0).T
print(c)
print(d)
e = np.equal(c, d)
print(e)
out:
[[0 1 1 3]
[0 1 1 3]
[0 1 1 3]
[0 1 1 3]]
[[0 0 0 0]
[1 1 1 1]
[1 1 1 1]
[3 3 3 3]]
[[ True False False False]
[False True True False]
[False True True False]
[False False False True]]
For problems like these, np.indices is your friend:
dims = (len(y), len(y))
inds = np.indices(dims)
mask = np.empty(dims, dtype=bool)
mask[inds] = y[inds[0]] == y[inds[1]]
edit:
Kevin's more specific solution is more concise and almost certainly faster than this method.

Numpy matrix comparison to several criteria

I'm working on comparing values in a numpy matrix.
Initially I wanted to check if any of the values in the matrix m were smaller than X, so I used:
(m<(X)).any()
Which worked fine, but now I would like it to ignore all 0 values in the matrix, so in essence to tell me if any values in the matrix m are in that range 0 < m < X.
I've figured a way to do this by going into a while look put was hoping that there might be a similar function to that above that could do the trick?
Many Thanks
Much like here, you can do
np.where(np.logical_and(0<a,a<6))
And it will give you two arrays, which tell you the locations in your matrix.
(array([0, 0, 1, 1, 1], dtype=int32),
array([1, 2, 0, 1, 2], dtype=int32))
Unlike the above, you have an n-dimensional array, and the output of that may not be as useful as using a masked array
b=np.ma.masked_where(np.logical_or(a<=0,a>=6),a)
b
Out[40]:
masked_array(data =
[[-- 1 2]
[3 4 5]
[-- -- --]],
mask =
[[ True False False]
[False False False]
[ True True True]],
fill_value = 999999)
Since that can give you a more useful array that preserves location.

numpy array slicing to avoid for loop

I am using numpy to do some calculations. In the following code:
assert(len(A.shape) == 2) # A is a 2D nparray
d1, d2 = A.shape
# want to initial G,which has the same dimension as A. And assign the last column of A to the last column of G
# initial with value 0
G = zero_likes(A)
# assign the last column to that of G
G[:, d2-1] = A[:, d2-1]
# the columns[0,dw-1] of G is the average of columns [0, dw-1] of A, based on the condition of B
for iW in range(d2-1):
n = 0
sum = 0.0
for i in range(d1):
if B[i, 0] != iW and B[i, 1] == 0:
sum += A[i, iW]
n += 1
for i in range(d1):
if B[i, 0] != iW and B[i, 1] == 0:
G[i, iW] = sum / (1.0 * n)
return G
Is there an easier way using "slicing" or "boolean array"?
Thanks!
In case you want G to have the same dimensionality as A and then change the appropriate elements of G, the following code should work:
# create G as a copy of A, otherwise you might change A by changing G
G = A.copy()
# getting the mask for all columns except the last one
m = (B[:,0][:,None] != np.arange(d2-1)[None,:]) & (B[:,1]==0)[:,None]
# getting a matrix with those elements of A which fulfills the conditions
C = np.where(m,A[:,:d2-1],0).astype(np.float)
# get the 'modified' average you use
avg = np.sum(C,axis=0)/np.sum(m.astype(np.int),axis=0)
# change the appropriate elements in all the columns except the last one
G[:,:-1] = np.where(m,avg,A[:,:d2-1])
After fiddling a long time and finding bugs ... I ended up with this code. I checked it against several random matrices A and specific choices of B
A = numpy.random.randint(100,size=(5,10))
B = np.column_stack(([4,2,1,3,4],np.zeros(5)))
and so far your and my result were in agreement.
Here's a start, focusing on the first inner loop:
In [35]: A=np.arange(12).reshape(3,4)
In [36]: B=np.array([[0,0],[1,0],[2,0]])
In [37]: sum=0
In [38]: for i in range(3):
if B[i,0]!=iW and B[i,1]==0:
sum += A[i,iW]
print(i,A[i,iW])
....:
1 4
2 8
In [39]: A[(B[:,0]!=iW)&(B[:,1]==0),iW].sum()
Out[39]: 12
I had to provide my own sample data to test this.
The 2nd loop has the same condition (B[:,0]!=iW)&(B[:,1]==0), and should work in the same way.
As one of the comments said, the dimensions of G look funny. To make things work with my sample, lets make zeros array. It looks like you are assigning to selected elements of G, the mean of a subset of A (sum/n)
In [52]: G=np.zeros_like(A)
In [53]: G[I,iW]=A[I,iW].mean()
Assuming n, the number of terms summed for each iW varies, it may be difficult to compress the outer loop into a vectorized step. If n was the same, you could pull out subset of A that matches the condition, e.g, A1, take the mean on one axis, an assign the values to G. With different numbers of terms in the sums, you still have to loop.
It just occurred to me that masked arrays might work. Mask off the terms of A that don't meet the condition, and then take the mean.
In [91]: I=(B[:,[0]]!=np.arange(4))&(B[:,[1]]==0)
In [92]: I
Out[92]:
array([[False, True, True, True],
[ True, False, True, True],
[ True, True, False, True]], dtype=bool)
In [93]: A1=np.ma.masked_array(A, ~I)
In [94]: A1
Out[94]:
masked_array(data =
[[-- 1 2 3]
[4 -- 6 7]
[8 9 -- 11]],
mask =
[[ True False False False]
[False True False False]
[False False True False]],
fill_value = 999999)
In [95]: A1.mean(0)
Out[95]:
masked_array(data = [6.0 5.0 4.0 7.0],
mask = [False False False False],
fill_value = 1e+20)
Or with plonser's where:
In [111]: np.where(I,A,0).sum(0)/I.sum(0)
Out[111]: array([ 6., 5., 4., 7.])

Optimised call to retrieve indices of the masked elements of a masked array?

I have a masked array:
a = np.arange(7)
a = np.ma.masked_greater(a,4)
a then contains
masked_array(data = [0 1 2 3 4 -- --],
mask = [False False False False False True True],
fill_value = 999999)
What I'm looking for now is an efficient way to retrieve an array that lists the index of each masked element, i.e.
res = [5, 6]
without looping through the mask like so:
res = []
for idx, data in enumerate(np.ma.getmaskarray(a)):
if data:
res.append(idx)
>>> a
masked_array(data = [0 1 2 3 4 -- --],
mask = [False False False False False True True],
fill_value = 999999)
>>> np.where(np.ma.getmaskarray(a))
(array([5, 6]),)

Categories