Basic NumPy data comparison - python

I have an array of N-dimensional values arranged in a 2D array. Something like:
import numpy as np
data = np.array([[[1,2],[3,4]],[[5,6],[1,2]]])
I also have a single value x that I want to compare against each data point, and I want to get a 2D array of boolean values showing whether my data is equal to x.
x = np.array([1,2])
If I do:
data == x
I get
# array([[[ True, True],
# [False, False]],
#
# [[False, False],
# [ True, True]]], dtype=bool)
I could easily combine these to get the result I want. However, I don't want to iterate over each of these slices, especially when data.shape[2] is larger. What I am looking for is a direct way of getting:
array([[ True, False],
[False, True]])
Any ideas for this seemingly easy task?

Well, (data == x).all(axis=-1) gives you what you want. It's still constructing a 3-d array of results and iterating over it, but at least that iteration isn't at Python-level, so it should be reasonably fast.

Related

Numpy: chained boolean indexing not properly updating boolean array without using np.where

I'm writing a simple root finder in NumPy that is designed to operate on entire NumPy arrays simultaneously. The basic idea of the solver is that the array not_converged is the size of all the data points while newly_converged is a different array that is the size of the subset of not_coverged that is True` in that iteration.
I'd like to update the values in the overall not_converged array to be False when entries in the newly_converged array become True, but doing so using Boolean indexing doesn't seem to work as I'd expect:
not_converged = np.array([True, False, True])
# A previously not converged point has now converged
newly_converged = np.array([True, False])
# Now update the converged state
not_converged[not_converged][newly_converged] = False
print(not_converged) # should be [False False True] but the array hasn't been updated
I can get this to work if I use np.where though:
not_converged = np.array([True, False, True])
# A previously not converged point has now converged
newly_converged = np.array([True, False])
# Now update the converged state
not_converged[not_converged] = np.where(newly_converged, False, True)
print(not_converged) # Correctly updates the state
But I would have thought that the two methods above would yield identical results. Any idea why only the second method works?
The np.where() is equivalent to the following indexing.
Note that not_converged[not_converged][newly_converged] is not a view into not_converged, it's a copy, so nothing should change.
not_converged = np.array([True, False, True])
# A previously not converged point has now converged
newly_converged = np.array([True, False])
# Now update the converged state
not_converged[not_converged] = np.logical_not(newly_converged)
print(not_converged) # should be [False False True] but the array hasn't been updated
This question is similar to one question asked before, as hpaulj comments (1, 2) well explained the reason, you can do this as:
sample = not_converged[not_converged]
sample[newly_converged] = False
not_converged[not_converged] = sample
or in simple way use ~ to invert newly_converged instead np.logical_not:
not_converged[not_converged] = ~newly_converged

Subtracting image elipses from each other using numpy masks

I am trying to calculate the area within a specific region by utilizing masks in shape of elipses, and taking the mean of the values inside the mask. Like this:
This is an eye image that is originally:
What i want to do is calculate the area of the sclera and iris separately. The way i want to do it is by generating masks, one just for the iris a second for the entire eye and a third to subtract the iris mask from the entire eye mask to obtain the sclera mask, where the first mask is just the iris region, the second area is the entire eye and the third will be a subtraction, which is exactly i am tring to do. Through subtraction achieve the area of the sclera separately.
The problem is that my mask function returns me boolean values, this what i was trying to do:
from PIL import Image
img = Image.open(r'imgpath')
h_1 = 16
k_1 = 31
a_1 = 7
b_1 = 17
#elipse function
def _in_ellipse(x, y, h, k, a, b):
z = ((x-h)**2)/a**2 + ((y-k)**2)/b**2
if z < 1:
return True
else:
return False
in_ellipse = np.vectorize(_in_ellipse)
img = np.asarray(img)
mask = in_ellipse(*np.indices(img.shape), h_1,k_1,a_1,b_1)
#Visualize the mask size
plt.imshow(mask)
plt.show()
#See if its inside the boundaries
plt.imshow(np.where(mask, img, np.nan))
plt.show()
mask_mean = np.nanmean((np.where(mask, img, np.nan)))
What i am trying to do is before calculating the mean values, i want to grab the mean value of the sclera alone, an attempt was through subtraction of the two areas, but elipse function does not return pixel values as expected, it returns boolean values:
mask:
array([[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]])
From what I understand, to generate the sclera_mask from eye_mask and iris_mask, both of which are of boolean type, your idea of subtraction translates to a logical, element-wise XOR operation on the two masks:
sclera_mask = np.logical_xor(eye_mask, iris_mask)
More on this in the docs: numpy.logical_xor

Logical comparison of a symmetric matrix and its transpose

I was trying to test numerically if the multiplication of a matrix and its transpose really generates a symmetric square matrix.
Below is the code I used:
mat = np.array([[1,2,3],[1,0,1],[1,1,1],[2,3,5]])
mat2 = np.linalg.inv(np.matmul(np.transpose(mat),mat))
mat2
array([[ 1.42857143, 0.42857143, -0.85714286],
[ 0.42857143, 1.92857143, -1.35714286],
[-0.85714286, -1.35714286, 1.21428571]])
mat2 appears to be symmetric.
However, the result of the below code made me confused:
np.transpose(mat2) == mat2
array([[ True, False, False],
[False, True, False],
[False, False, True]])
But when I did the same procedure with mat, the result was as I expected:
np.transpose(np.matmul(np.transpose(mat),mat)) == np.matmul(np.transpose(mat),mat)
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])
Is this related a computational issue? If then, how can I show that the off-diagonal elements are identical?
Comparing mat and mat.T, you're comparing integers to integers and there's no problem.
mat2 is floating point, which is prone to subtle errors. When you print out mat2, you're seeing the truncated version of the full digits. Look at the difference between mat2 and mat2.T:
>>> mat2 - mat2.T
array([[ 0.00000000e+00, 1.11022302e-16, -1.11022302e-16],
[-1.11022302e-16, 0.00000000e+00, 2.22044605e-16],
[ 1.11022302e-16, -2.22044605e-16, 0.00000000e+00]])
The differences are on the order of 0.0000000000000001, meaning that they're equal "for all intents an purposes" but not equal exactly. There are two places to go from here. You can either accept that numerical precision is limited and use something like numpy.allclose for your equality tests, which allows for some small errors:
>>> np.allclose(mat2, mat2.T)
True
Or, if you really insist on your matrices being symmetric, you can enforce it with something like this:
>>> mat3 = (mat2 + mat2.T)/2
>>> mat3 == mat3.T
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])

Scipy NNLS using mask

I am performing non-negative least squares using scipy. A trivial example would be as follows:
import numpy as np
from scipy.optimize import nnls
A = np.array([[60, 70, 120, 60],[60, 90, 120, 70]], dtype='float32')
b = np.array([6, 5])
x, res = nnls(A, b)
Now, I have a situation where some entries in A or b can be missing (np.NaN). Something like,
A_2 = A.copy()
A_2[0,2] = np.NaN
Ofcourse, running NNLS on A_2, b will not work as scipy does not expect an inf or nan.
How can we perform NNLS masking out the missing entry from the computation. Effectively, this should translate to
Minimize |(A_2.x- b)[mask]|
where mask can be defined as:
mask = ~np.isnan(A_2)
In general, entries can be missing from both A and b.
Possibly helpful:
[1] How to include constraint to Scipy NNLS function solution so that it sums to 1
I think you can compute the mask first (determine which points you want included) and then perform NNLS. Given the mask
In []: mask
Out[]:
array([[ True, True, False, True],
[ True, True, True, True]], dtype=bool)
you can verify whether to include a point by checking if all values in a column are True using np.all along the first axis.
In []: np.all(mask, axis=0)
Out[]: array([ True, True, False, True], dtype=bool)
This can then be used as a column mask for A.
In []: nnls(A_2[:,np.all(mask, axis=0)], b)
Out[]: (array([ 0.09166667, 0. , 0. ]), 0.7071067811865482)
The same idea can be used for b to construct a row mask.

is Numpy's masked array memory efficient?

I was wondering: are numpy's masked arrays able to store a compact representation of the available values? In other words, if I have a numpy array with no set values, will it be stored in memory with negligible size?
Actually, this is not just a casual question, but I need such memory optimization for an application I am developing.
No a masked array is not more compact.
In [344]: m = np.ma.masked_array([1,2,3,4],[1,0,0,1])
In [345]: m
Out[345]:
masked_array(data = [-- 2 3 --],
mask = [ True False False True],
fill_value = 999999)
In [346]: m.data
Out[346]: array([1, 2, 3, 4])
In [347]: m.mask
Out[347]: array([ True, False, False, True], dtype=bool)
It contains both the original (full) array, and a mask. The mask may be a scalar, or it may be a boolean array with the same shape as the data.
scipy.sparse stores just the nonzero values of an array, though the space savings depends on the storage format and the sparsity. So you might simulate your masking with sparsity. Or you could take ideas from that representation.
What do you plan to do with these arrays? Just access items, or do calculations?
Masked arrays are most useful for data that is mostly good, with a modest number of 'bad' values. For example, real life data series with occasional glitches, or monthly data padded to 31 days. Masking lets you keep the data in a rectangular arrangement, and still calculate things like the mean and sum without useing the masked vales.

Categories