I am performing non-negative least squares using scipy. A trivial example would be as follows:
import numpy as np
from scipy.optimize import nnls
A = np.array([[60, 70, 120, 60],[60, 90, 120, 70]], dtype='float32')
b = np.array([6, 5])
x, res = nnls(A, b)
Now, I have a situation where some entries in A or b can be missing (np.NaN). Something like,
A_2 = A.copy()
A_2[0,2] = np.NaN
Ofcourse, running NNLS on A_2, b will not work as scipy does not expect an inf or nan.
How can we perform NNLS masking out the missing entry from the computation. Effectively, this should translate to
Minimize |(A_2.x- b)[mask]|
where mask can be defined as:
mask = ~np.isnan(A_2)
In general, entries can be missing from both A and b.
Possibly helpful:
[1] How to include constraint to Scipy NNLS function solution so that it sums to 1
I think you can compute the mask first (determine which points you want included) and then perform NNLS. Given the mask
In []: mask
Out[]:
array([[ True, True, False, True],
[ True, True, True, True]], dtype=bool)
you can verify whether to include a point by checking if all values in a column are True using np.all along the first axis.
In []: np.all(mask, axis=0)
Out[]: array([ True, True, False, True], dtype=bool)
This can then be used as a column mask for A.
In []: nnls(A_2[:,np.all(mask, axis=0)], b)
Out[]: (array([ 0.09166667, 0. , 0. ]), 0.7071067811865482)
The same idea can be used for b to construct a row mask.
Related
I'm writing a simple root finder in NumPy that is designed to operate on entire NumPy arrays simultaneously. The basic idea of the solver is that the array not_converged is the size of all the data points while newly_converged is a different array that is the size of the subset of not_coverged that is True` in that iteration.
I'd like to update the values in the overall not_converged array to be False when entries in the newly_converged array become True, but doing so using Boolean indexing doesn't seem to work as I'd expect:
not_converged = np.array([True, False, True])
# A previously not converged point has now converged
newly_converged = np.array([True, False])
# Now update the converged state
not_converged[not_converged][newly_converged] = False
print(not_converged) # should be [False False True] but the array hasn't been updated
I can get this to work if I use np.where though:
not_converged = np.array([True, False, True])
# A previously not converged point has now converged
newly_converged = np.array([True, False])
# Now update the converged state
not_converged[not_converged] = np.where(newly_converged, False, True)
print(not_converged) # Correctly updates the state
But I would have thought that the two methods above would yield identical results. Any idea why only the second method works?
The np.where() is equivalent to the following indexing.
Note that not_converged[not_converged][newly_converged] is not a view into not_converged, it's a copy, so nothing should change.
not_converged = np.array([True, False, True])
# A previously not converged point has now converged
newly_converged = np.array([True, False])
# Now update the converged state
not_converged[not_converged] = np.logical_not(newly_converged)
print(not_converged) # should be [False False True] but the array hasn't been updated
This question is similar to one question asked before, as hpaulj comments (1, 2) well explained the reason, you can do this as:
sample = not_converged[not_converged]
sample[newly_converged] = False
not_converged[not_converged] = sample
or in simple way use ~ to invert newly_converged instead np.logical_not:
not_converged[not_converged] = ~newly_converged
I am trying to calculate the area within a specific region by utilizing masks in shape of elipses, and taking the mean of the values inside the mask. Like this:
This is an eye image that is originally:
What i want to do is calculate the area of the sclera and iris separately. The way i want to do it is by generating masks, one just for the iris a second for the entire eye and a third to subtract the iris mask from the entire eye mask to obtain the sclera mask, where the first mask is just the iris region, the second area is the entire eye and the third will be a subtraction, which is exactly i am tring to do. Through subtraction achieve the area of the sclera separately.
The problem is that my mask function returns me boolean values, this what i was trying to do:
from PIL import Image
img = Image.open(r'imgpath')
h_1 = 16
k_1 = 31
a_1 = 7
b_1 = 17
#elipse function
def _in_ellipse(x, y, h, k, a, b):
z = ((x-h)**2)/a**2 + ((y-k)**2)/b**2
if z < 1:
return True
else:
return False
in_ellipse = np.vectorize(_in_ellipse)
img = np.asarray(img)
mask = in_ellipse(*np.indices(img.shape), h_1,k_1,a_1,b_1)
#Visualize the mask size
plt.imshow(mask)
plt.show()
#See if its inside the boundaries
plt.imshow(np.where(mask, img, np.nan))
plt.show()
mask_mean = np.nanmean((np.where(mask, img, np.nan)))
What i am trying to do is before calculating the mean values, i want to grab the mean value of the sclera alone, an attempt was through subtraction of the two areas, but elipse function does not return pixel values as expected, it returns boolean values:
mask:
array([[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]])
From what I understand, to generate the sclera_mask from eye_mask and iris_mask, both of which are of boolean type, your idea of subtraction translates to a logical, element-wise XOR operation on the two masks:
sclera_mask = np.logical_xor(eye_mask, iris_mask)
More on this in the docs: numpy.logical_xor
I was trying to test numerically if the multiplication of a matrix and its transpose really generates a symmetric square matrix.
Below is the code I used:
mat = np.array([[1,2,3],[1,0,1],[1,1,1],[2,3,5]])
mat2 = np.linalg.inv(np.matmul(np.transpose(mat),mat))
mat2
array([[ 1.42857143, 0.42857143, -0.85714286],
[ 0.42857143, 1.92857143, -1.35714286],
[-0.85714286, -1.35714286, 1.21428571]])
mat2 appears to be symmetric.
However, the result of the below code made me confused:
np.transpose(mat2) == mat2
array([[ True, False, False],
[False, True, False],
[False, False, True]])
But when I did the same procedure with mat, the result was as I expected:
np.transpose(np.matmul(np.transpose(mat),mat)) == np.matmul(np.transpose(mat),mat)
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])
Is this related a computational issue? If then, how can I show that the off-diagonal elements are identical?
Comparing mat and mat.T, you're comparing integers to integers and there's no problem.
mat2 is floating point, which is prone to subtle errors. When you print out mat2, you're seeing the truncated version of the full digits. Look at the difference between mat2 and mat2.T:
>>> mat2 - mat2.T
array([[ 0.00000000e+00, 1.11022302e-16, -1.11022302e-16],
[-1.11022302e-16, 0.00000000e+00, 2.22044605e-16],
[ 1.11022302e-16, -2.22044605e-16, 0.00000000e+00]])
The differences are on the order of 0.0000000000000001, meaning that they're equal "for all intents an purposes" but not equal exactly. There are two places to go from here. You can either accept that numerical precision is limited and use something like numpy.allclose for your equality tests, which allows for some small errors:
>>> np.allclose(mat2, mat2.T)
True
Or, if you really insist on your matrices being symmetric, you can enforce it with something like this:
>>> mat3 = (mat2 + mat2.T)/2
>>> mat3 == mat3.T
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])
I have a multidimensional numpy array where the elements are either True or False values:
import numpy as np
#just making a toy array grid to show what I want to do
grid = np.ones((4,4),dtype = 'bool')
grid[0,0]=False
grid[-1,-1]=False
#now grid has a few false values but is a 4x4 filled with mostly true values
Now I need to generate another array M, where the value at each site M[i,j] depends on grid[i:i+2,j:j+2] as in
M = np.empty((4x4)) #elements to be filled
#here is the part I want to clean up
for ii in range(4):
for jj in range(4):
#details here are unimportant. It's just that M[ii,jj] depends on
#multiple elements of grid in some way
if ii+2<=4 and jj+2<=4:
M[ii,jj] = np.all(grid[ii:ii+2,jj:jj+2]==True)
else:
M[ii,jj] = False
Is there some way to fill the array M using elements from grid without double loops?
Approach #1
Here's one approach with 2D convolution -
from scipy.signal import convolve2d as conv2
out = (conv2(grid,np.ones((2,2),dtype=int),'valid')==4).astype(int)
Sample run -
In [118]: grid
Out[118]:
array([[False, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, False]], dtype=bool)
In [119]: (conv2(grid,np.ones((2,2),dtype=int),'valid')==4).astype(int)
Out[119]:
array([[0, 1, 1],
[1, 1, 1],
[1, 1, 0]])
Please note that the last row and last column from the expected output would be all zeros with the initialized output array. This is because of the sliding nature of the code, as it won't have that much of extent along the rows and columns.
Approach #2
Here's another with 2D uniform filter -
from scipy.ndimage.filters import uniform_filter as unif2d
out = unif2d(grid,size=2).astype(int)[1:,1:]
Approach #3
Here's another with 4D slided windowed view -
from skimage.util import view_as_windows as viewW
out = viewW(grid,(2,2)).all(axis=(2,3)).astype(int)
With that all(axis=(2,3)), we are just checking along both the dimensions of every window for all elements to be all True elements.
Runtime test
In [122]: grid = np.random.rand(5000,5000)>0.1
In [123]: %timeit (conv2(grid,np.ones((2,2),dtype=int),'valid')==4).astype(int)
1 loops, best of 3: 520 ms per loop
In [124]: %timeit unif2d(grid,size=2).astype(int)[1:,1:]
1 loops, best of 3: 210 ms per loop
In [125]: %timeit viewW(grid,(2,2)).all(axis=(2,3)).astype(int)
1 loops, best of 3: 614 ms per loop
I have an array of N-dimensional values arranged in a 2D array. Something like:
import numpy as np
data = np.array([[[1,2],[3,4]],[[5,6],[1,2]]])
I also have a single value x that I want to compare against each data point, and I want to get a 2D array of boolean values showing whether my data is equal to x.
x = np.array([1,2])
If I do:
data == x
I get
# array([[[ True, True],
# [False, False]],
#
# [[False, False],
# [ True, True]]], dtype=bool)
I could easily combine these to get the result I want. However, I don't want to iterate over each of these slices, especially when data.shape[2] is larger. What I am looking for is a direct way of getting:
array([[ True, False],
[False, True]])
Any ideas for this seemingly easy task?
Well, (data == x).all(axis=-1) gives you what you want. It's still constructing a 3-d array of results and iterating over it, but at least that iteration isn't at Python-level, so it should be reasonably fast.