Finding 2D boolean patterns in larger boolean tensors/arrays - python

I am looking for a way to find a 2D pattern in a MxNxR tensor/array with pytorch or numpy.
For instance, to see if a dictionary of tensor of boolean pattern (e.g. {6x6 : freq}) exist in a larger boolean tensor (e.g. 3x256x256).
Then I want to update my patterns and frequencies of the dictionary.
I was hoping that there was a pytorchi way of doing it, instead of having loops over it, or have an optimized loop for doing it.
As far as I know, torch.where works when we have a scalar value. I’m not sure how should I do, if I have a tensor of 6x6 instead of a value.
I looked into Finding Patterns in a Numpy Array , but I don't think that it's feasible to follow it for a 2D pattern.

I'm thinking maybe you can pull this off using convolutions. Let's imagine you have an input made up of 0 and 1. Here we will take a minimal example with an u=input of 3x3 and a 2x2 pattern:
>>> x = torch.tensor([[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.]])
And the pattern would be:
>>> pattern = torch.tensor([[1., 0.],
[0., 1.]])
Here the pattern can be found in the upper left corner of the input.
We perform a convolution with nn.functional.conv2d with 1 - pattern as the kernel.
>>> img, mask = x[None, None], pattern[None, None]
>>> M = F.conv2d(img, 1 - mask)
tensor([[[[0., 1.],
[2., 0.]]]])
There is a match if and only if the result is equal to the number of 1s in the pattern:
>>> M == mask.sum(dim=(2,3)))
tensor([[[[ True, False],
[False, False]]]])
You can deduce the frequencies from this final boolean mask. You can extend this method to multiple patterns by adding in kernels in your convolution.

Related

testing: compare numpy arrays while allowing a certain mismatch

I have two numpy arrays containing integers which I'm comparing with numpy.testing.assert_array_equal. The arrays are "equal enough", i.e. a few elements differ but given the size of my arrays, that's OK (in this specific case). But of course the test fails:
AssertionError:
Arrays are not equal
(mismatch 0.0010541406645359075%)
x: array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],...
y: array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],...
----------------------------------------------------------------------
Ran 1 test in 0.658s
FAILED (failures=1)
Of course one might argue that the (long-term) clean solution to this would be to adapt the reference solution or whatnot, but what I'd prefer is to simply allow for some mismatch without the test failing. I would have hoped for assert_array_equal to have an option for this, but this is not the case.
I've written a function which allows me to do exactly what I want, so the problem might be considered solved, but I'm just wondering whether there is a better, more elegant way to do this. Also, the approach of parsing the error string feels pretty hacky, but I haven't found a better way to get the mismatch percentage value.
def assert_array_equal_tolerant(arr1,arr2,threshold):
"""Compare equality of two arrays while allowing a certain mismatch.
Arguments:
- arr1, arr2: Arrays to compare.
- threshold: Mismatch (in percent) above which the test fails.
"""
try:
np.testing.assert_array_equal(arr1,arr2)
except AssertionError as e:
for arg in e.args[0].split("\n"):
match = re.search(r'mismatch ([0-9.]+)%',arg)
if match:
mismatch = float(match.group(1))
break
else:
raise
if mismatch > threshold:
raise
Just to be clear: I'm not talking about assert_array_almost_equal, and using it is also not feasible, because the errors are not small, they might be huge for a single element, but are confined to a very small number of elements.
You could try (if they are integers) to check for the number of elements that are not equal without regular expressions
unequal_pos = np.where(arr1 != arr2)
len(unequal_pos[0]) # gives you the number of elements that are not equal.
I don't know if you consider this more elegant.
Since the result of np.where can be used as index you can get the elements that do not match with
arr1[unequal_pos]
So you can do pretty much every test you like with that result. Depends on how you want to define the mismatch either by number of different elements or difference between the elements or something even fancier.
Here's a crude comparison, but it seems to be in the spirit of what numpy.testing.assert_array_equal does:
In [71]: x=np.arange(100).reshape(10,10)
In [72]: y=np.arange(100).reshape(10,10)
In [73]: y[(5,7),(3,5)]=(3,5)
In [74]: np.sum(np.abs(x-y)>1)
Out[74]: 2
In [80]: np.sum(x!=y)
Out[80]: 2
count_nonzero is a faster counter (because it is used frequently in other numpy code to allocate space)
In [90]: np.count_nonzero(x!=y)
Out[90]: 2
The function that you are using does:
assert_array_compare(operator.__eq__, x, y, err_msg=err_msg)
np.testing.utils.assert_array_compare is a longish function, but most of it has to do with testing shape, and handling nan and inf. Otherwise it comes down to doing
x==y
and doing a count on the number of mismatches, and generating the err_msg. Note that the err_msg can be customized, so parsing it could simplified.
If you know the shapes match, and you aren't worried about nan like values, then just filtering the numeric difference should work just fine.

Normalise 2D Numpy Array: Zero Mean Unit Variance

I have a 2D Numpy array, in which I want to normalise each column to zero mean and unit variance. Since I'm primarily used to C++, the method in which I'm doing is to use loops to iterate over elements in a column and do the necessary operations, followed by repeating this for all columns. I wanted to know about a pythonic way to do so.
Let class_input_data be my 2D array. I can get the column mean as:
column_mean = numpy.sum(class_input_data, axis = 0)/class_input_data.shape[0]
I then subtract the mean from all columns by:
class_input_data = class_input_data - column_mean
By now, the data should be zero mean. However, the value of:
numpy.sum(class_input_data, axis = 0)
isn't equal to 0, implying that I have done something wrong in my normalisation. By isn't equal to 0, I don't mean very small numbers which can be attributed to floating point inaccuracies.
Something like:
import numpy as np
eg_array = 5 + (np.random.randn(10, 10) * 2)
normed = (eg_array - eg_array.mean(axis=0)) / eg_array.std(axis=0)
normed.mean(axis=0)
Out[14]:
array([ 1.16573418e-16, -7.77156117e-17, -1.77635684e-16,
9.43689571e-17, -2.22044605e-17, -6.09234885e-16,
-2.22044605e-16, -4.44089210e-17, -7.10542736e-16,
4.21884749e-16])
normed.std(axis=0)
Out[15]: array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Eigenvectors computed with numpy's eigh and svd do not match

Consider singular value decomposition M=USV*. Then the eigenvalue decomposition of M* M gives M* M= V (S* S) V*=VS* U* USV*. I wish to verify this equality with numpy by showing that the eigenvectors returned by eigh function are the same as those returned by svd function:
import numpy as np
np.random.seed(42)
# create mean centered data
A=np.random.randn(50,20)
M= A-np.array(A.mean(0),ndmin=2)
# svd
U1,S1,V1=np.linalg.svd(M)
S1=np.square(S1)
V1=V1.T
# eig
S2,V2=np.linalg.eigh(np.dot(M.T,M))
indx=np.argsort(S2)[::-1]
S2=S2[indx]
V2=V2[:,indx]
# both Vs are in orthonormal form
assert np.all(np.isclose(np.linalg.norm(V1,axis=1), np.ones(V1.shape[0])))
assert np.all(np.isclose(np.linalg.norm(V1,axis=0), np.ones(V1.shape[1])))
assert np.all(np.isclose(np.linalg.norm(V2,axis=1), np.ones(V2.shape[0])))
assert np.all(np.isclose(np.linalg.norm(V2,axis=0), np.ones(V2.shape[1])))
assert np.all(np.isclose(S1,S2))
assert np.all(np.isclose(V1,V2))
The last assertion fails. Why?
Just play with small numbers to debug your problem.
Start with A=np.random.randn(3,2) instead of your much larger matrix with size (50,20)
In my random case, I find that
v1 = array([[-0.33872745, 0.94088454],
[-0.94088454, -0.33872745]])
and for v2:
v2 = array([[ 0.33872745, -0.94088454],
[ 0.94088454, 0.33872745]])
they only differ for a sign, and obviously, even if normalized to have unit module, the vector can differ for a sign.
Now if you try the trick
assert np.all(np.isclose(V1,-1*V2))
for your original big matrix, it fails... again, this is OK. What happens is that some vectors have been multiplied by -1, some others haven't.
A correct way to check for equality between the vectors is:
assert allclose(abs((V1*V2).sum(0)),1.)
and indeed, to get a feeling of how this works you can print this quantity:
(V1*V2).sum(0)
that indeed is either +1 or -1 depending on the vector:
array([ 1., -1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., -1., 1., 1., 1., -1., -1.])
EDIT: This will happen in most cases, especially if starting from a random matrix. Notice however that this test will likely fail if one or more eigenvalues has an eigenspace of dimension larger than 1, as pointed out by #Sven Marnach in his comment below:
There might be other differences than just vectors multiplied by -1.
If any of the eigenvalues has a multi-dimensional eigenspace, you
might get an arbitrary orthonormal basis of that eigenspace, and to
such bases might be rotated against each other by an arbitraty
unitarian matrix

How to convert a 2d bitmap into points with x-y coordinates in python - and fast

I am working on a motion detection project using opencv/python.
I have reached the point of creating an image bitmap/array where motion that has been detected is represented by 1's in the array, 0's where there is no motion (by subtracting one picture from a video feed from another after a short lag).
I wish to determine my moving targets (where there may be more than 1) by using Numpy's (or opencv's) kmeans algorithm.
However, these algorithms require an input of an array of x,y coordinates to be used.
What's the fastest and most efficient method to convert a 2d image bitmap array to an array of x y coordinates? (i.e. we're talking image processing here, so slow Python code is undesirable).
Numpy's nonzero will do this:
Here's an example from the docs:
>>> x = np.eye(3)
>>> x
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
>>> np.nonzero(x)
(array([0, 1, 2]), array([0, 1, 2]))
That is, the output is the indices of the non-zero elements in the array.

expanding (adding a row or column) a scipy.sparse matrix

Suppose I have a NxN matrix M (lil_matrix or csr_matrix) from scipy.sparse, and I want to make it (N+1)xN where M_modified[i,j] = M[i,j] for 0 <= i < N (and all j) and M[N,j] = 0 for all j. Basically, I want to add a row of zeros to the bottom of M and preserve the remainder of the matrix. Is there a way to do this without copying the data?
Scipy doesn't have a way to do this without copying the data but you can do it yourself by changing the attributes that define the sparse matrix.
There are 4 attributes that make up the csr_matrix:
data: An array containing the actual values in the matrix
indices: An array containing the column index corresponding to each value in data
indptr: An array that specifies the index before the first value in data for each row. If the row is empty then the index is the same as the previous column.
shape: A tuple containing the shape of the matrix
If you are simply adding a row of zeros to the bottom all you have to do is change the shape and indptr for your matrix.
x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 0., 0., 0., 0., 0.]])
Here is a function to handle the more general case of vstacking any 2 csr_matrices. You still end up copying the underlying numpy arrays but it is still significantly faster than the scipy vstack method.
def csr_vappend(a,b):
""" Takes in 2 csr_matrices and appends the second one to the bottom of the first one.
Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
the first matrix instead of copying it. The data, indices, and indptr still get copied."""
a.data = np.hstack((a.data,b.data))
a.indices = np.hstack((a.indices,b.indices))
a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
a._shape = (a.shape[0]+b.shape[0],b.shape[1])
return a
Not sure if you're still looking for a solution, but maybe others can look into hstack and vstack - http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html. I think we can define a csr_matrix for the single additional row and then vstack it with the previous matrix.
I don't think that there is any way to really escape from doing the copying. Both of those types of sparse matrices store their data as Numpy arrays (in the data and indices attributes for csr and in the data and rows attributes for lil) internally and Numpy arrays can't be extended.
Update with more information:
LIL does stand for LInked List, but the current implementation doesn't quite live up to the name. The Numpy arrays used for data and rows are both of type object. Each of the objects in these arrays are actually Python lists (an empty list when all values are zero in a row). Python lists aren't exactly linked lists, but they are kind of close and quite frankly a better choice due to O(1) look-up. Personally, I don't immediately see the point of using a Numpy array of objects here rather than just a Python list. You could fairly easily change the current lil implementation to use Python lists instead which would allow you to add a row without copying the whole matrix.

Categories