I am trying to compare two 3D numpy arrays to calculate similarity. I have found these two posts, which I am trying to stich together to something useful.
Comparing NumPy Arrays for Similarity
Subtracting numpy arrays of different shape efficiently
To make a long story short, I have two arrays created from 3D point clouds so they are filled with 3D coordinates, but because the 3D objects are different, the arrays have different lengths.
If requested, I can post some sample arrays, but they are +1000 points, so that would be a lot of text to post.
Here is what I am trying to do now. You can get array1 and array2 data here: https://pastebin.com/WbNvRUwG (array2 starts at line 1858).
array1 = [long np array with 3D coordinates]
array2 = [long np array with 3D coordinates]
array1_original = array1.copy()
if len(array1) < len(array2):
array1, array2 = array2, array1
array_difference = np.subtract(array1, array2[:,None]) # The [:,None] is from the second link to make the arrays have same length to enable subtractraction
array_abs_difference = np.absolute(array_difference)
array_total_difference = np.sum(array_abs_difference)
similarity = 1 - (array_total_difference /
np.sum(array1_original))
My array differences are fine and represent what I want, so the most similar arrays have small differences, but when I do the sum of array1_original it comes out way smaller than my differences and therefore my similarity score becomes negative.
I also tried to calculate the difference from an array filled with zeros to array1_original, but it comes out about the same.
Can anyone tell me why np.sum(array1_original) would not be bigger than np.sum(array_abs_difference)?
The numpy comparison ended up being to slow, so I just used open3D instead. It works for me
Related
I want to know how much different are two numpy matrices. Matrix1 and Matrix2 could be much similar, like 80% same values but just shifted... I attach images of two identical arrays that differ in a little sequence of values in top right.
from skimage.util import compare_images
#matrix1 & matrix2 are numpy arrays
compare_images(matrix1, matrix2, method='diff')
Gives me a first comparison, but what about two numpy matrices, one of which is, for example, left-shifted by a couple of columns?
from scipy.signal import correlate2d
corr = correlate2d(matrix1, matrix2)
plt.figure(figsize=(10,10))
plt.imshow(corr)
plt.grid(False)
plt.show()
Prints out correlation and it seems a nice method, but I do not understand how the results are displayed, since the differences are in top right of the images.
Otherwise:
picture1_norm = picture1/np.sqrt(np.sum(picture1**2))
picture2_norm = picture2/np.sqrt(np.sum(picture2**2))
print(np.sum(picture2_norm*picture1_norm))
Returns a value in range 0-1 of similarity; for example 0.9942.
What could be a good method?
Correlation between two matrices is a legitimate measure of how similar both are. If both contain the same values the (normalized) correlation will be 1 and your (max?) value of 0.9942 is already very close to that.
Regarding translational (in-)variance of your result have a closer look at the mode argument of scipy.signal.correlate2d which defines how to handle differing sizes along both axes of your matrices and how far to slide one matrix over the other when calculating the correlation.
I have a 3D numpy array points of dimensions [10000x3000x128] where the first dimension is the number of frames, the second dimension the number of points in each frame and the third dimension is a 128-element feature vector associated to each point. What I want to do is to efficiently filter the points in each frame by using a boolean 2D mask of dimensions [10000x3000] and for each of the selected points also take the related 128-dim vector of features. Moreover, in output I need still a 3D vector and not a merged 2D vector and possibly avoid any for loop.
Actually what I'm doing is:
# example of points
points = np.array([10000, 3000, 128])
# fg, bg = 2D dimensional boolean np.array
# init empty lists
fg_points, bg_points = [], []
for i in range(points.shape[0]):
fg_mask_tmp, bg_mask_tmp = fg[i], bg[i]
fg_points.append(points[i,fg_mask_tmp,:])
bg_points.append(points[i,bg_mask_tmp,:])
fg_features, bg_features = np.array(fg_points), np.array(bg_points)
But this is a quite naive solution that for sure can be improved in a more numpy-like way.
In addition, I also tried other solutions as:
fg_features = points[fg,:]
But this solution does not preserve the dimensions of the array merging the two first dimensions since the number of filtered points for each frame can vary.
Another solution I tried is to enlarge the 2D masks by appending a [128] true value to the last dimension, but with any successful result.
Dos anyone know a possible efficient solution?
Thank you in advance for any help!
Note: I'm using numpy
import numpy as np
Given 4 arrays of the same (but arbitrary) shape, I am trying to write a function that forms 2x2 matrices from each corresponding element of the arrays, finds the eigenvalues, and returns two arrays of the same shape as the original four, with its elements being eigenvalues (i.e. the resulting arrays would have the same shape as the input, with array1 holding all the first eigenvalues and array2 holding all the second eigenvalues).
I tried doing the following, but unsurprisingly, it gives me an error that says the array is not square.
temp = np.linalg.eig([[m1, m2],[m3, m4]])[0]
I suppose I can make an empty temp variable in the same shape,
temp = np.zeros_like(m1)
and go over each element of the original arrays and repeat the process. My problem is that I want this generalised for arrays of any arbitrary shape (need not be one dimensional). I would guess that finding the shape of the arrays and designing loops to go over each element would not be a very good way of doing it. How do I do this efficiently?
Construct a 2x2x... array:
temp = np.array([[m1, m2], [m3, m4]])
Move the first two dimensions to the end for a ...x2x2 array:
for _ in range(2):
temp = np.rollaxis(temp, 0, temp.ndim)
Call np.linalg.eigvals (which broadcasts) for a ...x2 array of eigenvalues:
eigvals = np.linalg.eigvals(temp)
And split this into an array of first eigenvalues and an array of second eigenvalues:
eigvals1, eigvals2 = eigvals[..., 0], eigvals[..., 1]
I have several N-dimensional arrays of different shapes and want to combine them into a new (N+1)-dimensional array, where the new axis has a length corresponding to the number of initial N-d arrays.
This answer is sufficient if the original arrays are all the same shape; however, it does not work if they have different shapes.
I don't really want to reshape the arrays to a congruent size and fill with empty elements due to the subsequent analysis I need to perform on the final array.
Specifically, I have four 4D arrays. One of the things I want to do with the resulting 5D array is plot parts of the four arrays on the same matplotlib figure. Obviously I could plot each one separately, however soon I will have more than four 4D arrays and am looking for a dynamic solution.
While I was writing this, Sven gave the same answer in the comments...
Put the arrays in a python list in the following manner:
5d_list = []
5d_list.append(4D_array_1)
5d_list.append(4D_array_2)
...
Then you can unpack them:
for 4d_array in 5d_list:
#plot 4d array on figure
I have a 2-D array of values and need to mask certain elements of that array (with indices taken from a list of ~ 100k tuple-pairs) before drawing random samples from the remaining elements without replacement.
I need something that is both quite fast/efficient (hopefully avoiding for loops) and has a small memory footprint because in practice the master array is ~ 20000 x 20000.
For now I'd be content with something like (for illustration):
xys=[(1,2),(3,4),(6,9),(7,3)]
gxx,gyy=numpy.mgrid[0:100,0:100]
mask = numpy.where((gxx,gyy) not in set(xys)) # The bit I can't get right
# Now sample the masked array
draws=numpy.random.choice(master_array[mask].flatten(),size=40,replace=False)
Fortunately for now I don't need the x,y coordinates of the drawn fluxes - but bonus points if you know an efficient way to do this all in one step (i.e. it would be acceptable for me to identify those coordinates first and then use them to fetch the corresponding master_array values; the illustration above is a shortcut).
Thanks!
Linked questions:
Numpy mask based on if a value is in some other list
Mask numpy array based on index
Implementation of numpy in1d for 2D arrays?
You can do it efficently using sparse coo matrix
from scipy import sparse
xys=[(1,2),(3,4),(6,9),(7,3)]
coords = zip(*xys)
mask = sparse.coo_matrix((numpy.ones(len(coords[0])), coords ), shape= master_array.shape, dtype=bool)
draws=numpy.random.choice( master_array[~mask.toarray()].flatten(), size=10)