I have two arrays, namely, New with shape = (5100,) and A with shape = (5100, 5100). I am multiplying these two arrays and measuring the time taken to perform this operation. The time turns out to be 0.409 seconds. Is there a more time efficient way to multiply such large arrays?
Pe=A*New
Related
I have two arrays R3_mod with shape (21,21) containing many zeros and P2 with shape (21,) containing many zeros. I am getting the inverse of R3_mod using np.linalg.pinv() and eventually multiplying it to P2 as shown below. Is there a more efficient way to invert such arrays and then multiply?
Since the arrays are too big, you can access it here: https://drive.google.com/drive/u/0/folders/1NjEiNoneMaCbmbmObEs2GCNIb08NFIy3
import numpy as np
X = np.linalg.pinv(R3_mod).dot(P2)
Assuming that the matrix R3_mod is indeed invertible, I think it's best to use np.linalg.inv instead of linalg.pinv.
inv computes the inverse of the matrix directly, where pinv (stands for pseudo-inverse, see https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse) computes the matrix A' that minimizes |AA'-I|. If the input matrix is invertible, pinv should return the same result as inv.
I am trying to compare two 3D numpy arrays to calculate similarity. I have found these two posts, which I am trying to stich together to something useful.
Comparing NumPy Arrays for Similarity
Subtracting numpy arrays of different shape efficiently
To make a long story short, I have two arrays created from 3D point clouds so they are filled with 3D coordinates, but because the 3D objects are different, the arrays have different lengths.
If requested, I can post some sample arrays, but they are +1000 points, so that would be a lot of text to post.
Here is what I am trying to do now. You can get array1 and array2 data here: https://pastebin.com/WbNvRUwG (array2 starts at line 1858).
array1 = [long np array with 3D coordinates]
array2 = [long np array with 3D coordinates]
array1_original = array1.copy()
if len(array1) < len(array2):
array1, array2 = array2, array1
array_difference = np.subtract(array1, array2[:,None]) # The [:,None] is from the second link to make the arrays have same length to enable subtractraction
array_abs_difference = np.absolute(array_difference)
array_total_difference = np.sum(array_abs_difference)
similarity = 1 - (array_total_difference /
np.sum(array1_original))
My array differences are fine and represent what I want, so the most similar arrays have small differences, but when I do the sum of array1_original it comes out way smaller than my differences and therefore my similarity score becomes negative.
I also tried to calculate the difference from an array filled with zeros to array1_original, but it comes out about the same.
Can anyone tell me why np.sum(array1_original) would not be bigger than np.sum(array_abs_difference)?
The numpy comparison ended up being to slow, so I just used open3D instead. It works for me
I have a NumPy array vectors = np.random.randn(rows, cols). I want to find differences between its rows according to some other array diffs which is sparse and "2-hot": containing a 1 in its column corresponding to the first row of vectors and a -1 corresponding to the second row. Perhaps an example shall make it clearer:
diffs = np.array([[ 1, 0, -1],
[ 1, -1, 0]])
then I can compute the row differences by simply diffs # vectors.
Unfortunately this is slow for diffs of 10_000x1000 and vectors 1000x15_000. I can get a speedup by using scipy.sparse: sparse.csr_matrix(diffs) # vectors, but even this is 300ms.
Possibly this is simply as fast as it gets, but some part of me thinks whether using matrix multiplications is the wisest decision for this task.
What's more is I need to take the absolute value afterwards so really I'm doing np.abs(sparse.csr_matrix(diffs) # vectors) which adds ~ 200ms for a grand total of ~500ms.
I can compute the row differences by simply diffs # vectors.
This is very inefficient. A matrix multiplication runs in O(n*m*k) for a (n,m) multiplied by a (m,k) one. In your case, there is only two values per line and you do not actually need a multiplication by 1 or -1. Your problem can be computed in O(n*k) time (ie. m times faster).
Unfortunately this is slow for diffs of 10_000x1000 and vectors 1000x15_000. I can get a speedup by using scipy.sparse.
The thing is the input data representation is inefficient. When diff is an array of size (10_000,1000), this is not reasonable to use a dense matrix that would be ~1000 times bigger than needed nor a sparse matrix that is not optimized for having only two non-zero values (especially 1 and -1). You need to store the position of the non-zeros values in a 2D array called sel_rows of shape (2,n) where the first row contains the location of the 1 and the second one contains the location of the -1 in the diff 2D array. Then, you can extract the rows of vectors for example with vectors[sel_rows[0]]. You can perform the final operation with vectors[sel_rows[0,:]] - vectors[sel_rows[1,:]]. This approach should be drastically faster than a dense matrix product and it may be a bit faster than a sparse one regarding the target machine.
While the above solution is simple, it create multiple temporary arrays that are not cache-friendly since your output array should take 10_000 * 15_000 * 8 = 1.1 GiB (which is quite huge). You can use Numba so to remove temporary array and so improve the performance. Multiple threads can be used to improve performance even further. Here is an untested code:
import numba as nb
#nb.njit('(int_[:,::1], float64[:,::1])', parallel=True)
def compute(diffs, vectors):
n, k = diffs.shape[0], vectors.shape[1]
assert diffs.shape[1] == 2
res = np.empty((n, k))
for i in nb.prange(n):
a, b = diffs[i]
for j in range(k):
# Compute nb.abs here if needed so to avoid
# creating new temporary arrays
res[i, j] = vectors[a, j] - vectors[b, j]
return res
This above code should be nearly optimal. It should be memory bound and able to saturate the memory bandwidth. Note that writing such huge arrays in memory take some time as well as reading (twice) the input array. On x86-64 platforms, a basic implementation should move 4.4 GiB of data from/to the RAM. Thus, on a mainstream PC with a 20 GiB/s RAM, this takes 220 ms. In fact, the sparse matrix computation result was not so bad in practice for a sequential implementation.
If this is not enough to you, then you can use simple-precision floating-point numbers instead of double-precision (twice faster). You could also use a low-level C/C++ implementation so to reduce the memory bandwidth used (thanks to non-temporal instructions -- ~30% faster). There is no much more to do.
I have a 50 dimensional array, whose dimensions are 255 x 255 x 255 x...(50 times)..x255. So its a total of 50^255 floating point numbers. Its just out of scope to even think of fitting in a RAM. Moreover I need to take an 50 dimensional Fast Fourier Transform (DFT) of this array. I can't do it in python on an ordinary PC. I cant even imagine doing it on a GPU. so I am guessing I have to take help of a hard disk memory, but even that is too huge. I don't need this in real time, I can afford even days for it to run. I have no clue what sort of machine I need or is it even possible? Appreciate your advice. Super computers, grids, or something even if its too costly, I am not worried about investment.
If you found enough universes to save your data in, here is what you could do:
The Fourier Transform is separable, that means that calculating the DFT of each axis one after the other will give you the same result as if you calculated the n-dimensional DFT:
for i in range(C.ndim):
C[...] = numpy.fft.fft(C, axis=i)
Double checking if the value is correct using a 2D tensor (because we have a 2D FFT numpy.fft.fft2 to compare against):
import numpy
A = numpy.random.rand(*[16] * 2)
B = numpy.fft.fft2(A)
C = A.astype(numpy.complex) # output vector for separable FFT
for i in range(C.ndim):
C[...] = numpy.fft.fft(C, axis=i)
numpy.allclose(C, B) # True
I am concatenating data to a numpy array like this:
xdata_test = np.concatenate((xdata_test,additional_X))
This is done a thousand times. The arrays have dtype float32, and their sizes are shown below:
xdata_test.shape : (x1,40,24,24) (x1 : [500~10500])
additional_X.shape : (x2,40,24,24) (x2 : [0 ~ 500])
The problem is that when x1 is larger than ~2000-3000, the concatenation takes a lot longer.
The graph below plots the concatenation time versus the size of the x2 dimension:
Is this a memory issue or a basic characteristic of numpy?
As far as I understand numpy, all the stack and concatenate functions are not extremely efficient. And for good reasons, because numpy tries to keep array memory contiguous for efficiency (see this link about contiguous arrays in numpy)
That means that every concatenate operation have to copy the whole data every time. When I need to concatenate a bunch of elements together I tend to do this :
l = []
for additional_X in ...:
l.append(addiional_X)
xdata_test = np.concatenate(l)
That way, the costly operation of moving the whole data is only done once.
NB : would be interested in the speed improvement that gives you.
If you have in advance the arrays you want to concatenate, I would suggest creating a new array with the total shape and fill it with the small arrays rather than concatenating, as every concatenation operation needs to copy the whole data to a new contiguous space of memory.
First, calculate the total size of the first axis:
max_x = 0
for arr in list_of_arrays:
max_x += arr.shape[0]
Second, create the end container:
final_data = np.empty((max_x,) + xdata_test.shape[1:], dtype=xdata_test.dtype)
which is equivalent to (max_x, 40, 24, 24) but dynamically typed.
Last, fill the numpy array:
curr_x = 0
for arr in list_of_arrays:
final_data[curr_x:curr_x+arr.shape[0]] = arr
curr_x += arr.shape[0]
The loop above, copies each of the arrays to a previously defined column/rows of the larger array.
By doing this, each of the N arrays will be copied to the exact final destination, rather than creating temporal arrays for each of the concatenation.