In a situation like the one below, how do I vstack the two matrices?
import numpy as np
a = np.array([[3,3,3],[3,3,3],[3,3,3]])
b = np.array([[2,2],[2,2],[2,2]])
a = np.vstack([a, b])
Output:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2
The output I would like would look like this:
a = array([[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]],
[[2, 2],
[2, 2],
[2, 2]]])
My goal is to then to loop over the content of the stacked matrices, index each matrix and call a function on a specific row.
for matrix in a:
row = matrix[1]
print(row)
Output:
[3, 3, 3]
[2, 2]
Be careful with those "Numpy is faster" claims. If you already have arrays, and make full use of array methods, numpy is indeed faster. But if you start with lists, or have to use Python level iteration (as you do in Pack...), the numpy version might well be slower.
Just doing a time test on the Pack step:
In [12]: timeit Pack_Matrices_with_NaN([a,b,c],5)
221 µs ± 9.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Compare that with fetching the first row of each array with a simple list comprehension:
In [13]: [row[1] for row in [a,b,c]]
Out[13]: [array([3., 3., 3.]), array([2., 2.]), array([4., 4., 4., 4.])]
In [14]: timeit [row[1] for row in [a,b,c]]
808 ns ± 2.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
200 µs compared to less than 1 µs!
And timing your Unpack:
In [21]: [Unpack_Matrix_with_NaN(packed_matrices.reshape(3,3,5),i)[1,:] for i in range(3)]
...:
Out[21]: [array([3., 3., 3.]), array([2., 2.]), array([4., 4., 4., 4.])]
In [22]: timeit [Unpack_Matrix_with_NaN(packed_matrices.reshape(3,3,5),i)[1,:] for i in ra
...: nge(3)]
199 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I was able to solve this only using NumPy. As NumPy is significantly faster than python's list function (https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347) I wanted to share my answer as it might be useful to others.
I started with adding np.NaN to make the two arrays the same shape.
import numpy as np
a = np.array([[3,3,3],[3,3,3],[3,3,3]]).astype(float)
b = np.array([[2,2],[2,2],[2,2]]).astype(float)
# Extend each vector in array with Nan to reach same shape
b = np.insert(b, 2, np.nan, axis=1)
# Now vstack the arrays
a = np.vstack([[a], [b]])
print(a)
Output:
[[[ 3. 3. 3.]
[ 3. 3. 3.]
[ 3. 3. 3.]]
[[ 2. 2. nan]
[ 2. 2. nan]
[ 2. 2. nan]]]
Then I wrote a function to unpack each array in a, and remove the nan.
def Unpack_Matrix_with_NaN(Matrix_with_nan, matrix_of_interest):
for first_row in Matrix_with_nan[matrix_of_interest,:1]:
# find shape of matrix row without nan
first_row_without_nan = first_row[~np.isnan(first_row)]
shape = first_row_without_nan.shape[0]
matrix_without_nan = np.arange(shape)
for row in Matrix_with_nan[matrix_of_interest]:
row_without_nan = row[~np.isnan(row)]
matrix_without_nan = np.vstack([matrix_without_nan, row_without_nan])
# Remove vector specifying shape
matrix_without_nan = matrix_without_nan[1:]
return matrix_without_nan
I could then loop through the matrices, find my desired row, and print it.
Matrix_with_nan = a
for matrix in range(len(Matrix_with_nan)):
matrix_of_interest = Unpack_Matrix_with_NaN(a, matrix)
row = matrix_of_interest[1]
print(row)
Output:
[3. 3. 3.]
[2. 2.]
I also made a function to pack matrices when more than one nan needs to be added per row:
import numpy as np
a = np.array([[3,3,3],[3,3,3],[3,3,3]]).astype(float)
b = np.array([[2,2],[2,2],[2,2]]).astype(float)
c = np.array([[4,4,4,4],[4,4,4,4],[4,4,4,4]]).astype(float)
# Extend each vector in array with Nan to reach same shape
def Pack_Matrices_with_NaN(List_of_matrices, Matrix_size):
Matrix_with_nan = np.arange(Matrix_size)
for array in List_of_matrices:
start_position = len(array[0])
for x in range(start_position,Matrix_size):
array = np.insert(array, (x), np.nan, axis=1)
Matrix_with_nan = np.vstack([Matrix_with_nan, array])
Matrix_with_nan = Matrix_with_nan[1:]
return Matrix_with_nan
arrays = [a,b,c]
packed_matrices = Pack_Matrices_with_NaN(arrays, 5)
print(packed_matrices)
Output:
[[ 3. 3. 3. nan nan]
[ 3. 3. 3. nan nan]
[ 3. 3. 3. nan nan]
[ 2. 2. nan nan nan]
[ 2. 2. nan nan nan]
[ 2. 2. nan nan nan]
[ 4. 4. 4. 4. nan]
[ 4. 4. 4. 4. nan]
[ 4. 4. 4. 4. nan]]
Related
I have been tasked with applying a "1-2-1" filter to a numpy array and returning an array of the filtered data. Without using for loops while loops or list comprehension.
The "1-2-1" filter maps each point of data to the average of itself twice and its neighbors. For example, if at some point the data contained ...1, 4, 3... then after applying the "1-2-1" filter the 4 would be replaced with (1 + 4 + 4 + 3) / 4 = 12 / 4 = 3.
For example the numpy array [1, 1, 4, 3, 2]
Would after the filter is applied produce a numpy array [1. 1.75 3. 3. 2. ]
Since the end points of the data do not have two neighbors the 1-2-1 filter is only applied to the internal len(data) - 2 points, leaving the end points unchanged.
Essentially I need to access the values before and after a given point during numpy array vectorization. For a array that could be of any length. Which as much as I have googled I cannot work out.
Pandas solution
s = pd.Series(l)
>>> s.rolling(3, center=True).sum().add(s).div(4).fillna(s).values
array([1. , 1.75, 3. , 3. , 2. ])
Step by step:
>>> s.rolling(3, center=True).sum().values
array([nan, 6., 8., 9., nan])
>>> s.rolling(3, center=True).sum().add(s).values
array([nan, 7., 12., 12., nan])
>>> s.rolling(3, center=True).sum().add(s).div(4).values
array([ nan, 1.75, 3. , 3. , nan])
>>> s.rolling(3, center=True).sum().add(s).div(4).fillna(s).values
array([1. , 1.75, 3. , 3. , 2. ])
Numpy solution
a = np.array(l)
>>> np.concatenate([a[:1], np.convolve(a, [1, 2, 1], mode="valid") / 4, a[-1:]])
Step by step:
>>> np.convolve(a, [1, 2, 1], mode="valid")
array([ 7, 12, 12])
>>> np.convolve(a, [1, 2, 1], mode="valid") / 4
array([1.75, 3. , 3. ])
>>> np.concatenate([a[:1], np.convolve(a, [1, 2, 1], mode="valid") / 4, a[-1:]])
array([1. , 1.75, 3. , 3. , 2. ])
Performance
%timeit s.rolling(3, center=True).sum().add(s).div(4).fillna(s).values
706 µs ± 4.62 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.concatenate([a[:1], np.convolve(a, [1, 2, 1], mode="valid") / 4, a[-1:]])
10.8 µs ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
You can try something like this:
import numpy as np
from scipy.linalg import circulant
x = np.array([1, 1, 4, 3, 2])
val = np.array([1, 2, 1])
offsets = np.array([0, 1, 2])
col0 = np.zeros(len(x))
col0[offsets] = val
C = circulant(col0).T[:-(len(val) - 1)]
print(C)
C is essentially a circulant matrix which looks like this:
array([[1., 2., 1., 0., 0.],
[0., 1., 2., 1., 0.],
[0., 0., 1., 2., 1.]])
Now you can simply compute the filtered output as follows:
y = (C # x) / 4
print(y)
# array([1.75, 3. , 3. ])
I have tried code below to multiply float element of matrix a which is less then one by any integer but it's not working on the other hand it is working properly for the matrix whose element is not a float i.e if you define matix a = np.arange(9).reshape(3,3) then it's working.
import numpy as np
a = np.linspace(0,1,9).reshape(3,3)
print(a)
print('new matrix')
for x in np.nditer(a, op_flags = ['readwrite']):
if x in range(0,1):
x[...] = 100*x
print(a)
In [130]: a = np.linspace(0,1,9).reshape(3,3)
In [131]: a
Out[131]:
array([[0. , 0.125, 0.25 ],
[0.375, 0.5 , 0.625],
[0.75 , 0.875, 1. ]])
I don't usually recommend using nditer to iterate through an array. It's hard to use right, and rarely, if ever, improves speed. I'm not sure who or what is prompting people to use it. Its docs could use a stronger speed disclaimer.
Anyways, lets examine what's happening.
In [136]: for x in np.nditer(a, op_flags = ['readwrite']):
...: print(type(x), x, x.shape)
...: if x in range(0,1):
...: x[...] = 100*x
...: print('mul')
...:
<class 'numpy.ndarray'> 0.0 ()
mul
<class 'numpy.ndarray'> 0.125 ()
<class 'numpy.ndarray'> 0.25 ()
<class 'numpy.ndarray'> 0.375 ()
<class 'numpy.ndarray'> 0.5 ()
<class 'numpy.ndarray'> 0.625 ()
<class 'numpy.ndarray'> 0.75 ()
<class 'numpy.ndarray'> 0.875 ()
<class 'numpy.ndarray'> 1.0 ()
nditer runs through every element of the array (not rows), producing a 0d view each time (shape ()). Only one of those elements is 0, so it multiplies by 100. None of the others are in range(0,1) (Only 0 in range(0,1), everything else is False).
So the iteration is work, at least as coded, if not as you intend.
a = np.arange(9).reshape(3,3) doesn't change anything. Only the 0 is in range(0,1),
===
Change the if test:
In [146]: a = np.linspace(0,1,9).reshape(3,3)
In [147]: a
Out[147]:
array([[0. , 0.125, 0.25 ],
[0.375, 0.5 , 0.625],
[0.75 , 0.875, 1. ]])
In [148]: for x in np.nditer(a, op_flags = ['readwrite']):
...: if x<1:
...: x[...] = 100*x
...: print('mul')
...:
mul
...
mul
In [149]: a
Out[149]:
array([[ 0. , 12.5, 25. ],
[37.5, 50. , 62.5],
[75. , 87.5, 1. ]])
An alternative to nditer is a flat iteration. In some ways that's messier since it requires a enumerate if we want to modify original values:
In [150]: a = np.linspace(0,1,9).reshape(3,3)
In [151]: for i,v in enumerate(a.flat):
...: if v<1:
...: a.flat[i] *= 100
...:
In [152]: a
Out[152]:
array([[ 0. , 12.5, 25. ],
[37.5, 50. , 62.5],
[75. , 87.5, 1. ]])
But despite some claims in the nditer docs, it isn't faster:
In [153]: %%timeit a=np.linspace(0,1,9).reshape(3,3)
...: for i,v in enumerate(a.flat):
...: if v<1:
...: a.flat[i] *= 100
5.4 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [154]:
In [154]: %%timeit a=np.linspace(0,1,9).reshape(3,3)
...: for x in np.nditer(a, op_flags = ['readwrite']):
...: if x<1:
...: x[...] = 100*x
34.4 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
===
But normally you shouldn't be iterating on an array. A whole-array, vectorized, approach is:
In [157]: mask = a<1
In [158]: mask
Out[158]:
array([[ True, True, True],
[ True, True, True],
[ True, True, False]])
In [159]: a[mask] *= 100
In [160]: a
Out[160]:
array([[ 0. , 12.5, 25. ],
[37.5, 50. , 62.5],
[75. , 87.5, 1. ]])
In [161]: %%timeit a=np.linspace(0,1,9).reshape(3,3)
...: a[a<1] *= 100
12.5 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Ouch! that's slower than the flat enumerate - for this small example. For a much larger a this will do much better.
You can apply the range "manually" in a boolean filter to multiply only the elements you are targeting:
import numpy as np
a = np.linspace(0,1,9).reshape(3,3)
print(a)
print('new matrix')
a[(a>0) & (a<1)] *= 100
print(a)
[[0. 0.125 0.25 ]
[0.375 0.5 0.625]
[0.75 0.875 1. ]]
new matrix
[[ 0. 12.5 25. ]
[37.5 50. 62.5]
[75. 87.5 1. ]]
note that your linear space will only ever generate 1 value that is outside of the range (the last one) so you might as well multiply everything and reassign the last value to 1
I would like to add float coordinates to a numpy array by splitting the intensity based on the centre of mass of the coordinate to neighbouring pixels.
As an example with integers:
import numpy as np
arr = np.zeros((5, 5), dtype=float)
coord = [2, 2]
arr[coord[0], coord[1]] = 1
arr
>>> array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
However I would like to distribute the intensity across neighbouring pixels when coord is float data, eg. coord = [2.2, 1.7].
I have considered using a gaussian, eg:
grid = np.meshgrid(*[np.arange(i) for i in arr.shape], indexing='ij')
out = np.exp(-np.dstack([(grid[i]-c)**2 for i, c in enumerate(coord)]).sum(axis=-1) / 0.5**2)
which gives good results, but becomes slow for 3d data and thousands of points.
Any advice or ideas would be appreciated, thanks.
Based on #rpoleski suggestion, take a local region and apply weighting by distance. This is a good idea although the implementation I have does not maintain the original centre of mass of the coordinates, for example:
from scipy.ndimage import center_of_mass
coord = [2.2, 1.7]
# get region coords
grid = np.meshgrid(*[range(2) for i in coord], indexing='ij')
# difference Euclidean distance between coords and coord
delta = np.linalg.norm(np.dstack([g-(c%1) for g, c, in zip(grid, coord)]), axis=-1)
value = 3 # pixel value of original coord
# create final array by 1/delta, ie. closer is weighted more
# normalise by sum of 1/delta
out = value * (1/delta) / (1/delta).sum()
out.sum()
>>> 3.0 # as expected
# but
center_of_mass(out)
>>> (0.34, 0.63) # should be (0.2, 0.7) in this case, ie. from coord
Any ideas?
Here is a simple (and hence most probably fast enough) solution that keeps the center of mass and has sum = 1:
arr = np.zeros((5, 5), dtype=float)
coord = [2.2, 0.7]
indexes = np.array([[x, y] for x in [int(coord[0]), int(coord[0])+1] for y in [int(coord[1]), int(coord[1])+1]])
values = [1. / (abs(coord[0]-index[0]) * abs(coord[1]-index[1])) for index in indexes]
sum_values = sum(values)
for (value, index) in zip(values, indexes):
arr[index[0], index[1]] = value / sum_values
print(arr)
print(center_of_mass(arr))
which results in:
[[0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 0. ]
[0. 0.24 0.56 0. 0. ]
[0. 0.06 0.14 0. 0. ]
[0. 0. 0. 0. 0. ]]
(2.2, 1.7)
Note: I'm using taxicab distances - they're good for center of mass calculations.
For anyone needing this functionality, and thanks to #rpoleski, I came up with this which uses Numba to speed up the calculation.
#numba.njit
def _add_floats_to_array_2d(coords, arr, values):
"""
Distribute float values around neighbouring pixels in array whilst maintinating center of mass.
Uses Manhattan (taxicab) distances for center of mass calculation.
This function uses numba to speed up the calculation but is limited to exactly 2D.
Parameters
----------
coords: (N, ndim) ndarray
Floats to distribute into array.
arr: ndim ndarray
Floats will be distributed into this array.
Array is modified in place.
values: (N,) arraylike
The total value of each coord to distribute into arr.
"""
indices_local = np.array([[[0, 0], [1, 0]], [[0, 1], [1, 1]]])
for i, c in enumerate(coords):
temp_abs = np.abs(indices_local - np.remainder(c, 1))
temp = 1.0 / (temp_abs[..., 0] * temp_abs[..., 1])
# handle perfect integers
for j in range(temp.shape[0]):
for k in range(temp.shape[1]):
if np.isinf(temp[j, k]):
temp[j, k] = 0
arr[int(c[0]) : int(c[0]) + 2, int(c[1]) : int(c[1]) + 2] += (
values[i] * temp / temp.sum()
)
Some testing:
arr = np.zeros((256, 256))
coords = np.random.rand(10000, 2) * arr.shape[0]
values = np.ones(len(coords))
%timeit arr[tuple(coords.astype(int).T)] = values
>>> 106 µs ± 4.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit _add_floats_to_array_2d(coords, arr, values)
>>> 13.5 ms ± 546 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In fact it is better to compare to compare to a buffered function as the first test will overwrite any previous values instead of accumulating:
%timeit np.add.at(arr, tuple(coords.astype(int).T), values)
>>> 1.23 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
For example, I got matrix A of shape (3,2,2), e.g.
[
[[1,1],[1,1]],
[[2,2],[2,2]],
[[3,3],[3,3]]
]
and matrix B of shape (2,2), e.g.
[[1, 1], [0,1]]
I would like to achieve c of shape (3,2,2) like:
c = np.zeros((3,2,2))
for i in range(len(A)):
c[i] = np.dot(B, A[i,:,:])
which gives
[[[2. 2.]
[1. 1.]]
[[4. 4.]
[2. 2.]]
[[6. 6.]
[3. 3.]]]
What is the most efficient way to achieve this?
Thanks.
Use np.tensordot and then swap axes. So, use one of these -
np.tensordot(B,A,axes=((1),(1))).swapaxes(0,1)
np.tensordot(A,B,axes=((1),(1))).swapaxes(1,2)
We can reshape A to 2D after swapping axes, use 2D matrix multiplication with np.dot and reshape and swap axes to maybe gain marginal performance boost.
Timings -
# Original approach
def orgapp(A,B):
m = A.shape[0]
n = B.shape[0]
r = A.shape[2]
c = np.zeros((m,n,r))
for i in range(len(A)):
c[i] = np.dot(B, A[i,:,:])
return c
In [91]: n = 10000
...: A = np.random.rand(n,2,2)
...: B = np.random.rand(2,2)
In [92]: %timeit orgapp(A,B)
100 loops, best of 3: 12.2 ms per loop
In [93]: %timeit np.tensordot(B,A,axes=((1),(1))).swapaxes(0,1)
1000 loops, best of 3: 191 µs per loop
In [94]: %timeit np.tensordot(A,B,axes=((1),(1))).swapaxes(1,2)
1000 loops, best of 3: 208 µs per loop
# #Bitwise's solution
In [95]: %timeit np.flip(np.dot(A,B).transpose((0,2,1)),1)
1000 loops, best of 3: 697 µs per loop
Another solution:
np.flip(np.dot(A,B).transpose((0,2,1)),1)
I'm searching for an efficient way to create a matrix of occurrences from two arrays that contains indexes, one represents the row indexes in this matrix, the other, the column ones.
eg. I have:
#matrix will be size 4x3 in this example
#array of rows idxs, with values from 0 to 3
[0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3]
#array of columns idxs, with values from 0 to 2
[0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2]
And need to create a matrix of occurrences like:
[[1 0 0]
[0 2 0]
[0 1 2]
[2 1 5]]
I can create an array of one hot vectors in a simple form, but cant get it work when there is more than one occurrence:
n_rows = 4
n_columns = 3
#data
rows = np.array([0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3])
columns = np.array([0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2])
#empty matrix
new_matrix = np.zeros([n_rows, n_columns])
#adding 1 for each [row, column] occurrence:
new_matrix[rows, columns] += 1
print(new_matrix)
Which returns:
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 1. 1.]
[ 1. 1. 1.]]
It seems like indexing and adding a value like this doesn't work when there is more than one occurrence/index, besides printing it seems to work just fine:
print(new_matrix[rows, :])
:
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 1. 0.]
[ 0. 1. 1.]
[ 0. 1. 1.]
[ 0. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
So maybe I'm missing something there? Or this cant be done and I need to search for another way to do it?
Use np.add.at, specifying a tuple of indices:
>>> np.add.at(new_matrix, (rows, columns), 1)
>>> new_matrix
array([[ 1., 0., 0.],
[ 0., 2., 0.],
[ 0., 1., 2.],
[ 2., 1., 5.]])
np.add.at operates on the array in-place, adding 1 as many times to the indices as specified by the (row, columns) tuple.
Approach #1
We can convert those pairs to linear indices and then use np.bincount -
def bincount_app(rows, columns, n_rows, n_columns):
# Get linear index equivalent
lidx = (columns.max()+1)*rows + columns
# Use binned count on the linear indices
return np.bincount(lidx, minlength=n_rows*n_columns).reshape(n_rows,n_columns)
Sample run -
In [242]: n_rows = 4
...: n_columns = 3
...:
...: rows = np.array([0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3])
...: columns = np.array([0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2])
In [243]: bincount_app(rows, columns, n_rows, n_columns)
Out[243]:
array([[1, 0, 0],
[0, 2, 0],
[0, 1, 2],
[2, 1, 5]])
Approach #2
Alternatively, we can sort the linear indices and get the counts using slicing to have our second approach, like so -
def mask_diff_app(rows, columns, n_rows, n_columns):
lidx = (columns.max()+1)*rows + columns
lidx.sort()
mask = np.concatenate(([True],lidx[1:] != lidx[:-1],[True]))
count = np.diff(np.flatnonzero(mask))
new_matrix = np.zeros([n_rows, n_columns],dtype=int)
new_matrix.flat[lidx[mask[:-1]]] = count
return new_matrix
Approach #3
This seems like a straight-forward one with sparse matrix csr_matrix as well, as it does accumulation on its own for repeated indices. The benefit is the memory efficiency, given that it's a sparse matrix, which would be noticeable if you are filling a small number of places in the output and a sparse matrix output is okay.
The implementation would look something like this -
from scipy.sparse import csr_matrix
def sparse_matrix_app(rows, columns, n_rows, n_columns):
out_shp = (n_rows, n_columns)
data = np.ones(len(rows),dtype=int)
return csr_matrix((data, (rows, columns)), shape=out_shp)
If you need a regular/dense array, simply do -
sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
Sample output -
In [319]: sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
Out[319]:
array([[1, 0, 0],
[0, 2, 0],
[0, 1, 2],
[2, 1, 5]])
Benchmarking
Other approach(es) -
# #cᴏʟᴅsᴘᴇᴇᴅ's soln
def add_at_app(rows, columns, n_rows, n_columns):
new_matrix = np.zeros([n_rows, n_columns],dtype=int)
np.add.at(new_matrix, (rows, columns), 1)
Timings
Case #1 : Output array of shape (1000, 1000) and no. of indices = 10k
In [307]: # Setup
...: n_rows = 1000
...: n_columns = 1000
...: rows = np.random.randint(0,1000,(10000))
...: columns = np.random.randint(0,1000,(10000))
In [308]: %timeit add_at_app(rows, columns, n_rows, n_columns)
...: %timeit bincount_app(rows, columns, n_rows, n_columns)
...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
1000 loops, best of 3: 1.05 ms per loop
1000 loops, best of 3: 424 µs per loop
1000 loops, best of 3: 1.05 ms per loop
1000 loops, best of 3: 1.41 ms per loop
Case #2 : Output array of shape (1000, 1000) and no. of indices = 100k
In [309]: # Setup
...: n_rows = 1000
...: n_columns = 1000
...: rows = np.random.randint(0,1000,(100000))
...: columns = np.random.randint(0,1000,(100000))
In [310]: %timeit add_at_app(rows, columns, n_rows, n_columns)
...: %timeit bincount_app(rows, columns, n_rows, n_columns)
...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
100 loops, best of 3: 11.4 ms per loop
1000 loops, best of 3: 1.27 ms per loop
100 loops, best of 3: 7.44 ms per loop
10 loops, best of 3: 20.4 ms per loop
Case #3 : Sparse-ness in output
As stated earlier, for the sparse method to work better, we would need sparse-ness. Such a case would be like this -
In [314]: # Setup
...: n_rows = 5000
...: n_columns = 5000
...: rows = np.random.randint(0,5000,(1000))
...: columns = np.random.randint(0,5000,(1000))
In [315]: %timeit add_at_app(rows, columns, n_rows, n_columns)
...: %timeit bincount_app(rows, columns, n_rows, n_columns)
...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
100 loops, best of 3: 11.7 ms per loop
100 loops, best of 3: 11.1 ms per loop
100 loops, best of 3: 11.1 ms per loop
1000 loops, best of 3: 269 µs per loop
If you need a dense array, we lose the memory efficiency and hence performance one as well -
In [317]: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
100 loops, best of 3: 11.7 ms per loop