Related
In a situation like the one below, how do I vstack the two matrices?
import numpy as np
a = np.array([[3,3,3],[3,3,3],[3,3,3]])
b = np.array([[2,2],[2,2],[2,2]])
a = np.vstack([a, b])
Output:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2
The output I would like would look like this:
a = array([[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]],
[[2, 2],
[2, 2],
[2, 2]]])
My goal is to then to loop over the content of the stacked matrices, index each matrix and call a function on a specific row.
for matrix in a:
row = matrix[1]
print(row)
Output:
[3, 3, 3]
[2, 2]
Be careful with those "Numpy is faster" claims. If you already have arrays, and make full use of array methods, numpy is indeed faster. But if you start with lists, or have to use Python level iteration (as you do in Pack...), the numpy version might well be slower.
Just doing a time test on the Pack step:
In [12]: timeit Pack_Matrices_with_NaN([a,b,c],5)
221 µs ± 9.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Compare that with fetching the first row of each array with a simple list comprehension:
In [13]: [row[1] for row in [a,b,c]]
Out[13]: [array([3., 3., 3.]), array([2., 2.]), array([4., 4., 4., 4.])]
In [14]: timeit [row[1] for row in [a,b,c]]
808 ns ± 2.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
200 µs compared to less than 1 µs!
And timing your Unpack:
In [21]: [Unpack_Matrix_with_NaN(packed_matrices.reshape(3,3,5),i)[1,:] for i in range(3)]
...:
Out[21]: [array([3., 3., 3.]), array([2., 2.]), array([4., 4., 4., 4.])]
In [22]: timeit [Unpack_Matrix_with_NaN(packed_matrices.reshape(3,3,5),i)[1,:] for i in ra
...: nge(3)]
199 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I was able to solve this only using NumPy. As NumPy is significantly faster than python's list function (https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347) I wanted to share my answer as it might be useful to others.
I started with adding np.NaN to make the two arrays the same shape.
import numpy as np
a = np.array([[3,3,3],[3,3,3],[3,3,3]]).astype(float)
b = np.array([[2,2],[2,2],[2,2]]).astype(float)
# Extend each vector in array with Nan to reach same shape
b = np.insert(b, 2, np.nan, axis=1)
# Now vstack the arrays
a = np.vstack([[a], [b]])
print(a)
Output:
[[[ 3. 3. 3.]
[ 3. 3. 3.]
[ 3. 3. 3.]]
[[ 2. 2. nan]
[ 2. 2. nan]
[ 2. 2. nan]]]
Then I wrote a function to unpack each array in a, and remove the nan.
def Unpack_Matrix_with_NaN(Matrix_with_nan, matrix_of_interest):
for first_row in Matrix_with_nan[matrix_of_interest,:1]:
# find shape of matrix row without nan
first_row_without_nan = first_row[~np.isnan(first_row)]
shape = first_row_without_nan.shape[0]
matrix_without_nan = np.arange(shape)
for row in Matrix_with_nan[matrix_of_interest]:
row_without_nan = row[~np.isnan(row)]
matrix_without_nan = np.vstack([matrix_without_nan, row_without_nan])
# Remove vector specifying shape
matrix_without_nan = matrix_without_nan[1:]
return matrix_without_nan
I could then loop through the matrices, find my desired row, and print it.
Matrix_with_nan = a
for matrix in range(len(Matrix_with_nan)):
matrix_of_interest = Unpack_Matrix_with_NaN(a, matrix)
row = matrix_of_interest[1]
print(row)
Output:
[3. 3. 3.]
[2. 2.]
I also made a function to pack matrices when more than one nan needs to be added per row:
import numpy as np
a = np.array([[3,3,3],[3,3,3],[3,3,3]]).astype(float)
b = np.array([[2,2],[2,2],[2,2]]).astype(float)
c = np.array([[4,4,4,4],[4,4,4,4],[4,4,4,4]]).astype(float)
# Extend each vector in array with Nan to reach same shape
def Pack_Matrices_with_NaN(List_of_matrices, Matrix_size):
Matrix_with_nan = np.arange(Matrix_size)
for array in List_of_matrices:
start_position = len(array[0])
for x in range(start_position,Matrix_size):
array = np.insert(array, (x), np.nan, axis=1)
Matrix_with_nan = np.vstack([Matrix_with_nan, array])
Matrix_with_nan = Matrix_with_nan[1:]
return Matrix_with_nan
arrays = [a,b,c]
packed_matrices = Pack_Matrices_with_NaN(arrays, 5)
print(packed_matrices)
Output:
[[ 3. 3. 3. nan nan]
[ 3. 3. 3. nan nan]
[ 3. 3. 3. nan nan]
[ 2. 2. nan nan nan]
[ 2. 2. nan nan nan]
[ 2. 2. nan nan nan]
[ 4. 4. 4. 4. nan]
[ 4. 4. 4. 4. nan]
[ 4. 4. 4. 4. nan]]
I have tried code below to multiply float element of matrix a which is less then one by any integer but it's not working on the other hand it is working properly for the matrix whose element is not a float i.e if you define matix a = np.arange(9).reshape(3,3) then it's working.
import numpy as np
a = np.linspace(0,1,9).reshape(3,3)
print(a)
print('new matrix')
for x in np.nditer(a, op_flags = ['readwrite']):
if x in range(0,1):
x[...] = 100*x
print(a)
In [130]: a = np.linspace(0,1,9).reshape(3,3)
In [131]: a
Out[131]:
array([[0. , 0.125, 0.25 ],
[0.375, 0.5 , 0.625],
[0.75 , 0.875, 1. ]])
I don't usually recommend using nditer to iterate through an array. It's hard to use right, and rarely, if ever, improves speed. I'm not sure who or what is prompting people to use it. Its docs could use a stronger speed disclaimer.
Anyways, lets examine what's happening.
In [136]: for x in np.nditer(a, op_flags = ['readwrite']):
...: print(type(x), x, x.shape)
...: if x in range(0,1):
...: x[...] = 100*x
...: print('mul')
...:
<class 'numpy.ndarray'> 0.0 ()
mul
<class 'numpy.ndarray'> 0.125 ()
<class 'numpy.ndarray'> 0.25 ()
<class 'numpy.ndarray'> 0.375 ()
<class 'numpy.ndarray'> 0.5 ()
<class 'numpy.ndarray'> 0.625 ()
<class 'numpy.ndarray'> 0.75 ()
<class 'numpy.ndarray'> 0.875 ()
<class 'numpy.ndarray'> 1.0 ()
nditer runs through every element of the array (not rows), producing a 0d view each time (shape ()). Only one of those elements is 0, so it multiplies by 100. None of the others are in range(0,1) (Only 0 in range(0,1), everything else is False).
So the iteration is work, at least as coded, if not as you intend.
a = np.arange(9).reshape(3,3) doesn't change anything. Only the 0 is in range(0,1),
===
Change the if test:
In [146]: a = np.linspace(0,1,9).reshape(3,3)
In [147]: a
Out[147]:
array([[0. , 0.125, 0.25 ],
[0.375, 0.5 , 0.625],
[0.75 , 0.875, 1. ]])
In [148]: for x in np.nditer(a, op_flags = ['readwrite']):
...: if x<1:
...: x[...] = 100*x
...: print('mul')
...:
mul
...
mul
In [149]: a
Out[149]:
array([[ 0. , 12.5, 25. ],
[37.5, 50. , 62.5],
[75. , 87.5, 1. ]])
An alternative to nditer is a flat iteration. In some ways that's messier since it requires a enumerate if we want to modify original values:
In [150]: a = np.linspace(0,1,9).reshape(3,3)
In [151]: for i,v in enumerate(a.flat):
...: if v<1:
...: a.flat[i] *= 100
...:
In [152]: a
Out[152]:
array([[ 0. , 12.5, 25. ],
[37.5, 50. , 62.5],
[75. , 87.5, 1. ]])
But despite some claims in the nditer docs, it isn't faster:
In [153]: %%timeit a=np.linspace(0,1,9).reshape(3,3)
...: for i,v in enumerate(a.flat):
...: if v<1:
...: a.flat[i] *= 100
5.4 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [154]:
In [154]: %%timeit a=np.linspace(0,1,9).reshape(3,3)
...: for x in np.nditer(a, op_flags = ['readwrite']):
...: if x<1:
...: x[...] = 100*x
34.4 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
===
But normally you shouldn't be iterating on an array. A whole-array, vectorized, approach is:
In [157]: mask = a<1
In [158]: mask
Out[158]:
array([[ True, True, True],
[ True, True, True],
[ True, True, False]])
In [159]: a[mask] *= 100
In [160]: a
Out[160]:
array([[ 0. , 12.5, 25. ],
[37.5, 50. , 62.5],
[75. , 87.5, 1. ]])
In [161]: %%timeit a=np.linspace(0,1,9).reshape(3,3)
...: a[a<1] *= 100
12.5 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Ouch! that's slower than the flat enumerate - for this small example. For a much larger a this will do much better.
You can apply the range "manually" in a boolean filter to multiply only the elements you are targeting:
import numpy as np
a = np.linspace(0,1,9).reshape(3,3)
print(a)
print('new matrix')
a[(a>0) & (a<1)] *= 100
print(a)
[[0. 0.125 0.25 ]
[0.375 0.5 0.625]
[0.75 0.875 1. ]]
new matrix
[[ 0. 12.5 25. ]
[37.5 50. 62.5]
[75. 87.5 1. ]]
note that your linear space will only ever generate 1 value that is outside of the range (the last one) so you might as well multiply everything and reassign the last value to 1
I'm searching for an efficient way to create a matrix of occurrences from two arrays that contains indexes, one represents the row indexes in this matrix, the other, the column ones.
eg. I have:
#matrix will be size 4x3 in this example
#array of rows idxs, with values from 0 to 3
[0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3]
#array of columns idxs, with values from 0 to 2
[0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2]
And need to create a matrix of occurrences like:
[[1 0 0]
[0 2 0]
[0 1 2]
[2 1 5]]
I can create an array of one hot vectors in a simple form, but cant get it work when there is more than one occurrence:
n_rows = 4
n_columns = 3
#data
rows = np.array([0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3])
columns = np.array([0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2])
#empty matrix
new_matrix = np.zeros([n_rows, n_columns])
#adding 1 for each [row, column] occurrence:
new_matrix[rows, columns] += 1
print(new_matrix)
Which returns:
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 1. 1.]
[ 1. 1. 1.]]
It seems like indexing and adding a value like this doesn't work when there is more than one occurrence/index, besides printing it seems to work just fine:
print(new_matrix[rows, :])
:
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 1. 0.]
[ 0. 1. 1.]
[ 0. 1. 1.]
[ 0. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
So maybe I'm missing something there? Or this cant be done and I need to search for another way to do it?
Use np.add.at, specifying a tuple of indices:
>>> np.add.at(new_matrix, (rows, columns), 1)
>>> new_matrix
array([[ 1., 0., 0.],
[ 0., 2., 0.],
[ 0., 1., 2.],
[ 2., 1., 5.]])
np.add.at operates on the array in-place, adding 1 as many times to the indices as specified by the (row, columns) tuple.
Approach #1
We can convert those pairs to linear indices and then use np.bincount -
def bincount_app(rows, columns, n_rows, n_columns):
# Get linear index equivalent
lidx = (columns.max()+1)*rows + columns
# Use binned count on the linear indices
return np.bincount(lidx, minlength=n_rows*n_columns).reshape(n_rows,n_columns)
Sample run -
In [242]: n_rows = 4
...: n_columns = 3
...:
...: rows = np.array([0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3])
...: columns = np.array([0, 1, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 2, 2])
In [243]: bincount_app(rows, columns, n_rows, n_columns)
Out[243]:
array([[1, 0, 0],
[0, 2, 0],
[0, 1, 2],
[2, 1, 5]])
Approach #2
Alternatively, we can sort the linear indices and get the counts using slicing to have our second approach, like so -
def mask_diff_app(rows, columns, n_rows, n_columns):
lidx = (columns.max()+1)*rows + columns
lidx.sort()
mask = np.concatenate(([True],lidx[1:] != lidx[:-1],[True]))
count = np.diff(np.flatnonzero(mask))
new_matrix = np.zeros([n_rows, n_columns],dtype=int)
new_matrix.flat[lidx[mask[:-1]]] = count
return new_matrix
Approach #3
This seems like a straight-forward one with sparse matrix csr_matrix as well, as it does accumulation on its own for repeated indices. The benefit is the memory efficiency, given that it's a sparse matrix, which would be noticeable if you are filling a small number of places in the output and a sparse matrix output is okay.
The implementation would look something like this -
from scipy.sparse import csr_matrix
def sparse_matrix_app(rows, columns, n_rows, n_columns):
out_shp = (n_rows, n_columns)
data = np.ones(len(rows),dtype=int)
return csr_matrix((data, (rows, columns)), shape=out_shp)
If you need a regular/dense array, simply do -
sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
Sample output -
In [319]: sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
Out[319]:
array([[1, 0, 0],
[0, 2, 0],
[0, 1, 2],
[2, 1, 5]])
Benchmarking
Other approach(es) -
# #cᴏʟᴅsᴘᴇᴇᴅ's soln
def add_at_app(rows, columns, n_rows, n_columns):
new_matrix = np.zeros([n_rows, n_columns],dtype=int)
np.add.at(new_matrix, (rows, columns), 1)
Timings
Case #1 : Output array of shape (1000, 1000) and no. of indices = 10k
In [307]: # Setup
...: n_rows = 1000
...: n_columns = 1000
...: rows = np.random.randint(0,1000,(10000))
...: columns = np.random.randint(0,1000,(10000))
In [308]: %timeit add_at_app(rows, columns, n_rows, n_columns)
...: %timeit bincount_app(rows, columns, n_rows, n_columns)
...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
1000 loops, best of 3: 1.05 ms per loop
1000 loops, best of 3: 424 µs per loop
1000 loops, best of 3: 1.05 ms per loop
1000 loops, best of 3: 1.41 ms per loop
Case #2 : Output array of shape (1000, 1000) and no. of indices = 100k
In [309]: # Setup
...: n_rows = 1000
...: n_columns = 1000
...: rows = np.random.randint(0,1000,(100000))
...: columns = np.random.randint(0,1000,(100000))
In [310]: %timeit add_at_app(rows, columns, n_rows, n_columns)
...: %timeit bincount_app(rows, columns, n_rows, n_columns)
...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
100 loops, best of 3: 11.4 ms per loop
1000 loops, best of 3: 1.27 ms per loop
100 loops, best of 3: 7.44 ms per loop
10 loops, best of 3: 20.4 ms per loop
Case #3 : Sparse-ness in output
As stated earlier, for the sparse method to work better, we would need sparse-ness. Such a case would be like this -
In [314]: # Setup
...: n_rows = 5000
...: n_columns = 5000
...: rows = np.random.randint(0,5000,(1000))
...: columns = np.random.randint(0,5000,(1000))
In [315]: %timeit add_at_app(rows, columns, n_rows, n_columns)
...: %timeit bincount_app(rows, columns, n_rows, n_columns)
...: %timeit mask_diff_app(rows, columns, n_rows, n_columns)
...: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns)
100 loops, best of 3: 11.7 ms per loop
100 loops, best of 3: 11.1 ms per loop
100 loops, best of 3: 11.1 ms per loop
1000 loops, best of 3: 269 µs per loop
If you need a dense array, we lose the memory efficiency and hence performance one as well -
In [317]: %timeit sparse_matrix_app(rows, columns, n_rows, n_columns).toarray()
100 loops, best of 3: 11.7 ms per loop
I have a (w,h) np array in 2d. I want to make a 3d dimension that has a value greater than 1 and copy its value over along the 3rd dimensions. I was hoping broadcast would do it but it can't. This is how i'm doing it
arr = np.expand_dims(arr, axis=2)
arr = np.concatenate((arr,arr,arr), axis=2)
is there a a faster way to do so?
You can push all dims forward, introducing a singleton dim/new axis as the last dim to create a 3D array and then repeat three times along that one with np.repeat, like so -
arr3D = np.repeat(arr[...,None],3,axis=2)
Here's another approach using np.tile -
arr3D = np.tile(arr[...,None],3)
Another approach that works:
x_train = np.stack((x_train,) * 3, axis=-1)
Better helpful in converting gray a-channel matrix into 3 channel matrix.
img3 = np.zeros((gray.shape[0],gray.shape[1],3))
img3[:,:,0] = gray
img3[:,:,1] = gray
img3[:,:,2] = gray
fig = plt.figure(figsize = (15,15))
plt.imshow(img3)
Another simple approach is to use matrix multiplication - multiplying by a matrix of ones that will essentially copy the values across the new dimension:
a=np.random.randn(4,4) #a.shape = (4,4)
a = np.expand_dims(a,-1) #a.shape = (4,4,1)
a = a*np.ones((1,1,3))
a.shape #(4, 4, 3)
I'd suggest you to use the barebones numpy.concatenate() simply because the below piece of code shows that it's the fastest among all other suggested answers:
# sample 2D array to work with
In [51]: arr = np.random.random_sample((12, 34))
# promote the array `arr` to 3D and then concatenate along `axis 2`
In [52]: arr3D = np.concatenate([arr[..., np.newaxis]]*3, axis=2)
# verify for desired shape
In [53]: arr3D.shape
Out[53]: (12, 34, 3)
You can see the timings below to convince yourselves. (ordered: best to worst):
In [42]: %timeit -n 100000 np.concatenate([arr[..., np.newaxis]]*3, axis=2)
1.94 µs ± 32.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [43]: %timeit -n 100000 np.repeat(arr[..., np.newaxis], 3, axis=2)
4.38 µs ± 46.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [44]: %timeit -n 100000 np.dstack([arr]*3)
5.1 µs ± 57.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [49]: %timeit -n 100000 np.stack([arr]*3, -1)
5.12 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [46]: %timeit -n 100000 np.tile(arr[..., np.newaxis], 3)
7.13 µs ± 85.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Having said that, if you're looking for shortest piece of code, then you can use:
# wrap your 2D array in an iterable and then multiply it by the needed depth
arr3D = np.dstack([arr]*3)
# verify shape
print(arr3D.shape)
(12, 34, 3)
This would work. (I think this would not a recommended way :-) But maybe this is the most closest way you thought.)
np.array([img, img, img]).transpose(1,2,0)
just stacking targets(img) any time you want(3), and make the channel(3) go to the last axis.
Not sure if I understood correctly, but broadcasting seems working to me in this case:
>>> a = numpy.array([[1,2], [3,4]])
>>> c = numpy.zeros((4, 2, 2))
>>> c[0] = a
>>> c[1:] = a+1
>>> c
array([[[ 1., 2.],
[ 3., 4.]],
[[ 2., 3.],
[ 4., 5.]],
[[ 2., 3.],
[ 4., 5.]],
[[ 2., 3.],
[ 4., 5.]]])
I wanted to interleave the rows of two numpy arrays of the same size.
I came up with this solution.
# A and B are same-shaped arrays
A = numpy.ones((4,3))
B = numpy.zeros_like(A)
C = numpy.array(zip(A[::1], B[::1])).reshape(A.shape[0]*2, A.shape[1])
print(C)
Outputs
[[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]]
Is there a cleaner, faster, better, numpy-only way?
It is maybe a bit clearer to do:
A = np.ones((4,3))
B = np.zeros_like(A)
C = np.empty((A.shape[0]+B.shape[0],A.shape[1]))
C[::2,:] = A
C[1::2,:] = B
and it's probably a bit faster as well, I'm guessing.
I find the following approach using numpy.hstack() quite readable:
import numpy as np
a = np.ones((2,3))
b = np.zeros_like(a)
c = np.hstack([a, b]).reshape(4, 3)
print(c)
Output:
[[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]]
It is easy to generalize this to a list of arrays of the same shape:
arrays = [a, b, c,...]
shape = (len(arrays)*a.shape[0], a.shape[1])
interleaved_array = np.hstack(arrays).reshape(shape)
It seems to be a bit slower than the accepted answer of #JoshAdel on small arrays but equally fast or faster on large arrays:
a = np.random.random((3,100))
b = np.random.random((3,100))
%%timeit
...: C = np.empty((a.shape[0]+b.shape[0],a.shape[1]))
...: C[::2,:] = a
...: C[1::2,:] = b
...:
The slowest run took 9.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.3 µs per loop
%timeit c = np.hstack([a,b]).reshape(2*a.shape[0], a.shape[1])
The slowest run took 5.06 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.1 µs per loop
a = np.random.random((4,1000000))
b = np.random.random((4,1000000))
%%timeit
...: C = np.empty((a.shape[0]+b.shape[0],a.shape[1]))
...: C[::2,:] = a
...: C[1::2,:] = b
...:
10 loops, best of 3: 23.2 ms per loop
%timeit c = np.hstack([a,b]).reshape(2*a.shape[0], a.shape[1])
10 loops, best of 3: 21.3 ms per loop
You can stack, transpose, and reshape:
numpy.dstack((A, B)).transpose(0, 2, 1).reshape(A.shape[0]*2, A.shape[1])