Numpy Broadcasting - python

What happens when i make this operation in Numpy?
a = np.ones([500,1])
b = np.ones([5000,])/2
c = a + b
# a.shape (500,1)
# b.shape (5000, )
# c.shape (500, 5000)
I'm having a hard time to figure out what is actually happening in this broadcast.

Numpy assumes for 1 dimensional arrays row vectors, so your summation is indeed between shapes (500, 1) and (1, 5000), which leads to matrix summation.
Since this is not very clear, you should extend your dimensions explicitly:
>>> np.arange(5)[:, None] + np.arange(8)[None, :]
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 2, 3, 4, 5, 6, 7, 8, 9],
[ 3, 4, 5, 6, 7, 8, 9, 10],
[ 4, 5, 6, 7, 8, 9, 10, 11]])

Related

How to reshape an array using np tile

I have a array X structured with shape 2, 5 as follows:
0, 6, 7, 9, 1
2, 4, 6, 2, 7
I'd like to reshape it to repeat each row n times as follows (example uses n = 3):
0, 6, 7, 9, 1
0, 6, 7, 9, 1
0, 6, 7, 9, 1
2, 4, 6, 2, 7
2, 4, 6, 2, 7
2, 4, 6, 2, 7
I have tried to use np.tile as follows, but it repeats as shown below:
np.tile(X, (3, 5))
0, 6, 7, 9, 1
2, 4, 6, 2, 7
0, 6, 7, 9, 1
2, 4, 6, 2, 7
0, 6, 7, 9, 1
2, 4, 6, 2, 7
How might i efficiently create the desired output?
If a be the main array:
a = np.array([0, 6, 7, 9, 1, 2, 4, 6, 2, 7])
we can do this by first reshaping to the desired array shape and then use np.repeat as:
b = a.reshape(2, 5)
final = np.repeat(b, 3, axis=0)
It can be done with np.tile too, but it needs unnecessary extra operations, something as below. So, np.repeat will be the better choice.
test = np.tile(b, (3, 1))
final = np.concatenate((test[::2], test[1::2]))
For complex repeats, I'd use np.kron instead:
np.kron(x, np.ones((2, 1), dtype=int))
For something relatively simple,
np.repeat(x, 2, axis=0)

Select Multiple slices from Numpy array at once

I want to implement a vectorized SGD algorithm and would like to generate multiple mini batches at once.
Suppose data = np.arange(0, 100), miniBatchSize=10, n_miniBatches=10 and indices = np.random.randint(0, n_miniBatches, 5) (5 mini batches). What I would like to achieve is
miniBatches = np.zeros(5, miniBatchSize)
for i in range(5):
miniBatches[i] = data[indices[i]: indices[i] + miniBatchSize]
Is there any way to avoid for loop?
Thanks!
It can be done using stride tricks:
from numpy.lib.stride_tricks import as_strided
a = as_strided(data[:n_miniBatches], shape=(miniBatchSize, n_miniBatches), strides=2*data.strides, writeable=False)
miniBatches = a[:, indices].T
# E.g. indices = array([0, 7, 1, 0, 0])
Output:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

NumPy/PyTorch extract subsets of images

In Numpy, given a stack of large images A of size(N,hl,wl), and coordinates x of size(N) and y of size(N) I want to get smaller images of size (N,16,16)
In a for loop it would look like this:
B=numpy.zeros((N,16,16))
for i in range(0,N):
B[i,:,:]=A[i,y[i]:y[i]+16,x[i]:x[i]+16]
But can I do this just with indexing?
Bonus question: Will this indexing also work in pytorch? If not how can I implement this there?
In numpy slicing is very simple and the same logic works with a pytorch example. For example
imgs = np.random.normal(size=(16,24,24))
imgs[:,0:12,0:12].shape
imgs_tensor = torch.from_numpy(imgs)
imgs_tensor[:,0:12,0:12].size()
where the first : in the slicing indicates to select all the images in the batch. The 2nd and 3rd : indicates the slicing for height and width.
Pretty simple really with view_as_windows from scikit-image, to get those sliding windowed views as a 6D array with the fourth axis being singleton. Then, use advanced-indexing to select the ones we want based off the y and x indices for indexing into the second and third axes of the windowed array to get our B.
Hence, the implementation would be -
from skimage.util.shape import view_as_windows
BSZ = 16, 16 # Blocksize
A6D = view_as_windows(A,(1,BSZ[0],BSZ[1]))
B_out = A6D[np.arange(N),y,x,0]
Explanation
To explain to other readers on what's really going on with the problem, here's a sample run on a smaller dataset and with a blocksize of (2,2) -
1) Input array (3D) :
In [78]: A
Out[78]:
array([[[ 5, 5, 3, 5, 3, 8],
[ 5, *2, 6, 2, 2, 4],
[ 4, 3, 4, 9, 3, 8],
[ 6, 3, 3, 10, 4, 5],
[10, 2, 5, 7, 6, 7],
[ 5, 4, 2, 5, 2, 10]],
[[ 4, 9, 8, 4, 9, 8],
[ 7, 10, 8, 2, 10, 9],
[10, *9, 3, 2, 4, 7],
[ 5, 10, 8, 3, 5, 4],
[ 6, 8, 2, 4, 10, 4],
[ 2, 8, 6, 2, 7, 5]],
[[ *4, 8, 7, 2, 9, 9],
[ 2, 10, 2, 3, 8, 8],
[10, 7, 5, 8, 2, 10],
[ 7, 4, 10, 9, 6, 9],
[ 3, 4, 9, 9, 10, 3],
[ 6, 4, 10, 2, 6, 3]]])
2) y and x indices to index into the second and third axes :
In [79]: y
Out[79]: array([1, 2, 0])
In [80]: x
Out[80]: array([1, 1, 0])
3) Finally the desired output, which is a block each from each of the 2D slice along the first axis and whose starting point (top left corner point) is (y,x) on that 2D slice. Refer to the asterisks in A for those -
In [81]: B
Out[81]:
array([[[ 2, 6],
[ 3, 4]],
[[ 9, 3],
[10, 8]],
[[ 4, 8],
[ 2, 10]]])
This is an implementation of extract_glimpse similar with tf.image.extract_glimpse in PyTorch. It should be satisfied your need:
https://github.com/jimmysue/xvision/blob/main/xvision/ops/extract_glimpse.py#L14

How to find zero elements in a sparse matrix

I know that scipy.sparse.find(A) returns 3 arrays I,J,V each of them containing the rows, columns, and values of the nonzero elements respectively.
What i want is a way to do the same (except the V array) for all zero elements without having to iterate through the matrix since its too large.
Make a small sparse matrix with 10% sparsity:
In [1]: from scipy import sparse
In [2]: M = sparse.random(10,10,.1)
In [3]: M
Out[3]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in COOrdinate format>
The 10 nonzero values:
In [5]: sparse.find(M)
Out[5]:
(array([6, 4, 1, 2, 3, 0, 1, 6, 9, 6], dtype=int32),
array([1, 2, 3, 3, 3, 4, 4, 4, 5, 8], dtype=int32),
array([ 0.91828586, 0.29763717, 0.12771201, 0.24986069, 0.14674883,
0.56018409, 0.28643427, 0.11654358, 0.8784731 , 0.13253971]))
If, out of the 100 elements of the matrix, 10 are nonzero, then 90 elements are zero. Do you really want the indices of all of those?
where or nonzero on the dense equivalent gives the same indices:
In [6]: A = M.A # dense
In [7]: np.where(A)
Out[7]:
(array([0, 1, 1, 2, 3, 4, 6, 6, 6, 9], dtype=int32),
array([4, 3, 4, 3, 3, 2, 1, 4, 8, 5], dtype=int32))
And the indices of the 90 zero values:
In [8]: np.where(A==0)
Out[8]:
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9], dtype=int32),
array([0, 1, 2, 3, 5, 6, 7, 8, 9, 0, 1, 2, 5, 6, 7, 8, 9, 0, 1, 2, 4, 5, 6,
7, 8, 9, 0, 1, 2, 4, 5, 6, 7, 8, 9, 0, 1, 3, 4, 5, 6, 7, 8, 9, 0, 1,
2, 3, 4, 5, 6, 7, 8, 9, 0, 2, 3, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 6, 7, 8, 9], dtype=int32))
That's 2 arrays of shape (90,), 180 integers, as opposed to the 100 values in the the dense array itself. If your sparse matrix is too large to convert to dense, it will be too large to produce all the zero indices (assuming reasonable sparsity).
The print(M) shows the same triplets as the find. The attributes of the coo format also give the nonzero indices:
In [13]: M.row
Out[13]: array([6, 6, 3, 4, 1, 6, 9, 2, 1, 0], dtype=int32)
In [14]: M.col
Out[14]: array([1, 4, 3, 2, 3, 8, 5, 3, 4, 4], dtype=int32)
(Sometimes manipulation of a matrix can set values to 0 without removing them from the attributes. So find/nonzero takes an added step to remove those, if any.)
We could apply find to M==0 as well - but sparse will give us a warning.
In [15]: sparse.find(M==0)
/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:213: SparseEfficiencyWarning: Comparing a sparse matrix with 0 using == is inefficient, try using != instead.
", try using != instead.", SparseEfficiencyWarning)
It's the same thing that I've been warning about - the large size of this set. The resulting arrays are the same as in Out[8].
Assuming you have a scipy sparse array and have imported find:
from itertools import product
I, J, _= find(your_sparse_array)
nonzero = zip(I, J)
nrows, ncols = your_sparse_array.shape
for a, b in product(range(nrows), range(ncols)):
if (a,b) not in nonzero: print(a, b)

Skip every nth index of numpy array

In order to do K-fold validation I would like to use slice a numpy array such that a view of the original array is made but with every nth element removed.
For example:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
If n = 4 then the result would be
[1, 2, 4, 5, 6, 8, 9]
Note: the numpy requirement is due to this being used for a machine learning assignment where the dependencies are fixed.
Approach #1 with modulus
a[np.mod(np.arange(a.size),4)!=0]
Sample run -
In [255]: a
Out[255]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [256]: a[np.mod(np.arange(a.size),4)!=0]
Out[256]: array([1, 2, 3, 5, 6, 7, 9])
Approach #2 with masking : Requirement as a view
Considering the views requirement, if the idea is to save on memory, we could store the equivalent boolean array that would occupy 8 times less memory on Linux system. Thus, such a mask based approach would be like so -
# Create mask
mask = np.ones(a.size, dtype=bool)
mask[::4] = 0
Here's the memory requirement stat -
In [311]: mask.itemsize
Out[311]: 1
In [312]: a.itemsize
Out[312]: 8
Then, we could use boolean-indexing as a view -
In [313]: a
Out[313]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [314]: a[mask] = 10
In [315]: a
Out[315]: array([ 0, 10, 10, 10, 4, 10, 10, 10, 8, 10])
Approach #3 with NumPy array strides : Requirement as a view
You can use np.lib.stride_tricks.as_strided to create such a view given the length of the input array is a multiple of n. If it's not a multiple, it would still work, but won't be a safe practice, as we would be going beyond the memory allocated for input array. Please note that the view thus created would be 2D.
Thus, an implementaion to get such a view would be -
def skipped_view(a, n):
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
return strided(a,shape=((a.size+n-1)//n,n),strides=(n*s,s))[:,1:]
Sample run -
In [50]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) # Input array
In [51]: a_out = skipped_view(a, 4)
In [52]: a_out
Out[52]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
In [53]: a_out[:] = 100 # Let's prove output is a view indeed
In [54]: a
Out[54]: array([ 0, 100, 100, 100, 4, 100, 100, 100, 8, 100, 100, 100])
numpy.delete :
In [18]: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [19]: arr = np.delete(arr, np.arange(0, arr.size, 4))
In [20]: arr
Out[20]: array([1, 2, 3, 5, 6, 7, 9])
The slickest answer that I found is using delete with i being the nth index which you want to skip:
del list[i-1::i]
Example:
In [1]: a = list([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [2]: del a[4-1::4]
In [3]: print(a)
Out[3]: [0, 1, 2, 4, 5, 6, 8, 9]
If you also want to skip the first value, use a[1:].

Categories