how numpy index is working - python

I was checking indexing in numpy array but got confused in below case, please tell me why I am getting different output when I am converting a list to an array. what am I doing wrong?
In [124]: a = np.arange(12).reshape(3, 4)
Out[125]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [126]: j = [[0, 1], [1, 2]]
In [127]: a[j]
Out[127]: array([1, 6])
In [128]: a[np.array(j)]
Out[128]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])

There's a bit of undocumented backward compatibility handling where if the selection object is a single short non-ndarray sequence containing embedded sequences, the sequence is handled as if it were a tuple. That means this:
a[j]
is treated as
a[[0, 1], [1, 2]]
instead of as the docs would have you expect.

Related

How to merge multiple Numpy array into single array

I want to merge multiple 2d Numpy array of shapes let say (r, a) ,(r,b) ,(r,c),...(r,z) into single 2d array of shape (r,a+b+c...+z)
I tried np.hstack but it needs the same shape & np.concat operates only on tuple as 2nd array.
You can use np.concatenate or np.hstack. Here is an example:
>>> a = np.arange(15).reshape(5,3)
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
>>> b = np.arange(10).reshape(5,2)
>>> b
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> np.concatenate((a,b), axis =1)
array([[ 0, 1, 2, 0, 1],
[ 3, 4, 5, 2, 3],
[ 6, 7, 8, 4, 5],
[ 9, 10, 11, 6, 7],
[12, 13, 14, 8, 9]])
>>> np.hstack((a,b))
array([[ 0, 1, 2, 0, 1],
[ 3, 4, 5, 2, 3],
[ 6, 7, 8, 4, 5],
[ 9, 10, 11, 6, 7],
[12, 13, 14, 8, 9]])
Hope it helps
I am new to numpy but I think its not possible. Precondition is
"The arrays must have the same shape along all but the second axis, except 1-D arrays which can be any length."
Actually one of my function was returning scipy.sparse.csr.csr_matrix and I was converting it into np.array along with lists returned by another function so that I can merge all them but the sparse matrix was converted into
array(<73194x17 sparse matrix of type '' with 203371 stored elements in Compressed Sparse Row format>, dtype=object)
which was not compatible with np.hstack.
so sorry for the inconvenience.
I figured out my solution instead of numpy.hstack i used scipy hstack function.
Thank you, Everyone, for responding.

Numpy - Minimum memory usage when slicing images?

I have a memory usage problem in python but haven't been able to find a satisfying solution yet.
The problem is quite simple :
I have collection of images as numpy arrays of shape (n_samples, size_image). I need to slice each image in the same way and feed these slices to a classification algorithm all at once.
How do you take numpy array slices without duplicating data in memory?
Naively, as slices are simple "views" of the original data, I assume that there must be a way to do the slicing without copying data in the memory.
The problem being critical when dealing with large datasets such as the MNIST handwritten digits dataset.
I have tried to find a solution using numpy.lib.stride_tricks.as_strided but struggle to get it work on collections of images.
A similar toy problem would be to slice the scikit handwritten digits in a memory-friendly way.
from sklearn.datasets import load_digits
digits = load_digits()
X = digits.data
X has shape (1797, 64) , i.e. the picture is a 8x8 element.
With a window size of 6x6 it will give (8-6+1)*(8-6+1) = 9 slices of size 36 per image resulting in an array sliced_Xof shape (16173, 36).
Now the question is how do you get from X to sliced_Xwithout using too much memory???
I would start off assuming that the input array is (M,n1,n2) (if it's not we can always reshape it). Here's an implementation to have a sliding windowed view into it with an output array of shape (M,b1,b2,n1-b1+1,n2-b2+1) with the block size being (b1,b2) -
def strided_lastaxis(a, blocksize):
d0,d1,d2 = a.shape
s0,s1,s2 = a.strides
strided = np.lib.stride_tricks.as_strided
out_shp = (d0,) + tuple(np.array([d1,d2]) - blocksize + 1) + blocksize
return strided(a, out_shp, (s0,s1,s2,s1,s2))
Being a view it won't occupy anymore of memory space, so we are doing okay on memory. But keep in mind that we shouldn't reshape, as that would force a memory copy.
Here's a sample run to make things with a manual check -
Setup input and get output :
In [72]: a = np.random.randint(0,9,(2, 6, 6))
In [73]: out = strided_lastaxis(a, blocksize=(4,4))
In [74]: np.may_share_memory(a, out) # Verify this is a view
Out[74]: True
In [75]: a
Out[75]:
array([[[1, 7, 3, 5, 6, 3],
[3, 2, 3, 0, 1, 5],
[6, 3, 5, 5, 3, 5],
[0, 7, 0, 8, 2, 4],
[0, 3, 7, 3, 4, 4],
[0, 1, 0, 8, 8, 1]],
[[4, 1, 4, 5, 0, 8],
[0, 6, 5, 6, 6, 7],
[6, 3, 1, 8, 6, 0],
[0, 1, 1, 7, 6, 8],
[6, 3, 3, 1, 6, 1],
[0, 0, 2, 4, 8, 3]]])
In [76]: out.shape
Out[76]: (2, 3, 3, 4, 4)
Output values :
In [77]: out[0,0,0]
Out[77]:
array([[1, 7, 3, 5],
[3, 2, 3, 0],
[6, 3, 5, 5],
[0, 7, 0, 8]])
In [78]: out[0,0,1]
Out[78]:
array([[7, 3, 5, 6],
[2, 3, 0, 1],
[3, 5, 5, 3],
[7, 0, 8, 2]])
In [79]: out[0,0,2]
Out[79]:
array([[3, 5, 6, 3],
[3, 0, 1, 5],
[5, 5, 3, 5],
[0, 8, 2, 4]]) # ............
In [80]: out[1,2,2] # last block
Out[80]:
array([[1, 8, 6, 0],
[1, 7, 6, 8],
[3, 1, 6, 1],
[2, 4, 8, 3]])

Easiest way to create a matrix with pre-determined dimension and values

I have a matrix with dimention (2,5) and I have have a vector of values to be fill in that matrix. What is the best way. I can think of three methods but I have trouble using the np.empty & fill and np.full without loops
x=np.array(range(0,10))
mat=x.reshape(2,5)
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
mat=np.empty((2,5))
newMat=mat.fill(x) # Error: The x has to be scalar
mat=np.full((2,5),x) # Error: The x has to be scalar
full and fill are for setting all elements the same
In [557]: np.full((2,5),10)
Out[557]:
array([[10, 10, 10, 10, 10],
[10, 10, 10, 10, 10]])
Assigning an array works provided the shapes match (in the broadcasting sense):
In [558]: arr[...] = x.reshape(2,5) # make source the same shape as target
In [559]: arr
Out[559]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [560]: arr.flat = x # make target same shape as source
In [561]: arr
Out[561]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
arr.flat and arr.ravel() are equivalent. Well, not quite:
In [562]: arr.flat = x.reshape(2,5) # don't need the [:] with flat #wim
In [563]: arr
Out[563]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [564]: arr.ravel()[:] = x.reshape(2,5)
ValueError: could not broadcast input array from shape (2,5) into shape (10)
In [565]: arr.ravel()[:] = x.reshape(2,5).flat
flat works with any shape source, even ones that require replication
In [570]: arr.flat = [1,2,3]
In [571]: arr
Out[571]:
array([[1, 2, 3, 1, 2],
[3, 1, 2, 3, 1]])
More broadcasted inputs
In [572]: arr[...] = np.ones((2,1))
In [573]: arr
Out[573]:
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
In [574]: arr[...] = np.arange(5)
In [575]: arr
Out[575]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
An example of the problem Eric mentioned. The ravel (or other reshape) of a transpose is (often) a copy. So writing to that does not modify the original.
In [578]: arr.T.ravel()[:]=10
In [579]: arr
Out[579]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In [580]: arr.T.flat=10
In [581]: arr
Out[581]:
array([[10, 10, 10, 10, 10],
[10, 10, 10, 10, 10]])
ndarray.flat returns an object which can modify the contents of the array by direct assignment:
>>> array = np.empty((2,5), dtype=int)
>>> vals = range(10)
>>> array.flat = vals
>>> array
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
If that seems kind of magical to you, then read about the descriptor protocol.
Warning: assigning to flat does not raise exceptions for size mismatch. If there are not enough values on the right hand side of the assignment, the data will be rolled/repeated. If there are too many values, only the first few will be used.
If you want a 10x2 matrix of 5:
np.ones((10,2))*5
If you have a list of values and just want them in a particular shape:
datavalues = [1,2,3,4,5,6,7,8,9,10]
np.reshape(datavalues,(2,5))
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])

Efficient way of making a list of pairs from an array in Numpy

I have a numpy array x (with (n,4) shape) of integers like:
[[0 1 2 3],
[1 2 7 9],
[2 1 5 2],
...]
I want to transform the array into an array of pairs:
[0,1]
[0,2]
[0,3]
[1,2]
...
so first element makes a pair with other elements in the same sub-array. I have already a for-loop solution:
y=np.array([[x[j,0],x[j,i]] for i in range(1,4) for j in range(0,n)],dtype=int)
but since looping over numpy array is not efficient, I tried slicing as the solution. I can do the slicing for every column as:
y[1]=np.array([x[:,0],x[:,1]]).T
# [[0,1],[1,2],[2,1],...]
I can repeat this for all columns. My questions are:
How can I append y[2] to y[1],... such that the shape is (N,2)?
If number of columns is not small (in this example 4), how can I find y[i] elegantly?
What are the alternative ways to achieve the final array?
The cleanest way of doing this I can think of would be:
>>> x = np.arange(12).reshape(3, 4)
>>> x
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> n = x.shape[1] - 1
>>> y = np.repeat(x, (n,)+(1,)*n, axis=1)
>>> y
array([[ 0, 0, 0, 1, 2, 3],
[ 4, 4, 4, 5, 6, 7],
[ 8, 8, 8, 9, 10, 11]])
>>> y.reshape(-1, 2, n).transpose(0, 2, 1).reshape(-1, 2)
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
This will make two copies of the data, so it will not be the most efficient method. That would probably be something like:
>>> y = np.empty((x.shape[0], n, 2), dtype=x.dtype)
>>> y[..., 0] = x[:, 0, None]
>>> y[..., 1] = x[:, 1:]
>>> y.shape = (-1, 2)
>>> y
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
Like Jaimie, I first tried a repeat of the 1st column followed by reshaping, but then decided it was simpler to make 2 intermediary arrays, and hstack them:
x=np.array([[0,1,2,3],[1,2,7,9],[2,1,5,2]])
m,n=x.shape
x1=x[:,0].repeat(n-1)[:,None]
x2=x[:,1:].reshape(-1,1)
np.hstack([x1,x2])
producing
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])
There probably are other ways of doing this sort of rearrangement. The result will copy the original data in one way or other. My guess is that as long as you are using compiled functions like reshape and repeat, the time differences won't be significant.
Suppose the numpy array is
arr = np.array([[0, 1, 2, 3],
[1, 2, 7, 9],
[2, 1, 5, 2]])
You can get the array of pairs as
import itertools
m, n = arr.shape
new_arr = np.array([x for i in range(m)
for x in itertools.product(a[i, 0 : 1], a[i, 1 : n])])
The output would be
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])

Row-wise scaling with Numpy

I have an array H of dimension MxN, and an array A of dimension M . I want to scale H rows with array A. I do it this way, taking advantage of element-wise behaviour of Numpy
H = numpy.swapaxes(H, 0, 1)
H /= A
H = numpy.swapaxes(H, 0, 1)
It works, but the two swapaxes operations are not very elegant, and I feel there is a more elegant and consise way to achieve the result, without creating temporaries. Would you tell me how ?
I think you can simply use H/A[:,None]:
In [71]: (H.swapaxes(0, 1) / A).swapaxes(0, 1)
Out[71]:
array([[ 8.91065496e-01, -1.30548362e-01, 1.70357901e+00],
[ 5.06027691e-02, 3.59913305e-01, -4.27484490e-03],
[ 4.72868136e-01, 2.04351398e+00, 2.67527572e+00],
[ 7.87239835e+00, -2.13484271e+02, -2.44764975e+02]])
In [72]: H/A[:,None]
Out[72]:
array([[ 8.91065496e-01, -1.30548362e-01, 1.70357901e+00],
[ 5.06027691e-02, 3.59913305e-01, -4.27484490e-03],
[ 4.72868136e-01, 2.04351398e+00, 2.67527572e+00],
[ 7.87239835e+00, -2.13484271e+02, -2.44764975e+02]])
because None (or newaxis) extends A in dimension (example link):
In [73]: A
Out[73]: array([ 1.1845468 , 1.30376536, -0.44912446, 0.04675434])
In [74]: A[:,None]
Out[74]:
array([[ 1.1845468 ],
[ 1.30376536],
[-0.44912446],
[ 0.04675434]])
You just need to reshape A so that it will broad cast properly:
A = A.reshape((-1, 1))
so:
In [21]: M
Out[21]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20]])
In [22]: A
Out[22]: array([1, 2, 3, 4, 5, 6, 7])
In [23]: M / A.reshape((-1, 1))
Out[23]:
array([[0, 1, 2],
[1, 2, 2],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2]])

Categories