Unable to expand numpy array - python

I have an array containing information about images. It contains information about 21495 images in an array named 'shuffled'.
np.shape(shuffled) = (21495, 1)
np.shape(shuffled[0]) = (1,)
np.shape(shuffled[0][0]) = (128, 128, 3) # (These are the image dimensions, with 3 channels of RGB)
How do I convert this array to an array of shape (21495, 128, 128, 3) to feed to my model?

There are 2 ways that I can think of:
One is using the vstack() fucntion of numpy, but it gets quite slow overtime when the size of array starts to increase.
Another way (which I use) is to take an empty list and keep appending the images array to that list using .append(), then finally convert that list to a numpy array.

Try
np.stack(shuffled[:,0])
stack, a form of concatenate, joins a list (or array) of arrays on a new initial dimension. We need to get get rid of the size 1 dimension first.
In [23]: arr = np.empty((4,1),object)
In [24]: for i in range(4): arr[i,0] = np.arange(i,i+6).reshape(2,3)
In [25]: arr
Out[25]:
array([[array([[0, 1, 2],
[3, 4, 5]])],
[array([[1, 2, 3],
[4, 5, 6]])],
[array([[2, 3, 4],
[5, 6, 7]])],
[array([[3, 4, 5],
[6, 7, 8]])]], dtype=object)
In [26]: arr.shape
Out[26]: (4, 1)
In [27]: arr[0,0].shape
Out[27]: (2, 3)
In [28]: np.stack(arr[:,0])
Out[28]:
array([[[0, 1, 2],
[3, 4, 5]],
[[1, 2, 3],
[4, 5, 6]],
[[2, 3, 4],
[5, 6, 7]],
[[3, 4, 5],
[6, 7, 8]]])
In [29]: _.shape
Out[29]: (4, 2, 3)
But beware, if the subarrays differ in shape, say one or two is b/w rather than 3 channel, this won't work.

Related

Understanding Numpy dimensions of arrays

I started looking into Numpy using a 'Python for data analysis'. Why is the array dimension for arr2d is "2", instead of "3". Also why is the dimension for arr3d "3", instead of "2".
I thought the dimension of the array is based on the number of rows? Or this doesn't apply to higher dimensional and multidimensional arrays?
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d.shape
Output: (3, 3)
arr2d.ndim
Output: 2
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d.shape
Output: (2, 2, 3)
arr3d.ndim
Output: 3
well see basically the dimension of the array is not based on the number of rows
basically it is based on the brackets i.e [] that you entered in np.array() method
see
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
in arr2d there are 2 brackets([[]]) or there are 2 opening brackets([[) or its has 2 closing brackets(]]) so its an 2D array of (3,3) i.e 3 rows and 3 columns
similarly
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
in arr3d there are 3 brackets([[[]]]) or there are 3 opening brackets([[[) or or its has 3 closing brackets(]]]) so its an 3D array of (2,2,3) i.e its has 2 arrays of 2 rows and 3 columns
Numpy stores its ndarrays as contiguous blocks of memory. Each element is stored in a sequential manner every n bytes after the previous.
(images referenced from this excellent SO post)
So if your 3D array looks like this -
np.arange(0,16).reshape(2,2,4)
#array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7]],
#
# [[ 8, 9, 10, 11],
# [12, 13, 14, 15]]])
Then in memory its stores as -
When retrieving an element (or a block of elements), NumPy calculates how many strides (of 8 bytes each) it needs to traverse to get the next element in that direction/axis. So, for the above example, for axis=2 it has to traverse 8 bytes (depending on the datatype) but for axis=1 it has to traverse 8*4 bytes, and axis=0 it needs 8*8 bytes.
With this in mind, let's understand what dimensions are in numpy.
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr2d.shape, arr3d.shape)
(3, 3) (2, 2, 3)
These can be considered a 2D matrix and a 3D tensor respectively. Here is an intuitive diagram to show how this would look like.
A 1D numpy array with (ndims=1) is a vector, 2D is a matrix, and 3D is a rank 2 tensor which can be imagined as a cube. The number of values it can store is equal to - array.shape[0] * array.shape[1] * array.shape[2] which in your second case is 2*2*3.
Vector (n,) -> (axis0,) #elements
Matrix (m,n) -> (axis0, axis1) #rows, columns
Tensor2 (l,m,n) -> (axis0, axis1, axis2)
Tensor3 (l,m,n,o) -> (axis0, axis1, axis2, axis3)

how to roll two arrays of diffeent dimesnions into one dimensional array in python

I have two arrays (a,b) of different mXn dimensions
I need to know that how can I roll these two arrays into a single one dimensional array
I used np.flatten() for both a,b array and then rolled them into a single array but what i get is an array containg two one dimensional array(a,b)
a = np.array([[1,2,3,4],[3,4,5,6],[4,5,6,7]]) #3x4 array
b = np.array([ [1,2],[2,3],[3,4],[4,5],[5,6]]) #5x2 array
result = [a.flatten(),b.flatten()]
print(result)
[array([1, 2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7]), array([1, 2, 2, 3, ... 5, 6])]
In matlab , I would do it like this :
res = [a(:);b(:)]
Also, how can I retrieve a and b back from the result?
Use ravel + concatenate:
>>> np.concatenate((a.ravel(), b.ravel()))
array([1, 2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6])
ravel returns a 1D view of the arrays, and is a cheap operation. concatenate joins the views together, returning a new array.
As an aside, if you want to be able to retrieve these arrays back, you'll need to store their shapes in some variable.
i = a.shape
j = b.shape
res = np.concatenate((a.ravel(), b.ravel()))
Later, to retrieve a and b from res,
a = res[:np.prod(i)].reshape(i)
b = res[np.prod(i):].reshape(j)
a
array([[1, 2, 3, 4],
[3, 4, 5, 6],
[4, 5, 6, 7]])
b
array([[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6]])
How about changing the middle line to:
result = [a.flatten(),b.flatten()].flatten()
Or even more simply (if you know there's always exactly 2 arrays)
result = a.flatten() + b.flatten()

Put numpy arrays split with np.split() back together

I have split a numpy array like so:
x = np.random.randn(10,3)
x_split = np.split(x,5)
which splits x equally into five numpy arrays each with shape (2,3) and puts them in a list. What is the best way to combine a subset of these back together (e.g. x_split[:k] and x_split[k+1:]) so that the resulting shape is similar to the original x i.e. (something,3)?
I found that for k > 0 this is possible with you do:
np.vstack((np.vstack(x_split[:k]),np.vstack(x_split[k+1:])))
but this does not work when k = 0 as x_split[:0] = [] so there must be a better and cleaner way. The error message I get when k = 0 is:
ValueError: need at least one array to concatenate
The comment by Paul Panzer is right on target, but since NumPy now gently discourages vstack, here is the concatenate version:
x = np.random.randn(10, 3)
x_split = np.split(x, 5, axis=0)
k = 0
np.concatenate(x_split[:k] + x_split[k+1:], axis=0)
Note the explicit axis argument passed both times (it has to be the same); this makes it easy to adapt the code to work for other axes if needed. E.g.,
x_split = np.split(x, 3, axis=1)
k = 0
np.concatenate(x_split[:k] + x_split[k+1:], axis=1)
np.r_ can turn several slices into a list of indices.
In [20]: np.r_[0:3, 4:5]
Out[20]: array([0, 1, 2, 4])
In [21]: np.vstack([xsp[i] for i in _])
Out[21]:
array([[9, 7, 5],
[6, 4, 3],
[9, 8, 0],
[1, 2, 2],
[3, 3, 0],
[8, 1, 4],
[2, 2, 5],
[4, 4, 5]])
In [22]: np.r_[0:0, 1:5]
Out[22]: array([1, 2, 3, 4])
In [23]: np.vstack([xsp[i] for i in _])
Out[23]:
array([[9, 8, 0],
[1, 2, 2],
[3, 3, 0],
[8, 1, 4],
[3, 2, 0],
[0, 3, 8],
[2, 2, 5],
[4, 4, 5]])
Internally np.r_ has a lot of ifs and loops to handle the slices and their boundaries, but it hides it all from us.
If the xsp (your x_split) was an array, we could do xsp[np.r_[...]], but since it is a list we have to iterate. Well we could also hide that iteration with an operator.itemgetter object.
In [26]: operator.itemgetter(*Out[22])
Out[26]: operator.itemgetter(1, 2, 3, 4)
In [27]: np.vstack(operator.itemgetter(*Out[22])(xsp))

How to use numpy as_strided (from np.stride_tricks) correctly?

I'm trying to reshape a numpy array using numpy.strided_tricks. This is the guide I'm following: https://stackoverflow.com/a/2487551/4909087
My use case is very similar, with the difference being that I need strides of 3.
Given this array:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
I'd like to get:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Here's what I tried:
import numpy as np
as_strided = np.lib.stride_tricks.as_strided
a = np.arange(1, 10)
as_strided(a, (len(a) - 2, 3), (3, 3))
array([[ 1, 2199023255552, 131072],
[ 2199023255552, 131072, 216172782113783808],
[ 131072, 216172782113783808, 12884901888],
[216172782113783808, 12884901888, 768],
[ 12884901888, 768, 1125899906842624],
[ 768, 1125899906842624, 67108864],
[ 1125899906842624, 67108864, 4]])
I was pretty sure I'd followed the example to a T, but evidently not. Where am I going wrong?
The accepted answer (and discussion) is good, but for the benefit of readers who don't want to run their own test case, I'll try to illustrate what's going on:
In [374]: a = np.arange(1,10)
In [375]: as_strided = np.lib.stride_tricks.as_strided
In [376]: a.shape
Out[376]: (9,)
In [377]: a.strides
Out[377]: (4,)
For a contiguous 1d array, strides is the size of the element, here 4 bytes, an int32. To go from one element to the next it steps forward 4 bytes.
What the OP tried:
In [380]: as_strided(a, shape=(7,3), strides=(3,3))
Out[380]:
array([[ 1, 512, 196608],
[ 512, 196608, 67108864],
[ 196608, 67108864, 4],
[ 67108864, 4, 1280],
[ 4, 1280, 393216],
[ 1280, 393216, 117440512],
[ 393216, 117440512, 7]])
This is stepping by 3 bytes, crossing int32 boundaries, and giving mostly unintelligable numbers. If might make more sense if the dtype had been bytes or uint8.
Instead using a.strides*2 (tuple replication), or (4,4) we get the desired array:
In [381]: as_strided(a, shape=(7,3), strides=(4,4))
Out[381]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Columns and rows both step one element, resulting in a 1 step moving window. We could have also set shape=(3,7), 3 windows 7 elements long.
In [382]: _.strides
Out[382]: (4, 4)
Changing strides to (8,4) steps 2 elements for each window.
In [383]: as_strided(a, shape=(7,3), strides=(8,4))
Out[383]:
array([[ 1, 2, 3],
[ 3, 4, 5],
[ 5, 6, 7],
[ 7, 8, 9],
[ 9, 25, -1316948568],
[-1316948568, 184787224, -1420192452],
[-1420192452, 0, 0]])
But shape is off, showing us bytes off the end of the original databuffer. That could be dangerous (we don't know if those bytes belong to some other object or array). With this size of array we don't get a full set of 2 step windows.
Now step 3 elements for each row (3*4, 4):
In [384]: as_strided(a, shape=(3,3), strides=(12,4))
Out[384]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [385]: a.reshape(3,3).strides
Out[385]: (12, 4)
This is the same shape and strides as a 3x3 reshape.
We can set negative stride values and 0 values. In fact, negative-step slicing along a dimension with a positive stride will give a negative stride, and broadcasting works by setting 0 strides:
In [399]: np.broadcast_to(a, (2,9))
Out[399]:
array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [400]: _.strides
Out[400]: (0, 4)
In [401]: a.reshape(3,3)[::-1,:]
Out[401]:
array([[7, 8, 9],
[4, 5, 6],
[1, 2, 3]])
In [402]: _.strides
Out[402]: (-12, 4)
However, negative strides require adjusting which element of the original array is the first element of the view, and as_strided has no parameter for that.
I have no idea why you think you need strides of 3. You need strides the distance in bytes between one element of a and the next, which you can get using a.strides:
as_strided(a, (len(a) - 2, 3), a.strides*2)
I was trying to do a similar operation and run into the same problem.
In your case, as stated in this comment, the problems were:
You were not taking into account the size of your element when stored in memory (int32 = 4, which can be checked using a.dtype.itemsize).
You didn't specify appropriately the number of strides you had to skip, which in your case were also 4, as you were skipping only one element.
I made myself a function based on this answer, in which I compute the segmentation of a given array, using a window of n-elements and specifying the number of elements to overlap (given by window - number_of_elements_to_skip).
I share it here in case someone else needs it, since it took me a while to figure out how stride_tricks work:
def window_signal(signal, window, overlap):
"""
Windowing function for data segmentation.
Parameters:
------------
signal: ndarray
The signal to segment.
window: int
Window length, in samples.
overlap: int
Number of samples to overlap
Returns:
--------
nd-array
A copy of the signal array with shape (rows, window),
where row = (N-window)//(window-overlap) + 1
"""
N = signal.reshape(-1).shape[0]
if (window == overlap):
rows = N//window
overlap = 0
else:
rows = (N-window)//(window-overlap) + 1
miss = (N-window)%(window-overlap)
if(miss != 0):
print('Windowing led to the loss of ', miss, ' samples.')
item_size = signal.dtype.itemsize
strides = (window - overlap) * item_size
return np.lib.stride_tricks.as_strided(signal, shape=(rows, window),
strides=(strides, item_size))
The solution for this case is, according to your code:
as_strided(a, (len(a) - 2, 3), (4, 4))
Alternatively, using the function window_signal:
window_signal(a, 3, 2)
Both return as output the following array:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])

Losing dimensions of a numpy array

I have a numpy array that consists of lists each containing more lists. I have been trying to figure out a smart and fast way to collapse the dimensions of these list using numpy, but without any luck.
What I have looks like this:
>>> np.shape(projected)
(13,)
>>> for i in range(len(projected)):
print np.shape(projected[i])
(130, 3200)
(137, 3200)
.
.
(307, 3200)
(196, 3200)
What I am trying to get is a list that contains all the sub-lists and would be 130+137+..+307+196 long. I have tried using np.reshape() but it gives an error: ValueError: total size of new array must be unchanged
np.reshape(projected,(total_number_of_lists, 3200))
>> ValueError: total size of new array must be unchanged
I have been fiddling around with np.vstack but to no avail. Any help that does not contain a for loop and an .append() would be highly appreciated.
It seems you can just use np.concatenate along the first axis axis=0 like so -
np.concatenate(projected,0)
Sample run -
In [226]: # Small random input list
...: projected = [[[3,4,1],[5,3,0]],
...: [[0,2,7],[8,2,8],[7,3,6],[1,9,0],[4,2,6]],
...: [[0,2,7],[8,2,8],[7,3,6]]]
In [227]: # Print nested lists shapes
...: for i in range(len(projected)):
...: print (np.shape(projected[i]))
...:
(2, 3)
(5, 3)
(3, 3)
In [228]: np.concatenate(projected,0)
Out[228]:
array([[3, 4, 1],
[5, 3, 0],
[0, 2, 7],
[8, 2, 8],
[7, 3, 6],
[1, 9, 0],
[4, 2, 6],
[0, 2, 7],
[8, 2, 8],
[7, 3, 6]])
In [232]: np.concatenate(projected,0).shape
Out[232]: (10, 3)

Categories