NumPy using the reshape function to reshape an array [duplicate] - python

This question already has an answer here:
how to reshape an N length vector to a 3x(N/3) matrix in numpy using reshape
(1 answer)
Closed 2 years ago.
I have an array: [1, 2, 3, 4, 5, 6]. I would like to use the numpy.reshape() function so that I end up with this array:
[[1, 4],
[2, 5],
[3, 6]
]
I'm not sure how to do this. I keep ending up with this, which is not what I want:
[[1, 2],
[3, 4],
[5, 6]
]

These do the same thing:
In [57]: np.reshape([1,2,3,4,5,6], (3,2), order='F')
Out[57]:
array([[1, 4],
[2, 5],
[3, 6]])
In [58]: np.reshape([1,2,3,4,5,6], (2,3)).T
Out[58]:
array([[1, 4],
[2, 5],
[3, 6]])
Normally values are 'read' across the rows in Python/numpy. This is call row-major or 'C' order. Read down is 'F', for FORTRAN, and is common in MATLAB, which has Fortran roots.
If you take the 'F' order, make a new copy and string it out, you'll get a different order:
In [59]: np.reshape([1,2,3,4,5,6], (3,2), order='F').copy().ravel()
Out[59]: array([1, 4, 2, 5, 3, 6])

You can set the order in np.reshape, in your case you can use 'F'. See docs for details
>>> arr
array([1, 2, 3, 4, 5, 6])
>>> arr.reshape(-1, 2, order = 'F')
array([[1, 4],
[2, 5],
[3, 6]])

The reason that you are getting that particular result is that arrays are normally allocates in C order. That means that reshaping by itself is not sufficient. You have to tell numpy to change the order of the axes when it steps along the array. Any number of operations will allow you to do that:
Set the axis order to F. F is for Fortran, which, like MATLAB, conventionally uses column-major order:
a.reshape(2, 3, order='F')
Swap the axes after reshaping:
np.swapaxes(a.reshape(2, 3), 0, 1)
Transpose the result:
a.reshape(2, 3).T
Roll the second axis forward:
np.rollaxis(a.reshape(2, 3), 1)
Notice that all but the first case require you to reshape to the transpose.
You can even manually arrange the data
np.stack((a[:3], a[3:]), axis=1)
Note that this will make many unnecessary copies. If you want the data copied, just do
a.reshape(2, 3, order='F').copy()

Related

Stack several 2D arrays to produce a 3D array

I have 4 numpy arrays, each of shape (5,5). I would like to stack them such that I get a new array of shape (5,5,4). I tried using:
N = np.stack((a, b, c, d))
but, as I am new to using numpy, I cannot understand why that is giving a shape of (4, 5, 5) instead of (5, 5, 4). Is there another method I should be using? dstack works but changes my arrays, I think it transposes them.
For example, 4 arrays
[[1,2]
[3,4]]
[[1,2]
[3,4]]
[[1,2]
[3,4]]
[[1,2]
[3,4]]
when stacked I am expecting:
[[[1,2]
[3,4]]
[[1,2]
[3,4]]
[[1,2]
[3,4]]
[[1,2]
[3,4]]]
This is working as expected with stack but would give a shape of (4,2,2) instead of (2,2,4). From my understanding, shape is (rows, columns, depth) Am I wrong in this?
I believe you could concatenate the arrays, and reshape into a 3D array as:
l = [a,b,c,d]
np.concatenate(l).reshape(len(l), *a.shape)
Or if you want to avoid creating that list and know the amount of arrays beforehand:
np.concatenate((a,b,c,d)).reshape(4, *a.shape)
Checking on the shared example:
a = [[1, 2], [3, 4]]
d = c = b = a
np.concatenate((a,b,c,d)).reshape(4, *np.array(a).shape)
array([[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]])
In [10]: arr = np.arange(1,5).reshape(2,2)
In [11]: np.stack((arr,arr,arr))
Out[11]:
array([[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]])
In [12]: _.shape
Out[12]: (3, 2, 2)
Default stack joins the arrays on a new first axis, the same as np.array((arr,arr,arr)).shape
If given an axis parameter it can join them as:
In [13]: np.stack((arr,arr,arr), axis=2)
Out[13]:
array([[[1, 1, 1],
[2, 2, 2]],
[[3, 3, 3],
[4, 4, 4]]])
In [14]: _.shape
Out[14]: (2, 2, 3)
np.dstack does the same thing, where d stands for 'depth`'.
The last dimension (here 3) is displayed as the innermost columns.
Selecting one 'channel' produces a 2d array:
In [17]: np.stack((arr,arr,arr), axis=2)[:,:,0]
Out[17]:
array([[1, 2],
[3, 4]])
For 3 dimensions, the first dimension is blocks or planes, and the middle rows. Those names are conveniences, helping us visualize the action, but don't have inherent means in numpy. For images the last dimension often is called colors or channels, and has size 3 or 4. A 4d array of images could described as
(batches, height, width, color)
But the actual meanings depend on how you are processing the array.

NumPy - Expand and Repeat

Is there a way to "expand" an array and repeat the last element to fill the expansion?
Another post talks about expansion and padding with 0 but I wish to repeat the last value as the pad.
Say I have an array:
[[1, 2],
[3, 4],
[0, 0]]
And I need to insert [5, 6, 6] to replace the [0, 0], obviously NumPy wouldnt allow this. But can I reshape/expand to:
[[1, 2, 2],
[3, 4, 4],
[5, 6, 6]]
I'm reading through a file where the number of values may vary in length, but I need the array to be of the same shape. One way to do this is read through the file first and find the maximum length, then read it again an populate, but the file is 10GB+ so I would prefer to do it on a single pass by "expanding" and backfilling with repeats.
Looks like what you require is numpy.pad using the edge mode. From the doc:
‘edge’
Pads with the edge values of array.
Example code:
>>> ar = np.array([[1,2], [4,5]])
>>> ar
array([[1, 2],
[4, 5]])
>>> np.pad(ar, [(0, 0), (0, 4)], mode="edge")
array([[1, 2, 2, 2, 2, 2],
[4, 5, 5, 5, 5, 5]])
The first (0, 0) tuple specify no padding on the first axis, while the second basically does "add 0 padding to the left and 4 to the right"

How to apply a function on jagged Numpy arrays (unequal row lengths) without using np.apply_along_axis()?

I'm trying to speed up a process, I think this might be possible using numpy's apply_along_axis. The problem is that not all my axis have the same length.
When I do:
a = np.array([[1, 2, 3],
[2, 3, 4],
[4, 5, 6]])
b = np.apply_along_axis(sum, 1, a)
print(b)
This works fine. But I would like to do something similar to (please note that the first row has 4 elements and the rest have 3):
a = np.array([[1, 2, 3, 4],
[2, 3, 4],
[4, 5, 6]])
b = np.apply_along_axis(sum, 1, a)
print(b)
But this fails because:
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
I've looked around and the only 'solution' I've found is to add zeros to make all the arrays the same length, which would probably defeat the purpose of performance improvement.
Is there any way to use numpy_apply_along_axis on a non-regular shaped numpy array?
You can transform your initial array of iterable-objects to ndarray by padding them with zeros in a vectorized manner:
import numpy as np
a = np.array([[1, 2, 3, 4],
[2, 3, 4],
[4, 5, 6]])
max_len = len(max(a, key = lambda x: len(x))) # max length of iterable-objects contained in array
cust_func = np.vectorize(pyfunc=lambda x: np.pad(array=x,
pad_width=(0,max_len),
mode='constant',
constant_values=(0,0))[:max_len], otypes=[list])
a_pad = np.stack(cust_func(a))
output:
array([[1, 2, 3, 4],
[2, 3, 4, 0],
[4, 5, 6, 0]])
It depends.
Do you know the size of the vectors before or are you appending to a list?
see e.g. http://stackoverflow.com/a/58085045/7919597
You could for example pad the arrays
import numpy as np
a1 = [1, 2, 3, 4]
a2 = [2, 3, 4, np.nan] # pad with nan
a3 = [4, 5, 6, np.nan] # pad with nan
b = np.stack([a1, a2, a3], axis=0)
print(b)
# you can apply the normal numpy operations on
# arrays with nan, they usually just result in a nan
# in a resulting array
c = np.diff(b, axis=-1)
print(c)
Afterwards you can apply a moving window on each row over the columns.
Have a look at https://stackoverflow.com/a/22621523/7919597 which is only 1d, but can give you an idea of how it could work.
It is possible to use a 2d array with only one row as kernel (shape e.g. (1, 3)) with scipy.signal.convolve2d and use the idea above.
This is a workaround to get a "row-wise 1D convolution":
from scipy import signal
krnl = np.array([[0, 1, 0]])
d = signal.convolve2d(c, krnl, mode='same')
print(d)

No fortran order in numpy.array

I see no fortran order in:
import numpy as np
In [143]: np.array([[1,2],[3,4]],order='F')
Out[143]:
array([[1, 2],
[3, 4]])
But in the following it works:
In [139]: np.reshape(np.arange(9),newshape=(3,3),order='F')
Out[139]:
array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])
So what am I doing wrong in the first one?
When you call numpy.array to create an array from an existing Python object, it will give you an object with whatever shape that the original Python object has. So,
np.array([[1,2],[3,4]], ...)
Will always give you,
np.array([[1, 2],
[3, 4]])
Which is exactly what you typed in, so it should not come as a surprise. Fortran order and C order do not describe the shape of the data, they describe the memory layout. When you print out an object, NumPy doesn't show you what the memory layout is, it only shows you the shape.
You can witness that the array truly is stored in Fortran order when you flatten it with the "K" order, which keeps the original order of the elements:
>>> a = np.array([[1,2],[3,4]], order="F")
>>> a.flatten(order="K")
array([1, 3, 2, 4])
This is what truly distinguishes Fortran from C order: the memory layout. Most NumPy functions do not force you to consider memory layout, instead, different layouts are handled transparently.
It sounds like what you want is to transpose, reversing the axis order. This can be done simply:
>>> b = numpy.transpose(a)
>>> b
array([[1, 3],
[2, 4]])
This does not create a new array, but a new view of the same array:
>>> b.base is a
True
If you want the data to have the memory layout 1 2 3 4 and have a Fortran order view of that [[1, 3], [2, 4]], the efficient way to do this is to store the existing array with C order and then transpose it, which results in a Fortran-order array with the desired contents and requires no extra copies.
>>> a = np.array([[1, 2], [3, 4]]).transpose()
>>> a.flatten(order="K")
array([1, 2, 3, 4])
>>> a
array([[1, 3],
[2, 4]])
If you store the original with Fortran order, the transposition will result in C order, so you don't want that (or maybe all you care about is the transposition, and memory order is not important?). In either case, the array will look the same in NumPy.
>>> a = np.array([[1, 2], [3, 4]], order="F").transpose()
>>> a.flatten(order="K")
array([1, 3, 2, 4])
>>> a
array([[1, 3],
[2, 4]])
Your two means of constructing the 2D array are not at all equivalent. In the first, you specified the structure of the array. In the second, you formed an array and then reshaped to your liking.
>>> np.reshape([1,2,3,4],newshape=(2,2),order='F')
array([[1, 3],
[2, 4]])
Again, for comparison, even if you ask for the reshape and format change to FORTRAN, you'll get your specified structure:
>>> np.reshape([[1,2],[3,4]],newshape=(2,2),order='F')
array([[1, 2],
[3, 4]])

numpy: how to construct a matrix of vectors from vector of matrix

I'm new to numpy,
so, with numpy, is it possible to use a vector of matrix to get a matrix of vectors"
for example:
matrix1(
[
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]
])
matrix2(
[
[2, 4, 6],
[2, 4, 6],
[2, 4, 6]
])
-->
matrix(
[
[array('1 2'), array('2 4'), array('3 6')],
[array('1 2'), array('2 4'), array('3 6')],
[array('1 2'), array('2 4'), array('3 6')]
])
I'm new to numpy, so I'm not sure if it is allowed to put any thing in numpy's matrix or just numbers.
And it's not easy to get answer from google with descriptions like "matrix of vectors and vectors of matrix"
numpy doesn't have a concept of "vector" separate from "matrix." It does have distinct concepts of "matrix" and "array," but most people avoid the matrix representation entirely. If you use arrays, the concepts of "vector," "matrix," and "tensor" are all subsumed under the general concept of an array's "shape" attribute.
In this worldview, vectors and matrices are both 2-dimensional arrays, distinguished only by their shape. Row vectors are arrays with the shape (1, n), while column vectors are arrays with the shape (n, 1). Matrices are arrays with the shape (n, m). 1-dimensional arrays can behave like vectors sometimes, depending on context, but often you'll find that you won't get what you want unless you "upgrade" them.
With all that in mind, here's one possible answer to your question. First, we create a 1-d array:
>>> a1d = numpy.array([1, 2, 3])
>>> a1d
array([1, 2, 3])
Now we reshape it to create a column vector. The -1 here tells numpy to figure out the right size given the input.
>>> vcol = a1d.reshape((-1, 1))
>>> vcol
array([[1],
[2],
[3]])
Observe the doubled brackets at the beginning and ending of this. That's a subtle cue that this is a 2-d array, even though one dimension has a size of just 1.
We can do the same thing, swapping the dimensions, to get a row. Note again the doubled brackets.
>>> vrow = a1d.reshape((1, -1))
>>> vrow
array([[1, 2, 3]])
You can tell that these are 2-d arrays, because a 1-d array would have only one value in its shape tuple:
>>> a1d.shape
(3,)
>>> vcol.shape
(3, 1)
>>> vrow.shape
(1, 3)
To build a matrix from column vectors we can use hstack. There are lots of other methods that may be faster, but this is a good starting point. Here, note that [vcol] is not a numpy object, but an ordinary python list, so [vcol] * 3 means the same thing as [vcol, vcol, vcol].
>>> mat = numpy.hstack([vcol] * 3)
>>> mat
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
And vstack gives us the same thing from row vectors.
>>> mat2 = numpy.vstack([vrow] * 3)
>>> mat2
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
It's unlikely that any other interpretation of "construct a matrix of vectors from vector of matrix" will generate something you actually want in numpy!
Since you mention wanting to do linear algebra, here are a couple of operations that are possible. This assumes you're using a recent-enough version of python to use the new # operator, which provides an unambiguous inline notation for matrix multiplication of arrays.1
For arrays, multiplication is always element-wise. But sometimes there is broadcasting. For values with the same shape, it's plain element-wise multiplication:
>>> vrow * vrow
array([[1, 4, 9]])
>>> vcol * vcol
array([[1],
[4],
[9]])
When values have different shapes, they are broadcast together if possible to produce a sensible result:
>>> vrow * vcol
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
>>> vcol * vrow
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
Broadcasting works in the way you'd expect for other shapes as well:
>>> vrow * mat
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
>>> vcol * mat
array([[1, 1, 1],
[4, 4, 4],
[9, 9, 9]])
If you want a dot product, you have to use the # operator:
>>> vrow # vcol
array([[14]])
Note that unlike the * operator, this is not symmetric:
>>> vcol # vrow
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
This can be a bit confusing at first, because this looks the same as vrow * vcol, but don't be fooled. * will produce the same result regardless of argument order. Finally, for a matrix-vector product:
>>> mat # vcol
array([[ 6],
[12],
[18]])
Observe again the difference between # and *:
>>> mat * vcol
array([[1, 1, 1],
[4, 4, 4],
[9, 9, 9]])
1. Sadly, this only exists as of Python 3.5. If you need to work with an earlier version, all the same advice applies, except that instead of using inline notation for a # b, you have to use np.dot(a, b). numpy's matrix type overrides * to behave like #... but then you can't do element-wise multiplication or broadcasting the same way! So even if you have an earlier version, I don't recommend using the matrix type.

Categories