numpy.shape gives inconsistent responses - why? - python

Why does the program
import numpy as np
c = np.array([1,2])
print(c.shape)
d = np.array([[1],[2]]).transpose()
print(d.shape)
give
(2,)
(1,2)
as its output? Shouldn't it be
(1,2)
(1,2)
instead? I got this in both python 2.7.3 and python 3.2.3

When you invoke the .shape attribute of a ndarray, you get a tuple with as many elements as dimensions of your array. The length, ie, the number of rows, is the first dimension (shape[0])
You start with an array : c=np.array([1,2]). That's a plain 1D array, so its shape will be a 1-element tuple, and shape[0] is the number of elements, so c.shape = (2,)
Consider c=np.array([[1,2]]). That's a 2D array, with 1 row. The first and only row is [1,2], that gives us two columns. Therefore, c.shape=(1,2) and len(c)=1
Consider c=np.array([[1,],[2,]]). Another 2D array, with 2 rows, 1 column: c.shape=(2,1) and len(c)=2.
Consider d=np.array([[1,],[2,]]).transpose(): this array is the same as np.array([[1,2]]), therefore its shape is (1,2).
Another useful attribute is .size: that's the number of elements across all dimensions, and you have for an array c c.size = np.product(c.shape).
More information on the shape in the documentation.

len(c.shape) is the "depth" of the array.
For c, the array is just a list (a vector), the depth is 1.
For d, the array is a list of lists, the depth is 2.
Note:
c.transpose()
# array([1, 2])
which is not d, so this behaviour is not inconsistent.
dt = d.transpose()
# array([[1],
# [2]])
dt.shape # (2,1)

Quick Fix: check the .ndim property - if its 2, then the .shape property will work as you expect.
Reason Why: if the .ndim property is 2, then numpy reports a shape value that agrees with the convention. If the .ndim property is 1, then numpy just reports shape in a different way.
More talking: When you pass np.array a lists of lists, the .shape property will agree with standard notions of the dimensions of a matrix: (rows, columns).
If you pass np.array just a list, then numpy doesn't think it has a matrix on its hands, and reports the shape in a different way.
The question is: does numpy think it has a matrix, or does it think it has something else on its hands.

transpose does not change the number of dimensions of the array. If c.ndim == 1, c.transpose() == c. Try:
c = np.array([1,2])
print c.shape
print c.T.shape
c = np.atleast_2d(c)
print c.shape
print c.T.shape

Coming from Matlab, I also find it difficult that a single-dimensional array is not organized as (row_count, colum_count)
My function had to respond consistently on a single-dimensional ndarray like [x1, x2, x3] or a list of arrays [[x1, x2, x3], [x1, x2, x3], [x1, x2, x3]].
This worked for me:
dim = np.shape(subtract_matrix)[-1]
Picking the last dimension.

Related

Numpy shape function for single dimension array

I've started to learn NumPy, when I create an array and then invoke the .shape function, I understand how it works for most cases. However, the result does not make sense to me for a single-dimensional array. Can someone please explain the outcome?
array = np.array([4,5,6])
print(array.shape)
The outcome is (3,)
Output tulpe of ints in the "np.shape" function gives the lengths of the corresponding array dimension, and this tuple will be (n,m), in which n and m indicate row and columns, respectively. For a single dimension array, this Tuple will be just (n,), in which n indicates the number of array elements.

How to visualize/connect vectors, matrices and representations in Python and numpy arrays?

I am having trouble visualizing scalars, vectors and matrices as how they are written in a math/physics class to how they would be represented in plain Python and numpy and their corresponding notions of dimensions, axes and shapes.
If I have a scalar, say 5
>>> b = np.array(5)
>>> np.ndim(b)
0
I have 0 dimensions for 5 but what are the axes here? There are functions for ndim and shape but not axes.
For a vector like this:
we say that we have 2 dimensions in physics/math class because it represents a 2D vector but it looks like numpy uses a different notion of this.
Why is it that ndim gives 1 and shape gives what the dimension is?
>>> c = np.array([1,-3])
>>> c
array([ 1, -3])
>>> c.ndim
1
>>> c.shape
(2,)
np.ndim gives 1 then?
I have looked at this tutorial on axes but haven't been able to get how the axes then apply here.
How would you represent the vector above in Python and numpy? Would this be [1, -3] in Python or [[1], [-3]]? How about in numpy? Would it be np.array([1, -3]) or np.array([[1], [-3]])? Which I tend to write, for my eyes' sake, as
np.array([
[1],
[-3]
])
Other than vectors, how would this matrix be represented in both plain Python and numpy? The documentation states that we need to use np.arrays instead.
It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future.
When it comes to multidimensional arrays, how would I represent/visualize multiple values for all the points in a 3D cube? Say we have a Rubik's cube and each of the sub cubes has a temperature and a color represented with red, green and blue so 4 values for each cube?
A scalar is not an array, so it has 0 dimensions.
np.array([1,-3]) is a 1D array, so c.shape returns a tuple with only one element (2,), just the first dimension and it's telling you there is only 1 dimension and 2 elements in that dimension.
You are correct np.array([[1], [-3]]) is the vector you have in 2. c.shape gives (2,1) meaning there are 2 rows and 1 column. c.ndim gives 2 since there are 2 dimensions x and y. It's a 2D/planar array
For 3., you would create it as np.array([[1,2,3], [4,5,6], [7,8,9]]). shape returns (3,3) meaning 3 rows and 3 columns. ndim returns 2 because it's still a 2D/planar array.
A ndarray has a shape, a tuple. ndim is the length of that tuple, and may be 0. The array has ndim axes (sometimes called dimensions).
np.array(5)
has shape (), 0 ndim and no axes.
np.array([1,2,3,4])
has (4,) shape, and 1 axis. It can be reshaped to (4,1), or (1,4) or (2,2) or even (2,1,2) or (1,4,1).
Your A can be created with
A = np.arange(1,10).reshape(3,3)
That's a 9 element 1d array reshaped to (3,3)
numpy arrays have a print display, with [] marking dimensional nesting. A.tolist() produces a list with 3 elements, each a 3 element list.
Rows, columns, planes are useful ways of talking about arrays, but are not a formal part of their definition.

Vstack of two arrays with same number of rows gives an error

I have a numpy array of shape (29, 10) and a list of 29 elements and I want to end up with an array of shape (29,11)
I am basically converting the list to a numpy array and trying to vstack, but it complain about dimensions not being the same.
Toy example
a = np.zeros((29,10))
a.shape
(29,10)
b = np.array(['A']*29)
b.shape
(29,)
np.vstack((a, b))
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Dimensions do actually match, why am I getting this error and how can I solve it?
I think you are looking for np.hstack.
np.hstack((a, b.reshape(-1,1)))
Moreover b must be 2-dimensional, that's why I used a reshape.
The problem is that you want to append a 1D array to a 2D array.
Also, for the dimension you've given for b, you are probably looking for hstack.
Try this:
a = np.zeros((29,10))
a.shape
(29,10)
b = np.array(['A']*29)[:,None] #to ensure 2D structure
b.shape
(29,1)
np.hstack((a, b))
If you do want to vertically stack, you'd need this:
a = np.zeros((29,10))
a.shape
(29,10)
b = np.array(['A']*10)[None,:] #to ensure 2D structure
b.shape
(1,10)
np.vstack((a, b))

List of simple arrays with pyplot.plot

I have some trouble to understand how pyplot.plot works.
I take a simple example: I want to plot pyplot.plot(lst2, lst2) where lst2 is a list.
The difficulty comes from the fact that each element of lst2 is an array of shape (1,1). If the elements were floating and not array, there would be no problems.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
V2 = np.array([[1]])
W2 = np.array([[2]])
print('The shape of V2 is', V2.shape)
print('The shape of W2 is', W2.shape)
lst2 = [V2, W2]
plt.plot(lst2, lst2)
plt.show
Below is the end of the error message I got:
~\Anaconda3\lib\site-packages\matplotlib\axes\_base.py in _xy_from_xy(self,x, y)
245 if x.ndim > 2 or y.ndim > 2:
246 raise ValueError("x and y can be no greater than 2-D, but have "
--> 247 "shapes {} and {}".format(x.shape, y.shape))
248
249 if x.ndim == 1:
ValueError: x and y can be no greater than 2-D, but have shapes (2, 1, 1) and (2, 1, 1)
What surprised me in the error message is the mention of an array of dimension (2,1,1). It seems like the array np.array([V2,W2]) is built when we call pyplot.plot.
My question is then what happens behind the scenes when we call pyplot.plot(x,y) with x and y list? It seems like an array with the elements of x is built (and same for y). And these arrays must have maximum 2 axis. Am I correct?
I know that if I use numpy.squeeze on V2 and W2, it would work. But I would like to understand what it happening inside pyplot.plot in the example I gave.
Take a closer look at what you're doing:
V2 = np.array([[1]])
W2 = np.array([[2]])
lst2 = [V2, W2]
plt.plot(lst2, lst2)
For some odd reason you're defining your arrays to be of shape (1,1) by using a nested pair of brackets. When you construct lst2, you stack your arrays along a new leading dimension. This has nothing do with pyplot, this is numpy.
Numpy arrays are rectangular, and they are compatible with lists of lists of ... of lists. The level of nesting determines the number of dimensions of an array. Look at a simple 2d example:
>>> M = np.arange(2*3).reshape(2,3)
>>> print(repr(M))
array([[0, 1, 2],
[3, 4, 5]])
You can for all intents and purposes think of this 2x3 matrix as two row vectors. M[0] is the same as M[0,:] and is the first row, M[1] is the same as M[1,:] is the second row. You could then also construct this array from the two rows in the following way:
row1 = [0, 1, 2]
row2 = [3, 4, 5]
lst = [row1, row2]
np.array(lst)
My point is that we took two flat lists of length 3 (which are compatible with 1d numpy arrays of shape (3,)), and concatenated them in a list. The result was compatible with a 2d array of shape (2,3). The "2" is due to the fact that we put 2 lists into lst, and the "3" is due to the fact that both lists had a length of 3.
So, when you create lst2 above, you're doing something that is equivalent to this:
lst2 = [ [[1]], [[2]] ]
You put two nested sublists into an array-compatible list, and both sublists are compatible with shape (1,1). This implies that you'll end up with a 3d array (in accordance with the fact that you have three opening brackets at the deepest level of nesting), with shape (2,1,1). Again the 2 comes from the fact that you have two arrays inside, and the trailing dimensions come from the contents.
The real question is what you're trying to do. For one, your data shouldn't really be of shape (1,1). In the most straightforward application of pyplot.plot you have 1d datasets: one for the x and one for the y coordinates of your plot. For this you can use a simple (flat) list or 1d array for both x and y. What matters is that they are of the same length.
Then when you plot the two against each other, you pass the x coordinates first, then the y coordinates second. You presumably meant something like
plt.plot(V2,W2)
In which case you'd pass 2d arrays to plot, and you wouldn't see the error caused by passing a 3d-array-like. However, the behaviour of pyplot.plot is non-trivial for 2d inputs (columns of both datasets will get plotted against one another), and you have to make sure that you really want to pass 2d arrays as inputs. But you almost never want to pass the same object as the first two arguments to pyplot.plot.

Divide an array of arrays by an array of scalars

If I have an array, A, with shape (n, m, o) and an array, B, with shape (n, m), is there a way to divide each array at A[n, m] by the scalar at B[n, m] without a list comprehension?
>>> A.shape
(4,173,1469)
>>> B.shape
(4,173)
>>> # Better way to do:
>>> np.array([[A[i, j] / B[i, j] for j in range(len(B[i]))] for i in range(len(B))])
The problem with a list comprehension is that it is slow, it doesn't return an array (so you have to np.array(_) it, which makes it even slower), it is hard to read, and the whole point of numpy was to move loops from Python to C++ or Fortran.
If A was of shape (n) and B was a scalar (of shape ( )), then this would be trivial: A / B, but this property does not scale with dimensions
>>> A / B
ValueError: operands could not be broadcast together with shapes (4,173,1469) (4,173)
I am looking for a fast way to do this (preferably not by tiling B to an array of shape (n, m, o), and preferably using native numpy tools).
You are absolutely right, there is a better way, I think you are getting the spirit of numpy.
The solution in your case is that you have to add a new dimension to B that consists of one entry in that dimension:
so if your A is of shape (n,m,o) your B has to be of shape (n,m,1) and then you can use the native broadcasting to get your operation "A/B" done.
You can just add that dimension to be by adding a "newaxis" to B there.
import numpy as np
A = np.ones(10,5,3)
B = np.ones(10,5)
Result = A/B[:,:,np.newaxis]
B[:,:,np.newaxis] --> this will turn B into an array of shape of (10,5,1)
From here, the rules of broadcasting are:
When operating on two arrays, NumPy compares their shapes
element-wise. It starts with the trailing dimensions, and works its
way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
Your dimensions are n,m,o and n,m so not compatible.
The / division operator will work using broadcasting if you use:
o,n,m divided by n,m
n,m,o divided by n,m,1

Categories