I have some trouble to understand how pyplot.plot works.
I take a simple example: I want to plot pyplot.plot(lst2, lst2) where lst2 is a list.
The difficulty comes from the fact that each element of lst2 is an array of shape (1,1). If the elements were floating and not array, there would be no problems.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
V2 = np.array([[1]])
W2 = np.array([[2]])
print('The shape of V2 is', V2.shape)
print('The shape of W2 is', W2.shape)
lst2 = [V2, W2]
plt.plot(lst2, lst2)
plt.show
Below is the end of the error message I got:
~\Anaconda3\lib\site-packages\matplotlib\axes\_base.py in _xy_from_xy(self,x, y)
245 if x.ndim > 2 or y.ndim > 2:
246 raise ValueError("x and y can be no greater than 2-D, but have "
--> 247 "shapes {} and {}".format(x.shape, y.shape))
248
249 if x.ndim == 1:
ValueError: x and y can be no greater than 2-D, but have shapes (2, 1, 1) and (2, 1, 1)
What surprised me in the error message is the mention of an array of dimension (2,1,1). It seems like the array np.array([V2,W2]) is built when we call pyplot.plot.
My question is then what happens behind the scenes when we call pyplot.plot(x,y) with x and y list? It seems like an array with the elements of x is built (and same for y). And these arrays must have maximum 2 axis. Am I correct?
I know that if I use numpy.squeeze on V2 and W2, it would work. But I would like to understand what it happening inside pyplot.plot in the example I gave.
Take a closer look at what you're doing:
V2 = np.array([[1]])
W2 = np.array([[2]])
lst2 = [V2, W2]
plt.plot(lst2, lst2)
For some odd reason you're defining your arrays to be of shape (1,1) by using a nested pair of brackets. When you construct lst2, you stack your arrays along a new leading dimension. This has nothing do with pyplot, this is numpy.
Numpy arrays are rectangular, and they are compatible with lists of lists of ... of lists. The level of nesting determines the number of dimensions of an array. Look at a simple 2d example:
>>> M = np.arange(2*3).reshape(2,3)
>>> print(repr(M))
array([[0, 1, 2],
[3, 4, 5]])
You can for all intents and purposes think of this 2x3 matrix as two row vectors. M[0] is the same as M[0,:] and is the first row, M[1] is the same as M[1,:] is the second row. You could then also construct this array from the two rows in the following way:
row1 = [0, 1, 2]
row2 = [3, 4, 5]
lst = [row1, row2]
np.array(lst)
My point is that we took two flat lists of length 3 (which are compatible with 1d numpy arrays of shape (3,)), and concatenated them in a list. The result was compatible with a 2d array of shape (2,3). The "2" is due to the fact that we put 2 lists into lst, and the "3" is due to the fact that both lists had a length of 3.
So, when you create lst2 above, you're doing something that is equivalent to this:
lst2 = [ [[1]], [[2]] ]
You put two nested sublists into an array-compatible list, and both sublists are compatible with shape (1,1). This implies that you'll end up with a 3d array (in accordance with the fact that you have three opening brackets at the deepest level of nesting), with shape (2,1,1). Again the 2 comes from the fact that you have two arrays inside, and the trailing dimensions come from the contents.
The real question is what you're trying to do. For one, your data shouldn't really be of shape (1,1). In the most straightforward application of pyplot.plot you have 1d datasets: one for the x and one for the y coordinates of your plot. For this you can use a simple (flat) list or 1d array for both x and y. What matters is that they are of the same length.
Then when you plot the two against each other, you pass the x coordinates first, then the y coordinates second. You presumably meant something like
plt.plot(V2,W2)
In which case you'd pass 2d arrays to plot, and you wouldn't see the error caused by passing a 3d-array-like. However, the behaviour of pyplot.plot is non-trivial for 2d inputs (columns of both datasets will get plotted against one another), and you have to make sure that you really want to pass 2d arrays as inputs. But you almost never want to pass the same object as the first two arguments to pyplot.plot.
Related
I want to extract parts of an numpy ndarray based on arrays of index positions for some of the dimensions. Let me show this on an example
Example data
dummy = np.random.rand(5,2,100)
X = np.array([[0,1],[4,1],[2,0]])
dummy is the original ndarray with dimensionality 5x2x100. This dimensionality is arbitrary, it could as well be 5x2x4x100.
X is a matrix of index values, here X[:,0] are the indices of the first dimension of dummy, X[:,1] those of the second dimension. The number of columns in X is always the number of dimensions in dummy minus 1.
Example output
I want to extract an ndarray of the following form for this example
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Complications
If the number of dimensions in dummy were fixed, this could just be done by dummy[X[:,0],X[:,1],:] . Sadly the dimensionality can be different, e.g. dummy could be a 5x2x4x6x100 ndarray and X correspondingly would then be 3x4 . My attempts at dealing with it have not yielded the desired result.
dummy[X,:] yields a 3x2x2x100 ndarray for this example same as dummy[X]
Iteratively reducing dummy by doing something like dummy = dummy[X[:,i],:] with i an iterator over the number of columns of X also does not reduce the ndarray in the example past 3x2x100
I have a feeling that this should be pretty simple with numpy indexing, but I guess my search for a solution was missing the right terms for this.
Does anyone have a solution to this?
I will try to provide some explainability to #Michael Szczesny answer.
First, notice that if you have an np.array with dimension n and pass m indexes where m<n, then it will be the same as using : in the dimensions >=m. In your case, for example:
dummy[(0, 0)] == dummy[0, 0, :]
Given that, note that you can also pass an array as an index. Thus:
dummy[([0, 1], [0, 0])]
It would be the same as:
np.array([dummy[(0,0)], dummy[(1,0)]])
You can validate that using:
dummy[([0, 1], [0, 0])] == np.array([dummy[(0,0)], dummy[(1,0)]])
Finally, notice that:
(*X.T,)
# (array([0, 4, 2]), array([1, 1, 0]))
You are here getting each dimension as an array, and then you will get:
[
dummy[0,1],
dummy[4,1],
dummy[2,0]
]
Which is the same as:
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Edit: Instead of using (*X.T,), you can use tuple(X.T), which for me, makes more sense
as Michael Szczesny wrote, the best solution is dummy[(*X.T,)].
Since X[:,0] are the indices of the first dimension of dummy and X[:,1] are the indices of the second dimension of dummy, if you transpose X (X.T) you'll have the the indices of the first dimension of dummy as X.T[0] and the indices of the second dimension of dummy as X.T[1].
Now to slice dummy as you want, you can specify the indices of the first and of the second dimension in this way:
dummy[(first_dim_indices, second_dim_indices)] = dummy[(X.T[0], X.T[1])]
In order to simplify the code (and since you doesn't want to transpose the X matrix twice) you can unpack X.T in a tuple as (*X.T,) and so write X[(*X.T,)] is the same thing to write dummy[(X.T[0], X.T[1])].
This writing is also useful if you have an unfixed number of dimensions to slice trough because you will unpack from X.T as many lines as there are dimensions to slice in dummy. For example suppose you want to retrieve an 1D-array from dummy given the following indices:
first_dim: (0, 4, 2)
second_dim: (1, 1, 0)
third_dim: (9, 8, 7)
You can specify the indices of the 3 dimensions as X = np.array([[0,1,9],[4,1,8],[2,0,7]]) and dim[(*X.T,)] is still valid.
I want to slice a multidimensional numpy array (>2 dimensions) along 2 of its axes using index slicing. What are the rules for where each of its original dimensions end up?
To illustrate my problem, let me provide an example. Say we have a 4D array:
import numpy as np
a = np.arange(2*3*4*5).reshape(2,3,4,5)
I'll create a tuple of indices using numpy.where, for slicing along axes 1 and 3:
mask = np.where(np.random.rand(3,5) > 0.5)
This will pick out random slices from my array a. Let's say it returned tuples of length 7.
To preserve the remaining dimensions I will use slice(None) objects:
b = a[(slice(None), mask[0], slice(None), mask[1])]
This changed the shape:
>>> a.shape
(2, 3, 4, 5)
>>> b.shape
(7, 2, 4)
The axes that were untouched (i.e. sliced using the slice(None) object) appear to have been preserved, whereas the sliced axes are destroyed and the resulting axis is moved to the front.
However, this is not always the case. When I apply a mask to axes 1 and 2:
mask2 = np.where(np.random.rand(3,4) > 0.5)
c = a[(slice(None), mask[0], mask[1], slice(None))]
I observe the following (numpy.where has returned tuples of length 7 again):
>>> c.shape
(2, 7, 5)
The axis resulting from the axes that have been destroyed by the slicing did not move to the front this time.
My guess is that it is related to whether the sliced axes are adjacent or not, but I want to know from what rules this behavior emerges.
https://docs.scipy.org/doc/numpy-1.15.4/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
Your where masks will produce a 1d (7,) shape array if applied to a 2d array, the values where the condition is true. You phrase that as 'destroying' a pair of axes.
In the second case that 7 can be placed between the 2 and 5.
But in the first it's ambiguous because of the slice in the middle (the non adjacency) - the fall back rule is to put it first, and order the slices after. In other words, instead of trying to choose between a (2,7,4) and (2,4,7) order, it chooses (7,2,4).
The ambiguity is clear in this case, and the default reasonable. It's more complicated with one or more of the dimensions is eliminated by a scalar index.
Note: I'm using numpy
import numpy as np
Given 4 arrays of the same (but arbitrary) shape, I am trying to write a function that forms 2x2 matrices from each corresponding element of the arrays, finds the eigenvalues, and returns two arrays of the same shape as the original four, with its elements being eigenvalues (i.e. the resulting arrays would have the same shape as the input, with array1 holding all the first eigenvalues and array2 holding all the second eigenvalues).
I tried doing the following, but unsurprisingly, it gives me an error that says the array is not square.
temp = np.linalg.eig([[m1, m2],[m3, m4]])[0]
I suppose I can make an empty temp variable in the same shape,
temp = np.zeros_like(m1)
and go over each element of the original arrays and repeat the process. My problem is that I want this generalised for arrays of any arbitrary shape (need not be one dimensional). I would guess that finding the shape of the arrays and designing loops to go over each element would not be a very good way of doing it. How do I do this efficiently?
Construct a 2x2x... array:
temp = np.array([[m1, m2], [m3, m4]])
Move the first two dimensions to the end for a ...x2x2 array:
for _ in range(2):
temp = np.rollaxis(temp, 0, temp.ndim)
Call np.linalg.eigvals (which broadcasts) for a ...x2 array of eigenvalues:
eigvals = np.linalg.eigvals(temp)
And split this into an array of first eigenvalues and an array of second eigenvalues:
eigvals1, eigvals2 = eigvals[..., 0], eigvals[..., 1]
Consider the following simple example:
X = numpy.zeros([10, 4]) # 2D array
x = numpy.arange(0,10) # 1D array
X[:,0] = x # WORKS
X[:,0:1] = x # returns ERROR:
# ValueError: could not broadcast input array from shape (10) into shape (10,1)
X[:,0:1] = (x.reshape(-1, 1)) # WORKS
Can someone explain why numpy has vectors of shape (N,) rather than (N,1) ?
What is the best way to do the casting from 1D array into 2D array?
Why do I need this?
Because I have a code which inserts result x into a 2D array X and the size of x changes from time to time so I have X[:, idx1:idx2] = x which works if x is 2D too but not if x is 1D.
Do you really need to be able to handle both 1D and 2D inputs with the same function? If you know the input is going to be 1D, use
X[:, i] = x
If you know the input is going to be 2D, use
X[:, start:end] = x
If you don't know the input dimensions, I recommend switching between one line or the other with an if, though there might be some indexing trick I'm not aware of that would handle both identically.
Your x has shape (N,) rather than shape (N, 1) (or (1, N)) because numpy isn't built for just matrix math. ndarrays are n-dimensional; they support efficient, consistent vectorized operations for any non-negative number of dimensions (including 0). While this may occasionally make matrix operations a bit less concise (especially in the case of dot for matrix multiplication), it produces more generally applicable code for when your data is naturally 1-dimensional or 3-, 4-, or n-dimensional.
I think you have the answer already included in your question. Numpy allows the arrays be of any dimensionality (while afaik Matlab prefers two dimensions where possible), so you need to be correct with this (and always distinguish between (n,) and (n,1)). By giving one number as one of the indices (like 0 in 3rd row), you reduce the dimensionality by one. By giving a range as one of the indices (like 0:1 in 4th row), you don't reduce the dimensionality.
Line 3 makes perfect sense for me and I would assign to the 2-D array this way.
Here are two tricks that make the code a little shorter.
X = numpy.zeros([10, 4]) # 2D array
x = numpy.arange(0,10) # 1D array
X.T[:1, :] = x
X[:, 2:3] = x[:, None]
Why does the program
import numpy as np
c = np.array([1,2])
print(c.shape)
d = np.array([[1],[2]]).transpose()
print(d.shape)
give
(2,)
(1,2)
as its output? Shouldn't it be
(1,2)
(1,2)
instead? I got this in both python 2.7.3 and python 3.2.3
When you invoke the .shape attribute of a ndarray, you get a tuple with as many elements as dimensions of your array. The length, ie, the number of rows, is the first dimension (shape[0])
You start with an array : c=np.array([1,2]). That's a plain 1D array, so its shape will be a 1-element tuple, and shape[0] is the number of elements, so c.shape = (2,)
Consider c=np.array([[1,2]]). That's a 2D array, with 1 row. The first and only row is [1,2], that gives us two columns. Therefore, c.shape=(1,2) and len(c)=1
Consider c=np.array([[1,],[2,]]). Another 2D array, with 2 rows, 1 column: c.shape=(2,1) and len(c)=2.
Consider d=np.array([[1,],[2,]]).transpose(): this array is the same as np.array([[1,2]]), therefore its shape is (1,2).
Another useful attribute is .size: that's the number of elements across all dimensions, and you have for an array c c.size = np.product(c.shape).
More information on the shape in the documentation.
len(c.shape) is the "depth" of the array.
For c, the array is just a list (a vector), the depth is 1.
For d, the array is a list of lists, the depth is 2.
Note:
c.transpose()
# array([1, 2])
which is not d, so this behaviour is not inconsistent.
dt = d.transpose()
# array([[1],
# [2]])
dt.shape # (2,1)
Quick Fix: check the .ndim property - if its 2, then the .shape property will work as you expect.
Reason Why: if the .ndim property is 2, then numpy reports a shape value that agrees with the convention. If the .ndim property is 1, then numpy just reports shape in a different way.
More talking: When you pass np.array a lists of lists, the .shape property will agree with standard notions of the dimensions of a matrix: (rows, columns).
If you pass np.array just a list, then numpy doesn't think it has a matrix on its hands, and reports the shape in a different way.
The question is: does numpy think it has a matrix, or does it think it has something else on its hands.
transpose does not change the number of dimensions of the array. If c.ndim == 1, c.transpose() == c. Try:
c = np.array([1,2])
print c.shape
print c.T.shape
c = np.atleast_2d(c)
print c.shape
print c.T.shape
Coming from Matlab, I also find it difficult that a single-dimensional array is not organized as (row_count, colum_count)
My function had to respond consistently on a single-dimensional ndarray like [x1, x2, x3] or a list of arrays [[x1, x2, x3], [x1, x2, x3], [x1, x2, x3]].
This worked for me:
dim = np.shape(subtract_matrix)[-1]
Picking the last dimension.