Stacking and adding up numpy arrays - python

I have an array g.
g = np.array([])
I have some loops through which I need to build it with the following structure in python:
[
[1 4
2 5
3 6]
[7 10
8 11
9 12]
]
...
i.e. any number of rows (let's say 10), but with each entry consisting of a 3x2 array.
After initializing g at the top, I'm doing this:
curr_g = np.array([])
for y, w in zip(z.T, weights.T):
temp_g = sm.WLS(y, X, w).fit()
# temp_g.params produces a (3L,) array
# curr_g is where I plan to end up with a 3x2 array
curr_g = np.hstack((temp_g.params, curr_g))
g = np.hstack((temp_g.params, g))
I thought that when I use hstack with two 3x1 arrays, then I'll end up with one single 3x2 array. But what's happening is that after the stacking, curr_g just goes from (3L,) to (6L,)...
Also, once I've got a 3x2 array, how do I stack 3x2 arrays on top of each other?

You are correct saying that "when I use hstack with two 3x1 arrays, then I'll end up with one single 3x2 array":
params =array([1,2,3]).reshape(3,1)
curr_g =array([4,5,6]).reshape(3,1)
print hstack((params, curr_g)).shape # == (3,2)
Likely, you get an array with shape (6,) because temp_g.params and g have both shape (3,), not (3,1). If this is the case, you're better of with column_stack((temp_g.params, curr_g)).
To the last point, you first initialize your big array g to the right size:
g=array((N,3,2))
and then you fill it in the for loop:
for j, (y, w) in enumerate(zip(z.T, weights.T)):
#calculate temp_g and curr_g
g[j]=column_stack((temp_g.params, curr_g))

Related

Numpy docs: How to multiply 2 arrays of different sizes together?

Numpy docs claims you can multiply arrays of different lengths together, however it is not working. I'm definitely misinterpreting what its saying but there's no example to go with their text. From the docs here:
Therefore, I created some code to try it out but I'm getting an error that says ValueError: operands could not be broadcast together with shapes (4,1) (3,1). Same error if I try this with shapes (4,) and (3,).
a = np.array([[1.0],
[1.0],
[1.0],
[1.0]])
print(a.shape)
b = np.array([[2.0],
[2.0],
[2.0]])
print(b.shape)
a*b
You can multiply arrays together if every dimenssion has the same length or one of the arrays has dimension 1 in the current axis.
in your example the arrays has sizes 4x1 and 3x1. So if you want to multiply them together you need to transpose one:
a = np.array([[1.0],
[1.0],
[1.0],
[1.0]])
print(a.shape)
b = np.array([[2.0],
[2.0],
[2.0]])
print(b.shape)
a*b.T
So its dimensions are shared with 1 in the other array 4x1 and 1x3 now and the result will have size 4x3
Copying and pasting the immediately previous text, in the same document, with my own emphasis:
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when
they are equal, or
one of them is 1
If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs.
Arrays do not need to have the same number of dimensions. For example, if you have a 256x256x3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:
Image (3d array): 256 x 256 x 3
Scale (1d array): 3
Result (3d array): 256 x 256 x 3
When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.
Now, let's try applying this logic to the example data.
A (2d array): 4 x 1
B (2d array): 3 x 1
Look at the first dimension: the lengths are 4 and 3. Is 4 equal to 3? No. Is either of those equal to 1? No. Therefore, the conditions are not met. We cannot broadcast along the first dimension of the array because there is not a rule that tells us how to match up 4 values against 3. If it were 4 values against 4, or 3 against 3, we could pair them up directly. If it were 4 against 1, or 1 against 3, we could "broadcast" by repeating the single value. Neither case applies here.
We could, however, multiply if either of the arrays were transposed:
A.T (2d array): 1 x 4
B (2d array): 3 x 1
A (2d array): 4 x 1
B.T (2d array): 1 x 3
Verifying this is left as an exercise for the reader.

How to visualize/connect vectors, matrices and representations in Python and numpy arrays?

I am having trouble visualizing scalars, vectors and matrices as how they are written in a math/physics class to how they would be represented in plain Python and numpy and their corresponding notions of dimensions, axes and shapes.
If I have a scalar, say 5
>>> b = np.array(5)
>>> np.ndim(b)
0
I have 0 dimensions for 5 but what are the axes here? There are functions for ndim and shape but not axes.
For a vector like this:
we say that we have 2 dimensions in physics/math class because it represents a 2D vector but it looks like numpy uses a different notion of this.
Why is it that ndim gives 1 and shape gives what the dimension is?
>>> c = np.array([1,-3])
>>> c
array([ 1, -3])
>>> c.ndim
1
>>> c.shape
(2,)
np.ndim gives 1 then?
I have looked at this tutorial on axes but haven't been able to get how the axes then apply here.
How would you represent the vector above in Python and numpy? Would this be [1, -3] in Python or [[1], [-3]]? How about in numpy? Would it be np.array([1, -3]) or np.array([[1], [-3]])? Which I tend to write, for my eyes' sake, as
np.array([
[1],
[-3]
])
Other than vectors, how would this matrix be represented in both plain Python and numpy? The documentation states that we need to use np.arrays instead.
It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future.
When it comes to multidimensional arrays, how would I represent/visualize multiple values for all the points in a 3D cube? Say we have a Rubik's cube and each of the sub cubes has a temperature and a color represented with red, green and blue so 4 values for each cube?
A scalar is not an array, so it has 0 dimensions.
np.array([1,-3]) is a 1D array, so c.shape returns a tuple with only one element (2,), just the first dimension and it's telling you there is only 1 dimension and 2 elements in that dimension.
You are correct np.array([[1], [-3]]) is the vector you have in 2. c.shape gives (2,1) meaning there are 2 rows and 1 column. c.ndim gives 2 since there are 2 dimensions x and y. It's a 2D/planar array
For 3., you would create it as np.array([[1,2,3], [4,5,6], [7,8,9]]). shape returns (3,3) meaning 3 rows and 3 columns. ndim returns 2 because it's still a 2D/planar array.
A ndarray has a shape, a tuple. ndim is the length of that tuple, and may be 0. The array has ndim axes (sometimes called dimensions).
np.array(5)
has shape (), 0 ndim and no axes.
np.array([1,2,3,4])
has (4,) shape, and 1 axis. It can be reshaped to (4,1), or (1,4) or (2,2) or even (2,1,2) or (1,4,1).
Your A can be created with
A = np.arange(1,10).reshape(3,3)
That's a 9 element 1d array reshaped to (3,3)
numpy arrays have a print display, with [] marking dimensional nesting. A.tolist() produces a list with 3 elements, each a 3 element list.
Rows, columns, planes are useful ways of talking about arrays, but are not a formal part of their definition.

List of simple arrays with pyplot.plot

I have some trouble to understand how pyplot.plot works.
I take a simple example: I want to plot pyplot.plot(lst2, lst2) where lst2 is a list.
The difficulty comes from the fact that each element of lst2 is an array of shape (1,1). If the elements were floating and not array, there would be no problems.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
V2 = np.array([[1]])
W2 = np.array([[2]])
print('The shape of V2 is', V2.shape)
print('The shape of W2 is', W2.shape)
lst2 = [V2, W2]
plt.plot(lst2, lst2)
plt.show
Below is the end of the error message I got:
~\Anaconda3\lib\site-packages\matplotlib\axes\_base.py in _xy_from_xy(self,x, y)
245 if x.ndim > 2 or y.ndim > 2:
246 raise ValueError("x and y can be no greater than 2-D, but have "
--> 247 "shapes {} and {}".format(x.shape, y.shape))
248
249 if x.ndim == 1:
ValueError: x and y can be no greater than 2-D, but have shapes (2, 1, 1) and (2, 1, 1)
What surprised me in the error message is the mention of an array of dimension (2,1,1). It seems like the array np.array([V2,W2]) is built when we call pyplot.plot.
My question is then what happens behind the scenes when we call pyplot.plot(x,y) with x and y list? It seems like an array with the elements of x is built (and same for y). And these arrays must have maximum 2 axis. Am I correct?
I know that if I use numpy.squeeze on V2 and W2, it would work. But I would like to understand what it happening inside pyplot.plot in the example I gave.
Take a closer look at what you're doing:
V2 = np.array([[1]])
W2 = np.array([[2]])
lst2 = [V2, W2]
plt.plot(lst2, lst2)
For some odd reason you're defining your arrays to be of shape (1,1) by using a nested pair of brackets. When you construct lst2, you stack your arrays along a new leading dimension. This has nothing do with pyplot, this is numpy.
Numpy arrays are rectangular, and they are compatible with lists of lists of ... of lists. The level of nesting determines the number of dimensions of an array. Look at a simple 2d example:
>>> M = np.arange(2*3).reshape(2,3)
>>> print(repr(M))
array([[0, 1, 2],
[3, 4, 5]])
You can for all intents and purposes think of this 2x3 matrix as two row vectors. M[0] is the same as M[0,:] and is the first row, M[1] is the same as M[1,:] is the second row. You could then also construct this array from the two rows in the following way:
row1 = [0, 1, 2]
row2 = [3, 4, 5]
lst = [row1, row2]
np.array(lst)
My point is that we took two flat lists of length 3 (which are compatible with 1d numpy arrays of shape (3,)), and concatenated them in a list. The result was compatible with a 2d array of shape (2,3). The "2" is due to the fact that we put 2 lists into lst, and the "3" is due to the fact that both lists had a length of 3.
So, when you create lst2 above, you're doing something that is equivalent to this:
lst2 = [ [[1]], [[2]] ]
You put two nested sublists into an array-compatible list, and both sublists are compatible with shape (1,1). This implies that you'll end up with a 3d array (in accordance with the fact that you have three opening brackets at the deepest level of nesting), with shape (2,1,1). Again the 2 comes from the fact that you have two arrays inside, and the trailing dimensions come from the contents.
The real question is what you're trying to do. For one, your data shouldn't really be of shape (1,1). In the most straightforward application of pyplot.plot you have 1d datasets: one for the x and one for the y coordinates of your plot. For this you can use a simple (flat) list or 1d array for both x and y. What matters is that they are of the same length.
Then when you plot the two against each other, you pass the x coordinates first, then the y coordinates second. You presumably meant something like
plt.plot(V2,W2)
In which case you'd pass 2d arrays to plot, and you wouldn't see the error caused by passing a 3d-array-like. However, the behaviour of pyplot.plot is non-trivial for 2d inputs (columns of both datasets will get plotted against one another), and you have to make sure that you really want to pass 2d arrays as inputs. But you almost never want to pass the same object as the first two arguments to pyplot.plot.

Advanced indexing is returning an array with the wrong shape

I've been using this reference to understand advanced indexing. One specific example is as follows;
Example
Suppose x.shape is (10,20,30) and ind is a (2,3,4)-shaped indexing intp array, then result = x[...,ind,:] has shape (10,2,3,4,30) because the (20,)-shaped subspace has been replaced with a (2,3,4)-shaped broadcasted indexing subspace. If we let i, j, k loop over the (2,3,4)-shaped subspace then result[...,i,j,k,:] = x[...,ind[i,j,k],:]. This example produces the same result as x.take(ind, axis=-2).
I've been trying to understand this for a while and to help me I'm got a little script that produces some arrays. I have;
Indexing arrays
i => 12 x 25
j => 12 x 25
k => 12 x 1
Input array
x => 2 x 3 x 4 x 4
Output Array
Cols => 2 x 12 x 25
The code I use to make Cols is as follows;
cols = x[:, k, i, j]
From my understanding of the example cols should actually have shape (2 x 12 x 1 x 12 x 25 x 12 x 25). I've come to this as follows;
It's original dimensions are 2 x 3 x 4 x 4
The 2 is unchanged but all other dimensions are altered
The 3 is replaced with k, a 12 x 1 array
The first 4 is replaced by i, a 12 x 25 array
The second 4 is replaced by j, also a 12 x 25 array
Clearly I'm misunderstanding something here, where am I going wrong?
This does what you want:
i=np.random.randint(0,4,(12,25))
j=np.random.randint(0,4,(12,25))
k=np.random.randint(0,3,(12,1))
x=np.random.randint(1,11,(2,3,4,4))
x1 = x[:,k,:,:][:,:,:,i,:][:,:,:,:,:,j]
x1.shape
(2, 12, 1, 12, 25, 12, 25)
Why doesn't the original method work that way? I think it is probably that advanced indexing is greedy in determining whether you're indexing by multiple dimensions simultaneously. For instance, your original shape:
x.shape
(2,3,4,4)
Could be interpreted many ways. What you want is that each axis is independent, but it is just as valid to interpret it as 6 (4,4) matrices or 2 (3,4,4) tensors. So when indexing by [...,i,j] you can interpret the i to be over the third axis and the j over the fourth, or that i,j is over the last two axes. Numpy guesses that you mean the second:
x[...,i,j].shape
(2,3,12,25)
You can also interpret x as 8 (3,4) matrices, which is what happens when you do:
x[:,k,i,:].shape
(2,12,25,4)
Notice that is has also broadcasted your (12,1) k array to (12,25) in order to match i for indexing. You can confirm that broadcasting is happening by using .squeeze() on k:
x[:,k.squeeze(),i,:]
Traceback (most recent call last):
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (12,) (12,25)
If you interpret x as 2 (3,4,4) tensors, numpy does both. It broadcasts k to (12,25) and then indexes the last three dimensions against a set of three (12,25) indexing arrays, reducing all three as a unit.
You can override this behavior somewhat using np.ix_, but all the arguments of np.ix_ have to be 1d, so you're out of luck there without flattening and reshaping, which sort of defeats the purpose here, but also works:
x2 = x[np.ix_(np.arange(x.shape[0]), k.flat, i.flat, j.flat)].reshape((x.shape[0], ) + k.shape + i.shape + j.shape)
x2.shape
(2, 12, 1, 12, 25, 12, 25)
np.all(x1 == x2)
True

shape of Vector in numpy

I am confused by the fact that
a = np.array([1,2])
a.T == a # True
and also
I = np.array([[1,0],[0,1]])
np.dot(a, I) = np.dot(I, a) # both sides work
Is the shape of vector (or array) in this case 1*2 or 2*1 ?
The vector a has shape 2, not 1 × 2 nor 2 × 1 (it is neither a column nor row vector), which is why transposition doesn't have any effect, as transposition by default reverses the ordering of the axes.
Numpy is very lenient about what kinds of arrays can be multiplied using dot:
it is a sum product over the last axis of a and the second-to-last of b

Categories