Concatenate matrixes to tensor - python

I have two (or sometimes more) matrixes, which I want to combine to a tensor. The matrixes e.g. have the shape (100, 400) and when they are combined, they should have the dimensions (2, 100, 400).
How do I do that? I tried it the same way I created matrixes from vectors, but that didn't work:
tensor = numpy.concatenate(list_of_matrixes, axis=0)

Probably you want
tensor = np.array(list_of_matrices)

np.array([...]) just loves to combine the inputs into a new array along a new axis. In fact it takes some effort to prevent that.:)
To use concatenate you need to add an axis to your arrays. axis=0 means 'join on the current 1st axis', so it would produce a (200,400) array.
np.concatentate([arr1[None,...], arr2[None,...], axis=0)
would do the the trick, or more generally
np.concatenate([arr[None,...] for arr in list_arr], axis=0)
If you look at the code for dstack, hstack, vstack you'll see that they do this sort of dimension adjustment before passing the task to concatenate.
The np.array solution is easy, but the concatenate solution is a good learning opportunity.

Related

How to apply numpy functions on a slice?

I have a numpy array of shape (100, 30, 3). I wanted to apply a function to transform the second dimension (N=30) based on the slice from third dimension.
For example, consider I am doing a machine learning and my shape is (Samples, 1D Pixels, Color Channels). Now I want to apply np.log on the 2nd color channel. Something like np.log(x, axis=1, slice_axis=2, slice_index=1) to apply log on (:,:,1). How?
For applying operations like np.log in-place, you can use the out parameter. For the problem you mentioned, np.log(x[:, : ,1], out=x[:, : ,1]).

Summing 3d numpy array over plane, not axis

I have a 3d numpy array (nx5x5). I want to sum each of the n slices together. So the new shape will be (nx1x1), where each n is just the sum of an individual 5x5 array. Can I do this in numpy without using a loop? np.sum has its axis arguments, but they reshape the array into the wrong shape. I think I may need to call np.sum twice? But I'm having trouble thinking about how to do this. Anybody know the answer?
Here are three different ways of doing it:
Use a tuple for axis:
a.sum(axis=(1, 2))
Reshape properly to merge the axes you want to sum over:
a.reshape(a.shape[0], -1).sum(axis=1)
Use multiple sums:
a.sum(-1).sum(-1)
OR
a.sum(1).sum(-1)
etc.
np.sum has its axis arguments, but they reshape the array into the wrong shape
Summing is a reduction operation and it makes sense that after reducing in a specific axis (by summing all elements in that axis) that particular dimension is removed. If you don't want that you can pass the optional keepdims argument.
values = np.random.randn(3,5,5)
values.sum(axis=(1,2), keepdims=True)
Think I figured this out, for anyone who is running into the same issue. It turns out you can select two axes from np.sum, making a plane instead of a line. So:
np.sum(a, (1,2)
Does the trick.

Numpy arrays of different dimension concatenate error

I have four Numpy arrays of shapes:
(2577, 42)
(2580, 100)
(2580, 236)
(2580, 8)
(2580, 37)
When I try to concatenate all of them do except (2577, 42). I get an error:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2580 and the array at index 4 has size 2577
The code I am using:
dataset = np.concatenate((onehot_b, num_v, onehot_s, onehot_c, onehot_s), axis=1)
Is there a way to fix this?
The error is prety clear. You Cannot concatenate arrays of different sizes. One possible way out is convert the numpy arrays to lists and append all list lines to you dataset.
Numpy does not allow non-rectangular arrays, meaning that all sub-arrays should have the same dimension along the same axis. In your case, 2577 and 2580, are dimensions along same axis=0 that you are not stacking over (hence not adding them along that axis and they should have same length). If you can change all of them to have same first dimension shape, you can use concatenate. If you insist on stacking them, another way is just stacking arrays rather than their content:
dataset = np.asarray([onehot_b, num_v, onehot_s, onehot_c, onehot_s])
This will create an array of arrays for you.

Numpy dot product problems

A=np.array([
[1,2],
[3,4]
])
B=np.ones(2)
A is clearly of shape 2X2
How does numpy allow me to compute a dot product np.dot(A,B)
[1,2] (dot) [1,1]
[3,4]
B has to have dimensions of 2X1 for a dot product or rather this
[1,2] (dot) [1]
[3,4] [1]
This is a very silly question but i am not able to figure out where i am going wrong here?
Earlier i used to think that np.ones(2) would give me this:
[1]
[1]
But it gives me this:
[1,1]
I'm copying part of an answer I wrote earlier today:
You should resist the urge to think of numpy arrays as having rows
and columns, but instead consider them as having dimensions and
shape. This is an important point which differentiates np.array and np.matrix:
x = np.array([1, 2, 3])
print(x.ndim, x.shape) # 1 (3,)
y = np.matrix([1, 2, 3])
print(y.ndim, y.shape) # 2 (1, 3)
An n-D array can only use n integer(s) to represent its shape.
Therefore, a 1-D array only uses 1 integer to specify its shape.
In practice, combining calculations between 1-D and 2-D arrays is not
a problem for numpy, and syntactically clean since # matrix
operation was introduced in Python 3.5. Therefore, there is rarely a
need to resort to np.matrix in order to satisfy the urge to see
expected row and column counts.
This behavior is by design. The NumPy docs state:
If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
Most of the rules for vector and matrix shapes relating to the dot product exist mostly in order to have a coherent method that scales up into higher tensor orders. But they aren't very important when dealing with 1st order (vectors) and 2nd order (matrix) tensors. And those orders are what the vast majority of numpy users need.
As a result, # and np.dot are optimized (both mathematically and input parsing) for those orders, always summing over the last axis of the first and the second to last axis (if applicable) of the second. The "if applicable" is sort of an idiot-proofing to assure the output is what is expected in the vast majority of cases, even if the shapes don't technically fit.
Those of us who use higher-order tensors, meanwhile, are relegated to np.tensordot or np.einsum, which come complete with all the niggling little rules about dimension matching.

Confusion in array operation in numpy

I generally use MATLAB and Octave, and i recently switching to python numpy.
In numpy when I define an array like this
>>> a = np.array([[2,3],[4,5]])
it works great and size of the array is
>>> a.shape
(2, 2)
which is also same as MATLAB
But when i extract the first entire column and see the size
>>> b = a[:,0]
>>> b.shape
(2,)
I get size (2,), what is this? I expect the size to be (2,1). Perhaps i misunderstood the basic concept. Can anyone make me clear about this??
A 1D numpy array* is literally 1D - it has no size in any second dimension, whereas in MATLAB, a '1D' array is actually 2D, with a size of 1 in its second dimension.
If you want your array to have size 1 in its second dimension you can use its .reshape() method:
a = np.zeros(5,)
print(a.shape)
# (5,)
# explicitly reshape to (5, 1)
print(a.reshape(5, 1).shape)
# (5, 1)
# or use -1 in the first dimension, so that its size in that dimension is
# inferred from its total length
print(a.reshape(-1, 1).shape)
# (5, 1)
Edit
As Akavall pointed out, I should also mention np.newaxis as another method for adding a new axis to an array. Although I personally find it a bit less intuitive, one advantage of np.newaxis over .reshape() is that it allows you to add multiple new axes in an arbitrary order without explicitly specifying the shape of the output array, which is not possible with the .reshape(-1, ...) trick:
a = np.zeros((3, 4, 5))
print(a[np.newaxis, :, np.newaxis, ..., np.newaxis].shape)
# (1, 3, 1, 4, 5, 1)
np.newaxis is just an alias of None, so you could do the same thing a bit more compactly using a[None, :, None, ..., None].
* An np.matrix, on the other hand, is always 2D, and will give you the indexing behavior you are familiar with from MATLAB:
a = np.matrix([[2, 3], [4, 5]])
print(a[:, 0].shape)
# (2, 1)
For more info on the differences between arrays and matrices, see here.
Typing help(np.shape) gives some insight in to what is going on here. For starters, you can get the output you expect by typing:
b = np.array([a[:,0]])
Basically numpy defines things a little differently than MATLAB. In the numpy environment, a vector only has one dimension, and an array is a vector of vectors, so it can have more. In your first example, your array is a vector of two vectors, i.e.:
a = np.array([[vec1], [vec2]])
So a has two dimensions, and in your example the number of elements in both dimensions is the same, 2. Your array is therefore 2 by 2. When you take a slice out of this, you are reducing the number of dimensions that you have by one. In other words, you are taking a vector out of your array, and that vector only has one dimension, which also has 2 elements, but that's it. Your vector is now 2 by _. There is nothing in the second spot because the vector is not defined there.
You could think of it in terms of spaces too. Your first array is in the space R^(2x2) and your second vector is in the space R^(2). This means that the array is defined on a different (and bigger) space than the vector.
That was a lot to basically say that you took a slice out of your array, and unlike MATLAB, numpy does not represent vectors (1 dimensional) in the same way as it does arrays (2 or more dimensions).

Categories