MATLAB/Python array difference?

MATLAB/Python array difference? - python

I got the following:
redIdx: a 2x1 matrix with values (289485, 289486).
image: 366x791x3 uint8 matrix (an image).
image2: zeros matrix with the same sape as image.
In MATLAB, if I do image2(redIdx) it returns a 2x1 matrix with values (0,0) and if I do image(redIdx) it returns a 2x1 matrix with values (94, 83).
But in Python, if I do image2[redIdx] or image[redIdx], it returns the next error: index 2879485 is out of bounds for axis 0 with size 366.
How can I get the same result as MATLAB?

MATLAB, when indexing an array with a single index (as opposed to multiple ones) uses linear indexing. Python, in the same situation, uses the index to index into the first dimension, returning a slice. The fact that redIdx contains multiple values is irrelevant, it's a 1D indexing operation.
To replicate linear indexing in Python, you can flatten the array, then index:
image.flatten('K')[redIdx]
This Q&A shows how to compute indices from the single linear index, which would be a more complex alternative to the above.

Related

Applying function on multiple dimensions of higher dimensional array

Suppose you have a higher dimensional array (3 or greater) which is composed of a series of 2d images. If this array is called x, then a 2d image will be represented as x[0,0,:,:]. Now what I want to do is apply a function that takes in a 2d image and outputs a scalar, on this higher dimensional array so that I would convert the dimension of the original array to one that is 2 dimensions lower. How would I do such a thing?
In other words, what is the faster numpy way of doing this: np.array([[f(x[i,j,:,:]) for i in range(x.shape[0])] for j in range(x.shape[1])]) for a list of axes and some function f that takes in an array.
I've looked at numpy.apply_along_axis but that only acts on a 1d array and the shape must be identical. numpy.apply_on_axes also doesn't work since it doesn't reduce the amount of dimensions which are given to the function (it gives my function a 4d array, not a 2d array which I can work with). numpy.vectorize doesn't work because it doesn't ever apply on more than one element at once.

Transpose a 1-dimensional array in Numpy without casting to matrix

My goal is to to turn a row vector into a column vector and vice versa. The documentation for numpy.ndarray.transpose says:
For a 1-D array, this has no effect. (To change between column and row vectors, first cast the 1-D array into a matrix object.)
However, when I try this:
my_array = np.array([1,2,3])
my_array_T = np.transpose(np.matrix(myArray))
I do get the wanted result, albeit in matrix form (matrix([[66],[640],[44]])), but I also get this warning:
PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
my_array_T = np.transpose(np.matrix(my_array))
How can I properly transpose an ndarray then?

A 1D array is itself once transposed, contrary to Matlab where a 1D array doesn't exist and is at least 2D.
What you want is to reshape it:
my_array.reshape(-1, 1)
Or:
my_array.reshape(1, -1)
Depending on what kind of vector you want (column or row vector).
The -1 is a broadcast-like, using all possible elements, and the 1 creates the second required dimension.

If your array is my_array and you want to convert it to a column vector you can do:
my_array.reshape(-1, 1)
For a row vector you can use
my_array.reshape(1, -1)
Both of these can also be transposed and that would work as expected.

IIUC, use reshape
my_array.reshape(my_array.size, -1)

Fastest way to mask rows of a 2D numpy array given a boolean vector of same length?

I have a numpy boolean vector of shape 1 x N, and an 2d array with shape 160 x N. What is a fast way of subsetting the columns of the 2d array such that for each index of the boolean vector that has True in it, the column is kept, and for each index of the boolean vector that has False in it, the column is discarded?
If you call the vector mask and the array features, i've found the following to be far too slow: np.array([f[mask] for f in features])
Is there a better way? I feel like there has to be, right?

You can try this,
new_array = 2d_array[:,bool_array==True]
So depending on the axes you can select which one you want to remove. In case you get a 1-d array, then you can just reshape it and get the required array. This method will be faster also.

What does (n,) mean in the context of numpy and vectors?

I've tried searching StackOverflow, googling, and even using symbolhound to do character searches, but was unable to find an answer. Specifically, I'm confused about Ch. 1 of Nielsen's Neural Networks and Deep Learning, where he says "It is assumed that the input a is an (n, 1) Numpy ndarray, not a (n,) vector."
At first I thought (n,) referred to the orientation of the array - so it might refer to a one-column vector as opposed to a vector with only one row. But then I don't see why we need (n,) and (n, 1) both - they seem to say the same thing. I know I'm misunderstanding something but am unsure.
For reference a refers to a vector of activations that will be input to a given layer of a neural network, before being transformed by the weights and biases to produce the output vector of activations for the next layer.
EDIT: This question equivocates between a "one-column vector" (there's no such thing) and a "one-column matrix" (does actually exist). Same for "one-row vector" and "one-row matrix".
A vector is only a list of numbers, or (equivalently) a list of scalar transformations on the basis vectors of a vector space. A vector might look like a matrix when we write it out, if it only has one row (or one column). Confusingly, we will sometimes refer to a "vector of activations" but actually mean "a single-row matrix of activation values transposed so that it is a single-column."
Be aware that in neither case are we discussing a one-dimensional vector, which would be a vector defined by only one number (unless, trivially, n==1, in which case the concept of a "column" or "row" distinction would be meaningless).

In numpy an array can have a number of different dimensions, 0, 1, 2 etc.
The typical 2d array has dimension (n,m) (this is a Python tuple). We tend to describe this as having n rows, m columns. So a (n,1) array has just 1 column, and a (1,m) has 1 row.
But because an array may have just 1 dimension, it is possible to have a shape (n,) (Python notation for a 1 element tuple: see here for more).
For many purposes (n,), (1,n), (n,1) arrays are equivalent (also (1,n,1,1) (4d)). They all have n terms, and can be reshaped to each other.
But sometimes that extra 1 dimension matters. A (1,m) array can multiply a (n,1) array to produce a (n,m) array. A (n,1) array can be indexed like a (n,m), with 2 indices, x[:,0] where as a (n,) only accepts x[0].
MATLAB matrices are always 2d (or higher). So people transfering ideas from MATLAB tend to expect 2 dimensions. There is a np.matrix subclass that supposed to imitate that.
For numpy programmers the distinctions between vector, row vector, column vector, matrix are loose and relatively unimportant. Or the use is derived from the application rather than from numpy itself. I think that's what's happening with this network book - the notation and expectations come from outside of numpy.
See as well this answer for how to interpret the shapes with respect to the data stored in ndarrays. It also provides insight on how to use .reshape: https://stackoverflow.com/a/22074424/3277902

(n,) is a tuple of length 1, whose only element is n. (The syntax isn't (n) because that's just n instead of making a tuple.)
If an array has shape (n,), that means it's a 1-dimensional array with a length of n along its only dimension. It's not a row vector or a column vector; it doesn't have rows or columns. It's just a vector.

Axis elimination

I'm having a trouble understanding the concept of Axis elimination in numpy. Suppose I have the following 2D matrix:
A =
1 2 3
3 4 5
6 7 8
Ok I understand that sum(A, axis=0) will sum each column down and will give a 1D array with 3 elements. I also understand that sum(A, axis=1) will sum each row.
But my trouble is when I read that axis=0 eliminates the 0th axis and axis=1 eliminates the 1th axis. Also sometime people mention "reduce" instead of "eliminate". I'm unable to understand what does that eliminate. For example sum(A, axis=0) will sum each column from top to bottom, but I don't see elimination or reduction here. What's the point? The same also for sum(A,axis=1).
AND how is it for higher dimensions?
p.s. I always confused between matrix dimensions and array dimensions. I wished that people who write the numpy documentation makes this distinction very clear.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.reduce.html
Reduces a‘s dimension by one, by applying ufunc along one axis.
For example, add.reduce() is equivalent to sum().
In numpy, the base class is ndarray - a multidimensional array (can 0d, 1d, or more)
http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html
Matrix is a subclass of array
http://docs.scipy.org/doc/numpy/reference/arrays.classes.html
Matrix objects are always two-dimensional
The history of the numpy Matrix is old, but basically it's meant to resemble the MATLAB matrix object. In the original MATLAB nearly everything was a matrix, which was always 2d. Later they generalized it to allow more dimensions. But it can't have fewer dimensions. MATLAB does have 'vectors', but they are just matrices with one dimension being 1 (row vector versus column vector).
'axis elimination' is not a common term when working with numpy. It could, conceivably, refer to any of several ways that reduce the number of dimensions of an array. Reduction, as in sum(), is one. Indexing is another: a[:,0,:]. Reshaping can also change the number of dimensions. np.squeeze is another.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.