I'm currently learning about broadcasting in Numpy and in the book I'm reading (Python for Data Analysis by Wes McKinney the author has mentioned the following example to "demean" a two-dimensional array:
import numpy as np
arr = np.random.randn(4, 3)
print(arr.mean(0))
demeaned = arr - arr.mean(0)
print(demeaned)
print(demeand.mean(0))
Which effectively causes the array demeaned to have a mean of 0.
I had the idea to apply this to an image-like, three-dimensional array:
import numpy as np
arr = np.random.randint(0, 256, (400,400,3))
demeaned = arr - arr.mean(2)
Which of course failed, because according to the broadcasting rule, the trailing dimensions have to match, and that's not the case here:
print(arr.shape) # (400, 400, 3)
print(arr.mean(2).shape) # (400, 400)
Now, i have gotten it to work mostly, by substracting the mean from every single index in the third dimension of the array:
demeaned = np.ones(arr.shape)
for i in range(3):
demeaned[...,i] = arr[...,i] - means
print(demeaned.mean(0))
At this point, the returned values are very close to zero and i think, that's a precision error. Am i actually right with this thought or is there another caveat, that i missed?
Also, this doesn't seam to be the cleanest, most 'numpy'-way to achieve what i wanted to achieve. Is there a function or a principle that i can make use of to improve the code?
As of numpy version 1.7.0, np.mean, and several other functions, accept a tuple in their axis parameter. This means that you can perform the operation on the planes of the image all at once:
m = arr.mean(axis=(0, 1))
This mean will have shape (3,), with one element for each plane of the image.
If you want to subtract the means of each pixel individually, you have to remember that broadcasting aligns shape tuples on the right edge. That means that you need to insert an extra dimension:
n = arr.mean(axis=2)
n = n.reshape(*n.shape, 1)
Or
n = arr.mean(axis=2)[..., None]
Try np.apply_along_axis().
np.apply_along_axis(lambda x: x - np.mean(x), 2, arr)
Output: you get the array of the same shape where each cell is demeaned in the dimension you want (the second parameter, here it is 2).
Related
Let us have a numpy array (float) with shape equal to (36, 2, 400, 400). Let us say the 400 by 400 represents an image. Then for each pixel I would like to find the two values (second dimension) which are when taking the norm over the second dimension, the lowest with respect to the first dimension. So I end up with an array of shape (2, 400, 400).
With np.argmin(np.linalg.norm(array, axis=1), axis=0) I am able to get the index for each of those 2 by 400 by 400 pixels which is almost what I want. But now I want to use this number to slice the original array in the first dimension so I am left with an array of shape (2, 400, 400).
What I can do is loop over all indices and construct the result pixel by pixel, but I am convinced there is a smarter way. Can anyone help me with a smarter way?
A minimal reproducible example as requested where distances is the array:
shape = (400, 400)
centers = np.random.randint(400, size=(36, 2))
distances = np.array([np.indices(shape) - np.array(center)[:, None, None] for center in centers])
nearest_center_index = np.argmin(np.linalg.norm(distances, axis=1), axis=0)
print(distances.shape)
print(nearest_center_index.shape)
plt.imshow(nearest_center_index)
out:
(36, 2, 400, 400)
(400, 400)
I was able, with help from the comments, to produce a somewhat ugly answer, which helped me futher to understand the problem. Let me elaborate. What is possible to do is to flatten the image and argmin results and then use advanced indexing with argmin and indices over the image to produce the results.
flatten_indices = nearest_center_index.reshape(400**2)
image_indices = range(400**2)
results = distances.reshape(36, 2, 400**2)[flatten_indices, :, image_indices].reshape(400, 400, 2).swapaxes(0, 2)
However, I think it happens a lot that you have indices that are shaped as a subset of the dimensions and have values containing indices of another dimension. I would expect a generic method to slice this.
Thus let us have and array with n dimensions with shape = (x1, x2, ..., xn) and let us say we have a array representing indices for a dimension, e.g., xi, that has shape which is a subset of the shape of the original array and not containing xi. Then I would expect a method to slice this array.
The function I was looking for is numpy.take_along_axis().
For the specific example the only thing needed to be done is making sure the nearest_center_index (output of argmin) has equal amount of dimensions as the to be sliced array. In the example this can be achieved by passing keepdims=True to both norm and argmin which then can be directly used as the second argument of the numpy function. The third argument should be the xi axis (in the example axis 0).
Without passing the keepdims=True, following the exact example, the stated objective can be achieved by:
result = np.take_along_axis(distances, nearest_center_index[None,None,:,:], 0)[0]
I have a ndarray of shape (68, 64, 64) called 'prediction'. These dimensions correspond to image_number, height, width. For each image, I have a tuple of length two that contains coordinates that corresponds to a particular location in each 64x64 image, for example (12, 45). I can stack these coordinates into another Numpy ndarray of shape (68,2) called 'locations'.
How can I construct a slice object or construct the necessary advanced indexing indices to access these locations without using a loop? Looking for help on the syntax. Using pure Numpy matrixes without loops is the goal.
Working loop structure
Import numpy as np
# example code with just ones...The real arrays have 'real' data.
prediction = np.ones((68,64,64), dtype='float32')
locations = np.ones((68,2), dtype='uint32')
selected_location_values = np.empty(prediction.shape[0], dtype='float32')
for index, (image, coordinates) in enumerate(zip(prediction, locations)):
selected_locations_values[index] = image[coordinates]
Desired approach
selected_location_values = np.empty(prediction.shape[0], dtype='float32')
correct_indexing = some_function_here(locations). # ?????
selected_locations_values = predictions[correct_indexing]
A straightforward indexing should work:
img = np.arange(locations.shape[0])
r = locations[:, 0]
c = locations[:, 1]
selected_locations_values = predictions[img, r, c]
Fancy indexing works by selecting elements of the indexed array that correspond to the shape of the broadcasted indices. In this case, the indices are quite straightforward. You just need the range to tell you what image each location corresponds to.
I have a function in Python that returns a numpy.mat of shape (100, 1). I am calling this function 4 times in a loop and would like to take the resulting 4 matricies and create a matrix of shape (100, 4). I have looked for sometime at numpy.append, numpy.concatenate, and numpy.insert but have not been able to get this working.
Here is a short SSCCE of my issue
zeros = np.zeros(shape=(100, 4))
for i in range(1, 5):
np.append(zeros, np.empty(shape=(100, 1)))
print(zeros)
Where zeros should results in a matrix of shape (100, 4) with "junk" values from each of the calls to numpy.empty and not all 0..
Do something along these lines -
zeros = np.zeros(shape=(100, 4))
for i in range(1, 5):
data = np.random.rand(100,1) # func that returns (100,1) shaped array
zeros[:,i-1] = data.ravel()
In place of ravel(), we could also use : data[:,0] or np.squeeze(data), basic idea is to feed a 1D array there, because the LHS zeros[:,i-1] expects a 1D array there.
As an alternative, inside the loop, we could also do -
zeros[:,[i-1]] = data
Thus, with that list of column index [i-1] instead of i-1, we are keeping the dimensions into which data is to be assigned (keeps as 2D) and that allows us to feed in data, which is also 2D without any change.
Consider the following simple example:
X = numpy.zeros([10, 4]) # 2D array
x = numpy.arange(0,10) # 1D array
X[:,0] = x # WORKS
X[:,0:1] = x # returns ERROR:
# ValueError: could not broadcast input array from shape (10) into shape (10,1)
X[:,0:1] = (x.reshape(-1, 1)) # WORKS
Can someone explain why numpy has vectors of shape (N,) rather than (N,1) ?
What is the best way to do the casting from 1D array into 2D array?
Why do I need this?
Because I have a code which inserts result x into a 2D array X and the size of x changes from time to time so I have X[:, idx1:idx2] = x which works if x is 2D too but not if x is 1D.
Do you really need to be able to handle both 1D and 2D inputs with the same function? If you know the input is going to be 1D, use
X[:, i] = x
If you know the input is going to be 2D, use
X[:, start:end] = x
If you don't know the input dimensions, I recommend switching between one line or the other with an if, though there might be some indexing trick I'm not aware of that would handle both identically.
Your x has shape (N,) rather than shape (N, 1) (or (1, N)) because numpy isn't built for just matrix math. ndarrays are n-dimensional; they support efficient, consistent vectorized operations for any non-negative number of dimensions (including 0). While this may occasionally make matrix operations a bit less concise (especially in the case of dot for matrix multiplication), it produces more generally applicable code for when your data is naturally 1-dimensional or 3-, 4-, or n-dimensional.
I think you have the answer already included in your question. Numpy allows the arrays be of any dimensionality (while afaik Matlab prefers two dimensions where possible), so you need to be correct with this (and always distinguish between (n,) and (n,1)). By giving one number as one of the indices (like 0 in 3rd row), you reduce the dimensionality by one. By giving a range as one of the indices (like 0:1 in 4th row), you don't reduce the dimensionality.
Line 3 makes perfect sense for me and I would assign to the 2-D array this way.
Here are two tricks that make the code a little shorter.
X = numpy.zeros([10, 4]) # 2D array
x = numpy.arange(0,10) # 1D array
X.T[:1, :] = x
X[:, 2:3] = x[:, None]
I generally use MATLAB and Octave, and i recently switching to python numpy.
In numpy when I define an array like this
>>> a = np.array([[2,3],[4,5]])
it works great and size of the array is
>>> a.shape
(2, 2)
which is also same as MATLAB
But when i extract the first entire column and see the size
>>> b = a[:,0]
>>> b.shape
(2,)
I get size (2,), what is this? I expect the size to be (2,1). Perhaps i misunderstood the basic concept. Can anyone make me clear about this??
A 1D numpy array* is literally 1D - it has no size in any second dimension, whereas in MATLAB, a '1D' array is actually 2D, with a size of 1 in its second dimension.
If you want your array to have size 1 in its second dimension you can use its .reshape() method:
a = np.zeros(5,)
print(a.shape)
# (5,)
# explicitly reshape to (5, 1)
print(a.reshape(5, 1).shape)
# (5, 1)
# or use -1 in the first dimension, so that its size in that dimension is
# inferred from its total length
print(a.reshape(-1, 1).shape)
# (5, 1)
Edit
As Akavall pointed out, I should also mention np.newaxis as another method for adding a new axis to an array. Although I personally find it a bit less intuitive, one advantage of np.newaxis over .reshape() is that it allows you to add multiple new axes in an arbitrary order without explicitly specifying the shape of the output array, which is not possible with the .reshape(-1, ...) trick:
a = np.zeros((3, 4, 5))
print(a[np.newaxis, :, np.newaxis, ..., np.newaxis].shape)
# (1, 3, 1, 4, 5, 1)
np.newaxis is just an alias of None, so you could do the same thing a bit more compactly using a[None, :, None, ..., None].
* An np.matrix, on the other hand, is always 2D, and will give you the indexing behavior you are familiar with from MATLAB:
a = np.matrix([[2, 3], [4, 5]])
print(a[:, 0].shape)
# (2, 1)
For more info on the differences between arrays and matrices, see here.
Typing help(np.shape) gives some insight in to what is going on here. For starters, you can get the output you expect by typing:
b = np.array([a[:,0]])
Basically numpy defines things a little differently than MATLAB. In the numpy environment, a vector only has one dimension, and an array is a vector of vectors, so it can have more. In your first example, your array is a vector of two vectors, i.e.:
a = np.array([[vec1], [vec2]])
So a has two dimensions, and in your example the number of elements in both dimensions is the same, 2. Your array is therefore 2 by 2. When you take a slice out of this, you are reducing the number of dimensions that you have by one. In other words, you are taking a vector out of your array, and that vector only has one dimension, which also has 2 elements, but that's it. Your vector is now 2 by _. There is nothing in the second spot because the vector is not defined there.
You could think of it in terms of spaces too. Your first array is in the space R^(2x2) and your second vector is in the space R^(2). This means that the array is defined on a different (and bigger) space than the vector.
That was a lot to basically say that you took a slice out of your array, and unlike MATLAB, numpy does not represent vectors (1 dimensional) in the same way as it does arrays (2 or more dimensions).