Related
I have a numpy array of shape (206, 482, 3). I wanted to pick the 1st channel so I used name_of_array[:][:][0] but apparently that doesn't select the 1st channel.
I think name_of_array[:,:,0] picks the 1st channel. I don't understand why. Why name_of_array[:][:][0] != name_of_array[:,:,0]?
It's important to understand what each thing does. To do this break up the action left to right. Perhaps rewriting will make this more clear:
x[:][:][0] -> ( ( x[:] )[:] )[0] # Both are valid and equivalent Python syntax
So basically, we apply [:] to x, then [:] to the result, then [0] to this result. What does x[:]? Just return a copy of x! Thus
( (x[:])[:] )[0] == ( (x)[:] )[0] == (x[:])[0] == x[0]
This is of course, not what you expected. On the other hand,
x[:, :, 0]
returns at once the 0 column of all rows of all frames (I'm treating the index as [frame, row, col]).
The short answer: because thats the syntax (see Numpy basics indexing).
arr[:] == arr # full slice of all dimensions of the array
arr[:][:] == arr # full slice of a full slice of all dimensions
arr[:][:][0] == arr # equal to arr[0] because the first 2 [:] slice all
vs
arr[:,:,0] # slice all of 1st dim, slice all of 2nd dim, get 0th of 3rd arr
One way to figure things like this out yourself is to make a simplified example and experiment (heeding How to debug small programs):
import numpy as np
res = np.arange(4 * 3 * 2).reshape(4,3,2)
print(":,:,:")
print(res[:, :, :])
print("\n1:2,1:2,:")
print(res[1:2, 1:2, :])
print("\n:,:,0")
print(res[:, :, 0])
print("\n:,:,1")
print(res[:,:,1])
Output:
# :,:,: == all of it
[[[ 0 1]
[ 2 3]
[ 4 5]]
[[ 6 7]
[ 8 9]
[10 11]]
[[12 13]
[14 15]
[16 17]]
[[18 19]
[20 21]
[22 23]]]
# 1:2,1:2,:
[[[8 9]]]
# :,:,0
[[ 0 2 4]
[ 6 8 10]
[12 14 16]
[18 20 22]]
# :,:,1
[[ 1 3 5]
[ 7 9 11]
[13 15 17]
[19 21 23]]
There are lots of questions about numpy-slicing on SO, some of which are worthwhile studying to advance your knowledge (suggested as probably dupes but they do not address the confusion correctly):
Numpy extract submatrix
Selecting specific rows and columns from NumPy array
Below shows the output from numpy.ix_() function. What is the use of the output? It's structure is quite unique.
import numpy as np
>>> gfg = np.ix_([1, 2, 3, 4, 5, 6], [11, 12, 13, 14, 15, 16], [21, 22, 23, 24, 25, 26], [31, 32, 33, 34, 35, 36] )
>>> gfg
(array([[[[1]]],
[[[2]]],
[[[3]]],
[[[4]]],
[[[5]]],
[[[6]]]]),
array([[[[11]],
[[12]],
[[13]],
[[14]],
[[15]],
[[16]]]]),
array([[[[21],
[22],
[23],
[24],
[25],
[26]]]]),
array([[[[31, 32, 33, 34, 35, 36]]]]))
According to numpy doc:
Construct an open mesh from multiple sequences.
This function takes N 1-D sequences and returns N outputs with N dimensions each, such that the shape is 1 in all but one dimension and the dimension with the non-unit shape value cycles through all N dimensions.
Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].
numpy.ix_()'s main use is to create an open mesh so that we can use it to select specific indices from an array (specific sub-array). An easy example to understand it is:
Say you have a 2D array of shape (5,5), and you would like to select a sub-array that is constructed by selecting the rows 1 and 3 and columns 0 and 3. You can use np.ix_ to create a (index) mesh so as to be able to select the sub-array as follows in the example below:
a = np.arange(5*5).reshape(5,5)
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
sub_indices = np.ix_([1,3],[0,3])
(array([[1],
[3]]), array([[0, 3]]))
a[sub_indices]
[[ 5 8]
[15 18]]
which is basically the selected sub-array from a that is in rows array([[1],[3]]) and columns array([[0, 3]]):
col 0 col 3
| |
v v
[[ 0 1 2 3 4]
[ 5 6 7 8 9] <- row 1
[10 11 12 13 14]
[15 16 17 18 19] <- row 3
[20 21 22 23 24]]
Please note in the output of the np.ix_, the N-arrays returned for the N 1-D input indices you feed to np.ix_ are returned in a way that first one is for rows, second one is for columns, third one is for depth and so on. That is why in the above example, array([[1],[3]]) is for rows and array([[0, 3]]) is for columns. Same goes for the example OP provided in the question. The reason behind it is the way numpy uses advanced indexing for multi-dimensional arrays.
It's basically used to create N array mask or arrays of indexes each one referring to a different dimension.
For example if I've a 3d np.ndarray and I want to get only some entries of it I can use numpy.ix to create 3 arrays that will have shapes like (N,1,1) (1,N,1) and (1,1,N) containing the corresponding index for each one of the 3 axis.
Take a look at the examples at numpy documentation page. They're self explanatory.
This function isn't commonly used.
I think it's used in some algebra operations like cross product and it's generalisations.
I have a 3D multidimensional arr = (x, y, z) shaped numpy array. Shape = (10000, 99, 2) in this example.
I.e. we have 10000 instances of 99 x 2 two dimensional arrays.
I would like to sort the whole array by the values in the z index i.e. ranking according to the 99 variables across rows in each column, across each instance.
Is there an easy way to do this using vectorisation? I'm aware I could loop over 10000 iterations, sorting the 2d array like below and combining to a 3d output.
np.unique(arr[:,0], return_inverse=True)
np.unique(arr[:,1], return_inverse=True)
Given I have 10000 outer instances, I am however interested in avoiding loops and sorting all 10000 values in a more efficient manner.
I am not sure if I understand the z score completely, but you can try:
np.sort(arr,axis=1)
An example 3-d input:
import numpy as np
rng_seed = 42 # control reproducibility
rng = np.random.RandomState(rng_seed)
arr=rng.randint(0,40,20).reshape(2,5,2)
The input looks like:
[[[38 28]
[14 7]
[20 38]
[18 22]
[10 10]]
[[23 35]
[39 23]
[ 2 21]
[ 1 23]
[29 37]]]
Applying:
arr1=np.sort(arr,axis=1)
print (arr1)
Gives you the sorted array based on column within each instance:
[[[10 7]
[14 10]
[18 22]
[20 28]
[38 38]]
[[ 1 21]
[ 2 23]
[23 23]
[29 35]
[39 37]]]
If you want the rank of each value instead, try:
arr_rank = arr.argsort(axis=1)
print (arr_rank)
The output is:
[[[4 1]
[1 4]
[3 3]
[2 0]
[0 2]]
[[3 2]
[2 1]
[0 3]
[4 0]
[1 4]]]
I have a question about how to apply a function to vectors in a 3D numpy array.
My problem is the following: let's say I have an array like this one:
a = np.arange(24)
a = a.reshape([4,3,2])
I want to apply a function to all following vectors to modify them:
[0 6], [1 7], [2 8], [4 10], [3 9] ...
What is the best method to use? As my array is quite big, looping in two of the three dimension is quite long...
Thanks in advance!
You can use function np.apply_along_axis. From the doc:
Apply a function to 1-D slices along the given axis.
For example:
>>> import numpy as np
>>> a = np.arange(24)
>>> a = a.reshape([4,3,2])
>>>
>>> def my_func(a):
... print "vector: " + str(a)
... return sum(a) / len(a)
...
>>> np.apply_along_axis(my_func, 0, a)
vector: [ 0 6 12 18]
vector: [ 1 7 13 19]
vector: [ 2 8 14 20]
vector: [ 3 9 15 21]
vector: [ 4 10 16 22]
vector: [ 5 11 17 23]
array([[ 9, 10],
[11, 12],
[13, 14]])
In example above I've used 0th axis. If you need n axes you can execute this function n times.
I have a large 3d np.ndarray of data that represents a physical variable sampled over a volume in a regular grid fashion (as in the value in array[0,0,0] represents the value at physical coords (0,0,0)).
I would like to go to a finer grid spacing by interpolating the data in the rough grid. At the moment I'm using scipy griddata linear interpolation but it's pretty slow (~90secs for 20x20x20 array). It's a bit overengineered for my purposes, allowing random sampling of the volume data. Is there anything out there that can take advantage of my regularly spaced data and the fact that there is only a limited set of specific points I want to interpolate to?
Sure! There are two options that do different things but both exploit the regularly-gridded nature of the original data.
The first is scipy.ndimage.zoom. If you just want to produce a denser regular grid based on interpolating the original data, this is the way to go.
The second is scipy.ndimage.map_coordinates. If you'd like to interpolate a few (or many) arbitrary points in your data, but still exploit the regularly-gridded nature of the original data (e.g. no quadtree required), it's the way to go.
"Zooming" an array (scipy.ndimage.zoom)
As a quick example (This will use cubic interpolation. Use order=1 for bilinear, order=0 for nearest, etc.):
import numpy as np
import scipy.ndimage as ndimage
data = np.arange(9).reshape(3,3)
print 'Original:\n', data
print 'Zoomed by 2x:\n', ndimage.zoom(data, 2)
This yields:
Original:
[[0 1 2]
[3 4 5]
[6 7 8]]
Zoomed by 2x:
[[0 0 1 1 2 2]
[1 1 1 2 2 3]
[2 2 3 3 4 4]
[4 4 5 5 6 6]
[5 6 6 7 7 7]
[6 6 7 7 8 8]]
This also works for 3D (and nD) arrays. However, be aware that if you zoom by 2x, for example, you'll zoom along all axes.
data = np.arange(27).reshape(3,3,3)
print 'Original:\n', data
print 'Zoomed by 2x gives an array of shape:', ndimage.zoom(data, 2).shape
This yields:
Original:
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
Zoomed by 2x gives an array of shape: (6, 6, 6)
If you have something like a 3-band, RGB image that you'd like to zoom, you can do this by specifying a sequence of tuples as the zoom factor:
print 'Zoomed by 2x along the last two axes:'
print ndimage.zoom(data, (1, 2, 2))
This yields:
Zoomed by 2x along the last two axes:
[[[ 0 0 1 1 2 2]
[ 1 1 1 2 2 3]
[ 2 2 3 3 4 4]
[ 4 4 5 5 6 6]
[ 5 6 6 7 7 7]
[ 6 6 7 7 8 8]]
[[ 9 9 10 10 11 11]
[10 10 10 11 11 12]
[11 11 12 12 13 13]
[13 13 14 14 15 15]
[14 15 15 16 16 16]
[15 15 16 16 17 17]]
[[18 18 19 19 20 20]
[19 19 19 20 20 21]
[20 20 21 21 22 22]
[22 22 23 23 24 24]
[23 24 24 25 25 25]
[24 24 25 25 26 26]]]
Arbitrary interpolation of regularly-gridded data using map_coordinates
The first thing to undersand about map_coordinates is that it operates in pixel coordinates (e.g. just like you'd index the array, but the values can be floats). From your description, this is exactly what you want, but if often confuses people. For example, if you have x, y, z "real-world" coordinates, you'll need to transform them to index-based "pixel" coordinates.
At any rate, let's say we wanted to interpolate the value in the original array at position 1.2, 0.3, 1.4.
If you're thinking of this in terms of the earlier RGB image case, the first coordinate corresponds to the "band", the second to the "row" and the last to the "column". What order corresponds to what depends entirely on how you decide to structure your data, but I'm going to use these as "z, y, x" coordinates, as it makes the comparison to the printed array easier to visualize.
import numpy as np
import scipy.ndimage as ndimage
data = np.arange(27).reshape(3,3,3)
print 'Original:\n', data
print 'Sampled at 1.2, 0.3, 1.4:'
print ndimage.map_coordinates(data, [[1.2], [0.3], [1.4]])
This yields:
Original:
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
Sampled at 1.2, 0.3, 1.4:
[14]
Once again, this is cubic interpolation by default. Use the order kwarg to control the type of interpolation.
It's worth noting here that all of scipy.ndimage's operations preserve the dtype of the original array. If you want floating point results, you'll need to cast the original array as a float:
In [74]: ndimage.map_coordinates(data.astype(float), [[1.2], [0.3], [1.4]])
Out[74]: array([ 13.5965])
Another thing you may notice is that the interpolated coordinates format is rather cumbersome for a single point (e.g. it expects a 3xN array instead of an Nx3 array). However, it's arguably nicer when you have sequences of coordinate. For example, consider the case of sampling along a line that passes through the "cube" of data:
xi = np.linspace(0, 2, 10)
yi = 0.8 * xi
zi = 1.2 * xi
print ndimage.map_coordinates(data, [zi, yi, xi])
This yields:
[ 0 1 4 8 12 17 21 24 0 0]
This is also a good place to mention how boundary conditions are handled. By default, anything outside of the array is set to 0. Thus the last two values in the sequence are 0. (i.e. zi is > 2 for the last two elements).
If we wanted the points outside the array to be, say -999 (We can't use nan as this is an integer array. If you want nan, you'll need to cast to floats.):
In [75]: ndimage.map_coordinates(data, [zi, yi, xi], cval=-999)
Out[75]: array([ 0, 1, 4, 8, 12, 17, 21, 24, -999, -999])
If we wanted it to return the nearest value for points outside the array, we'd do:
In [76]: ndimage.map_coordinates(data, [zi, yi, xi], mode='nearest')
Out[76]: array([ 0, 1, 4, 8, 12, 17, 21, 24, 25, 25])
You can also use "reflect" and "wrap" as boundary modes, in addition to "nearest" and the default "constant". These are fairly self-explanatory, but try experimenting a bit if you're confused.
For example, let's interpolate a line along the first row of the first band in the array that extends for twice the distance of the array:
xi = np.linspace(0, 5, 10)
yi, zi = np.zeros_like(xi), np.zeros_like(xi)
The default give:
In [77]: ndimage.map_coordinates(data, [zi, yi, xi])
Out[77]: array([0, 0, 1, 2, 0, 0, 0, 0, 0, 0])
Compare this to:
In [78]: ndimage.map_coordinates(data, [zi, yi, xi], mode='reflect')
Out[78]: array([0, 0, 1, 2, 2, 1, 2, 1, 0, 0])
In [78]: ndimage.map_coordinates(data, [zi, yi, xi], mode='wrap')
Out[78]: array([0, 0, 1, 2, 0, 1, 1, 2, 0, 1])
Hopefully that clarifies things a bit!
Great answer by Joe. Based on his suggestion, I created the regulargrid package (https://pypi.python.org/pypi/regulargrid/, source at https://github.com/JohannesBuchner/regulargrid)
It provides support for n-dimensional Cartesian grids (as needed here) via the very fast scipy.ndimage.map_coordinates for arbitrary coordinate scales.