Say you have a 3D array as follows:
a = np.random.uniform(0,10,(3,4,4))
a
Out[167]:
array([[[6.11382489, 5.33572952, 2.6994938 , 5.32924568],
[0.02494179, 9.5813176 , 3.78090323, 7.73698908],
[0.4559432 , 3.14531716, 4.18929635, 9.44256735],
[7.05641989, 0.51355523, 6.61806454, 1.3124488 ]],
[[9.79806021, 6.9343234 , 3.96018673, 8.97424501],
[3.25146771, 5.06744849, 6.05870707, 2.27286515],
[4.66656429, 6.92791142, 7.1623226 , 5.34108811],
[6.09831564, 9.52367529, 8.27257007, 8.01510805]],
[[5.62545596, 9.01048599, 6.76713644, 7.71836144],
[5.59842752, 0.34003062, 8.07114444, 8.5382837 ],
[0.20420194, 6.39088367, 4.97895935, 4.26247875],
[1.2701483 , 8.35244104, 2.69965027, 8.39305974]]])
Is there a way to get the minimum values in the slices along axis=0 as one array efficiently?
So in this case I would specify axis=0 (i.e. the axis with dimension length=3) and return the minimum values: (0.02494179, 2.27286515, 0.20420194).
I feel like this is a simple problem but I can't seem to get it to work, so any help on the matter would be greatly appreciated!
If I got it right, you just have to apply "min" twice
for instance:
>>> np.random.seed(1) #reproduce the same results
>>> a = np.random.randint(0,10,(3,2,4)) #using int is easier to understand
Out[4]:
array([[[5, 8, 9, 5],
[0, 0, 1, 7]],
[[6, 9, 2, 4],
[5, 2, 4, 2]],
[[4, 7, 7, 9],
[1, 7, 0, 6]]])
>>> a.min(axis=0).min(axis=0)
Out[5]: array([0, 0, 0, 2])
It is the first time I post an answer, I hope I did okay.
Related
Solution: #QuangHoang's first comment namely np.linalg.norm(arr,axis=1).
I would like to apply Numpy's linalg.norm function column wise to sub-arrays of a 3D array by using ranges (or indices?), similar in functionality to what ufunc.reduceat does.
Given the following array:
import numpy as np
In []: arr = np.array([[0,1,2,3], [2,2,3,4], [3,2,5,6],
[1,7,1,9], [1,4,8,6], [2,3,5,8],
[2,5,7,3], [2,3,4,6], [2,5,3,2]]).reshape(3,3,4)
Out []: array([[[0, 1, 2, 3],
[2, 2, 3, 4],
[3, 2, 5, 6]],
[[1, 7, 1, 9],
[1, 4, 8, 6],
[2, 3, 5, 8]],
[[2, 5, 7, 3],
[2, 3, 4, 6],
[2, 5, 3, 2]]])
I would like to apply linalg.norm column wise to the three sub-arrays separately i.e. for the first column it would be linalg.norm([0, 2, 3]), linalg.norm([1, 1, 2]) and linalg.norm([2, 2, 2]), for the second linalg.norm([1, 2, 2]), linalg.norm([7, 4, 3]) and linalg.norm([5, 3, 5]) etc. resulting in a 2D vector with shape (3,4) containing the results of the linalg.norm calls.
Doing this with a 2D array is straightforward by specifying the axis:
import numpy.linalg as npla
In []: npla.norm(np.array([[0,1,2,3], [2,2,3,4], [3,2,5,6]]), axis=0)
Out []: array([3.60555128, 3. , 6.164414 , 7.81024968])
But I don't understand how to do that for each sub-array separately. I believe that reduceat with a ufunc like add allows to set indices and ranges. Would something similar be possible here but with linalg.norm?
Edit 1:
I followed #hpaulj's advice to look at the code used for add.reduce. Getting a better understanding of the method I was able to search more precisely and I found np.apply_along_axis which is exactly what I was looking for:
In []: np.apply_along_axis(npla.norm, 1, arr)
Out []: array([[ 3.60555128, 3. , 6.164414 , 7.81024968],
[ 2.44948974, 8.60232527, 9.48683298, 13.45362405],
[ 3.46410162, 7.68114575, 8.60232527, 7. ]])
However, this method is very slow. Is there a way to use linalg.nrom in a vectorized manner instead?
Edit 2:
#QuangHoang's first comment is actually the correct answer I was looking for. I misunderstood the method which is why I misunderstood their comment. Specifying the axis in the linalg.norm call is what is required here:
np.linalg.norm(arr,axis=1)
I am very confused when it comes to the logic of the NumPy axis argument. In some cases it affects the row when axis = 0 and in some cases it affects the columns when axis = 0. Example:
a = np.array([[1,3,6,7,4],[3,2,5,9,1]])
array([[1,3,6,7,4],
[3,2,5,9,1]])
np.sort(a, axis = 0) #This sorts the columns
array([[1, 2, 5, 7, 1],
[3, 3, 6, 9, 4]])
np.sort(a, axis=1) #This sorts the rows
array([[1, 3, 4, 6, 7],
[1, 2, 3, 5, 9]])
#####################################################################
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
np.delete(arr,obj = 1, axis = 0) # This deletes the row
array([[ 1, 2, 3, 4],
[ 9, 10, 11, 12]])
np.delete(arr,obj = 1, axis = 1) #This deletes the column
array([[ 1, 3, 4],
[ 5, 7, 8],
[ 9, 11, 12]])
If there is some logic here that I am missing I would love to learn it.
It's perhaps simplest to remember it as 0=down and 1=across.
This means:
Use axis=0 to apply a method down each column, or to the row labels (the index).
Use axis=1 to apply a method across each row, or to the column labels.
Here's a picture to show the parts of a DataFrame that each axis refers to:
It's also useful to remember that Pandas follows NumPy's use of the word axis. The usage is explained in NumPy's glossary of terms:
Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]
So, concerning the method in the question, np.sort(axis=1), seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, np.sort(axis=0) would be an operation acting vertically downwards across rows.
Similarly, np.delete(name, axis=1) refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0 would make the method act on rows instead.
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
# array([[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12]])
arr has 2 dimensions, use the empty slice : to select the first and second axis arr[:,:]. From the documentation of np.delete regarding the second parameter obj:
obj : slice, int or array of ints
Indicate indices of sub-arrays to remove along the specified axis.
If we want to delete obj=1 from axis=0 we are effectively removing arr[[1],:] from arr
arr[[1],:] # array([[5, 6, 7, 8]])
With the same intuition, we can remove obj=1 from axis=1
arr[:,[1]] # array([[ 2],
# [ 6],
# [10]])
When sorting the array arr above along axis=0 we are comparing the following elements:
# array([[1, 2, 5, 7, 1]])
# array([[5, 6, 7, 8]])
# array([[ 9, 10, 11, 12]])
The array is already sorted in this case but the comparison is done between two rows. For example array([[5, 6, 7, 8]]) is compared with array([[ 9, 10, 11, 12]]) by doing an element-wise comparison.
Sorting the array on axis=1 we are comparing the following elements
# array([[1], array([[ 2], array([[ 3], array([[ 4],
# [5], [ 6], [ 7], [ 8],
# [9]]) [10]]) [11]]) [12]])
Notice the difference of axis usage between np.delete and np.sort. np.delete will remove the complete row/column while np.sort will use the complete row/column for comparison.
The [::n] indexing option in numpy provides a very useful way to index every nth item in a list. However, is it possible to use this feature to extract multiple values, e.g. every other pair of values?
For example:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
And I want to extract every other pair of values i.e. I want to return
a[0, 1, 4, 5, 8, 9,]
Of course the index could be built using loops or something, but I wonder if there's a faster way to use ::-style indexing in numpy but also specifying the width of the pattern to take every nth iteration of.
Thanks
With length of array being a multiple of the window size -
In [29]: W = 2 # window-size
In [30]: a.reshape(-1,W)[::2].ravel()
Out[30]: array([0, 1, 4, 5, 8, 9])
Explanation with breaking-down-the-steps -
# Reshape to split into W-sized groups
In [43]: a.reshape(-1,W)
Out[43]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
# Use stepsize to select every other pair starting from the first one
In [44]: a.reshape(-1,W)[::2]
Out[44]:
array([[0, 1],
[4, 5],
[8, 9]])
# Flatten for desired output
In [45]: a.reshape(-1,W)[::2].ravel()
Out[45]: array([0, 1, 4, 5, 8, 9])
If you are okay with 2D output, skip the last step as that still be a view into the input and virtually free on runtime. Let's verify the view-part -
In [47]: np.shares_memory(a,a.reshape(-1,W)[::2])
Out[47]: True
For generic case of not necessarily a multiple, we can use a masking based one -
In [64]: a[(np.arange(len(a))%(2*W))<W]
Out[64]: array([0, 1, 4, 5, 8, 9])
You can do that reshaping the array into a nx3 matrix, then slice up the first two elements for each row and finally flatten up the reshaped array:
a.reshape((-1,3))[:,:2].flatten()
resulting in:
array([ 0, 1, 3, 4, 6, 7, 9, 10])
I've run into this interaction with arrays that I'm a little confused. I can work around it, but for my own understanding, I'd like to know what is going on.
Essentially, I have a datafile that I'm trying to tailor so I can run this as an input for some code I've already written. This involves some calculations on some columns, rows, etc. In particular, I also need to rearrange some elements, where the original array isn't being modified as I expect it would.
import numpy as np
ex_data = np.arange(12).reshape(4,3)
ex_data[2,0] = 0 #Constructing some fake data
ex_data[ex_data[:,0] == 0][:,1] = 3
print ex_data
Basically, I look in a column of interest, collect all the rows where that column contains a parameter value of interest and just reassigning values.
With the snippet of code above, I would expect ex_data to have it's column 1 elements, conditional if it's column 0 element is equal to 0, to be assigned a value of 3. However what I'm seeing is that there is no effect at all.
>>> ex_data
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 0, 7, 8],
[ 9, 10, 11]])
In another case, if I don't 'slice', my 'sliced' data file, then the reassignment goes on as normal.
ex_data[ex_data[:,0] == 0] = 3
print ex_data
Here I'd expect my entire row, conditional to where column 0 is equal to 0, be populated with 3. This is what you see.
>>> ex_data
array([[ 3, 3, 3],
[ 3, 4, 5],
[ 3, 3, 3],
[ 9, 10, 11]])
Can anyone explain the interaction?
In [368]: ex_data
Out[368]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 0, 7, 8],
[ 9, 10, 11]])
The column 0 test:
In [369]: ex_data[:,0]==0
Out[369]: array([ True, False, True, False])
That boolean mask can be applied to the rows as:
In [370]: ex_data[ex_data[:,0]==0,0]
Out[370]: array([0, 0]) # the 0's you expected
In [371]: ex_data[ex_data[:,0]==0,1]
Out[371]: array([1, 7]) # the col 1 values you want to replace
In [372]: ex_data[ex_data[:,0]==0,1] = 3
In [373]: ex_data
Out[373]:
array([[ 0, 3, 2],
[ 3, 4, 5],
[ 0, 3, 8],
[ 9, 10, 11]])
The indexing you tried:
In [374]: ex_data[ex_data[:,0]==0]
Out[374]:
array([[0, 3, 2],
[0, 3, 8]])
produces a copy. Assigning ...[:,1]=3 just changes that copy, not the original array. Fortunately in this case, it is easy to use
ex_data[ex_data[:,0]==0,1]
instead of
ex_data[ex_data[:,0]==0][:,1]
I want to flip the first and second values of arrays in an array. A naive solution is to loop through the array. What is the right way of doing this?
import numpy as np
contour = np.array([[1, 4],
[3, 2]])
flipped_contour = np.empty((0,2))
for point in contour:
x_y_fipped = np.array([point[1], point[0]])
flipped_contour = np.vstack((flipped_contour, x_y_fipped))
print(flipped_contour)
[[4. 1.]
[2. 3.]]
Use the aptly named np.flip:
np.flip(contour, axis=1)
Or,
np.fliplr(contour)
array([[4, 1],
[2, 3]])
You can use numpy indexing:
contour[:, ::-1]
In addition to COLDSPEED's answer, if we only want to swap the first and second column only, not to flip the entire array:
contour[:, :2] = contour[:, 1::-1]
Here contour[:, 1::-1] is the array formed by first two columns of the array contour, in the reverse order. It then is assigned to the first two columns (contour[:, :2]). Now the first two column are swapped.
In general, to swap the ith and jth columns, do the following:
contour[:, [i, j]] = contour[:, [j, i]]
Here are two non-inplace ways of swapping the first two columns:
>>> a = np.arange(15).reshape(3, 5)
>>> a[:, np.r_[1:-1:-1, 2:5]]
array([[ 1, 0, 2, 3, 4],
[ 6, 5, 7, 8, 9],
[11, 10, 12, 13, 14]])
or
>>> np.c_[a[:, 1::-1], a[:, 2:]]
array([[ 1, 0, 2, 3, 4],
[ 6, 5, 7, 8, 9],
[11, 10, 12, 13, 14]])
>>> your_array[indices_to_flip] = np.flip(your_array[indices_to_flip], axis=1)