Related
I am very confused when it comes to the logic of the NumPy axis argument. In some cases it affects the row when axis = 0 and in some cases it affects the columns when axis = 0. Example:
a = np.array([[1,3,6,7,4],[3,2,5,9,1]])
array([[1,3,6,7,4],
[3,2,5,9,1]])
np.sort(a, axis = 0) #This sorts the columns
array([[1, 2, 5, 7, 1],
[3, 3, 6, 9, 4]])
np.sort(a, axis=1) #This sorts the rows
array([[1, 3, 4, 6, 7],
[1, 2, 3, 5, 9]])
#####################################################################
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
np.delete(arr,obj = 1, axis = 0) # This deletes the row
array([[ 1, 2, 3, 4],
[ 9, 10, 11, 12]])
np.delete(arr,obj = 1, axis = 1) #This deletes the column
array([[ 1, 3, 4],
[ 5, 7, 8],
[ 9, 11, 12]])
If there is some logic here that I am missing I would love to learn it.
It's perhaps simplest to remember it as 0=down and 1=across.
This means:
Use axis=0 to apply a method down each column, or to the row labels (the index).
Use axis=1 to apply a method across each row, or to the column labels.
Here's a picture to show the parts of a DataFrame that each axis refers to:
It's also useful to remember that Pandas follows NumPy's use of the word axis. The usage is explained in NumPy's glossary of terms:
Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]
So, concerning the method in the question, np.sort(axis=1), seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, np.sort(axis=0) would be an operation acting vertically downwards across rows.
Similarly, np.delete(name, axis=1) refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0 would make the method act on rows instead.
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
# array([[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12]])
arr has 2 dimensions, use the empty slice : to select the first and second axis arr[:,:]. From the documentation of np.delete regarding the second parameter obj:
obj : slice, int or array of ints
Indicate indices of sub-arrays to remove along the specified axis.
If we want to delete obj=1 from axis=0 we are effectively removing arr[[1],:] from arr
arr[[1],:] # array([[5, 6, 7, 8]])
With the same intuition, we can remove obj=1 from axis=1
arr[:,[1]] # array([[ 2],
# [ 6],
# [10]])
When sorting the array arr above along axis=0 we are comparing the following elements:
# array([[1, 2, 5, 7, 1]])
# array([[5, 6, 7, 8]])
# array([[ 9, 10, 11, 12]])
The array is already sorted in this case but the comparison is done between two rows. For example array([[5, 6, 7, 8]]) is compared with array([[ 9, 10, 11, 12]]) by doing an element-wise comparison.
Sorting the array on axis=1 we are comparing the following elements
# array([[1], array([[ 2], array([[ 3], array([[ 4],
# [5], [ 6], [ 7], [ 8],
# [9]]) [10]]) [11]]) [12]])
Notice the difference of axis usage between np.delete and np.sort. np.delete will remove the complete row/column while np.sort will use the complete row/column for comparison.
Suppose I have a 2D NumPy array values. I want to add new column to it. New column should be values[:, 19] but lagged by one sample (first element equals to zero). It could be returned as np.append([0], values[0:-2:1, 19]). I tried: Numpy concatenate 2D arrays with 1D array
temp = np.append([0], [values[1:-2:1, 19]])
values = np.append(dataset.values, temp[:, None], axis=1)
but I get:
ValueError: all the input array dimensions except for the concatenation axis
must match exactly
I tried using c_ too as:
temp = np.append([0], [values[1:-2:1, 19]])
values = np.c_[values, temp]
but effect is the same. How this concatenation could be made. I think problem is in temp orientation - it is treated as a row instead of column, so there is an issue with dimensions. In Octave ' (transpose operator) would do the trick. Maybe there is similiar solution in NumPy?
Anyway, thank you for you time.
Best regards,
Max
In [76]: values = np.arange(16).reshape(4,4)
In [77]: temp = np.concatenate(([0], values[1:,-1]))
In [78]: values
Out[78]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [79]: temp
Out[79]: array([ 0, 7, 11, 15])
This use of concatenate to make temp is similar to your use of append (which actually uses concatenate).
Sounds like you want to join values and temp in this way:
In [80]: np.concatenate((values, temp[:,None]),axis=1)
Out[80]:
array([[ 0, 1, 2, 3, 0],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11],
[12, 13, 14, 15, 15]])
Again I prefer using concatenate directly.
You need to convert the 1D array to 2D as shown. You can then use vstack or hstack with reshaping to get the final array you want as shown:
a = np.array([[1, 2, 3],[4, 5, 6]])
b = np.array([[7, 8, 9]])
c = np.vstack([ele for ele in [a, b]])
print(c)
c = np.hstack([a.reshape(1,-1) for a in [a,b]]).reshape(-1,3)
print(c)
Either way, the output is:
[[1 2 3] [4 5 6] [7 8 9]]
Hope I understood the question correctly
The following example illustartes my question clearly :
suppose their is an array 'arr'
>>import numpy as np
>>from skimage.util.shape import view_as_blocks
>>arr=np.array([[1,2,3,4,5,6,7,8],[1,2,3,4,5,6,7,8],[9,10,11,12,13,14,15,16],[17,18,19,20,21,22,23,24]])
>>arr
array([[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16],
[17, 18, 19, 20, 21, 22, 23, 24]])
I segmented this array in to 2*2 blocks using :
>>img= view_as_blocks(arr, block_shape=(2,2))
>>img
array([[[[ 1, 2],
[ 1, 2]],
[[ 3, 4],
[ 3, 4]],
[[ 5, 6],
[ 5, 6]],
[[ 7, 8],
[ 7, 8]]],
[[[ 9, 10],
[17, 18]],
[[11, 12],
[19, 20]],
[[13, 14],
[21, 22]],
[[15, 16],
[23, 24]]]])
I have an other array "cor"
>>cor
(array([0, 1, 1], dtype=int64), array([2, 1, 3], dtype=int64))
In "cor" the 1st array ([0,1,1]) gives the coordinates of rows and 2nd array ([2,1,3]) gives the coordinates of corresponding columns in sequential order.
Now my work is to access segments of img whose positional coordinates are [0,2],[1,1]and [1,3] (taken from "cor". x from 1st array and corresponding y from 2nd array) automatically by reading "cor".
In the above example
img[0,2]= [[ 5, 6], img[1,1]= [[11, 12], img[1,3]=[[15, 16],
[ 5, 6]], [19, 20]] [23, 24]]
then find the mean value of each segment seperately.
ie. img[0,2]=5.5 img[1,1]=15.5 img[1,3]=19.5
Now, check if its mean values are less than the mean vlaue of whole array "img".
Here, mean value of img is 10.5. hence only mean value of img[0,2] is less than 10.5.
Therefore finally return coordinate of segment img[0,2] ie [0,2] as output in sequential order if more segments exists in any other big array.
##expected output for above example:
[0,2]
We simply need to index with cor and perform those mean computations (along last two axes) and check -
# Convert to array format
In [229]: cor = np.asarray(cor)
# Index into `img` with tuple version of `cor`, so that we get all the
# blocks in one go and then compute mean along last two axes i.e. 1,2.
# Then compare against global mean - `img.mean()` to give us a valid
# mask. Then index into columns of `cor with it, to give us a slice of
# valid `cor`. Finally transpose, so that we get per row valid indices set.
In [254]: cor[:,img[tuple(cor)].mean((1,2))<img.mean()].T
Out[254]: array([[0, 2]])
Another way to set it up, would be to split up the indices -
In [235]: r,c = cor
In [236]: v = img[r,c].mean((1,2))<img.mean() # or img[cor].mean((1,2))<img.mean()
In [237]: r[v],c[v]
Out[237]: (array([0]), array([2]))
Same as first approach, with the only difference of using split indices to index into cor and getting the final indices.
Or a compact version -
In [274]: np.asarray(cor).T[img[cor].mean((1,2))<img.mean()]
Out[274]: array([[0, 2]])
In this solution, we are directly feeding in the original tuple version of cor, rest being same as approach#1.
I want to split a numpy array into two subarrays where the splitting point is based on a column id, i.e., vertical split. For instance, if I generate a numpy array of shape [10,16] and I want to create two subarrays by splitting it from the column's index 11, then I should get one subarray of size [10,10] and the other one is from [10,15]. Therefore, I am following numpy.hsplit here but it seems it only does an even split (the subarrays need to be equal). I want to be able to:
Split any numpy array vertically, no matter what is the size of subarrays.
Extract both subarrays.
To simulate my request, the following is my code:
import numpy as np
C = [[1,2,3,4],[5,6,7,8],[9,10,11,12], [13,14,15,16]]
C = np.asarray(C)
C = np.hsplit(C, 3)
print(C)
As you can see, np.hsplit(C, 3) doesn't work unless the splitting generates similar subarrays. Even if I did np.hsplit(C, 2), I don't know how to extract both subarrays into separate numpy arrays.
To achieve my goals, how can I modify this code?
Use the array indexing.
C[:,:3] # All rows , columns 0 to 2
Out[29]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11],
[13, 14, 15]])
C[:,3:] # All rows column 3 (to end in this case also 3).
Out[30]:
array([[ 4],
[ 8],
[12],
[16]])
You need to specify the indices as list:
import numpy as np
C = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]
C = np.asarray(C)
C = np.hsplit(C, [3])
print(C)
Output
[array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11],
[13, 14, 15]]), array([[ 4],
[ 8],
[12],
[16]])]
I'm having a problem with np.append.
I'm trying to duplicate the last column of 20x361 matrix n_list_converted by using the code below:
n_last = []
n_last = n_list_converted[:, -1]
n_lists = np.append(n_list_converted, n_last, axis=1)
But I get error:
ValueError: all the input arrays must have same number of dimensions
However, I've checked the matrix dimensions by doing
print(n_last.shape, type(n_last), n_list_converted.shape, type(n_list_converted))
and I get
(20L,) (20L, 361L)
so the dimensions match? Where is the mistake?
If I start with a 3x4 array, and concatenate a 3x1 array, with axis 1, I get a 3x5 array:
In [911]: x = np.arange(12).reshape(3,4)
In [912]: np.concatenate([x,x[:,-1:]], axis=1)
Out[912]:
array([[ 0, 1, 2, 3, 3],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11]])
In [913]: x.shape,x[:,-1:].shape
Out[913]: ((3, 4), (3, 1))
Note that both inputs to concatenate have 2 dimensions.
Omit the :, and x[:,-1] is (3,) shape - it is 1d, and hence the error:
In [914]: np.concatenate([x,x[:,-1]], axis=1)
...
ValueError: all the input arrays must have same number of dimensions
The code for np.append is (in this case where axis is specified)
return concatenate((arr, values), axis=axis)
So with a slight change of syntax append works. Instead of a list it takes 2 arguments. It imitates the list append is syntax, but should not be confused with that list method.
In [916]: np.append(x, x[:,-1:], axis=1)
Out[916]:
array([[ 0, 1, 2, 3, 3],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11]])
np.hstack first makes sure all inputs are atleast_1d, and then does concatenate:
return np.concatenate([np.atleast_1d(a) for a in arrs], 1)
So it requires the same x[:,-1:] input. Essentially the same action.
np.column_stack also does a concatenate on axis 1. But first it passes 1d inputs through
array(arr, copy=False, subok=True, ndmin=2).T
This is a general way of turning that (3,) array into a (3,1) array.
In [922]: np.array(x[:,-1], copy=False, subok=True, ndmin=2).T
Out[922]:
array([[ 3],
[ 7],
[11]])
In [923]: np.column_stack([x,x[:,-1]])
Out[923]:
array([[ 0, 1, 2, 3, 3],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11]])
All these 'stacks' can be convenient, but in the long run, it's important to understand dimensions and the base np.concatenate. Also know how to look up the code for functions like this. I use the ipython ?? magic a lot.
And in time tests, the np.concatenate is noticeably faster - with a small array like this the extra layers of function calls makes a big time difference.
(n,) and (n,1) are not the same shape. Try casting the vector to an array by using the [:, None] notation:
n_lists = np.append(n_list_converted, n_last[:, None], axis=1)
Alternatively, when extracting n_last you can use
n_last = n_list_converted[:, -1:]
to get a (20, 1) array.
The reason why you get your error is because a "1 by n" matrix is different from an array of length n.
I recommend using hstack() and vstack() instead.
Like this:
import numpy as np
a = np.arange(32).reshape(4,8) # 4 rows 8 columns matrix.
b = a[:,-1:] # last column of that matrix.
result = np.hstack((a,b)) # stack them horizontally like this:
#array([[ 0, 1, 2, 3, 4, 5, 6, 7, 7],
# [ 8, 9, 10, 11, 12, 13, 14, 15, 15],
# [16, 17, 18, 19, 20, 21, 22, 23, 23],
# [24, 25, 26, 27, 28, 29, 30, 31, 31]])
Notice the repeated "7, 15, 23, 31" column.
Also, notice that I used a[:,-1:] instead of a[:,-1]. My version generates a column:
array([[7],
[15],
[23],
[31]])
Instead of a row array([7,15,23,31])
Edit: append() is much slower. Read this answer.
You can also cast (n,) to (n,1) by enclosing within brackets [ ].
e.g. Instead of np.append(b,a,axis=0) use np.append(b,[a],axis=0)
a=[1,2]
b=[[5,6],[7,8]]
np.append(b,[a],axis=0)
returns
array([[5, 6],
[7, 8],
[1, 2]])
I normally use np.row_stack((ndarray_1, ndarray_2, ..., ndarray_nth))
Assuming your ndarrays are indeed the same shape, this should work for you
n_last = []
n_last = n_list_converted[:, -1]
n_lists = np.row_stack((n_list_converted, n_last))