Related
a = np.array(list(range(16).reshape((4,4))
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Say I want the middle square. It'd seem reasonable to do this:
a[[1,2],[1,2]]
but I get this:
array([5, 10])
This works, but seems inelegant:
a[[1,2],:][:,[1,2]]
array([[5, 6],
[9, 10]])
So my questions are:
Why is it this way? What premises are required to make the implemented way sensible?
Is there a canonical way to select along more than one index at once?
I think you can read more details on advanced indexing. Basically, when you slice the array by lists/arrays, the arrays will be broadcast and iterate together.
In your case, you can do:
idx = np.array([1,3])
a[idx,idx[:,None]]
Or as in the doc above:
a[np.ix_(idx, idx)]
Output:
array([[ 5, 13],
[ 7, 15]])
You can do both slicing operations at once instead of creating a view and indexing that again:
import numpy as np
a = np.arange(16).reshape((4, 4))
# preferred if possible
print(a[1:3, 1:3])
# [[ 5 6]
# [ 9 10]]
# otherwise add a second dimension to the first index to make it broadcastable
index1 = np.asarray([1, 2])
index2 = np.asarray([1, 2])
print(a[index1[:, None], index2])
# [[ 5 6]
# [ 9 10]]
You could use multiple np.take to select indices from multiple axes
a = np.arange(16).reshape((4, 4))
idx = np.array([1,2])
np.take(np.take(a, idx, axis=1), idx, axis=0)
Or (slightly more readable)
a.take(idx, axis=1).take(idx, axis=0)
Output:
array([[ 5, 6],
[ 9, 10]])
np.take also allows you to conveniently wrap around out-of-bound indices and such.
I've been trying to look up how np.diag_indices work, and for examples of them, however the documentation for it is a bit light. I know this creates a diagonal array through your matrix, however I want to change the diagonal array (I was thinking of using a loop to change its dimensions or something along those lines).
I.E.
say we have a 3x2 matrix:
[[1 2]
[3 4]
[5 6]]
Now if I use np.diag_indices it will form a diagonal array starting at (0,0) and goes through (1,1).
[1 4]
However, I'd like this diagonal array to then shift one down. So now it starts at (0,1) and goes through (1,2).
[3 6]
However there are only 2 arguments for np.diag_indices, neither of which from the looks of it enable me to do this. Am I using the wrong tool to try and achieve this? If so, what tools can I use to create a changing diagonal array that goes through my matrix? (I'm looking for something that will also work on larger matrices like a 200x50).
The code for diag_indices is simple, so simple that I've never used it:
idx = arange(n)
return (idx,) * ndim
In [68]: np.diag_indices(4,2)
Out[68]: (array([0, 1, 2, 3]), array([0, 1, 2, 3]))
It just returns a tuple of arrays, the arange repeated n times. It's useful for indexing the main diagonal of a square matrix, e.g.
In [69]: arr = np.arange(16).reshape(4,4)
In [70]: arr
Out[70]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [71]: arr[np.diag_indices(4,2)]
Out[71]: array([ 0, 5, 10, 15])
The application is straight forward indexing with two arrays that match in shape.
It works on other shapes - if they are big enogh.
np.diag applied to the same array does the same thing:
In [72]: np.diag(arr)
Out[72]: array([ 0, 5, 10, 15])
but it also allows for offset:
In [73]: np.diag(arr, 1)
Out[73]: array([ 1, 6, 11])
===
Indexing with diag_indices does allow us to change that diagonal:
In [78]: arr[np.diag_indices(4,2)] += 10
In [79]: arr
Out[79]:
array([[10, 1, 2, 3],
[ 4, 15, 6, 7],
[ 8, 9, 20, 11],
[12, 13, 14, 25]])
====
But we don't have to use diag_indices to generate the desired indexing arrays:
In [80]: arr = np.arange(1,7).reshape(3,2)
In [81]: arr
Out[81]:
array([[1, 2],
[3, 4],
[5, 6]])
selecting values from 1st 2 rows, and columns:
In [82]: arr[np.arange(2), np.arange(2)]
Out[82]: array([1, 4])
In [83]: arr[np.arange(2), np.arange(2)] += 10
In [84]: arr
Out[84]:
array([[11, 2],
[ 3, 14],
[ 5, 6]])
and for a difference selection of rows:
In [85]: arr[np.arange(1,3), np.arange(2)] += 20
In [86]: arr
Out[86]:
array([[11, 2],
[23, 14],
[ 5, 26]])
The relevant documentation section on advanced indexing with integer arrays: https://numpy.org/doc/stable/reference/arrays.indexing.html#purely-integer-array-indexing
Suppose I have a 2D NumPy array values. I want to add new column to it. New column should be values[:, 19] but lagged by one sample (first element equals to zero). It could be returned as np.append([0], values[0:-2:1, 19]). I tried: Numpy concatenate 2D arrays with 1D array
temp = np.append([0], [values[1:-2:1, 19]])
values = np.append(dataset.values, temp[:, None], axis=1)
but I get:
ValueError: all the input array dimensions except for the concatenation axis
must match exactly
I tried using c_ too as:
temp = np.append([0], [values[1:-2:1, 19]])
values = np.c_[values, temp]
but effect is the same. How this concatenation could be made. I think problem is in temp orientation - it is treated as a row instead of column, so there is an issue with dimensions. In Octave ' (transpose operator) would do the trick. Maybe there is similiar solution in NumPy?
Anyway, thank you for you time.
Best regards,
Max
In [76]: values = np.arange(16).reshape(4,4)
In [77]: temp = np.concatenate(([0], values[1:,-1]))
In [78]: values
Out[78]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [79]: temp
Out[79]: array([ 0, 7, 11, 15])
This use of concatenate to make temp is similar to your use of append (which actually uses concatenate).
Sounds like you want to join values and temp in this way:
In [80]: np.concatenate((values, temp[:,None]),axis=1)
Out[80]:
array([[ 0, 1, 2, 3, 0],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11],
[12, 13, 14, 15, 15]])
Again I prefer using concatenate directly.
You need to convert the 1D array to 2D as shown. You can then use vstack or hstack with reshaping to get the final array you want as shown:
a = np.array([[1, 2, 3],[4, 5, 6]])
b = np.array([[7, 8, 9]])
c = np.vstack([ele for ele in [a, b]])
print(c)
c = np.hstack([a.reshape(1,-1) for a in [a,b]]).reshape(-1,3)
print(c)
Either way, the output is:
[[1 2 3] [4 5 6] [7 8 9]]
Hope I understood the question correctly
The following example illustartes my question clearly :
suppose their is an array 'arr'
>>import numpy as np
>>from skimage.util.shape import view_as_blocks
>>arr=np.array([[1,2,3,4,5,6,7,8],[1,2,3,4,5,6,7,8],[9,10,11,12,13,14,15,16],[17,18,19,20,21,22,23,24]])
>>arr
array([[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16],
[17, 18, 19, 20, 21, 22, 23, 24]])
I segmented this array in to 2*2 blocks using :
>>img= view_as_blocks(arr, block_shape=(2,2))
>>img
array([[[[ 1, 2],
[ 1, 2]],
[[ 3, 4],
[ 3, 4]],
[[ 5, 6],
[ 5, 6]],
[[ 7, 8],
[ 7, 8]]],
[[[ 9, 10],
[17, 18]],
[[11, 12],
[19, 20]],
[[13, 14],
[21, 22]],
[[15, 16],
[23, 24]]]])
I have an other array "cor"
>>cor
(array([0, 1, 1], dtype=int64), array([2, 1, 3], dtype=int64))
In "cor" the 1st array ([0,1,1]) gives the coordinates of rows and 2nd array ([2,1,3]) gives the coordinates of corresponding columns in sequential order.
Now my work is to access segments of img whose positional coordinates are [0,2],[1,1]and [1,3] (taken from "cor". x from 1st array and corresponding y from 2nd array) automatically by reading "cor".
In the above example
img[0,2]= [[ 5, 6], img[1,1]= [[11, 12], img[1,3]=[[15, 16],
[ 5, 6]], [19, 20]] [23, 24]]
then find the mean value of each segment seperately.
ie. img[0,2]=5.5 img[1,1]=15.5 img[1,3]=19.5
Now, check if its mean values are less than the mean vlaue of whole array "img".
Here, mean value of img is 10.5. hence only mean value of img[0,2] is less than 10.5.
Therefore finally return coordinate of segment img[0,2] ie [0,2] as output in sequential order if more segments exists in any other big array.
##expected output for above example:
[0,2]
We simply need to index with cor and perform those mean computations (along last two axes) and check -
# Convert to array format
In [229]: cor = np.asarray(cor)
# Index into `img` with tuple version of `cor`, so that we get all the
# blocks in one go and then compute mean along last two axes i.e. 1,2.
# Then compare against global mean - `img.mean()` to give us a valid
# mask. Then index into columns of `cor with it, to give us a slice of
# valid `cor`. Finally transpose, so that we get per row valid indices set.
In [254]: cor[:,img[tuple(cor)].mean((1,2))<img.mean()].T
Out[254]: array([[0, 2]])
Another way to set it up, would be to split up the indices -
In [235]: r,c = cor
In [236]: v = img[r,c].mean((1,2))<img.mean() # or img[cor].mean((1,2))<img.mean()
In [237]: r[v],c[v]
Out[237]: (array([0]), array([2]))
Same as first approach, with the only difference of using split indices to index into cor and getting the final indices.
Or a compact version -
In [274]: np.asarray(cor).T[img[cor].mean((1,2))<img.mean()]
Out[274]: array([[0, 2]])
In this solution, we are directly feeding in the original tuple version of cor, rest being same as approach#1.
This is an indirect indexing problem.
It can be solved with a list comprehension.
The question is whether, or, how to solve it within numpy,
When
data.shape is (T,N)
and
c.shape is (T,K)
and each element of c is an int between 0 and N-1 inclusive, that is,
each element of c is intended to refer to a column number from data.
The goal is to obtain out where
out.shape = (T,K)
And for each i in 0..(T-1)
the row out[i] = [ data[i, c[i,0]] , ... , data[i, c[i,K-1]] ]
Concrete example:
data = np.array([\
[ 0, 1, 2],\
[ 3, 4, 5],\
[ 6, 7, 8],\
[ 9, 10, 11],\
[12, 13, 14]])
c = np.array([
[0, 2],\
[1, 2],\
[0, 0],\
[1, 1],\
[2, 2]])
out should be out = [[0, 2], [4, 5], [6, 6], [10, 10], [14, 14]]
The first row of out is [0,2] because the columns chosen are given by c's row 0, they are 0 and 2, and data[0] at columns 0 and 2 are 0 and 2.
The second row of out is [4,5] because the columns chosen are given by c's row 1, they are 1 and 2, and data[1] at columns 1 and 2 is 4 and 5.
Numpy fancy indexing doesn't seem to solve this in an obvious way because indexing data with c (e.g. data[c], np.take(data,c,axis=1) ) always produces a 3 dimensional array.
A list comprehension can solve it:
out = [ [data[rowidx,i1],data[rowidx,i2]] for (rowidx, (i1,i2)) in enumerate(c) ]
if K is 2 I suppose this is marginally OK. If K is variable, this is not so good.
The list comprehension has to be rewritten for each value K, because it unrolls the columns picked out of data by each row of c. It also violates DRY.
Is there a solution based entirely in numpy?
You can avoid loops with np.choose:
In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
data = np.array([\
[ 0, 1, 2],\
[ 3, 4, 5],\
[ 6, 7, 8],\
[ 9, 10, 11],\
[12, 13, 14]])
c = np.array([
[0, 2],\
[1, 2],\
[0, 0],\
[1, 1],\
[2, 2]])
--
In [2]: np.choose(c, data.T[:,:,np.newaxis])
Out[2]:
array([[ 0, 2],
[ 4, 5],
[ 6, 6],
[10, 10],
[14, 14]])
Here's one possible route to a general solution...
Create masks for data to select the values for each column of out. For example, the first mask could be achieved by writing:
>>> np.arange(3) == np.vstack(c[:,0])
array([[ True, False, False],
[False, True, False],
[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
>>> data[_]
array([ 2, 5, 6, 10, 14])
The mask to get the values for the second column of out: np.arange(3) == np.vstack(c[:,1]).
So, to get the out array...
>>> mask0 = np.arange(3) == np.vstack(c[:,0])
>>> mask1 = np.arange(3) == np.vstack(c[:,1])
>>> np.vstack((data[mask0], data[mask1])).T
array([[ 0, 2],
[ 4, 5],
[ 6, 6],
[10, 10],
[14, 14]])
Edit: Given arbitrary array widths K and N you could use a loop to create the masks, so the general construction of the out array might simply look like this:
np.vstack([data[np.arange(N) == np.vstack(c[:,i])] for i in range(K)]).T
Edit 2: A slightly neater solution (though still relying on a loop) is:
np.vstack([data[i][c[i]] for i in range(T)])