Slicing multi-dimensional array with another array - python

edited with a clearer example, and included solution
I'd like to slice an arbitrary dimensional array, where I pin the first n dimensions and keep the remaining dimensions. In addition, I'd like to be able to store the n pinning dimensions in a variable. For example
Q = np.arange(24).reshape(2, 3, 4) # array to be sliced
# array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]],
# [[12, 13, 14, 15],
# [16, 17, 18, 19],
# [20, 21, 22, 23]]])
Q[0, 1, ...] # this is what I want manually
# array([4, 5, 6, 7])
# but programmatically:
s = np.array([0, 1])
Q[s, ...] # this doesn't do what I want: it uses both s[0] and s[1] along the 0th dimension of Q
# array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]],
# [[12, 13, 14, 15],
# [16, 17, 18, 19],
# [20, 21, 22, 23]]])
np.take(Q, s) # this unravels the indices and takes the s[i]th elements of Q
# array([0, 1])
Q[tuple(s)] # this works! Thank you kwin
# array([4, 5, 6, 7])
Is there a clean way to do this?

You could do this:
Q[tuple(s)]
Or this:
np.take(Q, s)
Both of these yield array([0.58383736, 0.80486868]).
I'm afraid I don't have a great intuition for exactly why the tuple version of s works differently from indexing with s itself. The other thing I intuitively tried is Q[*s] but that's a syntax error.

I am not sure what output you want but there are several things you can do.
If you want the output to be like this:
array([[[0.46988733, 0.19062458],
[0.69307707, 0.80242129],
[0.36212295, 0.2927196 ],
[0.34043998, 0.87408959],
[0.5096636 , 0.37797475]],
[[0.98322049, 0.00572271],
[0.06374176, 0.98195354],
[0.63195656, 0.44767722],
[0.61140211, 0.58889763],
[0.18344186, 0.9587247 ]]])
Q[list(s)] should work. np.array([Q[i] for i in s]) also works.
If you want the output to be like this:
array([0.58383736, 0.80486868])
Then as #kwinkunks mentioned you could use Q[tuple(s)] or np.take(Q, s)

Related

Stats Model contingency table nd array 2 x 2 x k , cannot reshape

Consider below list of 2x2 tables and CMH(Cochran–Mantel–Haenszel) test results. We are trying to determine if each specific centre was accociated with the sucess of the treatment [Data from Agresti, Categorical Data Analysis, second edition]
tables= [
[[11, 10], [25, 27]],
[[16, 22], [4, 10]],
[[14, 7], [5, 12]],
[[2, 1], [14, 16]],
[[6, 0], [11, 12]],
[[1, 0], [10, 10]],
[[1, 1], [4, 8]],
[[4, 6], [2, 1]]]
cmh = sm.stats.contingency_tables.StratifiedTable(tables = tables)
print(cmh.test_null_odds())
pvalue ~ 0.012
statistic ~ 6.38
The tables parameters in StratifiedTable can also take a numpy array shape 2 x 2 x k, where k is a slice return each of the contingency tables.
I've been unable to wrap my head around the array reshaping, this based on the above 8, 2, 2 shape the list of lists can more intuitively offer (at least for me).
Any toughts on how to re run this same test with a nd array?
UPDATE: I've tried to reshape my tables var in numpy as suggested in comment below to a nd array 2 x 2 x k , with a transpose. The below TypeError is rasied when running the same test with
TypeError: No loop matching the specified signature and casting was found for ufunc true_divide
Note: in R the following matrix would return the desired output
data = array (c(11, 10, 25, 27, 16, 22, 4, 10,
14, 7, 5, 12, 2, 1, 14, 16,
6, 0, 11, 12, 1, 0, 10, 10,
1, 1, 4, 8, 4, 6, 2, 1),
c(2,2,8))
mantelhaen.test(data, correct=F)
Just referencing #Josef comment as the answer. I missed/ not accounted for a dtype convertion.
Your example worked for me with the transpose, .T. It looks like you have a separate problem with the dtype. Use float: tables = np.asarray(tables).T.astype(float) This was recently fixed github.com/statsmodels/statsmodels/pull/7279

Using python how to access and do arthematic operations on 'n' no.of segments in an array if their coordinates are available?

The following example illustartes my question clearly :
suppose their is an array 'arr'
>>import numpy as np
>>from skimage.util.shape import view_as_blocks
>>arr=np.array([[1,2,3,4,5,6,7,8],[1,2,3,4,5,6,7,8],[9,10,11,12,13,14,15,16],[17,18,19,20,21,22,23,24]])
>>arr
array([[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16],
[17, 18, 19, 20, 21, 22, 23, 24]])
I segmented this array in to 2*2 blocks using :
>>img= view_as_blocks(arr, block_shape=(2,2))
>>img
array([[[[ 1, 2],
[ 1, 2]],
[[ 3, 4],
[ 3, 4]],
[[ 5, 6],
[ 5, 6]],
[[ 7, 8],
[ 7, 8]]],
[[[ 9, 10],
[17, 18]],
[[11, 12],
[19, 20]],
[[13, 14],
[21, 22]],
[[15, 16],
[23, 24]]]])
I have an other array "cor"
>>cor
(array([0, 1, 1], dtype=int64), array([2, 1, 3], dtype=int64))
In "cor" the 1st array ([0,1,1]) gives the coordinates of rows and 2nd array ([2,1,3]) gives the coordinates of corresponding columns in sequential order.
Now my work is to access segments of img whose positional coordinates are [0,2],[1,1]and [1,3] (taken from "cor". x from 1st array and corresponding y from 2nd array) automatically by reading "cor".
In the above example
img[0,2]= [[ 5, 6], img[1,1]= [[11, 12], img[1,3]=[[15, 16],
[ 5, 6]], [19, 20]] [23, 24]]
then find the mean value of each segment seperately.
ie. img[0,2]=5.5 img[1,1]=15.5 img[1,3]=19.5
Now, check if its mean values are less than the mean vlaue of whole array "img".
Here, mean value of img is 10.5. hence only mean value of img[0,2] is less than 10.5.
Therefore finally return coordinate of segment img[0,2] ie [0,2] as output in sequential order if more segments exists in any other big array.
##expected output for above example:
[0,2]
We simply need to index with cor and perform those mean computations (along last two axes) and check -
# Convert to array format
In [229]: cor = np.asarray(cor)
# Index into `img` with tuple version of `cor`, so that we get all the
# blocks in one go and then compute mean along last two axes i.e. 1,2.
# Then compare against global mean - `img.mean()` to give us a valid
# mask. Then index into columns of `cor with it, to give us a slice of
# valid `cor`. Finally transpose, so that we get per row valid indices set.
In [254]: cor[:,img[tuple(cor)].mean((1,2))<img.mean()].T
Out[254]: array([[0, 2]])
Another way to set it up, would be to split up the indices -
In [235]: r,c = cor
In [236]: v = img[r,c].mean((1,2))<img.mean() # or img[cor].mean((1,2))<img.mean()
In [237]: r[v],c[v]
Out[237]: (array([0]), array([2]))
Same as first approach, with the only difference of using split indices to index into cor and getting the final indices.
Or a compact version -
In [274]: np.asarray(cor).T[img[cor].mean((1,2))<img.mean()]
Out[274]: array([[0, 2]])
In this solution, we are directly feeding in the original tuple version of cor, rest being same as approach#1.

Connecting two numpy array channels

I have two numpy arrays, for example:
a = [[1,2,3],[4,5,6],[7,8,9]]
b = [[11,12,13],[14,15,16],[17,18,19]]
Which are channels of the same image. I would like to get the "connected" channels array in a as pythonic way as possible. wanted outcome:
c = [[[1,11],[2,12],[3,13]],
[[4,14],[5,15],[6,16]],
[[7,17],[8,18],[9,19]]]
What Iv'e tried:
I created an array of the same size and looped over both the source array to connect them.
for x in range(len(a)):
for y in range(len(a[x])):
c[x][y] = [a[x][y],b[x][y]]
What I need: I would love to find a more efficient, modular and pythonic way of implementing this.
You can use np.stack on the second axis:
>>> np.stack((a,b),axis=2)
array([[[ 1, 11],
[ 2, 12],
[ 3, 13]],
[[ 4, 14],
[ 5, 15],
[ 6, 16]],
[[ 7, 17],
[ 8, 18],
[ 9, 19]]])
Checking that it's the same as your c array:
c = np.array([[[1,11],[2,12],[3,13]],
[[4,14],[5,15],[6,16]],
[[7,17],[8,18],[9,19]]])
>>> (c == np.stack((a,b),axis=2)).all()
True
This is dstack. You mention this is an image, and from the docs:
This is a simple way to stack 2D arrays (images) into a single 3D array for processing.
np.dstack((a, b))
array([[[ 1, 11],
[ 2, 12],
[ 3, 13]],
[[ 4, 14],
[ 5, 15],
[ 6, 16]],
[[ 7, 17],
[ 8, 18],
[ 9, 19]]])
Minor note, the docs also state that concatenate, and stack should be preferred, as they are more general.

Find cumsum of subarrays split by indices for numpy array efficiently

Given an array 'array' and a set of indices 'indices', how do I find the cumulative sum of the sub-arrays formed by splitting the array along those indices in a vectorized manner?
To clarify, suppose I have:
>>> array = np.arange(20)
>>> array
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
indices = np.arrray([3, 8, 14])
The operation should output:
array([0, 1, 3, 3, 7, 12, 18, 25, 8, 17, 27, 38, 50, 63, 14, 29, 45, 62, 80, 99])
Please note that the array is very big (100000 elements) and so, I need a vectorized answer. Using any loops would slow it down considerably.
Also, if I had the same problem, but a 2D array and corresponding indices, and I need to do the same thing for each row in the array, how would I do it?
For the 2D version:
>>>array = np.arange(12).reshape((3,4))
>>>array
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> indices = np.array([[2], [1, 3], [1, 2]])
The output would be:
array([[ 0, 1, 3, 3],
[ 4, 9, 6, 13],
[ 8, 17, 10, 11]])
To clarify: Every row will be split.
You can introduce differentiation of originally cumulatively summed array at indices positions to create a boundary like effect at those places, such that when the differentiated array is cumulatively summed, gives us the indices-stopped cumulatively summed output. This might feel a bit contrived at first-look, but stick with it, try with other samples and hopefully would make sense! The idea is very similar to the one applied in this other MATLAB solution. So, following such a philosophy here's one approach using numpy.diff along with cumulative summation -
# Get linear indices
n = array.shape[1]
lidx = np.hstack(([id*n+np.array(item) for id,item in enumerate(indices)]))
# Get successive differentiations
diffs = array.cumsum(1).ravel()[lidx] - array.ravel()[lidx]
# Get previous group's offsetted summations for each row at all
# indices positions across the entire 2D array
_,idx = np.unique(lidx/n,return_index=True)
offsetted_diffs = np.diff(np.append(0,diffs))
offsetted_diffs[idx] = diffs[idx]
# Get a copy of input array and place previous group's offsetted summations
# at indices. Then, do cumulative sum which will create a boundary like
# effect with those offsets at indices positions.
arrayc = array.copy()
arrayc.ravel()[lidx] -= offsetted_diffs
out = arrayc.cumsum(1)
This should be an almost vectorized solution, almost because even though we are calculating linear indices in a loop, but since it's not the computationally intensive part here, so it's effect on the total runtime would be minimal. Also, you can replace arrayc with array if you don't care about destructing the input for saving on memory.
Sample input, output -
In [75]: array
Out[75]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23]])
In [76]: indices
Out[76]: array([[3, 6], [4, 7], [5]], dtype=object)
In [77]: out
Out[77]:
array([[ 0, 1, 3, 3, 7, 12, 6, 13],
[ 8, 17, 27, 38, 12, 25, 39, 15],
[16, 33, 51, 70, 90, 21, 43, 66]])
You can use np.split to split your array along the indices then using python built in function map apply the np.cumsum() to your sub arrays. And at the end by using np.hstack convert the result to an integrated array:
>>> np.hstack(map(np.cumsum,np.split(array,indices)))
array([ 0, 1, 3, 3, 7, 12, 18, 25, 8, 17, 27, 38, 50, 63, 14, 29, 45,
62, 80, 99])
Note that since map is a built in function in python and has been implemented in C inside the Python interpreter it would performs better than a regular loop.1
Here is an alternative for 2D arrays:
>>> def func(array,indices):
... return np.hstack(map(np.cumsum,np.split(array,indices)))
...
>>>
>>> array
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> indices
array([[2], [1, 3], [1, 2]], dtype=object)
>>> np.array([func(arr,ind) for arr,ind in np.array((array,indices)).T])
array([[ 0, 1, 2, 5],
[ 4, 5, 11, 7],
[ 8, 9, 10, 21]])
Note that your expected output is not based on the way that np.split works.
If you want to such results you need to add 1 to your indices :
>>> indices = np.array([[3], [2, 4], [2, 3]], dtype=object)
>>>
>>> np.array([func(arr,ind) for arr,ind in np.array((array,indices)).T])
array([[ 0., 1., 3., 3.],
[ 4., 9., 6., 13.],
[ 8., 17., 10., 11.]])
Due to a comment which said there is not performance difference between using generator expression and map function I ran a benchmark which demonstrates result better.
# Use map
~$ python -m timeit --setup "import numpy as np;array = np.arange(20);indices = np.array([3, 8, 14])" "np.hstack(map(np.cumsum,np.split(array,indices)))"
10000 loops, best of 3: 72.1 usec per loop
# Use generator expression
~$ python -m timeit --setup "import numpy as np;array = np.arange(20);indices = np.array([3, 8, 14])" "np.hstack(np.cumsum(a) for a in np.split(array,indices))"
10000 loops, best of 3: 81.2 usec per loop
Note that this doesn't mean that using map which performs in C speed makes that code preforms in C speed. That's because of that, the code has implemented in python and calling the function (first argument) and applying it on iterable items would take time.

When getting an ROI from a numpy array (opencv image) why does img[y0:y1, x0:x1] seem to use an inconsistent range of indicies?

OpenCV uses numpy arrays in order to store image data. In this question and accepted answer I was told that to access a subregion of interest in an image, I could use the form roi = img[y0:y1, x0:x1].
I am confused because when I create an numpy array in the terminal and test, I don't seem to be getting this behavior. Below I want to get the roi [[6,7], [11,12]], where y0 = index 1, y1 = index 2, and x0 = index 0, x1 = index 1.
Why then do I get what I want only with arr[1:3, 0:2]? I expected to get it with arr[1:2, 0:1].
It seems that when I slice an n-by-n ndarray[a:b, c:d], a and c are the expected range of indicies 0..n-1, but b and d are indicies ranging 1..n.
In your posted example numpy and cv2 are working as expected. Indexing or Slicing in numpy, just as in python in general, is 0 based and of the form [a, b), i.e. not including b.
Recreate your example:
>>> import numpy as np
>>> arr = np.arange(1,26).reshape(5,5)
>>> arr
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
So the statement arr[1:2, 0:1] means get the value(s) at row=1 (row 1 up to but not including 2) and column=0 (we expect 6):
>>> arr[1:2, 0:1]
array([[6]])
Similarly for arr[1:3, 0:2] we expect rows 1,2 and columns 0,1:
>>> arr[1:3, 0:2]
array([[ 6, 7],
[11, 12]])
So if what you want is the region [[a, b], [c, d]] to include b and d, what you really need is:
[[a, b+1], [c, d+1]]
Further examples:
Suppose you need all columns but just rows 0 and 1:
>>> arr[:2, :]
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
Here arr[:2, :] means all rows up to, but not including 2, followed by all columns :.
Suppose you want every other column, starting at column index 0 (and all rows):
>>> arr[:, ::2]
array([[ 1, 3, 5],
[ 6, 8, 10],
[11, 13, 15],
[16, 18, 20],
[21, 23, 25]])
where ::2 follows the start:stop:step notation (where stop is not inclusive).

Categories