Extracting multiple sets of rows/ columns from a 2D numpy array

Extracting multiple sets of rows/ columns from a 2D numpy array - python

I have a 2D numpy array from which I want to extract multiple sets of rows/ columns.
# img is 2D array
img = np.arange(25).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
I know the syntax to extract one set of row/ column. The following will extract the first 4 rows and the 3rd and 4th column as shown below
img[0:4, 2:4]
array([[ 2, 3],
[ 7, 8],
[12, 13],
[17, 18]])
However, what is the syntax if I want to extract multiple sets of rows and/or columns? I tried the following but it leads to an invalid syntax error
img[[0,2:4],2]
The output that I am looking for from the above command is
array([[ 2],
[12],
[17]])
I tried searching for this but it only leads to results for one set of rows/ columns or extracting discrete rows/ columns which I know how to do, like using np.ix.
For context, the 2D array that I am actually dealing with has the dimensions ~800X1200, and from this array I want to extract multiple ranges of rows and columns in one go. So something like img[[0:100, 120:210, 400, 500:600], [1:450, 500:550, 600, 700:950]].

IIUC, you can use numpy.r_ to generate the indices from the slice:
img[np.r_[0,2:4][:,None],2]
output:
array([[ 2],
[12],
[17]])
intermediates:
np.r_[0,2:4]
# array([0, 2, 3])
np.r_[0,2:4][:,None] # variant: np.c_[np.r_[0,2:4]]
# array([[0],
# [2],
# [3]])

You can create your slices with numpy.r_:
np.r_[0:2, 4]
# array([0,1,4])
Then you can get the specific rows and columns as follows:
rows = np.r_[0:2, 4]
cols = np.r_[0, 2:4]
img[rows][:, cols]
# array([[ 0, 2, 3],
# [ 5, 7, 8],
# [20, 22, 23]])

Related

Stats Model contingency table nd array 2 x 2 x k , cannot reshape

Consider below list of 2x2 tables and CMH(Cochran–Mantel–Haenszel) test results. We are trying to determine if each specific centre was accociated with the sucess of the treatment [Data from Agresti, Categorical Data Analysis, second edition]
tables= [
[[11, 10], [25, 27]],
[[16, 22], [4, 10]],
[[14, 7], [5, 12]],
[[2, 1], [14, 16]],
[[6, 0], [11, 12]],
[[1, 0], [10, 10]],
[[1, 1], [4, 8]],
[[4, 6], [2, 1]]]
cmh = sm.stats.contingency_tables.StratifiedTable(tables = tables)
print(cmh.test_null_odds())
pvalue ~ 0.012
statistic ~ 6.38
The tables parameters in StratifiedTable can also take a numpy array shape 2 x 2 x k, where k is a slice return each of the contingency tables.
I've been unable to wrap my head around the array reshaping, this based on the above 8, 2, 2 shape the list of lists can more intuitively offer (at least for me).
Any toughts on how to re run this same test with a nd array?
UPDATE: I've tried to reshape my tables var in numpy as suggested in comment below to a nd array 2 x 2 x k , with a transpose. The below TypeError is rasied when running the same test with
TypeError: No loop matching the specified signature and casting was found for ufunc true_divide
Note: in R the following matrix would return the desired output
data = array (c(11, 10, 25, 27, 16, 22, 4, 10,
14, 7, 5, 12, 2, 1, 14, 16,
6, 0, 11, 12, 1, 0, 10, 10,
1, 1, 4, 8, 4, 6, 2, 1),
c(2,2,8))
mantelhaen.test(data, correct=F)

Just referencing #Josef comment as the answer. I missed/ not accounted for a dtype convertion.
Your example worked for me with the transpose, .T. It looks like you have a separate problem with the dtype. Use float: tables = np.asarray(tables).T.astype(float) This was recently fixed github.com/statsmodels/statsmodels/pull/7279

Concatenate NumPy 2D array with column (1D array)

Suppose I have a 2D NumPy array values. I want to add new column to it. New column should be values[:, 19] but lagged by one sample (first element equals to zero). It could be returned as np.append([0], values[0:-2:1, 19]). I tried: Numpy concatenate 2D arrays with 1D array
temp = np.append([0], [values[1:-2:1, 19]])
values = np.append(dataset.values, temp[:, None], axis=1)
but I get:
ValueError: all the input array dimensions except for the concatenation axis
must match exactly
I tried using c_ too as:
temp = np.append([0], [values[1:-2:1, 19]])
values = np.c_[values, temp]
but effect is the same. How this concatenation could be made. I think problem is in temp orientation - it is treated as a row instead of column, so there is an issue with dimensions. In Octave ' (transpose operator) would do the trick. Maybe there is similiar solution in NumPy?
Anyway, thank you for you time.
Best regards,
Max

In [76]: values = np.arange(16).reshape(4,4)
In [77]: temp = np.concatenate(([0], values[1:,-1]))
In [78]: values
Out[78]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [79]: temp
Out[79]: array([ 0, 7, 11, 15])
This use of concatenate to make temp is similar to your use of append (which actually uses concatenate).
Sounds like you want to join values and temp in this way:
In [80]: np.concatenate((values, temp[:,None]),axis=1)
Out[80]:
array([[ 0, 1, 2, 3, 0],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11],
[12, 13, 14, 15, 15]])
Again I prefer using concatenate directly.

You need to convert the 1D array to 2D as shown. You can then use vstack or hstack with reshaping to get the final array you want as shown:
a = np.array([[1, 2, 3],[4, 5, 6]])
b = np.array([[7, 8, 9]])
c = np.vstack([ele for ele in [a, b]])
print(c)
c = np.hstack([a.reshape(1,-1) for a in [a,b]]).reshape(-1,3)
print(c)
Either way, the output is:
[[1 2 3] [4 5 6] [7 8 9]]
Hope I understood the question correctly

Using python how to access and do arthematic operations on 'n' no.of segments in an array if their coordinates are available?

The following example illustartes my question clearly :
suppose their is an array 'arr'
>>import numpy as np
>>from skimage.util.shape import view_as_blocks
>>arr=np.array([[1,2,3,4,5,6,7,8],[1,2,3,4,5,6,7,8],[9,10,11,12,13,14,15,16],[17,18,19,20,21,22,23,24]])
>>arr
array([[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16],
[17, 18, 19, 20, 21, 22, 23, 24]])
I segmented this array in to 2*2 blocks using :
>>img= view_as_blocks(arr, block_shape=(2,2))
>>img
array([[[[ 1, 2],
[ 1, 2]],
[[ 3, 4],
[ 3, 4]],
[[ 5, 6],
[ 5, 6]],
[[ 7, 8],
[ 7, 8]]],
[[[ 9, 10],
[17, 18]],
[[11, 12],
[19, 20]],
[[13, 14],
[21, 22]],
[[15, 16],
[23, 24]]]])
I have an other array "cor"
>>cor
(array([0, 1, 1], dtype=int64), array([2, 1, 3], dtype=int64))
In "cor" the 1st array ([0,1,1]) gives the coordinates of rows and 2nd array ([2,1,3]) gives the coordinates of corresponding columns in sequential order.
Now my work is to access segments of img whose positional coordinates are [0,2],[1,1]and [1,3] (taken from "cor". x from 1st array and corresponding y from 2nd array) automatically by reading "cor".
In the above example
img[0,2]= [[ 5, 6], img[1,1]= [[11, 12], img[1,3]=[[15, 16],
[ 5, 6]], [19, 20]] [23, 24]]
then find the mean value of each segment seperately.
ie. img[0,2]=5.5 img[1,1]=15.5 img[1,3]=19.5
Now, check if its mean values are less than the mean vlaue of whole array "img".
Here, mean value of img is 10.5. hence only mean value of img[0,2] is less than 10.5.
Therefore finally return coordinate of segment img[0,2] ie [0,2] as output in sequential order if more segments exists in any other big array.
##expected output for above example:
[0,2]

We simply need to index with cor and perform those mean computations (along last two axes) and check -
# Convert to array format
In [229]: cor = np.asarray(cor)
# Index into `img` with tuple version of `cor`, so that we get all the
# blocks in one go and then compute mean along last two axes i.e. 1,2.
# Then compare against global mean - `img.mean()` to give us a valid
# mask. Then index into columns of `cor with it, to give us a slice of
# valid `cor`. Finally transpose, so that we get per row valid indices set.
In [254]: cor[:,img[tuple(cor)].mean((1,2))<img.mean()].T
Out[254]: array([[0, 2]])
Another way to set it up, would be to split up the indices -
In [235]: r,c = cor
In [236]: v = img[r,c].mean((1,2))<img.mean() # or img[cor].mean((1,2))<img.mean()
In [237]: r[v],c[v]
Out[237]: (array([0]), array([2]))
Same as first approach, with the only difference of using split indices to index into cor and getting the final indices.
Or a compact version -
In [274]: np.asarray(cor).T[img[cor].mean((1,2))<img.mean()]
Out[274]: array([[0, 2]])
In this solution, we are directly feeding in the original tuple version of cor, rest being same as approach#1.

How to split numpy array vertically from any column index

I want to split a numpy array into two subarrays where the splitting point is based on a column id, i.e., vertical split. For instance, if I generate a numpy array of shape [10,16] and I want to create two subarrays by splitting it from the column's index 11, then I should get one subarray of size [10,10] and the other one is from [10,15]. Therefore, I am following numpy.hsplit here but it seems it only does an even split (the subarrays need to be equal). I want to be able to:
Split any numpy array vertically, no matter what is the size of subarrays.
Extract both subarrays.
To simulate my request, the following is my code:
import numpy as np
C = [[1,2,3,4],[5,6,7,8],[9,10,11,12], [13,14,15,16]]
C = np.asarray(C)
C = np.hsplit(C, 3)
print(C)
As you can see, np.hsplit(C, 3) doesn't work unless the splitting generates similar subarrays. Even if I did np.hsplit(C, 2), I don't know how to extract both subarrays into separate numpy arrays.
To achieve my goals, how can I modify this code?

Use the array indexing.
C[:,:3] # All rows , columns 0 to 2
Out[29]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11],
[13, 14, 15]])
C[:,3:] # All rows column 3 (to end in this case also 3).
Out[30]:
array([[ 4],
[ 8],
[12],
[16]])

You need to specify the indices as list:
import numpy as np
C = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]
C = np.asarray(C)
C = np.hsplit(C, [3])
print(C)
Output
[array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11],
[13, 14, 15]]), array([[ 4],
[ 8],
[12],
[16]])]

When getting an ROI from a numpy array (opencv image) why does img[y0:y1, x0:x1] seem to use an inconsistent range of indicies?

OpenCV uses numpy arrays in order to store image data. In this question and accepted answer I was told that to access a subregion of interest in an image, I could use the form roi = img[y0:y1, x0:x1].
I am confused because when I create an numpy array in the terminal and test, I don't seem to be getting this behavior. Below I want to get the roi [[6,7], [11,12]], where y0 = index 1, y1 = index 2, and x0 = index 0, x1 = index 1.
Why then do I get what I want only with arr[1:3, 0:2]? I expected to get it with arr[1:2, 0:1].
It seems that when I slice an n-by-n ndarray[a:b, c:d], a and c are the expected range of indicies 0..n-1, but b and d are indicies ranging 1..n.

In your posted example numpy and cv2 are working as expected. Indexing or Slicing in numpy, just as in python in general, is 0 based and of the form [a, b), i.e. not including b.
Recreate your example:
>>> import numpy as np
>>> arr = np.arange(1,26).reshape(5,5)
>>> arr
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
So the statement arr[1:2, 0:1] means get the value(s) at row=1 (row 1 up to but not including 2) and column=0 (we expect 6):
>>> arr[1:2, 0:1]
array([[6]])
Similarly for arr[1:3, 0:2] we expect rows 1,2 and columns 0,1:
>>> arr[1:3, 0:2]
array([[ 6, 7],
[11, 12]])
So if what you want is the region [[a, b], [c, d]] to include b and d, what you really need is:
[[a, b+1], [c, d+1]]
Further examples:
Suppose you need all columns but just rows 0 and 1:
>>> arr[:2, :]
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
Here arr[:2, :] means all rows up to, but not including 2, followed by all columns :.
Suppose you want every other column, starting at column index 0 (and all rows):
>>> arr[:, ::2]
array([[ 1, 3, 5],
[ 6, 8, 10],
[11, 13, 15],
[16, 18, 20],
[21, 23, 25]])
where ::2 follows the start:stop:step notation (where stop is not inclusive).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting multiple sets of rows/ columns from a 2D numpy array - python

IIUC, you can use numpy.r_ to generate the indices from the slice: img[np.r_[0,2:4][:,None],2] output: array([[ 2], [12], [17]]) intermediates: np.r_[0,2:4] # array([0, 2, 3]) np.r_[0,2:4][:,None] # variant: np.c_[np.r_[0,2:4]] # array([[0], # [2], # [3]])

You can create your slices with numpy.r_: np.r_[0:2, 4] # array([0,1,4]) Then you can get the specific rows and columns as follows: rows = np.r_[0:2, 4] cols = np.r_[0, 2:4] img[rows][:, cols] # array([[ 0, 2, 3], # [ 5, 7, 8], # [20, 22, 23]])

Related

Stats Model contingency table nd array 2 x 2 x k , cannot reshape

Concatenate NumPy 2D array with column (1D array)

Using python how to access and do arthematic operations on 'n' no.of segments in an array if their coordinates are available?

How to split numpy array vertically from any column index

When getting an ROI from a numpy array (opencv image) why does img[y0:y1, x0:x1] seem to use an inconsistent range of indicies?

Categories

Resources