Finding indices of runs of integers in an array - python

I have a numpy array with integers as entries. I need to find the indices corresponding to runs of integers in the array. For example if my array is a = [1, 2, 3, 6, 7, 8], then the indices I want are [[0, 1, 2], [3, 4, 5]]. I have tried the following piece of code which uses a loop.
import numpy as np
idx = np.array([0, 1, 2, 3, 8, 9, 10, 11, 12, 2, 3, 4, 5, 6, 55, 56, 89, 90, 91, 92])
idx_cpy = idx
indices = []
while len(idx) > 0:
sorted_idx = np.arange(idx[0], idx.max()+1)[:len(idx)]
bool_equal = np.equal(sorted_idx, idx)
true_idx = np.argwhere(bool_equal == True)[:, 0]
try:
indices.append(np.array(indices[-1]).max() + 1 + true_idx)
except IndexError:
indices.append(true_idx)
idx = idx[true_idx.max()+1:]
"""
indices =
[array([0, 1, 2, 3], dtype=int64),
array([4, 5, 6, 7, 8], dtype=int64),
array([ 9, 10, 11, 12, 13], dtype=int64),
array([14, 15], dtype=int64),
array([16, 17, 18, 19], dtype=int64)]
"""
Although this works as expected, the variable idx in my actual code has large length and this takes a long time to complete. Is there a vectorized way of doing this? Thanks.

You can find the "turning points" with np.diff and check where it differs from 1. To include end points, we pass prepend and append to it such that the difference isn't 1 in those places and they get counted too. Then a list comprehension with np.arange gives the final result:
>>> turnings, = np.where(np.diff(a, prepend=a[0], append=a[-1]) != 1)
>>> turnings
array([ 0, 4, 9, 14, 16, 20], dtype=int64)
>>> result = [np.arange(pre, nex) for pre, nex in zip(turnings, turnings[1:])]
>>> result
[array([0, 1, 2, 3], dtype=int64),
array([4, 5, 6, 7, 8], dtype=int64),
array([ 9, 10, 11, 12, 13], dtype=int64),
array([14, 15], dtype=int64),
array([16, 17, 18, 19], dtype=int64)]

Related

Select non-consecutive row and column indices from 2d numpy array

I have an array a
a = np.arange(5*5).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
and want to select the last two columns from row one and two, and the first two columns of row three and four.
The result should look like this
array([[3, 4, 10, 11],
[8, 9, 15, 16]])
How to do that in one go without indexing twice and concatenation?
I tried using take
a.take([[0,1,2,3], [3,4,0,1]])
array([[0, 1, 2, 3],
[3, 4, 0, 1]])
ix_
a[np.ix_([0,1,2,3], [3,4,0,1])]
array([[ 3, 4, 0, 1],
[ 8, 9, 5, 6],
[13, 14, 10, 11],
[18, 19, 15, 16]])
and r_
a[np.r_[0:2, 2:4], np.r_[3:5, 0:2]]
array([ 3, 9, 10, 16])
and a combination of ix_ and r_
a[np.ix_([0,1,2,3], np.r_[3:4, 0:1])]
array([[ 3, 0],
[ 8, 5],
[13, 10],
[18, 15]])
Using integer advanced indexing, you can do something like this
index_rows = np.array([
[0, 0, 2, 2],
[1, 1, 3, 3],
])
index_cols = np.array([
[-2, -1, 0, 1],
[-2, -1, 0, 1],
])
a[index_rows, index_cols]
where you just select directly what elements you want.

Partition array into N random chunks of different sizes with Numpy

With numpy.array_splits, you can split an array into equal size chunks. Is there a way to split it into chunks based on a list?
How do I split this array into 4 chunks, with each chunk determined by the size of the chunk given in chunk_size, and consisting of random values from the array?
import numpy as np
np.random.seed(13)
a = np.arange(20)
chunk_size = [10, 5, 3, 2]
dist = [np.random.choice(a, c) for c in chunk_size]
print(dist)
but I get multiple duplications, as expected:
[array([18, 16, 10, 16, 6, 2, 12, 3, 2, 14]),
array([ 5, 13, 10, 9, 11]), array([ 2, 0, 19]), array([19, 11])]
For example,
16 is contained twice in the first chunks
10 is contained in the first and second chunk
With np.split, this is the answer I get:
>>> for s in np.split(a, chunk_size):
... print(s.shape)
...
(10,)
(0,)
(0,)
(0,)
(18,)
With np.random.choice and replace=False, still gives duplicate elements:
import numpy as np
np.random.seed(13)
a = np.arange(20)
chunk_size = [10, 5, 3, 2]
dist = [np.random.choice(a, c, replace=False) for c in chunk_size]
print(dist)
While each chunk now does not contain duplicates, it does not prevent that, for example, 7 is contained in both the first and second chunk:
[array([11, 12, 0, 1, 8, 5, 7, 15, 14, 13]),
array([16, 7, 13, 9, 19]), array([1, 4, 2]), array([15, 12])]
One way to ensure that every element of a is contained in exactly one chunk would be to create a random permutation of a first and then split it with np.split.
In order to get an array of splitting indices for np.split from chunk_size you can use np.cumsum.
Example
>>> import numpy as np
>>> np.random.seed(13)
>>> a = np.arange(20)
>>> b = np.random.permutation(a)
>>> b
array([11, 12, 0, 1, 8, 5, 7, 15, 14, 13,
3, 17, 9, 4, 2, 6, 19, 10, 16, 18])
>>> chunk_size = [10, 5, 3, 2]
>>> np.cumsum(chunk_size)
array([10, 15, 18, 20])
>>> np.split(b, np.cumsum(chunk_size))
[array([11, 12, 0, 1, 8, 5, 7, 15, 14, 13]),
array([ 3, 17, 9, 4, 2]), array([ 6, 19, 10]), array([16, 18]),
array([], dtype=int64)]
You could avoid the trailing empty array by omitting the last value in chunk_size, as it is implied by the size of a and the sum of the previous values:
>>> np.split(b, np.cumsum(chunk_size[:-1])) # [10, 5, 3] -- 2 is implied
[array([11, 12, 0, 1, 8, 5, 7, 15, 14, 13]),
array([ 3, 17, 9, 4, 2]), array([ 6, 19, 10]), array([16, 18])]
Thanks to Divakar
import numpy as np
np.random.seed(13)
dist = np.arange(0, 3286, 1)
chunk_size = [975, 708, 515, 343, 269, 228, 77, 57, 42, 33, 11, 9, 7, 4, 3, 1, 1, 1, 1, 1]
dist = [np.random.choice(dist,_, replace=False) for _ in chunk_size]

Matrix to Vector with python/numpy

Numpy ravel works well if I need to create a vector by reading by rows or by columns. However, I would like to transform a matrix to a 1d array, by using a method that is often used in image processing. This is an example with initial matrix A and final result B:
A = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
B = np.array([[ 0, 1, 4, 8, 5, 2, 3, 6, 9, 12, 13, 10, 7, 11, 14, 15])
Is there an existing function already that could help me with that? If not, can you give me some hints on how to solve this problem? PS. the matrix A is NxN.
I've been using numpy for several years, and I've never seen such a function.
Here's one way you could do it (not necessarily the most efficient):
In [47]: a
Out[47]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [48]: np.concatenate([np.diagonal(a[::-1,:], k)[::(2*(k % 2)-1)] for k in range(1-a.shape[0], a.shape[0])])
Out[48]: array([ 0, 1, 4, 8, 5, 2, 3, 6, 9, 12, 13, 10, 7, 11, 14, 15])
Breaking down the one-liner into separate steps:
a[::-1, :] reverses the rows:
In [59]: a[::-1, :]
Out[59]:
array([[12, 13, 14, 15],
[ 8, 9, 10, 11],
[ 4, 5, 6, 7],
[ 0, 1, 2, 3]])
(This could also be written a[::-1] or np.flipud(a).)
np.diagonal(a, k) extracts the kth diagonal, where k=0 is the main diagonal. So, for example,
In [65]: np.diagonal(a[::-1, :], -3)
Out[65]: array([0])
In [66]: np.diagonal(a[::-1, :], -2)
Out[66]: array([4, 1])
In [67]: np.diagonal(a[::-1, :], 0)
Out[67]: array([12, 9, 6, 3])
In [68]: np.diagonal(a[::-1, :], 2)
Out[68]: array([14, 11])
In the list comprehension, k gives the diagonal to be extracted. We want to reverse the elements in every other diagonal. The expression 2*(k % 2) - 1 gives the values 1, -1, 1, ... as k varies from -3 to 3. Indexing with [::1] leaves the order of the array being indexed unchanged, and indexing with [::-1] reverses the order of the array. So np.diagonal(a[::-1, :], k)[::(2*(k % 2)-1)] gives the kth diagonal, but with every other diagonal reversed:
In [71]: [np.diagonal(a[::-1,:], k)[::(2*(k % 2)-1)] for k in range(1-a.shape[0], a.shape[0])]
Out[71]:
[array([0]),
array([1, 4]),
array([8, 5, 2]),
array([ 3, 6, 9, 12]),
array([13, 10, 7]),
array([11, 14]),
array([15])]
np.concatenate() puts them all into a single array:
In [72]: np.concatenate([np.diagonal(a[::-1,:], k)[::(2*(k % 2)-1)] for k in range(1-a.shape[0], a.shape[0])])
Out[72]: array([ 0, 1, 4, 8, 5, 2, 3, 6, 9, 12, 13, 10, 7, 11, 14, 15])
I found discussion of zigzag scan for MATLAB, but not much for numpy. One project appears to use a hardcoded indexing array for 8x8 blocks
https://github.com/lot9s/lfv-compression/blob/master/scripts/our_mpeg/zigzag.py
ZIG = np.array([[0, 1, 5, 6, 14, 15, 27, 28],
[2, 4, 7, 13, 16, 26, 29, 42],
[3, 8, 12, 17, 25, 30, 41, 43],
[9, 11, 18, 24, 31, 40, 44,53],
[10, 19, 23, 32, 39, 45, 52,54],
[20, 22, 33, 38, 46, 51, 55,60],
[21, 34, 37, 47, 50, 56, 59,61],
[35, 36, 48, 49, 57, 58, 62,63]])
Apparently it's used jpeg and mpeg compression.

Indexing a 2d array with a 3d array in numpy

I have two arrays.
"a", a 2d numpy array.
import numpy.random as npr
a = array([[5,6,7,8,9],[10,11,12,14,15]])
array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 14, 15]])
"idx", a 3d numpy array constituting three index variants I want to use to index "a".
idx = npr.randint(5, size=(nsamp,shape(a)[0], shape(a)[1]))
array([[[1, 2, 1, 3, 4],
[2, 0, 2, 0, 1]],
[[0, 0, 3, 2, 0],
[1, 3, 2, 0, 3]],
[[2, 1, 0, 1, 4],
[1, 1, 0, 1, 0]]])
Now I want to index "a" three times with the indices in "idx" to obtain an object as follows:
array([[[6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
The naive "a[idx]" does not work. Any ideas as to how to do this? (I use Python 3.4 and numpy 1.9)
You can use choose to make the selection from a:
>>> np.choose(idx, a.T[:,:,np.newaxis])
array([[[ 6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[ 5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[ 7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
As you can see, a has to be reshaped from an array with shape (2, 5) to an array with shape (5, 2, 1) first. This is essentially so that it is broadcastable with idx, which has shape (3, 2, 5).
(I learned this method from #immerrr's answer here: https://stackoverflow.com/a/26225395/3923281)
You can use take array method:
import numpy
a = numpy.array([[5,6,7,8,9],[10,11,12,14,15]])
idx = numpy.random.randint(5, size=(3, a.shape[0], a.shape[1]))
print a.take(idx)

Obtain a subset of 2-D indices using numpy

I have 2-D data containing bad values (0 indicates bad). My goal is to replace each bad value with its nearest neighbor that isn't bad.
SciPy's NearestNDInterpolator seems like a nice way to do this. In the 2-D case, it accepts a (number of points) x 2 array of indices and a (number of points) x 1 array of corresponding values to interpolate from.
So, I need to get a subset of the indices and values: those that are "good." The code below achieves this, but coordinates = array(list(ndindex(n_y, n_x))) and where(values != 0)[0] are both messy. Is there a cleaner way to do this?
# n_y and n_x are the number of points along each dimension.
coordinates = array(list(ndindex(n_y, n_x)))
values = data.flatten()
nonzero_ind = where(values != 0)[0]
nonzero_coordinates = coordinates[nonzero_ind, :]
nonzero_values = values[nonzero_ind]
Thanks.
nonzero_coordinates = np.argwhere(data != 0)
nonzero_values = np.extract(data, data)
or simply:
nonzero_values = data[data!=0]
I initially rather missed the obvious nonzero_values method, but thanks to #askewchan in the comments for that.
So, I need to get a subset of the indices and values: those that are "good."
If you've created a "mask" of the bad indices, you can take the negation of that mask ~ and then find the indices from the mask using np.where. For example:
import numpy as np
# Sample array
Z = np.random.random(size=(5,5))
# Use whatever criteria you have to mark the bad indices
bad_mask = Z<.2
good_mask = ~bad_mask
good_idx = np.where(good_mask)
print good_mask
print good_idx
Gives, as an example:
[[ True  True  True  True False]
 [ True False False  True  True]
 [ True False  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]]
(array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]), array([0, 1, 2, 3, 0, 3, 4, 0, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]))
Another way to approach this problem altogether is to just run your array through an image filter that would automatically 'close' those holes. There is such a filter in scipy.ndimage called grey_closing:
>>> from scipy import ndimage
>>> a = np.arange(1,26).reshape(5,5)
>>> a[2,2] = 0
>>> a
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 0, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
>>> a = np.arange(1,26).reshape(5,5)
>>> ndimage.grey_closing(a, size=2)
array([[ 7, 7, 8, 9, 10],
[ 7, 7, 8, 9, 10],
[12, 12, 13, 14, 15],
[17, 17, 18, 19, 20],
[22, 22, 23, 24, 25]])
But this has unfortunate edge affects (which you can change a bit with the mode paramenter). To avoid this, you could just take the new values from where the original array was 0 and place them into the original array:
>>> np.where(a, a, ndimage.grey_closing(a, size=2))
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 12, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
Alternatively, you could use scikit-image:
>>> from skimage.morphology import closing, square
>>> a = np.arange(1,10, dtype=np.uint8).reshape(3,3)
>>> a[1,1] = 0
>>> a
array([[1, 2, 3],
[4, 0, 6],
[7, 8, 9]], dtype=uint8)
>>> closing(a, square(2))
array([[1, 2, 3],
[4, 4, 6],
[7, 8, 9]], dtype=uint8)
>>> a
array([[1, 2, 3],
[4, 0, 6],
[7, 8, 9]], dtype=uint8)
Give it a as the output array and the closing is done in-place:
>>> closing(a, square(2), a)
>>> a
array([[1, 2, 3],
[4, 4, 6],
[7, 8, 9]], dtype=uint8)
Use a larger square (or any shape from skimage.morphology) if you have big gaps of zeros. The disadvantage of this (aside from the dependency) is that it seems to only work for uint8.

Categories