Indexing numpy.ndarrays periodically - python

I am trying to access (read/write) numpy.ndarrays periodically. In other words, if I have my_array with the shape of 10*10 and I use the access operator with the inputs:
my_arrray[10, 10] or acess_function(my_array, 10, 10)
I can have access to element
my_array[0, 0].
I want to have read/write ability at my returned element of periodically indexed array.
Can anyone how to do it without making a shifted copy of my original array?

I think this does what you want but I'm not sure whether there's something more elegant that exists. It's probably possible to write a general function for an Nd array but this does 2D only. As you said it uses modular arithmetic.
import numpy as np
def access(shape, ixr, ixc):
""" Returns a selection. """
return np.s_[ixr % shape[0], ixc % shape[1]]
arr = np.arange(100)
arr.shape = 10,10
arr[ access(arr.shape, 45, 87) ]
# 57
arr[access(arr.shape, 45, 87)] = 100
In [18]: arr
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [ 50, 51, 52, 53, 54, 55, 56, **100**, 58, 59],
# [ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [ 70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
Edit - Generic nD version
def access(shape, *args):
if len(shape) != len(args):
error = 'Inconsistent number of dimemsions: {} & number of indices: {} in coords.'
raise IndexError( error.format(len(shape), len(args)))
res = []
for limit, ix in zip(shape, args):
res.append(ix % limit)
return tuple(res)
Usage/Test
a = np.arange(24)
a.shape = 2,3,4
a[access(a.shape, 5, 6, 7)]
# 15
a[access(a.shape, 5,6,7) ] = 100
a
# array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]],
# [[ 12, 13, 14, 100],
# [ 16, 17, 18, 19],
# [ 20, 21, 22, 23]]])

Related

Defining loop for discrete values in Python

The code deletes multiples sets of rows and columns. Is it possible to define a loop with discrete values, here 2,4,8 instead of writing mask[2] = 0, mask[4] = 0, mask[8] = 0?
import numpy as np
x = np.arange(1,101).reshape(10,10)
#print([x])
mask = np.ones(x.shape[0], bool)
mask[2] = 0
mask[4] = 0
mask[8] = 0
print([x[mask,:][:,mask]])
The current and desired output should be the same which is
[array([[ 1, 2, 4, 6, 7, 8, 10],
[ 11, 12, 14, 16, 17, 18, 20],
[ 31, 32, 34, 36, 37, 38, 40],
[ 51, 52, 54, 56, 57, 58, 60],
[ 61, 62, 64, 66, 67, 68, 70],
[ 71, 72, 74, 76, 77, 78, 80],
[ 91, 92, 94, 96, 97, 98, 100]])]

How to transpose every n rows into columns in NumPy?

I have a numpy array of shape (1000000,).
I would like every n=1000 rows to become columns.
The resulting shape should be (1000, 1000)
How can I do this with NumPy? np.transpose() doesn't seem to do what I want.
I don't want to use a for loop for performance reasons.
You can use reshape with the order='F' parameter:
Example with a (100,) 1D array converted to (10,10) 2D array:
a = np.arange(100). # array([0, 1, 2, ..., 98, 99])
b = a.reshape((10,10), order='F')
Output:
>>> b
array([[ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90],
[ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91],
[ 2, 12, 22, 32, 42, 52, 62, 72, 82, 92],
[ 3, 13, 23, 33, 43, 53, 63, 73, 83, 93],
[ 4, 14, 24, 34, 44, 54, 64, 74, 84, 94],
[ 5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
[ 6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
[ 7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
[ 8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
[ 9, 19, 29, 39, 49, 59, 69, 79, 89, 99]])

Numpy array split by pairs of irregular (start, stop)

I have an numpy array with x and y values of points. I have another array which contains pairs of start and end indices. Originally this data was in pandas DataFrame, but since it was over 60 millions items, the loc algorithm was very slow. Is there any numpy fast method to split this?
import numpy as np
xy_array = np.arange(100).reshape(2,-1)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17], [20, 22]]
expected_result = [
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]],
[[10, 11, 12], [60, 61, 62]],
[[13, 14, 15, 16], [63, 64, 65, 66]],
[[20, 21], [70, 71]]
]
Update:
It is not always the case that, next pair will start from end of previous.
This will do it:
import numpy as np
xy_array = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17]]
expected_result = [xy_array[:, x:y] for x, y in split_paris]
expected_result
#[array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]), array([[10, 11, 12],
# [60, 61, 62]]), array([[13, 14, 15, 16],
# [63, 64, 65, 66]])]
It is using index slicing basically working in sense array[rows, columns] having : take all rows and x:y taking columns from x to y.
you can always use the np.array_split function provided by numpy. and use the ranges you want
x = np.arange(8.0)
>>> np.array_split(x, 3)
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]

NumPy nearest value along axis of multidimensional array

I'd like to create a function, that returns the nearest value in the array along a specified axis to a given value.
To get the index of the nearest value I use the following code where arr is a multidimensional array and value is the value to look for:
def nearest_index( arr, value, axis=None ):
return ( np.abs( arr - value ) ).argmin( axis=axis )
But I struggle with using the result of this function to get the values from the array.
It is easy with 1D-arrays:
In [14]: arr_1 = np.random.randint( 10, 100, size=( 10, ) )
In [15]: arr_1
Out[15]: array([67, 49, 90, 29, 60, 80, 31, 55, 29, 10])
In [16]: nearest_index( arr_1, 50 )
Out[16]: 1
In [17]: arr_1[nearest_index( arr_1, 50 )]
Out[17]: 49
or with flattened arrays:
In [25]: arr_3 = np.random.randint( 10, 100, size=( 2, 3, 4, ) )
In [26]: arr_3
Out[26]:
array([[[85, 51, 74, 79],
[63, 42, 27, 75],
[89, 68, 80, 63]],
[[85, 72, 74, 16],
[85, 22, 47, 78],
[44, 70, 98, 34]]])
In [27]: idx_flat = nearest_index( arr_3, 50, axis=None )
In [28]: idx_flat
Out[28]: 1
In [29]: idx = np.unravel_index( idx_flat, arr_3.shape )
In [30]: idx
Out[30]: (0, 0, 1)
In [31]: arr_3[idx]
Out[31]: 51
How can I create a function, that returns the values along the defined axis?
I tried the solution for this Question, but I only got it working for axis=-1.
Note that it is not an issue to me, if only the first occurance of the result is found if multiple elements in the array are equally near the expected value.
For a multi-dimensional array, we need to use advanced-indexing. So, for a generic n-dim array and with a specified axis, we could do something like this -
def argmin_values_along_axis(arr, value, axis):
argmin_idx = np.abs(arr - value).argmin(axis=axis)
shp = arr.shape
indx = list(np.ix_(*[np.arange(i) for i in shp]))
indx[axis] = np.expand_dims(argmin_idx, axis=axis)
return np.squeeze(arr[indx])
Sample runs -
In [203]: arr_3 = np.random.randint( 10, 100, size=( 2, 3, 4, ) )
In [204]: arr_3
Out[204]:
array([[[94, 55, 26, 51],
[82, 66, 80, 66],
[96, 54, 93, 57]],
[[59, 28, 95, 56],
[47, 48, 17, 77],
[15, 57, 57, 25]]])
In [205]: argmin_values_along_axis(arr_3, value=50, axis=0)
Out[205]:
array([[59, 55, 26, 51],
[47, 48, 80, 66],
[15, 54, 57, 57]])
In [206]: argmin_values_along_axis(arr_3, value=50, axis=1)
Out[206]:
array([[82, 54, 26, 51],
[47, 48, 57, 56]])
In [207]: argmin_values_along_axis(arr_3, value=50, axis=2)
Out[207]:
array([[51, 66, 54],
[56, 48, 57]])
Well it works for me.
def nearest_index(arr, value, axis=None):
return np.argmin(np.abs( arr - value ), axis=axis)
>>> X
array([[76, 94, 56, 93, 28, 0, 44, 50, 89, 93],
[80, 99, 29, 98, 39, 27, 55, 70, 19, 76],
[87, 7, 28, 78, 47, 95, 34, 97, 66, 27],
[75, 78, 82, 30, 15, 0, 2, 25, 58, 69],
[31, 2, 34, 1, 56, 7, 87, 78, 32, 77],
[89, 80, 76, 97, 49, 18, 62, 35, 94, 41],
[ 2, 44, 83, 3, 64, 4, 49, 93, 46, 8],
[51, 63, 45, 57, 77, 90, 93, 4, 26, 81],
[43, 92, 22, 98, 93, 36, 46, 25, 35, 36],
[30, 14, 42, 91, 86, 14, 78, 9, 37, 19]])
>>> X[nearest_index(X, 2, axis=0), np.arange(10)]
array([ 2, 2, 22, 1, 15, 0, 2, 4, 19, 8])
>>> X[np.arange(10), nearest_index(X, 2, axis=1)]
array([ 0, 19, 7, 2, 2, 18, 2, 4, 22, 9])

Indexing a numpy array using a numpy array of slices

(Edit: I wrote a solution basing on hpaulj's answer, see code at the bottom of this post)
I wrote a function that subdivides an n-dimensional array into smaller ones such that each of the subdivisions has max_chunk_size elements in total.
Since I need to subdivide many arrays of same shapes and then perform operations on the corresponding chunks, it doesn't actually operate on the data rather than creates an array of "indexers", i. e. an array of (slice(x1, x2), slice(y1, y2), ...) objects (see the code below). With these indexers I can retrieve subdivisions by calling the_array[indexer[i]] (see examples below).
Also, the array of these indexers has same number of dimensions as input and divisions are aligned along corresponding axes, i. e. blocks the_array[indexer[i,j,k]] and the_array[indexer[i+1,j,k]] are adjusent along the 0-axis, etc.
I was expecting that I should also be able to concatenate these blocks by calling the_array[indexer[i:i+2,j,k]] and that the_array[indexer] would return just the_array, however such calls result in an error:
IndexError: arrays used as indices must be of integer (or boolean)
type
Is there a simple way around this error?
Here's the code:
import numpy as np
import itertools
def subdivide(shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object)
for i, el in enumerate(itertools.product(*slices)): result[i] = el
result.shape = np.ceil(shape / slice_shape).astype(int)
return result
Here's an example usage:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> slices = subdivide(ar.shape, 16)
>>> slices
array([[(slice(0, 2, None), slice(0, 6, None)),
(slice(0, 2, None), slice(6, 12, None)),
(slice(0, 2, None), slice(12, 15, None))],
[(slice(2, 4, None), slice(0, 6, None)),
(slice(2, 4, None), slice(6, 12, None)),
(slice(2, 4, None), slice(12, 15, None))],
[(slice(4, 6, None), slice(0, 6, None)),
(slice(4, 6, None), slice(6, 12, None)),
(slice(4, 6, None), slice(12, 15, None))]], dtype=object)
>>> ar[slices[1,0]]
array([[30, 31, 32, 33, 34, 35],
[45, 46, 47, 48, 49, 50]])
>>> ar[slices[0,2]]
array([[12, 13, 14],
[27, 28, 29]])
>>> ar[slices[2,1]]
array([[66, 67, 68, 69, 70, 71],
[81, 82, 83, 84, 85, 86]])
>>> ar[slices[:2,1:3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type
Here's a solution based on hpaulj's answer:
import numpy as np
import itertools
class Subdivision():
def __init__(self, shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
self.slices = \
np.array(list(itertools.product(*slices)), \
dtype=np.object).reshape(tuple(np.ceil(shape / slice_shape).astype(int)) + (len(shape),))
def __getitem__(self, args):
if type(args) != tuple: args = (args,)
# turn integer index into equivalent slice
args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args)
# select the slices
# always select all elements from the last axis (which contains slices for each data dimension)
slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))]
return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \
[0] * (len(slices.shape) - 2 - i) + [i])])] \
for i in range(len(slices.shape) - 1)))
Example usage:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> subdiv = Subdivision(ar.shape, 16)
>>> ar[subdiv[...]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[0]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
>>> ar[subdiv[:2,1]]
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26],
[36, 37, 38, 39, 40, 41],
[51, 52, 53, 54, 55, 56]])
>>> ar[subdiv[2,:3]]
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[...,:2]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]])
Your slices produce 2x6 and 2x3 arrays.
In [36]: subslice=slices[:2,1:3]
In [37]: subslice[0,0]
Out[37]: array([slice(0, 2, None), slice(6, 12, None)], dtype=object)
In [38]: ar[tuple(subslice[0,0])]
Out[38]:
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26]])
My numpy version expects me to turn the subslice into a tuple. This is the same as
ar[slice(0,2), slice(6,12)]
ar[:2, 6:12]
That's just the basic syntax of indexing and slicing. ar is 2d, so ar[(i,j)] requires a 2 element tuple - of slices, lists, arrays, or integers. It won't work with an array of slice objects.
How ever it is possible to concatenate the results into a larger array. That can be done after indexing or the slices can be converted into indexing lists.
np.bmat for example concatenates together a 2d arangement of arrays:
In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]],
[ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]])
Out[42]:
matrix([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[51, 52, 53, 54, 55, 56, 57, 58, 59]])
You could generalize this. It just uses hstack and vstack on the nested lists. The result is np.matrix but can be converted back to array.
The other approach is to use tools like np.arange, np.r_, np.xi_ to create index arrays. It'll take some playing around to generate an example.
To combine the [0,0] and [0,1] subslices:
In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]]
In [65]: i = np.r_[subslice[0,0,0]]
In [66]: i,j
Out[66]: (array([0, 1]), array([ 6, 7, 8, 9, 10, 11, 12, 13, 14]))
In [68]: ix = np.ix_(i,j)
In [69]: ix
Out[69]:
(array([[0],
[1]]), array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14]]))
In [70]: ar[ix]
Out[70]:
array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
Or with i = np.r_[subslice[0,0,0], subslice[1,0,0]], ar[np.ix_(i,j)] produces the 4x9 array.

Categories