Related
The question follows a such:
x = np.arange(100)
Write Python code to split the following array at these intervals: 10, 25, 45, 75, 95
I have used the split function and unable to get at these specific intervals, can anyone enlighten me on another method or am i doing it wrongly?
Here's both the manual way and the numpy way with split.
# Manual method
x = np.arange(100)
split_indices = [10, 25, 45, 75, 95]
split_arrays = []
for i, j in zip([0]+split_indices[:-1], split_indices):
split_arrays.append(x[i:j])
print(split_arrays)
# Numpy method
split_arrays_np = np.split(x, split_indices)
print(split_arrays_np)
And the result is (for both)
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]),
array([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]),
array([45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74]),
array([75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94])
]
I have a numpy array of shape (1000000,).
I would like every n=1000 rows to become columns.
The resulting shape should be (1000, 1000)
How can I do this with NumPy? np.transpose() doesn't seem to do what I want.
I don't want to use a for loop for performance reasons.
You can use reshape with the order='F' parameter:
Example with a (100,) 1D array converted to (10,10) 2D array:
a = np.arange(100). # array([0, 1, 2, ..., 98, 99])
b = a.reshape((10,10), order='F')
Output:
>>> b
array([[ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90],
[ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91],
[ 2, 12, 22, 32, 42, 52, 62, 72, 82, 92],
[ 3, 13, 23, 33, 43, 53, 63, 73, 83, 93],
[ 4, 14, 24, 34, 44, 54, 64, 74, 84, 94],
[ 5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
[ 6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
[ 7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
[ 8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
[ 9, 19, 29, 39, 49, 59, 69, 79, 89, 99]])
I am looking for a way to reshape the following 1d-numpy array:
# dimensions
n = 2 # int : 1 ... N
h = 2 # int : 1 ... N
m = n*(2*h+1)
input_data = np.arange(0,(n*(2*h+1))**2)
The expected output should be reshaped into (2*h+1)**2 blocks of shape (n,n) such as:
input_data.reshape(((2*h+1)**2,n,n))
>>> array([[[ 0 1]
[ 2 3]]
[[ 4 5]
[ 6 7]]
...
[[92 93]
[94 95]]
[[96 97]
[98 99]]]
These blocks finally need to be reshaped into a (m,m) matrix so that they are stacked in rows of 2*h+1 blocks:
>>> array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
...
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
My problem is that I can't seem to find proper axis permutations after the first reshape into (n,n) blocks. I have looked at several answers such as this one but in vain.
As the real dimensions n and h are quite bigger and this operation takes place in an iterative process, I am looking for an efficient reshaping operation.
I don't think you can do this with reshape and transpose alone (although I'd love to be proven wrong). Using np.block works, but it's a bit messy:
np.block([list(i) for i in input_data.reshape( (2*h+1), (2*h+1), n, n )])
array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
[20, 21, 24, 25, 28, 29, 32, 33, 36, 37],
[22, 23, 26, 27, 30, 31, 34, 35, 38, 39],
[40, 41, 44, 45, 48, 49, 52, 53, 56, 57],
[42, 43, 46, 47, 50, 51, 54, 55, 58, 59],
[60, 61, 64, 65, 68, 69, 72, 73, 76, 77],
[62, 63, 66, 67, 70, 71, 74, 75, 78, 79],
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
EDIT: Never mind, you can do without np.block:
input_data.reshape( (2*h+1), (2*h+1), n, n).transpose(0, 2, 1, 3).reshape(10, 10)
array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
[20, 21, 24, 25, 28, 29, 32, 33, 36, 37],
[22, 23, 26, 27, 30, 31, 34, 35, 38, 39],
[40, 41, 44, 45, 48, 49, 52, 53, 56, 57],
[42, 43, 46, 47, 50, 51, 54, 55, 58, 59],
[60, 61, 64, 65, 68, 69, 72, 73, 76, 77],
[62, 63, 66, 67, 70, 71, 74, 75, 78, 79],
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
I have an numpy array with x and y values of points. I have another array which contains pairs of start and end indices. Originally this data was in pandas DataFrame, but since it was over 60 millions items, the loc algorithm was very slow. Is there any numpy fast method to split this?
import numpy as np
xy_array = np.arange(100).reshape(2,-1)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17], [20, 22]]
expected_result = [
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]],
[[10, 11, 12], [60, 61, 62]],
[[13, 14, 15, 16], [63, 64, 65, 66]],
[[20, 21], [70, 71]]
]
Update:
It is not always the case that, next pair will start from end of previous.
This will do it:
import numpy as np
xy_array = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17]]
expected_result = [xy_array[:, x:y] for x, y in split_paris]
expected_result
#[array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]), array([[10, 11, 12],
# [60, 61, 62]]), array([[13, 14, 15, 16],
# [63, 64, 65, 66]])]
It is using index slicing basically working in sense array[rows, columns] having : take all rows and x:y taking columns from x to y.
you can always use the np.array_split function provided by numpy. and use the ranges you want
x = np.arange(8.0)
>>> np.array_split(x, 3)
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]
(Edit: I wrote a solution basing on hpaulj's answer, see code at the bottom of this post)
I wrote a function that subdivides an n-dimensional array into smaller ones such that each of the subdivisions has max_chunk_size elements in total.
Since I need to subdivide many arrays of same shapes and then perform operations on the corresponding chunks, it doesn't actually operate on the data rather than creates an array of "indexers", i. e. an array of (slice(x1, x2), slice(y1, y2), ...) objects (see the code below). With these indexers I can retrieve subdivisions by calling the_array[indexer[i]] (see examples below).
Also, the array of these indexers has same number of dimensions as input and divisions are aligned along corresponding axes, i. e. blocks the_array[indexer[i,j,k]] and the_array[indexer[i+1,j,k]] are adjusent along the 0-axis, etc.
I was expecting that I should also be able to concatenate these blocks by calling the_array[indexer[i:i+2,j,k]] and that the_array[indexer] would return just the_array, however such calls result in an error:
IndexError: arrays used as indices must be of integer (or boolean)
type
Is there a simple way around this error?
Here's the code:
import numpy as np
import itertools
def subdivide(shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object)
for i, el in enumerate(itertools.product(*slices)): result[i] = el
result.shape = np.ceil(shape / slice_shape).astype(int)
return result
Here's an example usage:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> slices = subdivide(ar.shape, 16)
>>> slices
array([[(slice(0, 2, None), slice(0, 6, None)),
(slice(0, 2, None), slice(6, 12, None)),
(slice(0, 2, None), slice(12, 15, None))],
[(slice(2, 4, None), slice(0, 6, None)),
(slice(2, 4, None), slice(6, 12, None)),
(slice(2, 4, None), slice(12, 15, None))],
[(slice(4, 6, None), slice(0, 6, None)),
(slice(4, 6, None), slice(6, 12, None)),
(slice(4, 6, None), slice(12, 15, None))]], dtype=object)
>>> ar[slices[1,0]]
array([[30, 31, 32, 33, 34, 35],
[45, 46, 47, 48, 49, 50]])
>>> ar[slices[0,2]]
array([[12, 13, 14],
[27, 28, 29]])
>>> ar[slices[2,1]]
array([[66, 67, 68, 69, 70, 71],
[81, 82, 83, 84, 85, 86]])
>>> ar[slices[:2,1:3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type
Here's a solution based on hpaulj's answer:
import numpy as np
import itertools
class Subdivision():
def __init__(self, shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
self.slices = \
np.array(list(itertools.product(*slices)), \
dtype=np.object).reshape(tuple(np.ceil(shape / slice_shape).astype(int)) + (len(shape),))
def __getitem__(self, args):
if type(args) != tuple: args = (args,)
# turn integer index into equivalent slice
args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args)
# select the slices
# always select all elements from the last axis (which contains slices for each data dimension)
slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))]
return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \
[0] * (len(slices.shape) - 2 - i) + [i])])] \
for i in range(len(slices.shape) - 1)))
Example usage:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> subdiv = Subdivision(ar.shape, 16)
>>> ar[subdiv[...]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[0]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
>>> ar[subdiv[:2,1]]
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26],
[36, 37, 38, 39, 40, 41],
[51, 52, 53, 54, 55, 56]])
>>> ar[subdiv[2,:3]]
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[...,:2]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]])
Your slices produce 2x6 and 2x3 arrays.
In [36]: subslice=slices[:2,1:3]
In [37]: subslice[0,0]
Out[37]: array([slice(0, 2, None), slice(6, 12, None)], dtype=object)
In [38]: ar[tuple(subslice[0,0])]
Out[38]:
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26]])
My numpy version expects me to turn the subslice into a tuple. This is the same as
ar[slice(0,2), slice(6,12)]
ar[:2, 6:12]
That's just the basic syntax of indexing and slicing. ar is 2d, so ar[(i,j)] requires a 2 element tuple - of slices, lists, arrays, or integers. It won't work with an array of slice objects.
How ever it is possible to concatenate the results into a larger array. That can be done after indexing or the slices can be converted into indexing lists.
np.bmat for example concatenates together a 2d arangement of arrays:
In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]],
[ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]])
Out[42]:
matrix([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[51, 52, 53, 54, 55, 56, 57, 58, 59]])
You could generalize this. It just uses hstack and vstack on the nested lists. The result is np.matrix but can be converted back to array.
The other approach is to use tools like np.arange, np.r_, np.xi_ to create index arrays. It'll take some playing around to generate an example.
To combine the [0,0] and [0,1] subslices:
In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]]
In [65]: i = np.r_[subslice[0,0,0]]
In [66]: i,j
Out[66]: (array([0, 1]), array([ 6, 7, 8, 9, 10, 11, 12, 13, 14]))
In [68]: ix = np.ix_(i,j)
In [69]: ix
Out[69]:
(array([[0],
[1]]), array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14]]))
In [70]: ar[ix]
Out[70]:
array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
Or with i = np.r_[subslice[0,0,0], subslice[1,0,0]], ar[np.ix_(i,j)] produces the 4x9 array.