Related
The question follows a such:
x = np.arange(100)
Write Python code to split the following array at these intervals: 10, 25, 45, 75, 95
I have used the split function and unable to get at these specific intervals, can anyone enlighten me on another method or am i doing it wrongly?
Here's both the manual way and the numpy way with split.
# Manual method
x = np.arange(100)
split_indices = [10, 25, 45, 75, 95]
split_arrays = []
for i, j in zip([0]+split_indices[:-1], split_indices):
split_arrays.append(x[i:j])
print(split_arrays)
# Numpy method
split_arrays_np = np.split(x, split_indices)
print(split_arrays_np)
And the result is (for both)
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]),
array([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]),
array([45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74]),
array([75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94])
]
I'm using the following method to convert from one Range to another Range of numbers
def pos(self, value) :
return int( math.floor(self.nitems * ((value - self.vmin)/float(self.vrange)) ) )
the problem is that it is not doing it consistently i.e. for a
min-max/range : 10-100/90
nitems : 100
i.e :
10-100 => 0-100
i get gaps : missing : -1,19,29,39,49,....,99,100
In [66]: np.array([ne.pos(x) for x in range(100)])
Out[66]: array([-12, -10, -9, -8, -7, -6, -5, -4, -3, -2, 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24,
25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61,
62, 63, 64, 65, 66, 67, 68, 70, 71, 72, 73, 74, 75, 76, 77, 78, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97,
98])
Where is my error in the formula ? if there is consistent way to do it with any range ?
Is it because of the roundings ?
It has to "land" on every value in my target-range! and it has to be sequential.
hmm... now that I think about it may be it is not possible when the input range is smaller than the output range, unless I allow for real-numbers ?
it seems so, when i try with float no gaps :
In [82]: np.array([ne.pos(x) for x in np.linspace(0,101,150)])
Out[82]: array([-12, -11, -10, -9, -9, -8, -7, -6, -6, -5, -4, -3, -3, -2, -1, 0, 0, 1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9, 9, 10, 11, 12, 12,
13, 14, 15, 16, 16, 17, 18, 19, 19, 20, 21, 22, 22, 23, 24, 25, 25, 26, 27, 28, 28, 29, 30, 31, 31, 32, 33, 34, 34, 35, 36, 37, 37,
38, 39, 40, 40, 41, 42, 43, 43, 44, 45, 46, 46, 47, 48, 49, 49, 50, 51, 52, 52, 53, 54, 55, 55, 56, 57, 58, 58, 59, 60, 61, 61, 62,
63, 64, 64, 65, 66, 67, 67, 68, 69, 70, 70, 71, 72, 73, 73, 74, 75, 76, 77, 77, 78, 79, 80, 80, 81, 82, 83, 83, 84, 85, 86, 86, 87,
88, 89, 89, 90, 91, 92, 92, 93, 94, 95, 95, 96, 97, 98, 98, 99, 100, 101])
It seems as though you are over-complicating this, if all you're trying to do is create a second range of integers that is the same length but different starting point as the first range.
a = range(10, 100)
len(a) == 90
list(a) == [10, 11, 12, ..., 98, 99]
b = range(0, len(a))
len(b) == 90
list(b) == [0, 1, 2, ..., 88, 89]
c = range(20, 20 + len(a))
len(c) == 90
list(c) == [20, 21, 22, ..., 108, 109]
I am looking for a way to reshape the following 1d-numpy array:
# dimensions
n = 2 # int : 1 ... N
h = 2 # int : 1 ... N
m = n*(2*h+1)
input_data = np.arange(0,(n*(2*h+1))**2)
The expected output should be reshaped into (2*h+1)**2 blocks of shape (n,n) such as:
input_data.reshape(((2*h+1)**2,n,n))
>>> array([[[ 0 1]
[ 2 3]]
[[ 4 5]
[ 6 7]]
...
[[92 93]
[94 95]]
[[96 97]
[98 99]]]
These blocks finally need to be reshaped into a (m,m) matrix so that they are stacked in rows of 2*h+1 blocks:
>>> array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
...
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
My problem is that I can't seem to find proper axis permutations after the first reshape into (n,n) blocks. I have looked at several answers such as this one but in vain.
As the real dimensions n and h are quite bigger and this operation takes place in an iterative process, I am looking for an efficient reshaping operation.
I don't think you can do this with reshape and transpose alone (although I'd love to be proven wrong). Using np.block works, but it's a bit messy:
np.block([list(i) for i in input_data.reshape( (2*h+1), (2*h+1), n, n )])
array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
[20, 21, 24, 25, 28, 29, 32, 33, 36, 37],
[22, 23, 26, 27, 30, 31, 34, 35, 38, 39],
[40, 41, 44, 45, 48, 49, 52, 53, 56, 57],
[42, 43, 46, 47, 50, 51, 54, 55, 58, 59],
[60, 61, 64, 65, 68, 69, 72, 73, 76, 77],
[62, 63, 66, 67, 70, 71, 74, 75, 78, 79],
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
EDIT: Never mind, you can do without np.block:
input_data.reshape( (2*h+1), (2*h+1), n, n).transpose(0, 2, 1, 3).reshape(10, 10)
array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
[20, 21, 24, 25, 28, 29, 32, 33, 36, 37],
[22, 23, 26, 27, 30, 31, 34, 35, 38, 39],
[40, 41, 44, 45, 48, 49, 52, 53, 56, 57],
[42, 43, 46, 47, 50, 51, 54, 55, 58, 59],
[60, 61, 64, 65, 68, 69, 72, 73, 76, 77],
[62, 63, 66, 67, 70, 71, 74, 75, 78, 79],
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
I have an numpy array with x and y values of points. I have another array which contains pairs of start and end indices. Originally this data was in pandas DataFrame, but since it was over 60 millions items, the loc algorithm was very slow. Is there any numpy fast method to split this?
import numpy as np
xy_array = np.arange(100).reshape(2,-1)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17], [20, 22]]
expected_result = [
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]],
[[10, 11, 12], [60, 61, 62]],
[[13, 14, 15, 16], [63, 64, 65, 66]],
[[20, 21], [70, 71]]
]
Update:
It is not always the case that, next pair will start from end of previous.
This will do it:
import numpy as np
xy_array = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17]]
expected_result = [xy_array[:, x:y] for x, y in split_paris]
expected_result
#[array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]), array([[10, 11, 12],
# [60, 61, 62]]), array([[13, 14, 15, 16],
# [63, 64, 65, 66]])]
It is using index slicing basically working in sense array[rows, columns] having : take all rows and x:y taking columns from x to y.
you can always use the np.array_split function provided by numpy. and use the ranges you want
x = np.arange(8.0)
>>> np.array_split(x, 3)
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]
I'm experimenting in ipython3, where I created an array of arrays:
In [105]: counts_array
Out[105]:
array([array([ 17, 59, 320, ..., 1, 7, 0], dtype=uint32),
array([ 30, 71, 390, ..., 12, 20, 6], dtype=uint32),
array([ 7, 145, 214, ..., 4, 12, 0], dtype=uint32),
array([ 23, 346, 381, ..., 15, 19, 5], dtype=uint32),
array([ 51, 78, 270, ..., 3, 0, 2], dtype=uint32),
array([212, 149, 511, ..., 19, 31, 8], dtype=uint32)], dtype=object)
In [106]: counts_array.shape
Out[106]: (6,)
In [107]: counts_array[0].shape
Out[107]: (1590,)
I would like to obtain a plain shape=(6, 1590), dtype=uint32 array from this monster I created.
How can I do that?
You can use np.vstack -
np.vstack(counts_array)
Another way with np.concatenate -
np.concatenate(counts_array).reshape(len(counts_array),-1)
Sample run -
In [23]: a
Out[23]:
array([array([68, 92, 84, 35, 14, 71, 55, 40, 21, 41]),
array([30, 90, 52, 64, 86, 68, 61, 85, 26, 98]),
array([98, 64, 23, 49, 13, 17, 52, 96, 97, 19]),
array([54, 26, 25, 22, 95, 77, 20, 73, 22, 80]),
array([15, 84, 91, 54, 25, 21, 37, 19, 25, 25]),
array([87, 17, 49, 74, 11, 34, 27, 23, 22, 83])], dtype=object)
In [24]: np.vstack(a)
Out[24]:
array([[68, 92, 84, 35, 14, 71, 55, 40, 21, 41],
[30, 90, 52, 64, 86, 68, 61, 85, 26, 98],
[98, 64, 23, 49, 13, 17, 52, 96, 97, 19],
[54, 26, 25, 22, 95, 77, 20, 73, 22, 80],
[15, 84, 91, 54, 25, 21, 37, 19, 25, 25],
[87, 17, 49, 74, 11, 34, 27, 23, 22, 83]])
After various experiments, it turns out that the following simple syntax just works:
numpy.array([sub_array for sub_array in counts_array])
My first working version was unnecessary complicated:
numpy.array([[*sub_array] for sub_array in counts_array], dtype=numpy.uint32)
Have you considered numpy.vstack()?
I use it very often for this kind of operations.