I have an numpy array with x and y values of points. I have another array which contains pairs of start and end indices. Originally this data was in pandas DataFrame, but since it was over 60 millions items, the loc algorithm was very slow. Is there any numpy fast method to split this?
import numpy as np
xy_array = np.arange(100).reshape(2,-1)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17], [20, 22]]
expected_result = [
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]],
[[10, 11, 12], [60, 61, 62]],
[[13, 14, 15, 16], [63, 64, 65, 66]],
[[20, 21], [70, 71]]
]
Update:
It is not always the case that, next pair will start from end of previous.
This will do it:
import numpy as np
xy_array = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17]]
expected_result = [xy_array[:, x:y] for x, y in split_paris]
expected_result
#[array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]), array([[10, 11, 12],
# [60, 61, 62]]), array([[13, 14, 15, 16],
# [63, 64, 65, 66]])]
It is using index slicing basically working in sense array[rows, columns] having : take all rows and x:y taking columns from x to y.
you can always use the np.array_split function provided by numpy. and use the ranges you want
x = np.arange(8.0)
>>> np.array_split(x, 3)
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]
Related
The question follows a such:
x = np.arange(100)
Write Python code to split the following array at these intervals: 10, 25, 45, 75, 95
I have used the split function and unable to get at these specific intervals, can anyone enlighten me on another method or am i doing it wrongly?
Here's both the manual way and the numpy way with split.
# Manual method
x = np.arange(100)
split_indices = [10, 25, 45, 75, 95]
split_arrays = []
for i, j in zip([0]+split_indices[:-1], split_indices):
split_arrays.append(x[i:j])
print(split_arrays)
# Numpy method
split_arrays_np = np.split(x, split_indices)
print(split_arrays_np)
And the result is (for both)
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]),
array([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]),
array([45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74]),
array([75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94])
]
I have a numpy array of shape (1000000,).
I would like every n=1000 rows to become columns.
The resulting shape should be (1000, 1000)
How can I do this with NumPy? np.transpose() doesn't seem to do what I want.
I don't want to use a for loop for performance reasons.
You can use reshape with the order='F' parameter:
Example with a (100,) 1D array converted to (10,10) 2D array:
a = np.arange(100). # array([0, 1, 2, ..., 98, 99])
b = a.reshape((10,10), order='F')
Output:
>>> b
array([[ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90],
[ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91],
[ 2, 12, 22, 32, 42, 52, 62, 72, 82, 92],
[ 3, 13, 23, 33, 43, 53, 63, 73, 83, 93],
[ 4, 14, 24, 34, 44, 54, 64, 74, 84, 94],
[ 5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
[ 6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
[ 7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
[ 8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
[ 9, 19, 29, 39, 49, 59, 69, 79, 89, 99]])
I have a class like this
class A:
def __init__(self):
self.top_left = (1,2)
self.arr = np.reshape(np.arange(100), (10,10))
def __setitem__(self, key, val):
return self.arr[shifted(key, self.top_left)] = val
I want all the row indices appear in key to be shifted by 1 and all the column indices appear in key shifted by 2. Is it possible?
Edit:
Consider a = A() and a.arr to be
[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
Now when I set a[0,0] = 5, a.arr changes at index (1,2). Because it gets shifted by (1,2).
Again if I set a[3:6, 3:6] = np.ones((3,3)) then a.arr looks like this:
[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 1, 1, 1, 48, 49],
[50, 51, 52, 53, 54, 1, 1, 1, 58, 59],
[60, 61, 62, 63, 64, 1, 1, 1, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
because all the index in the key, gets shifted by (1,2).
Edit 2:
Currently I am storing the values in a separate array. And then putting this whole array, back to arr.
self.arr2[key] = value
self.arr[self.top_left[1] : self.top_left[1] + self.shape[1],
self.top_left[0] : self.top_left[0] + self.shape[1],
] = self.arr2
self.shape is shape of the editable window in a.arr
Numpy array operate on builtin python slice or tuple.
shifter function decides what kind of index you passed.
import numpy as np
class A:
def __init__(self):
self.top_left = (1,2)
self.arr = np.reshape(np.arange(100), (10,10))
def __setitem__(self, key, val):
self.arr[self.shifter(key)] = val
def shifter(self, key):
if isinstance(key[0], slice):
shift_func = self.shifted_slice
else:
shift_func = self.shifted_point
return shift_func(key)
def shifted_slice(self, key):
row_slice, col_slice = key
row_offset, col_offset = self.top_left
return slice(row_slice.start + row_offset, row_slice.stop + row_offset), \
slice(col_slice.start + col_offset, col_slice.stop + col_offset)
def shifted_point(self, key):
row_num, col_num = key
row_offset, col_offset = self.top_left
return row_num + row_offset, \
col_num + col_offset
a = A()
a[0, 0] = 5
a[3:6, 3:6] = np.ones((3,3))
print(a.arr)
Outputs:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 5, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 1, 1, 1, 48, 49],
[50, 51, 52, 53, 54, 1, 1, 1, 58, 59],
[60, 61, 62, 63, 64, 1, 1, 1, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
I'm using the following method to convert from one Range to another Range of numbers
def pos(self, value) :
return int( math.floor(self.nitems * ((value - self.vmin)/float(self.vrange)) ) )
the problem is that it is not doing it consistently i.e. for a
min-max/range : 10-100/90
nitems : 100
i.e :
10-100 => 0-100
i get gaps : missing : -1,19,29,39,49,....,99,100
In [66]: np.array([ne.pos(x) for x in range(100)])
Out[66]: array([-12, -10, -9, -8, -7, -6, -5, -4, -3, -2, 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24,
25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61,
62, 63, 64, 65, 66, 67, 68, 70, 71, 72, 73, 74, 75, 76, 77, 78, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97,
98])
Where is my error in the formula ? if there is consistent way to do it with any range ?
Is it because of the roundings ?
It has to "land" on every value in my target-range! and it has to be sequential.
hmm... now that I think about it may be it is not possible when the input range is smaller than the output range, unless I allow for real-numbers ?
it seems so, when i try with float no gaps :
In [82]: np.array([ne.pos(x) for x in np.linspace(0,101,150)])
Out[82]: array([-12, -11, -10, -9, -9, -8, -7, -6, -6, -5, -4, -3, -3, -2, -1, 0, 0, 1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9, 9, 10, 11, 12, 12,
13, 14, 15, 16, 16, 17, 18, 19, 19, 20, 21, 22, 22, 23, 24, 25, 25, 26, 27, 28, 28, 29, 30, 31, 31, 32, 33, 34, 34, 35, 36, 37, 37,
38, 39, 40, 40, 41, 42, 43, 43, 44, 45, 46, 46, 47, 48, 49, 49, 50, 51, 52, 52, 53, 54, 55, 55, 56, 57, 58, 58, 59, 60, 61, 61, 62,
63, 64, 64, 65, 66, 67, 67, 68, 69, 70, 70, 71, 72, 73, 73, 74, 75, 76, 77, 77, 78, 79, 80, 80, 81, 82, 83, 83, 84, 85, 86, 86, 87,
88, 89, 89, 90, 91, 92, 92, 93, 94, 95, 95, 96, 97, 98, 98, 99, 100, 101])
It seems as though you are over-complicating this, if all you're trying to do is create a second range of integers that is the same length but different starting point as the first range.
a = range(10, 100)
len(a) == 90
list(a) == [10, 11, 12, ..., 98, 99]
b = range(0, len(a))
len(b) == 90
list(b) == [0, 1, 2, ..., 88, 89]
c = range(20, 20 + len(a))
len(c) == 90
list(c) == [20, 21, 22, ..., 108, 109]
I am looking for a way to reshape the following 1d-numpy array:
# dimensions
n = 2 # int : 1 ... N
h = 2 # int : 1 ... N
m = n*(2*h+1)
input_data = np.arange(0,(n*(2*h+1))**2)
The expected output should be reshaped into (2*h+1)**2 blocks of shape (n,n) such as:
input_data.reshape(((2*h+1)**2,n,n))
>>> array([[[ 0 1]
[ 2 3]]
[[ 4 5]
[ 6 7]]
...
[[92 93]
[94 95]]
[[96 97]
[98 99]]]
These blocks finally need to be reshaped into a (m,m) matrix so that they are stacked in rows of 2*h+1 blocks:
>>> array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
...
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
My problem is that I can't seem to find proper axis permutations after the first reshape into (n,n) blocks. I have looked at several answers such as this one but in vain.
As the real dimensions n and h are quite bigger and this operation takes place in an iterative process, I am looking for an efficient reshaping operation.
I don't think you can do this with reshape and transpose alone (although I'd love to be proven wrong). Using np.block works, but it's a bit messy:
np.block([list(i) for i in input_data.reshape( (2*h+1), (2*h+1), n, n )])
array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
[20, 21, 24, 25, 28, 29, 32, 33, 36, 37],
[22, 23, 26, 27, 30, 31, 34, 35, 38, 39],
[40, 41, 44, 45, 48, 49, 52, 53, 56, 57],
[42, 43, 46, 47, 50, 51, 54, 55, 58, 59],
[60, 61, 64, 65, 68, 69, 72, 73, 76, 77],
[62, 63, 66, 67, 70, 71, 74, 75, 78, 79],
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
EDIT: Never mind, you can do without np.block:
input_data.reshape( (2*h+1), (2*h+1), n, n).transpose(0, 2, 1, 3).reshape(10, 10)
array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
[20, 21, 24, 25, 28, 29, 32, 33, 36, 37],
[22, 23, 26, 27, 30, 31, 34, 35, 38, 39],
[40, 41, 44, 45, 48, 49, 52, 53, 56, 57],
[42, 43, 46, 47, 50, 51, 54, 55, 58, 59],
[60, 61, 64, 65, 68, 69, 72, 73, 76, 77],
[62, 63, 66, 67, 70, 71, 74, 75, 78, 79],
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])