Related
I have a numpy array of shape (1000000,).
I would like every n=1000 rows to become columns.
The resulting shape should be (1000, 1000)
How can I do this with NumPy? np.transpose() doesn't seem to do what I want.
I don't want to use a for loop for performance reasons.
You can use reshape with the order='F' parameter:
Example with a (100,) 1D array converted to (10,10) 2D array:
a = np.arange(100). # array([0, 1, 2, ..., 98, 99])
b = a.reshape((10,10), order='F')
Output:
>>> b
array([[ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90],
[ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91],
[ 2, 12, 22, 32, 42, 52, 62, 72, 82, 92],
[ 3, 13, 23, 33, 43, 53, 63, 73, 83, 93],
[ 4, 14, 24, 34, 44, 54, 64, 74, 84, 94],
[ 5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
[ 6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
[ 7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
[ 8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
[ 9, 19, 29, 39, 49, 59, 69, 79, 89, 99]])
I have a class like this
class A:
def __init__(self):
self.top_left = (1,2)
self.arr = np.reshape(np.arange(100), (10,10))
def __setitem__(self, key, val):
return self.arr[shifted(key, self.top_left)] = val
I want all the row indices appear in key to be shifted by 1 and all the column indices appear in key shifted by 2. Is it possible?
Edit:
Consider a = A() and a.arr to be
[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
Now when I set a[0,0] = 5, a.arr changes at index (1,2). Because it gets shifted by (1,2).
Again if I set a[3:6, 3:6] = np.ones((3,3)) then a.arr looks like this:
[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 1, 1, 1, 48, 49],
[50, 51, 52, 53, 54, 1, 1, 1, 58, 59],
[60, 61, 62, 63, 64, 1, 1, 1, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
because all the index in the key, gets shifted by (1,2).
Edit 2:
Currently I am storing the values in a separate array. And then putting this whole array, back to arr.
self.arr2[key] = value
self.arr[self.top_left[1] : self.top_left[1] + self.shape[1],
self.top_left[0] : self.top_left[0] + self.shape[1],
] = self.arr2
self.shape is shape of the editable window in a.arr
Numpy array operate on builtin python slice or tuple.
shifter function decides what kind of index you passed.
import numpy as np
class A:
def __init__(self):
self.top_left = (1,2)
self.arr = np.reshape(np.arange(100), (10,10))
def __setitem__(self, key, val):
self.arr[self.shifter(key)] = val
def shifter(self, key):
if isinstance(key[0], slice):
shift_func = self.shifted_slice
else:
shift_func = self.shifted_point
return shift_func(key)
def shifted_slice(self, key):
row_slice, col_slice = key
row_offset, col_offset = self.top_left
return slice(row_slice.start + row_offset, row_slice.stop + row_offset), \
slice(col_slice.start + col_offset, col_slice.stop + col_offset)
def shifted_point(self, key):
row_num, col_num = key
row_offset, col_offset = self.top_left
return row_num + row_offset, \
col_num + col_offset
a = A()
a[0, 0] = 5
a[3:6, 3:6] = np.ones((3,3))
print(a.arr)
Outputs:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 5, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 1, 1, 1, 48, 49],
[50, 51, 52, 53, 54, 1, 1, 1, 58, 59],
[60, 61, 62, 63, 64, 1, 1, 1, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
I am trying to access (read/write) numpy.ndarrays periodically. In other words, if I have my_array with the shape of 10*10 and I use the access operator with the inputs:
my_arrray[10, 10] or acess_function(my_array, 10, 10)
I can have access to element
my_array[0, 0].
I want to have read/write ability at my returned element of periodically indexed array.
Can anyone how to do it without making a shifted copy of my original array?
I think this does what you want but I'm not sure whether there's something more elegant that exists. It's probably possible to write a general function for an Nd array but this does 2D only. As you said it uses modular arithmetic.
import numpy as np
def access(shape, ixr, ixc):
""" Returns a selection. """
return np.s_[ixr % shape[0], ixc % shape[1]]
arr = np.arange(100)
arr.shape = 10,10
arr[ access(arr.shape, 45, 87) ]
# 57
arr[access(arr.shape, 45, 87)] = 100
In [18]: arr
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
# [ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
# [ 50, 51, 52, 53, 54, 55, 56, **100**, 58, 59],
# [ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
# [ 70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
# [ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
# [ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
Edit - Generic nD version
def access(shape, *args):
if len(shape) != len(args):
error = 'Inconsistent number of dimemsions: {} & number of indices: {} in coords.'
raise IndexError( error.format(len(shape), len(args)))
res = []
for limit, ix in zip(shape, args):
res.append(ix % limit)
return tuple(res)
Usage/Test
a = np.arange(24)
a.shape = 2,3,4
a[access(a.shape, 5, 6, 7)]
# 15
a[access(a.shape, 5,6,7) ] = 100
a
# array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]],
# [[ 12, 13, 14, 100],
# [ 16, 17, 18, 19],
# [ 20, 21, 22, 23]]])
I have an numpy array with x and y values of points. I have another array which contains pairs of start and end indices. Originally this data was in pandas DataFrame, but since it was over 60 millions items, the loc algorithm was very slow. Is there any numpy fast method to split this?
import numpy as np
xy_array = np.arange(100).reshape(2,-1)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17], [20, 22]]
expected_result = [
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]],
[[10, 11, 12], [60, 61, 62]],
[[13, 14, 15, 16], [63, 64, 65, 66]],
[[20, 21], [70, 71]]
]
Update:
It is not always the case that, next pair will start from end of previous.
This will do it:
import numpy as np
xy_array = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
split_paris = [[0, 10], [10, 13], [13, 17]]
expected_result = [xy_array[:, x:y] for x, y in split_paris]
expected_result
#[array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]), array([[10, 11, 12],
# [60, 61, 62]]), array([[13, 14, 15, 16],
# [63, 64, 65, 66]])]
It is using index slicing basically working in sense array[rows, columns] having : take all rows and x:y taking columns from x to y.
you can always use the np.array_split function provided by numpy. and use the ranges you want
x = np.arange(8.0)
>>> np.array_split(x, 3)
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]
Given an array image which might be a 2D, 3D or 4D, but preferable nD array, I want to extract a contiguous part of the array around a point with a list denoting how I extend in along all axis and pad the array with a pad_value if the extensions is out of the image.
I came up with this:
def extract_patch_around_point(image, loc, extend, pad_value=0):
offsets_low = []
offsets_high = []
for i, x in enumerate(loc):
offset_low = -np.min([x - extend[i], 0])
offsets_low.append(offset_low)
offset_high = np.max([x + extend[i] - image.shape[1] + 1, 0])
offsets_high.append(offset_high)
upper_patch_offsets = []
lower_image_offsets = []
upper_image_offsets = []
for i in range(image.ndim):
upper_patch_offset = 2*extend[i] + 1 - offsets_high[i]
upper_patch_offsets.append(upper_patch_offset)
image_offset_low = loc[i] - extend[i] + offsets_low[i]
image_offset_high = np.min([loc[i] + extend[i] + 1, image.shape[i]])
lower_image_offsets.append(image_offset_low)
upper_image_offsets.append(image_offset_high)
patch = pad_value*np.ones(2*np.array(extend) + 1)
# This is ugly
A = np.ix_(range(offsets_low[0], upper_patch_offsets[0]),
range(offsets_low[1], upper_patch_offsets[1]))
B = np.ix_(range(lower_image_offsets[0], upper_image_offsets[0]),
range(lower_image_offsets[1], upper_image_offsets[1]))
patch[A] = image[B]
return patch
Currently it only works in 2D because of the indexing trick with A, B etc. I do not want to check for the number of dimensions and use a different indexing scheme. How can I make this independent on image.ndim?
Based on my understanding of the requirements, I would suggest a zeros-padded version and then using slice notation to keep it generic on number of dimensions, like so -
def extract_patch_around_point(image, loc, extend, pad_value=0):
extend = np.asarray(extend)
image_ext_shp = image.shape + 2*np.array(extend)
image_ext = np.full(image_ext_shp, pad_value)
insert_idx = [slice(i,-i) for i in extend]
image_ext[insert_idx] = image
region_idx = [slice(i,j) for i,j in zip(loc,extend*2+1+loc)]
return image_ext[region_idx]
Sample runs -
2D case :
In [229]: np.random.seed(1234)
...: image = np.random.randint(11,99,(13,8))
...: loc = (5,3)
...: extend = np.array([2,4])
...:
In [230]: image
Out[230]:
array([[58, 94, 49, 64, 87, 35, 26, 60],
[34, 37, 41, 54, 41, 37, 69, 80],
[91, 84, 58, 61, 87, 48, 45, 49],
[78, 22, 11, 86, 91, 14, 13, 30],
[23, 76, 86, 92, 25, 82, 71, 57],
[39, 92, 98, 24, 23, 80, 42, 95],
[56, 27, 52, 83, 67, 81, 67, 97],
[55, 94, 58, 60, 29, 96, 57, 48],
[49, 18, 78, 16, 58, 58, 26, 45],
[21, 39, 15, 93, 66, 89, 34, 61],
[73, 66, 95, 11, 44, 32, 82, 79],
[92, 63, 75, 96, 52, 12, 25, 14],
[41, 23, 84, 30, 37, 79, 75, 33]])
In [231]: image[loc]
Out[231]: 24
In [232]: out = extract_patch_around_point(image, loc, extend, pad_value=0)
In [233]: out
Out[233]:
array([[ 0, 78, 22, 11, 86, 91, 14, 13, 30],
[ 0, 23, 76, 86, 92, 25, 82, 71, 57],
[ 0, 39, 92, 98, 24, 23, 80, 42, 95], <-- At middle
[ 0, 56, 27, 52, 83, 67, 81, 67, 97],
[ 0, 55, 94, 58, 60, 29, 96, 57, 48]])
^
3D case :
In [234]: np.random.seed(1234)
...: image = np.random.randint(11,99,(13,5,8))
...: loc = (5,2,3)
...: extend = np.array([1,2,4])
...:
In [235]: image[loc]
Out[235]: 82
In [236]: out = extract_patch_around_point(image, loc, extend, pad_value=0)
In [237]: out.shape
Out[237]: (3, 5, 9)
In [238]: out
Out[238]:
array([[[ 0, 23, 87, 19, 58, 98, 36, 32, 33],
[ 0, 56, 30, 52, 58, 47, 50, 28, 50],
[ 0, 70, 93, 48, 98, 49, 19, 65, 28],
[ 0, 52, 58, 30, 54, 55, 46, 53, 31],
[ 0, 37, 34, 13, 76, 38, 89, 79, 71]],
[[ 0, 14, 92, 58, 72, 74, 43, 24, 67],
[ 0, 59, 69, 46, 68, 71, 94, 20, 71],
[ 0, 61, 62, 60, 82, 92, 15, 14, 57], <-- At middle
[ 0, 58, 74, 95, 16, 94, 83, 83, 74],
[ 0, 67, 25, 92, 71, 19, 52, 44, 80]],
[[ 0, 74, 28, 12, 12, 13, 62, 88, 63],
[ 0, 25, 58, 86, 76, 40, 20, 91, 61],
[ 0, 28, 42, 85, 22, 45, 64, 35, 66],
[ 0, 64, 34, 69, 27, 17, 92, 89, 68],
[ 0, 15, 57, 86, 17, 98, 29, 59, 50]]])
^
Here is a simple working example that demonstrates how to iteratively "wittle down" your input matrix to obtain the patch around a point in nDims:
import numpy as np
# Givens. Matrix to be sliced, point around which to slice,
# and the padding around the given point
matrix = np.random.normal(size=[5,5,5])
loc = (3,3,3)
padding = 2
# If one knows the dimensionality, the slice can be obtained easily
ans1 = matrix[loc[0] - padding:loc[0] + 1,
loc[1] - padding:loc[1] + 1,
loc[2] - padding:loc[2] + 1]
# If one does not know the dimensionality, the slice can be
# obtained iteratively
ans2 = matrix
for i in range(matrix.ndim):
# Compute slice for the particular axis
s = slice(loc[i] - padding, loc[i] + 1, 1)
# Move particular axis to front, slice it, then move it back
ans2 = np.moveaxis(np.moveaxis(ans2, i, 0)[s], 0, i)
# Assert the two answers are equal
np.testing.assert_array_equal(ans1, ans2)
This example does not take into account slicing beyond the existing dimensions, but that exception can be easily caught in the loop.