Slice subarray from numpy array by list of indices - python

I have a 2D numpy array input_array and two lists of indices (x_coords and y_coords). Id like to slice a 3x3 subarray for each x,y pair centered around the x,y coordinates. The end result will be an array of 3x3 subarrays where the number of subarrays is equal to the number of coordinate pairs I have.
Preferably by avoiding for loops. Currently I use a modification of game of life strides from the scipy cookbook:
http://wiki.scipy.org/Cookbook/GameOfLifeStrides
shape = (input_array.shape[0] - 2, input_array.shape[0] - 2, 3, 3)
strides = input_array.strides + input_array.strides
strided = np.lib.stride_trics.as_strided(input_array, shape=shape, strides=strides).\
reshape(shape[0]*shape[1], shape[2], shape[3])
This creates a view of the original array as a (flattened) array of all possible 3x3 subarrays. I then convert the x,y coordinate pairs to be able to select the subarrays I want from strided:
coords = x_coords - 1 + (y_coords - 1)*shape[1]
sub_arrays = strided[coords]
Although this works perfectly fine, I do feel it is a bit cumbersome. Is there a more direct approach to do this? Also, in the future I would like to extend this to the 3D case; slicing nx3x3 subarrays from a nxmxk array. It might also be possible using strides but so far I haven't been able to make it work in 3D

Here is a method that use array broadcast:
x = np.random.randint(1, 63, 10)
y = np.random.randint(1, 63, 10)
dy, dx = [grid.astype(int) for grid in np.mgrid[-1:1:3j, -1:1:3j]]
Y = dy[None, :, :] + y[:, None, None]
X = dx[None, :, :] + x[:, None, None]
then you can use a[Y, X] to select blocks from a. Here is an example code:
img = np.zeros((64, 64))
img[Y, X] = 1
Here is graph ploted by pyplot.imshow():

A very straight forward solution would be a list comprehension and itertools.product:
import itertools
sub_arrays = [input_array[x-1:x+2, y-1:y+2]
for x, y in itertools.product(x_coords, y_coords)]
This creates all possible tuples of coordinates and then slices the 3x3 arrays from the input_array.
But this is sort-of a for loop. And you will have to take care, that x_coords and y_coords are not on the border of the matrix.

Related

Fast way to do consecutive one-to-all calculations on Numpy arrays without a for-loop?

I'm working on an optimization problem, but to avoid getting into the details, I'm going to provide a simple example of a bug that's been giving me headaches for a few days.
Say I have a 2D numpy array with observed x-y coordinates:
from scipy.optimize import distance
x = np.array([1,2], [2,3], [4,5], [5,6])
I also have a list of x-y coordinates to compare to these points (y):
y = np.array([11,13], [12, 14])
I have a function that takes the sum of manhattan differences between a value of x and all of the values in y:
def find_sum(ref_row, comp_rows):
modeled_counts = []
y = ref_row * len(comp_rows)
res = list(map(distance.cityblock, ref_row, comp_rows))
modeled_counts.append(sum(res))
return sum(modeled_counts)
Essentially, what I would like to do is find the sum of manhattan distances for every item in y with each item in x (so basically for each item in x, find the sum of the Manhattan distances between that (x,y) pair and every (x,y) pair in y).
I've tried this out with the following line of code:
z = list(map(find_sum, x, y))
However, z is of length 2 (like y), and not 4 like x. Is there a way to ensure that z is the result of consecutive one-to-all calculations? That is, I'd like to calculate the sum of all of the manhattan differences between x[0] and every set in y, and so on and so forth, so the length of z should be equal to the length of x.
Is there a simple way to do this without a for loop? My data is rather large (~ 4 million rows), so I'd really appreciate fast solutions. I'm fairly new to Python programming, so any explanations about why the solution works and is fast would be appreciated as well, but definitely isn't required!
Thanks!
This solution implements the distance in numpy, as I think it is a good example of broadcasting, which is a very useful thing to know if you need to use arrays and matrices.
By definition of Manhattan distance, you need to evaluate the sum of the absolute value of difference between each column. However, the first column of x, x[:, 0], has shape (4,) and the first column of y, y[:, 0], has shape (2,), so they are not compatible in the sense of applying subtraction: the broadcasting property says that each shape is compared starting with the trailing dimensions and two dimensions are compatible when they are equal or one of them is 1. Sadly, none of them are true for your columns.
However, you can add a new dimension of value 1 using np.newaxis, so
x[:, 0]
is array([1, 2, 4, 5]), but
x[:, 0, np.newaxis]
is
array([[1],
[2],
[4],
[5]])
and its shape is (4 ,1). Now, a matrix of shape (4, 1) subtracted by an array of shape 2 results in a matrix of shape (4, 2), by numpy's broadcasting treatment:
4 x 1
2
= 4 x 2
You can obtain the differences for each column:
first_column_difference = x[:, 0, np.newaxis] - y[:, 0]
second_column_difference = x[:, 1, np.newaxis] - y[:, 1]
and evaluate the sum of their absolute values:
np.abs(first_column_difference) + np.abs(second_column_difference)
which results in a (4, 2) matrix. Now, you want to sum the values for each row, so that you have 4 values:
np.sum(np.abs(first_column_difference) + np.abs(second_column_difference), axis=1)
which results in array([73, 69, 61, 57]). The rule is simple: the parameter axis will eliminate that dimension from the result, therefore using axis=1 for a (4, 2) matrix generates 4 values -- if you use axis=0, it will generate 2 values.
So, this will solve your problem:
x = np.array([[1, 2], [2, 3], [4, 5], [5, 6]])
y = np.array([[11, 13], [12, 43]])
first_column_difference = x[:, 0, np.newaxis] - y[:, 0]
second_column_difference = x[:, 1, np.newaxis] - y[:, 1]
z = np.abs(first_column_difference) + np.abs(second_column_difference)
print(np.sum(z, axis=1))
You can also skip the intermediate steps for each column and evaluate everything at once (it is a little bit harder to understand, so I prefer the method described above to explain what is happening):
print(np.abs(x[:, np.newaxis] - y).sum(axis=(1, 2)))
It is a general case for an n-dimensional Manhattan distance: if x is (u, n) and y is (v, n), it generates u rows by broadcasting (u, 1, n) by (v, n) = (u, v, n), then applying sum to eliminate the second and third axis.
Here is how you can do it using numpy broadcast with simplified explanation
Adjust Shape For Broadcasting
import numpy as np
start_points = np.array([[1,2], [2,3], [4,5], [5,6]])
dest_points = np.array([[11,13], [12, 14]])
## using np.newaxis as index add a new dimension at that position
## : give all the elements on that dimension
start_points = start_points[np.newaxis, :, :]
dest_points = dest_points[:, np.newaxis, :]
## Now lets check he shape of the point arrays
print('start_points.shape: ', start_points.shape) # (1, 4, 2)
print('dest_points.shape', dest_points.shape) # (2, 1, 2)
Lets try to understand
last element of shape represent x and y of a point, size 2
we can think of start_points as having 1 row and 4 columns of points
we can think of dest_points as having 2 rows and 1 columns of points
We can think start_points and dest_points as matrix or a table of points of size (1X4) and (2X1)
We clearly see that size are not compatible. What will happen if we perform arithmatic
operation between them? Here is where a smart part of numpy comes, called broadcast.
It will repeat rows of start_points to match that of dest_point making matrix of (2X4)
It will repeat columns of dest_point to match that of start_points making matrix of (2X4)
Result is arithmetic operation between every pair of elements on start_points and dest_points
Calculate the distance
diff_x_y = start_points - dest_points
print(diff_x_y.shape) # (2, 4, 2)
abs_diff_x_y = np.abs(start_points - dest_points)
man_distance = np.sum(abs_diff_x_y, axis=2)
print('man_distance:\n', man_distance)
sum_distance = np.sum(man_distance, axis=0)
print('sum_distance:\n', sum_distance)
Oneliner
start_points = np.array([[1,2], [2,3], [4,5], [5,6]])
dest_points = np.array([[11,13], [12, 14]])
np.sum(np.abs(start_points[np.newaxis, :, :] - dest_points[:, np.newaxis, :]), axis=(0,2))
Here is more detail explanation of broadcasting if you want to understand it more
With so many rows you can make substantial savings by using a smart algorithm. Let us for simplicity assume there is just one dimension; once we have established the algorithm, getting back to the general case is a simple matter of summing over coordinates.
The naive algorithm is O(mn) where m,n are the sizes of sets X,Y. Our algorithm is O((m+n)log(m+n)) so it scales much better.
We first have to sort the union of X and Y by coordinate and then form the cumsum over Y. Next, we find for each x in X the number YbefX of y in Y to its left and use it to look up the corresponding cumsum item YbefXval. The summed distances to all y to the left of x are YbefX times coordinate of x minus YbefXval, the distances to all y to the right are sum of all y coordinates minus YbefXval minus n - YbefX times coordinate of x.
Where does the saving come from? Sorting coordinates enables us to recycle the summations we have done before, instead of starting each time from scratch. This uses the fact that up to a sign we always sum the same y coordinates and going from left to right the signs flip one by one.
Code:
import numpy as np
from scipy.spatial.distance import cdist
from timeit import timeit
def pp(X,Y):
(m,k),(n,k) = X.shape,Y.shape
XY = np.concatenate([X.T,Y.T],1)
idx = XY.argsort(1)
Xmsk = idx<m
Ymsk = ~Xmsk
Xidx = np.arange(k)[:,None],idx[Xmsk].reshape(k,m)
Yidx = np.arange(k)[:,None],idx[Ymsk].reshape(k,n)
YbefX = Ymsk.cumsum(1)[Xmsk].reshape(k,m)
YbefXval = XY[Yidx].cumsum(1)[np.arange(k)[:,None],YbefX-1]
YbefXval[YbefX==0] = 0
XY[Xidx] = ((2*YbefX-n)*XY[Xidx]) - 2*YbefXval + Y.sum(0)[:,None]
return XY[:,:m].sum(0)
def summed_cdist(X,Y):
return cdist(X,Y,"minkowski",p=1).sum(1)
# demo
m,n,k = 1000,500,10
X,Y = np.random.randn(m,k),np.random.randn(n,k)
print("same result:",np.allclose(pp(X,Y),summed_cdist(X,Y)))
print("sort :",timeit(lambda:pp(X,Y),number=1000),"ms")
print("scipy cdist:",timeit(lambda:summed_cdist(X,Y),number=100)*10,"ms")
Sample run, comparing smart algo "sort" to naive algo implemented using cdist library function:
same result: True
sort : 1.4447695480193943 ms
scipy cdist: 36.41934019047767 ms

4D array into 2D array and again back to 4D

I have the poses of humans (X, Y, Z values of joints like left elbow, right knee etc) in a video saved in a 4D numpy array.
Example: The poses are saved in the array of shape 3, 103, 25, 2 means:
3(number of Co-ordinates), 103(number of Frames), 25(Number of Joints), 2(number of persons).
Now I want to change the view angle of this observation. i.e. I want to apply a rotation matrix on all the joint position values.
As of now, I'm
iterating through the number of persons
converting each person into 2-D array
Multiplying the 2D array with a rotation 3x3 matrix
Reshaping the rotated 2D array into 3D array
Appending the 3D arrays
seq = np.random.rand(3, 103, 25, 2)
rot_mat = np.random.rand(3,3)
rotated_seq = np.zeros(seq.shape)
for i in range(seq.shape[3]): # iterating through person
person = seq[:, :, :, i]
joint_values = np.reshape(person, (3, -1))
rotated_joint_values = np.dot(joint_values.T, rot_mat).T
rotated_person = np.reshape(rotated_joint_values, person.shape)
rotated_seq[:, :, :, i] = rotated_person
My question is there any way to do this without using the for loop.

Unravel 1D list back to 3D array

Basically, is there a way to transform a 1D list that has been "flattened" through the numpy.ravel() function back to it's original 3D form ? I know the dimensions, and one might ask why I just don't use the original 3D array in the first place, instead of converting it - but there reasons for that.
I just need to know if I can actually create the same 3D array from a 1D array that was created by using numpy.ravel() on the 3D array.
Basically the 3D array was created like this:
import numpy as np
nx = 50
ny = 40
nz = 150
x = np.linspace(1, 51, nx)
y = np.linspace(1, 41, ny)
z = np.linspace(1, 151, nz)
x_bc = x[:, np.newaxis, np.newaxis]
y_bc = y[np.newaxis, :, np.newaxis]
z_bc = z[np.newaxis, np.newaxis, :]
arr = x_bc + y_bc + z_bc
And nope, I can't just do this to get it back, since calculations has been done to it in the mean time, and then converted to a 1D array in the mean time as well. So the data in this array is not the same as the one I actually want to convert back.
Just reshape it back to the original shape?
raveled = np.ravel(arr)
new_arr = raveled.reshape(*arr.shape)
Does numpy.reshape do what you want?

Attach matrix to inside of matrix in numpy

Suppose there are 4D matrix (3,4,6,1) and 2D array (6,4) , I want to attach the 2D array left side of 2D matrix[0:3][0:4].
I can only do this problem using for loop.
for i in range(0, cols):
for j in range(0, rows):
x = np.append(a[i][j], b, axis = 1)
I try to make 2D -> 4D and use np.append ,but still don't know how to make 2D-> 4D like (3,4,6,4).
If I understand your requirements correctly one simple way would be
out = np.empty((3, 4, 6, 5))
out[..., :1] = a
out[..., 1:] = b

Reshape from flattened indices in Python

I have an image of size M*N whose pixels coordinates has been flattened to a 1D array according to a space-filling curve (i.e. not a classical rasterization where I could have used reshape).
I thus process my 1D array (flattened image) and I then would like to reshape it to a M*N array (initial size).
So far, I have done this with a for-loop:
for i in range(img_flat.size):
img_res[x[i], y[i]] = img_flat[i]
x and y being the x and y pixels coordinates according to my path scan.
However, I am wondering how to do this in a unique line of code.
If x and y are numpy arrays of dimension 1 and lengths n, and img_flat also has length n img_res is a numpy array of dimension 2 (h, w) such that `h*w = n, then:
img_res[x, y] = img_flat
Should suffice
In fact, it was easy:
vec = np.arange(0, seg.size, dtype=np.uint)
img_res[x[vec], y[vec]] = seg[vec]

Categories