I need to carry out some operation on a subset of an NxN array. I have the center of the sub-array, x and y, and its size.
So I can easily do:
subset = data[y-size:y+size,x-size:x+size]
And this is fine.
What I ask is if there is the possibility to do the same without writing an explicit loop if x and y are both 1D arrays of positions.
Thanks!
Using a simple example of a 5x5 array and setting size=1 we can get:
import numpy as np
data = np.arange(25).reshape((5,5))
size = 1
x = np.array([1,4])
y = np.array([1,4])
subsets = [data[j-size:j+size,i-size:i+size] for i in x for j in y]
print(subsets)
Which returns a list of numpy arrays:
[array([[0, 1],[5, 6]]),
array([[15, 16],[20, 21]]),
array([[3, 4],[8, 9]]),
array([[18, 19],[23, 24]])]
Which I hope is what you are looking for.
To get the list of subset assuming you have the list of positions xList and yList, this will do the tric:
subsetList = [ data[y-size:y+size,x-size:x+size] for x,y in zip(xList,yList) ]
Related
the problem i am trying to solve is as follows. I am given a matrix of arbitrary dimension representing indices of a list, and then a list. I would like to get back a matrix with the list elements swapped for the indices. I can't figure out how to do that in a vectorized way:
i.e if z = [[0,1], [1,0]] and list = [20,10], i'd want [[20,10], [10,20]] returned.
When they both are np.array, you can do indexing in a natural way:
import numpy as np
z = np.array([[0, 1], [1, 0]])
a = np.array([20, 10])
output = a[z]
print(output)
# [[20 10]
# [10 20]]
I have written this piece of code:
data = np.array([[3,6], [5,9], [4, 8]])
orig_x, orig_y = np.split(data, 2, axis=1)
x = np.array([3, 4])
y = np.zeros((len(x)))
for i in range(len(x)):
y[i] = orig_y[np.where(orig_x == x[i])[0]]
So basically, I have a 2D NumPy array. I split it into two 1D arrays orig_x and orig_y, one storing values of the x-axis and the other values of the y-axis.
I also have another 1D NumPy array, which has some of the values that exist in the orig_x array. I want to find the y-axis values for each value in the x array. I created this method, using a simple loop, but it is extremely slow since I'm using it with thousands of values.
Do you have a better idea? Maybe by using a NumPy function?
Note: Also a better title for this question can be made. Sorry :(
You could create a mask over which values you want from the x column and then use this mask to select values from the y column.
data = np.array([[3,6], [5,9], [4, 8]])
# the values you want to lookup on the x-axis
x = np.array([3, 4])
mask = np.isin(data[:,0], x)
data[mask,1]
Output:
array([6, 8])
The key function here is to use np.isin. What this is basically doing is broadcasting x or data to the appropriate shape and doing an element-wise comparison:
mask = data[:,0,None] == x
y_mask = np.logical_or.reduce(mask, axis=1)
data[y_mask, 1]
Output:
array([6, 8])
I'm not 100% sure I understood the problem correctly, but I think the following should work:
>>> rows, cols = np.where(orig_x == x)
>>> y = orig_y[rows[np.argsort(cols)]].ravel()
>>> y
array([6, 8])
It assumes that all the values in orig_x are unique, but since your code example has the same restriction, I considered it a given.
What about a lookup table?
import numpy as np
data = np.array([[3,6], [5,9], [4, 8]])
orig_x, orig_y = np.split(data, 2, axis=1)
x = np.array([3, 4])
y = np.zeros((len(x)))
You can pack a dict for lookup:
lookup = {i: j for i, j in zip(orig_x.ravel(), orig_y.ravel())}
And just map this into a new array:
np.fromiter(map(lambda i: lookup.get(i, np.nan), x), dtype=int, count=len(x))
array([6, 8])
If orig_x & orig_y are your smaller data structures this will probably be most efficient.
EDIT - It's occurred to me that if your values are integers the default np.nan won't work and you should figure out what value makes sense for your application if you're trying to find a value that isn't in your orig_x array.
I am working with satellite images that are geotiff files in form of arrays. Let say that I have multiple images with blank spaces (because of clouds or other elements). However I am collecting those arrays in a list.
from rasterio.plot import show
LST = [array1, array2]
f,ax=plt.subplots(1,2, figsize=[20,20])
show(lst_paris2, cmap='hot_r', vmin=vmin, ax=ax[0])
show(lst_paris3, cmap='hot_r', vmin=vmin1, ax=ax[1])
I would like to compute, for each pixel (i.e. cell i,j of the array) the mean as numpy.mean(LST)and the percentiles as numpy.percentile(LIST, [5,50,95]) avoiding the zeros values.
Your LST variable seems to be a list of two lists/arrays. It would help if you use np.hstack to make a single np.array from those two lists. Then you can make calculations like #midtownguru stated in comments.
array1 = [1,0,2,0,3,0,4,0,5,0,6,0,7,0,8,0,9,0,10]
array2 = [15,16,17,18,19]
# Stack two arrays to make a single np.array
LST = np.hstack([array1, array2])
print(LST.shape)
>>> (24,)
# Now you can calculate mean, percentile etc. without 0's
np.mean(LST[LST != 0])
>>> 5.5
np.percentile(LST[LST != 0], [5, 85])
>>> array([1.45, 8.65])
You can use .nonzero() method with boolean indexing
import numpy as np
lst = np.asarray(LST)
np.percentile(lst[lst.nonzero()], [5, 50, 95])
Let's say I have a 2D numpy array 'x' and I want to create a new array 'y' with only certain columns from x.
Is there an easy solution for this?
I was trying to write a function that iterates through each column of an array, and then appends every 3rd column to a new array.
def grab_features(x, starting = 0, every = 3, rowlength = 16):
import numpy as np
import pandas as pd
y = np.empty([rowlength,1])
for i in range(starting, np.size(x, 1), every):
y = np.append(y, np.reshape(x[:, i], (rowlength, 1)), axis=0)
return y
I didn't get any errors, but instead the function returned a long 1 dimensional array of float numbers. I was hoping for an array of the same type as x, just with 1/3 of the columns.
You can use the slice syntax of i:j:k where i is the starting index, j is the stopping index, and k is the step size
import numpy as np
array = np.array([[1,2],
[3,4],
[5,6],
[7,8],
[9,10],
[11,12]])
print(array[::3])
[[1 2]
[7 8]]
I have a 2D numpy array input_array and two lists of indices (x_coords and y_coords). Id like to slice a 3x3 subarray for each x,y pair centered around the x,y coordinates. The end result will be an array of 3x3 subarrays where the number of subarrays is equal to the number of coordinate pairs I have.
Preferably by avoiding for loops. Currently I use a modification of game of life strides from the scipy cookbook:
http://wiki.scipy.org/Cookbook/GameOfLifeStrides
shape = (input_array.shape[0] - 2, input_array.shape[0] - 2, 3, 3)
strides = input_array.strides + input_array.strides
strided = np.lib.stride_trics.as_strided(input_array, shape=shape, strides=strides).\
reshape(shape[0]*shape[1], shape[2], shape[3])
This creates a view of the original array as a (flattened) array of all possible 3x3 subarrays. I then convert the x,y coordinate pairs to be able to select the subarrays I want from strided:
coords = x_coords - 1 + (y_coords - 1)*shape[1]
sub_arrays = strided[coords]
Although this works perfectly fine, I do feel it is a bit cumbersome. Is there a more direct approach to do this? Also, in the future I would like to extend this to the 3D case; slicing nx3x3 subarrays from a nxmxk array. It might also be possible using strides but so far I haven't been able to make it work in 3D
Here is a method that use array broadcast:
x = np.random.randint(1, 63, 10)
y = np.random.randint(1, 63, 10)
dy, dx = [grid.astype(int) for grid in np.mgrid[-1:1:3j, -1:1:3j]]
Y = dy[None, :, :] + y[:, None, None]
X = dx[None, :, :] + x[:, None, None]
then you can use a[Y, X] to select blocks from a. Here is an example code:
img = np.zeros((64, 64))
img[Y, X] = 1
Here is graph ploted by pyplot.imshow():
A very straight forward solution would be a list comprehension and itertools.product:
import itertools
sub_arrays = [input_array[x-1:x+2, y-1:y+2]
for x, y in itertools.product(x_coords, y_coords)]
This creates all possible tuples of coordinates and then slices the 3x3 arrays from the input_array.
But this is sort-of a for loop. And you will have to take care, that x_coords and y_coords are not on the border of the matrix.