I have a list of matrices, some of them have bigger height (.shape[0]) than the others I want to make them all in equal height. So I want to find the height of the biggest matrix and add pad the rest of matrices with the difference so that the content of amtrix stays in the middle. (if difference is not even then add to the bottom one row more than the top. This i my code so far:
def equalize_heights(matrices,maxHeight):
newMatrices = []
matricesNum = len(matrices)
for i in xrange(matricesNum):
matrixHeight = matrices[i].shape[0]
if (matrixHeight == maxHeight):
newMatrices.append(matrices[i])
else:
addToTop = (maxHeight-matrixHeight)/2
addToBottom = (maxHeight-matrixHeight)/2 +((maxHeight-matrixHeight)%2)
now matrixes that are not as high as biggest one should have 'addToTop' rows added to the top of martrix (rows filled iwth 0's) and 'addToBottom' rows added to the bottom.
i think im supposed to use numpy.pad function but I don't understand how exactly.
Keep in mind that np.pad pads in every dimension, not just the height. Consider using np.concatenate instead. Also note that you don't need to pass the maximum height - your function can just calculate that itself.
E.g.:
import numpy as np
matrices = [np.array([[1,2], [1,2]]), np.array([[1,2], [1,2], [1,2]])]
def equalize_heights(matrices):
max_height = matrices[0].shape[0]
for matrix in matrices[1:]:
max_height = max(max_height, matrix.shape[0])
for idx, matrix in enumerate(matrices):
matrices[idx] = np.concatenate((
matrix,
np.zeros((max_height - matrix.shape[0], matrix.shape[1]))
))
Note that this will not center your matrices the way you wanted, but that shouldn't be too difficult. (Put three arrays in the tuple to concatenate, rather than two.
Related
I have a 4-D numpy array, with the first dimension representing the number of images in a data set, the second and third being the (equal) width and height, and the 4th being the number of channels (3). For example let's say I have 4 color images that are 28*28, so my image data looks like this:
X = np.reshape(np.arange(4*28*28*3), (4,28,28,3))
I would like to select a random 16*16 width x height crop of each of the 4 images. Critically, I want the crop to be different per-image, i.e I want to generate 4 random (x_offset, y_offset) pairs. In the end I want access to an array of shape (4, 16, 16, 3).
If I were to write this in a for loop it would look something like this:
x = np.random.randint(0,12,4)
y = np.random.randint(0,12,4)
for i in range(X.shape[0]):
cropped_image = X[i, x[i]:x[i]+16, y[i]:y[i]+16, :]
#Add cropped image to a list or something
But I'd like to do it as efficiently as possible and I'm wondering if there's a way to do it with strides and fancy indexing. I've seen the answers to this question, but can't quite wrap my head around how I might combine something like stride_tricks with random starting points for the strides on the second and third (width and height) axes.
Leverage strided-based method for efficient patch extraction
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows that would be merely views into the input array and hence incur no extra memory overhead and virtually free! We can surely use np.lib.stride_tricks.as_strided directly, but the setup work required is hard to manage especially on arrays with higher dimensions. If scikit-image is not available, we can directly use the source code that works standalone.
Explanation on usage of view_as_windows
The idea with view_as_windows is that we feed in the input arg window_shape as a tuple of length same as the number of dimensions in the input array whose sliding windows are needed. The axes along which we need to slide are fed with the respective window lengths and rest are fed with 1s. This would create an array of views with singleton dims/axes i.e. axes with lengths=1 corresponding to the 1s in window_shape arg. So, for those cases we might want to index into the zeroth element corresponding to the axes that are fed 1 as the sliding window lengths to have a squeezed version of the sliding windows.
Thus, we would have a solution, like so -
# Get sliding windows
from skimage.util.shape import view_as_windows
w = view_as_windows(X, (1,16,16,1))[...,0,:,:,0]
# Index and get our specific windows
out = w[np.arange(X.shape[0]),x,y]
# If you need those in the same format as in the posted loopy code
out = out.transpose(0,2,3,1)
I have multiple 2D arrays saved in a list called image_concat. This list will be composed of over a hundred of these arrays, but for now I'm just trying to make my code run for a list with only two of them. These arrays all have different shapes, and I would like to find the largest x-dimension and largest y-dimension out of all the arrays, and then pad all the other ones with enough zeros around the edges so that in the end, they all have the same shape. Note that the largest x-dimension and largest y-dimension might belong to separate arrays, or they might belong to the same one. What I have tried writing so far is not successfully changing the shape of the smaller array for some reason. But I also think that some issues will arise even after changing the shapes, since some arrays might be off by one in the end due to elements in the shape being even or odd.
import astropy
import numpy as np
import math
import matplotlib.pyplot as plt
from astropy.utils.data import download_file
from astropy.io import fits
images = ['http://irsa.ipac.caltech.edu/ibe/data/wise/allsky/4band_p1bm_frm/9a/02729a/148/02729a148-w2-int-1b.fits?center=89.353536,37.643864deg&size=0.6deg', 'http://irsa.ipac.caltech.edu/ibe/data/wise/allsky/4band_p1bm_frm/2a/03652a/123/03652a123-w4-int-1b.fits?center=294.772333,-19.747157deg&size=0.6deg']
image_list = []
for url in images:
image_list.append(download_file(url, cache=True))
image_concat = [fits.getdata(image) for image in image_list]
# See shapes in the beginning
print(np.shape(image_concat[0]))
print(np.shape(image_concat[1]))
def pad(image_concat):
# Identify largest x and y dimensions
xdims, ydims = np.zeros(len(image_concat)), np.zeros(len(image_concat))
for i in range(len(xdims)):
x, y = np.shape(image_concat[i])
xdims[i] = x
ydims[i] = y
x_max = int(np.max(xdims))
y_max = int(np.max(ydims))
# Pad all arrays except the largest dimensions
for A in image_concat:
x_len, y_len = np.shape(A)
print(math.ceil((y_max-y_len)/2))
print(math.ceil((x_max-x_len)/2))
np.pad(A, ((math.ceil((y_max-y_len)/2), math.ceil((y_max-y_len)/2)), (math.ceil((x_max-x_len)/2), math.ceil((x_max-x_len)/2))), 'constant', constant_values=0)
return image_concat
image_concat = pad(image_concat)
# See shapes afterwards (they haven't changed for some reason)
print(np.shape(image_concat[0]))
print(np.shape(image_concat[1]))
I can't understand why the shape isn't changing for this case. And also, is there a way to easily generalize this so that it will work on many arrays regardless of if they have even or odd dimensions?
np.pad doesn't modify the array in-place, it returns a padded array. So you'd need to do image_concat[i] = np.pad(...), where i is the index of A.
I have a "cube" of 3D data where there is some peak in the column, or first dimension. The index of the peak may shift depending what row is examined. The third dimension may do something a bit more complicated, but for now can be thought of as just scaling things by some linear function.
I would like to find the index of the max along the first dimension, subject to the constraint that for each row, the z index is chosen such that the column peak will be closest to 0.5.
Here's a sample image that is a plane in row,column with a fixed z:
These arrays will at times be large -- say, 21x11x200 float64s, so I would like to vectorize this calculation. Written with a for loop, it looks like this:
cols, rows, zs = data.shape
for i in range(rows):
# for each field point, make an intermediate array that is 2D with focus,frequency dimensions
arr = data[:,i,:]
# compute the thru-focus max and find the peak closest to 0.5
maxs = np.max(arr, axis=0)
max_manip = np.abs(maxs-0.5)
freq_idx = np.argmin(max_manip)
# take the thru-focus slice that peaks closest to 0.5
arr2 = data[:,i,freq_idx]
focus_idx = np.argmax(arr2)
print(focus_idx)
My issue is that I do not know how to roll these calculations up into a vector operation. I would appreciate any help, thanks!
We just need to use the axis param with the relevant ufuncs there and that would lead us to a vectorized solution, like so -
# Get freq indices along all rows in one go
idx = np.abs(data.max(0)-0.5).argmin(1)
# Index into data with those and get the argmax indices
out = data[:,np.arange(data.shape[1]), idx].argmax(0)
I want to take a part of values (say 500 values) of an array and perform some operation on it such as take sum of squares of those 500 values. and then proceed with the next 500 values of the same array.
How should I implement this? Would a blackman window be useful in this case or is another approach more suitable?
It depends on several criteria:
Is the number of elements per operation an integer divisor of your array length?
Is the number of elements a significant fraction of your array length?
If 1. is True then you can reshape your array to use reduce-functions like .sum(axis=axis), which is probably the most performant way. See #P. Camilleri answer for this case.
However if 1. is False then the second question becomes important. If you answer "yes" to 2. then you can just use a for-loop over the array because the Python loop overhead is not quite as significant for loops with few iterations:
width = 500
for i in range(0, arr.size, width):
print(arr[i:i+width]) # do your operation here!
However if your answer is "No" to 1. and 2. you probably should use a convolution filter (see scipy.ndimage.filters) and then only pick the interesting elements:
width = 10
result = some_filter(arr)
# take only the elements starting by width_half and make "width" stepsize
result = result[(width-0.5)//2, :, width]
For example the sum of squares:
import numpy as np
arr = np.random.randint(0, 10, (25))
arr_squared = arr ** 2
width = 10
for i in range(0, arr_squared.size, width):
print(arr_squared[i:i+width].sum())
# 267, 329, 170
or using a convolution:
from scipy.ndimage import convolve
convolve(arr_squared, np.ones(width), mode='constant')[4::10]
# array([267, 329, 170])
Assuming your array a is 1D and its length is a multiple a 500, a simple np.sum(a.reshape((-1, 500)) ** 2, axis=1) would suffice. If you want a more complicated operation, please edit your question accordingly.
I have an array of values and would like to create a matrix from that, where each row is my starting point vector multiplied by a sample from a (normal) distribution.
The number of rows of this matrix will then vary in dependence from the number of samples I want.
%pylab
my_vec = array([1,2,3])
my_rand_vec = my_vec*randn(100)
Last command does not work, because array shapes do not match.
I could think of using a for loop, but I am trying to leverage on array operations.
Try this
my_rand_vec = my_vec[None,:]*randn(100)[:,None]
For small numbers I get for example
import numpy as np
my_vec = np.array([1,2,3])
my_rand_vec = my_vec[None,:]*np.random.randn(5)[:,None]
my_rand_vec
# array([[ 0.45422416, 0.90844831, 1.36267247],
# [-0.80639766, -1.61279531, -2.41919297],
# [ 0.34203295, 0.6840659 , 1.02609885],
# [-0.55246431, -1.10492863, -1.65739294],
# [-0.83023829, -1.66047658, -2.49071486]])
Your solution my_vec*rand(100) does not work because * corresponds to the element-wise multiplication which only works if both arrays have identical shapes.
What you have to do is adding an additional dimension using [None,:] and [:,None] such that numpy's broadcasting works.
As a side note I would recommend not to use pylab. Instead, use import as in order to include modules as pointed out here.
It is the outer product of vectors:
my_rand_vec = numpy.outer(randn(100), my_vec)
You can pass the dimensions of the array you require to numpy.random.randn:
my_rand_vec = my_vec*np.random.randn(100,3)
To multiply each vector by the same random number, you need to add an extra axis:
my_rand_vec = my_vec*np.random.randn(100)[:,np.newaxis]