I have a batch of image tensors with BxCxHxW and a shift tensor with Bx2 format, where B is the batch size.
As I know, torch.roll(input, shifts, dims=None) expect shifts parameters as int or tuples.
So if I want to use that function to roll all of my batch images, here is my current implementation:
for i in range(batch_size):
images[i] = images[i].roll(shifts[i].tolist(), (1, 2))
I know this is inefficient, so I am wondering if there is any alternative for using the roll function with the shifts parameter as a batch. So, I can implement something like:
images = images.roll(shifts, (2, 3)).
Thank you.
Related
I have (for the most part) gigapixel images that I have divided into 512x512 patches. Then I feed each 512x512 2D image with 3 channel into a ResNet18 frozen network for feature extraction and I end up with a 1D 512 tensor. Eventually, I concatenate all these 512x512 1D 512 tensors and I end up with Nx512 intermediate representation dimension where N is the number of patches in the gigapixel image.
Since my original gigapixel images are not all the same size and they range from 17x512 to 6000x512, I am using the following as a strategy in order to feed them to my model. However, my preference is to use a more standardize method as in PyTorch (in case of 2D images with 3 channel perhaps we could easily do torch transform -- not here).
feature_path = 'features.pt'
features = torch.load(feature_path, map_location=lambda storage, loc: storage)
if features.shape[0] <= median_num_patches:
a = torch.zeros((median_num_patches - features.shape[0], 512)) #zero padding to lenght median_num_patches
embeddings = torch.cat((features, a), axis=0)
sample['image'] = embeddings
else:
random_indices = torch.randint(features.shape[0], (median_num_patches, )) # max size: 6000 patches in an image
sample['image'] = features[random_indices, :]
^ As mentioned earlier, the 2D intermediate representation (Nx512) is created in an offline process and saved in features.pt files.
The above solution, after finding what the median of size of 2D intermediate representations are based on number of patches in each gigapixel image, first checks to see if the size of current 2D intermediate representation in the batch is smaller that the median, and if so, it zero-fills that 2D intermediate representation to the size of median. And if the size of 2D intermediate representation in the batch is larger than median, it does sample median number of patches from that 2D representation.
I am looking for a better solution than the current one. Perhaps something without sampling or zero-filling and without loss of data. Thanks for any possible lead.
I have a pytorch tensor T with shape (batch_size, window_size, filters, 3, 3) and I would like to pool the tensor by trace. Specifically, I would like to obtain a tensor T_pooled of size (batch_size, window_size//2, filters, 3, 3) by comparing the trace of paired frames. For example, if window_size=4, then we would compare the trace of T[i,0,k,3,3] and T[i,1,k,3,3] and select the subtensor with the smaller trace to be T_pooled[i,0,k,3,3]. Similarly, compare T[i,2,k,3,3] and T[i,3,k,3,3] to obtain T_pooled[i,1,k,3,3].
This can be done by looping over i and k, but that is very slow and inefficient. Is there a way to vectorize this pooling operation to speed it up?
Edit:
Here is what I have tried so far. It uses list comprehension and for loops. It takes approximately 2.5s to run on a tensor of size (128,120,22,3,3).
def TPL_Pairwise(x):
x_pooled=torch.zeros(x.shape[0],x.shape[1]//2,x.shape[2],x.shape[3], x.shape[4])
#compute tensorized trace
trace=torch.einsum('ijkll->ijkl', x).sum(-1)
for i in range(x.shape[0]):
for j in range(x.shape[2]):
keep=[ x[i,k,j] if trace[i,k,j] <= trace[i,k+1,j] else x[i,k+1,j] for k in range(0,x.shape[1],2)]
x_pooled[i,:,j]=torch.stack(keep)
return x_pooled
I have an image dataset that I retrieved by means of tf.data.Dataset.list_files().
In my .map() function, I read and decode images, like below:
def map_function(filepath):
image = tf.io.read_file(filename=filepath)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [IMAGE_WIDTH, IMAGE_HEIGHT])
return image
If I use(this below works)
dataset = tf.data.Dataset.list_files(file_pattern=...)
dataset = dataset.map(map_function)
for image in dataset.as_numpy_iterator():
#Correctly outputs the numpy array, no error is displayed/encountered
print(image)
However, if I use(this below throws error):
dataset = tf.data.Dataset.list_files(file_pattern=...)
dataset = dataset.batch(32).map(map_function)
for image in dataset.as_numpy_iterator():
#Error is displayed
print(image)
ValueError: Shape must be rank 0 but is rank 1 for 'ReadFile' (op:
'ReadFile') with input shapes: [?].
Now, according to this: https://www.tensorflow.org/guide/data_performance#vectorizing_mapping, the code should not fail and the preprocessing step should be optimized(batch processing vs one-time-processing).
Where is the mistake in my code?
*** If I use map().batch() it works fine
The error occurs because map_function expects unbatched elements, but in the second example you give it batched elements.
The example in https://www.tensorflow.org/guide/data_performance is being tricky by defining an increment function which can apply to both batched and unbatched elements, since adding 1 to a batched element like [1, 2, 3] will result in [2, 3, 4].
def increment(x):
return x+1
To use vectorization, you would need to write a vectorized_map_function, which takes in a vector of unbatched elements, applies the map function to each element in the vector, then returns a vector of the results.
In your case though, I don't think vectorizing will have a noticeable impact, since the cost of reading and decoding files is much higher than the overhead of calling a function. Vectorization is most impactful when the map function is extremely cheap, to the point where the time spent on function invocation is comparable to the time spent doing actual work in the map function.
In the following code what does torch.cat really do. I know it concatenates the batch which is contained in the sample but why do we have to do that and what does concatenate really mean.
# memory is just a list of events
def sample(self, batch_size):
samples = zip(*random.sample(self.memory, batch_size))
return map(lambda x: Variable(torch.cat(x,0)))
torch.cat concatenates as the name suggests along specified dimension.
Example from documentation will tell you everything you need to know:
x = torch.randn(2, 3) # shape (2, 3)
catted = torch.cat((x, x, x), dim=0) # shape (6, 3), e.g. 3 x stacked on each other
Remember concatenated tensors need to have the same dimension except the one along which you are concatenating.
In the above example it doesn't do anything though and isn't even viable as it lacks second argument (inputs to apply map to), see here.
Assuming you would do this mapping instead:
map(lambda x: Variable(torch.cat(x,0)), samples)
It would create a new tensor of shape [len(samples), x_dim_1, x_dim_2, ...] provided all samples have the same dimensionality except 0.
Still it is pretty convoluted example and definitely shouldn't be done like that (torch.autograd.Variable is deprecated, see here), this should be enough:
# assuming random.sample returns either `list` or `tuple`
def sample(self, batch_size):
return torch.cat(random.sample(self.memory, batch_size), dim=0)
I have a 4-D numpy array, with the first dimension representing the number of images in a data set, the second and third being the (equal) width and height, and the 4th being the number of channels (3). For example let's say I have 4 color images that are 28*28, so my image data looks like this:
X = np.reshape(np.arange(4*28*28*3), (4,28,28,3))
I would like to select a random 16*16 width x height crop of each of the 4 images. Critically, I want the crop to be different per-image, i.e I want to generate 4 random (x_offset, y_offset) pairs. In the end I want access to an array of shape (4, 16, 16, 3).
If I were to write this in a for loop it would look something like this:
x = np.random.randint(0,12,4)
y = np.random.randint(0,12,4)
for i in range(X.shape[0]):
cropped_image = X[i, x[i]:x[i]+16, y[i]:y[i]+16, :]
#Add cropped image to a list or something
But I'd like to do it as efficiently as possible and I'm wondering if there's a way to do it with strides and fancy indexing. I've seen the answers to this question, but can't quite wrap my head around how I might combine something like stride_tricks with random starting points for the strides on the second and third (width and height) axes.
Leverage strided-based method for efficient patch extraction
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows that would be merely views into the input array and hence incur no extra memory overhead and virtually free! We can surely use np.lib.stride_tricks.as_strided directly, but the setup work required is hard to manage especially on arrays with higher dimensions. If scikit-image is not available, we can directly use the source code that works standalone.
Explanation on usage of view_as_windows
The idea with view_as_windows is that we feed in the input arg window_shape as a tuple of length same as the number of dimensions in the input array whose sliding windows are needed. The axes along which we need to slide are fed with the respective window lengths and rest are fed with 1s. This would create an array of views with singleton dims/axes i.e. axes with lengths=1 corresponding to the 1s in window_shape arg. So, for those cases we might want to index into the zeroth element corresponding to the axes that are fed 1 as the sliding window lengths to have a squeezed version of the sliding windows.
Thus, we would have a solution, like so -
# Get sliding windows
from skimage.util.shape import view_as_windows
w = view_as_windows(X, (1,16,16,1))[...,0,:,:,0]
# Index and get our specific windows
out = w[np.arange(X.shape[0]),x,y]
# If you need those in the same format as in the posted loopy code
out = out.transpose(0,2,3,1)