Currently I have the following problem: I get an array of image data (a large vector). I do neither know the size of the image, only the formula for the size (2^n*2^m) nor know the encoding of the image (jpeg, j2k, lossy 12bit jpeg or similar). One array I know the encoding looks like:
[-1000, -888, -884, -883, -884, -886,...-850, -852, -854, -854]
Here I can simply reshape it into the form I want to have (in this case its the square root of the length) and afterwards convert it into an image I can view with
pixel_values = numpy.asarray(pixel_values).reshape(512, 512)
pl2 = pylab.imshow(pixel_values, cmap=pylab.cm.bone
But now I have another array:
[65534, 57344, 4, 0, 0, 0, 65534, 57344, 7652, 1, 20479, 20991, 10496, 0,...35286, 23076, 34407, 36383, 56252, 65370, 217]
Here I can not use the square root or something similar (I know that the images are always like (2^n*2^m), and I don't know how I can transform this data into a real image I can view. How can I find out the encoding of this data and the size in python?
To determine the size of the image, I don't think there is a better approach that simply by trial and error. First we determine the image sizes compatible with the expression (2^n, 2^m),
import numpy as np
vect_len = len(pixel_values)
min_size = 256 # e.g. minimal size acceptable for one of the dimensions
npm = np.log2(vect_len) # this is n+m
if not npm % 1:
# n+m is an integer
for n in range(1, npm):
p = npm - n
if 2**n < min_size or 2**p < min_size:
continue
print(n,p)
# (256, 2048)
# (512, 1024)
# (1024, 512)
# (2048, 256)
Then, for each of the possible image sizes, we reshape the pixel_values array and plot the result until it looks right. If it is a colour image, there would also be a third dimension of size 3 for RGB channels.
If you can visualise you image simply by reshaping the input vector, that means that it contains directly the values for each pixel and we don't care about the encoding of the image (it was already decoded). Indeed, say jpeg would store the discrete cosine transform (DCT) coefficients in the .jpeg file, j2k stores the wavelet transform coefficients, etc. This is not something you want to get into, and the approach is to use the appropriate library for each format.
Related
I have (for the most part) gigapixel images that I have divided into 512x512 patches. Then I feed each 512x512 2D image with 3 channel into a ResNet18 frozen network for feature extraction and I end up with a 1D 512 tensor. Eventually, I concatenate all these 512x512 1D 512 tensors and I end up with Nx512 intermediate representation dimension where N is the number of patches in the gigapixel image.
Since my original gigapixel images are not all the same size and they range from 17x512 to 6000x512, I am using the following as a strategy in order to feed them to my model. However, my preference is to use a more standardize method as in PyTorch (in case of 2D images with 3 channel perhaps we could easily do torch transform -- not here).
feature_path = 'features.pt'
features = torch.load(feature_path, map_location=lambda storage, loc: storage)
if features.shape[0] <= median_num_patches:
a = torch.zeros((median_num_patches - features.shape[0], 512)) #zero padding to lenght median_num_patches
embeddings = torch.cat((features, a), axis=0)
sample['image'] = embeddings
else:
random_indices = torch.randint(features.shape[0], (median_num_patches, )) # max size: 6000 patches in an image
sample['image'] = features[random_indices, :]
^ As mentioned earlier, the 2D intermediate representation (Nx512) is created in an offline process and saved in features.pt files.
The above solution, after finding what the median of size of 2D intermediate representations are based on number of patches in each gigapixel image, first checks to see if the size of current 2D intermediate representation in the batch is smaller that the median, and if so, it zero-fills that 2D intermediate representation to the size of median. And if the size of 2D intermediate representation in the batch is larger than median, it does sample median number of patches from that 2D representation.
I am looking for a better solution than the current one. Perhaps something without sampling or zero-filling and without loss of data. Thanks for any possible lead.
The specific problem I try to solve is:
I have a binary image binary map that I want to generate a heatmap (density map) for, my idea is to get the 2D array of this image, let say it is 12x12
a = np.random.randint(20, size=(12, 12));
index and process it with a fixed-size submatrix (let say 3x3), so for every submatrix, a pixel percentage value will be calculated (nonzero pixels/total pixel).
submatrix = a[0:3, 0:3]
pixel_density = np.count_nonzero(submatrix) / submatrix.size
At the end, all the percentage values will made up a new 2D array (a smaller, 4x4 density array) that represent the density estimation of the original image. Lower resolution is fine because the data it will be compared to has a lower resolution as well.
I am not sure how to do that through numpy, especially for the indexing part. Also if there is a better way for generating heatmap like that, please let me know as well.
Thank you!
Maybe a 2-D convolution? Basically this will sweep through the a matrix with the b matrix, which is just 1s below. So it will do the summation you were looking for. This link has a visual demo of convolution near the bottom.
import numpy as np
from scipy import signal
a = np.random.randint(2, size=(12, 12))
b = np.ones((4,4))
signal.convolve2d(a,b, 'valid') / b.sum()
I have a 4-D numpy array, with the first dimension representing the number of images in a data set, the second and third being the (equal) width and height, and the 4th being the number of channels (3). For example let's say I have 4 color images that are 28*28, so my image data looks like this:
X = np.reshape(np.arange(4*28*28*3), (4,28,28,3))
I would like to select a random 16*16 width x height crop of each of the 4 images. Critically, I want the crop to be different per-image, i.e I want to generate 4 random (x_offset, y_offset) pairs. In the end I want access to an array of shape (4, 16, 16, 3).
If I were to write this in a for loop it would look something like this:
x = np.random.randint(0,12,4)
y = np.random.randint(0,12,4)
for i in range(X.shape[0]):
cropped_image = X[i, x[i]:x[i]+16, y[i]:y[i]+16, :]
#Add cropped image to a list or something
But I'd like to do it as efficiently as possible and I'm wondering if there's a way to do it with strides and fancy indexing. I've seen the answers to this question, but can't quite wrap my head around how I might combine something like stride_tricks with random starting points for the strides on the second and third (width and height) axes.
Leverage strided-based method for efficient patch extraction
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows that would be merely views into the input array and hence incur no extra memory overhead and virtually free! We can surely use np.lib.stride_tricks.as_strided directly, but the setup work required is hard to manage especially on arrays with higher dimensions. If scikit-image is not available, we can directly use the source code that works standalone.
Explanation on usage of view_as_windows
The idea with view_as_windows is that we feed in the input arg window_shape as a tuple of length same as the number of dimensions in the input array whose sliding windows are needed. The axes along which we need to slide are fed with the respective window lengths and rest are fed with 1s. This would create an array of views with singleton dims/axes i.e. axes with lengths=1 corresponding to the 1s in window_shape arg. So, for those cases we might want to index into the zeroth element corresponding to the axes that are fed 1 as the sliding window lengths to have a squeezed version of the sliding windows.
Thus, we would have a solution, like so -
# Get sliding windows
from skimage.util.shape import view_as_windows
w = view_as_windows(X, (1,16,16,1))[...,0,:,:,0]
# Index and get our specific windows
out = w[np.arange(X.shape[0]),x,y]
# If you need those in the same format as in the posted loopy code
out = out.transpose(0,2,3,1)
I have an image I've read from file with shape (m,n,3) (i.e. it has 3 channels). I also have a matrix to convert the color space with dimensions (3,3). I've already arrived at a few different ways of applying this matrix to each vector in the image; for example,
np.einsum('ij,...j',transform,image)
appears to make for the same results as the following (far slower) implementation.
def convert(im: np.array, transform: np.array) -> np.array:
""" Convert an image array to another colorspace """
dimensions = len(im.shape)
axes = im.shape[:dimensions-1]
# Create a new array (respecting mutability)
new_ = np.empty(im.shape)
for coordinate in np.ndindex(axes):
pixel = im[coordinate]
pixel_prime = transform # pixel
new_[coordinate] = pixel_prime
return new_
However, I found that the following is even more efficient while testing on the example image with line_profiler.
np.moveaxis(np.tensordot(transform, X, axes=((-1),(-1))), 0, 2)
The problem I'm having here is using just a np.tensordot, i.e. removing the need for np.moveaxis. I've spent a few hours attempting to find a solution (I'm guessing it resides in choosing the correct axes), so I thought I'd ask others for help.
You can do it concisely with tensordot if you make image the first argument:
np.tensordot(image, transform, axes=(-1, 1))
You can get better performance from einsum by using the argument optimize=True (requires numpy 1.12 or later):
np.einsum('ij,...j', transform, image, optimize=True)
Or (as Paul Panzer pointed out in a comment), you can simply use matrix multiplication:
image # transform.T
They all take about the same time on my computer.
I'm saving a binary OpenCV Mat to a HDF5 file.
In OpenCV Mat files are stored in memory with first index channel, second index is x-Coordinate and third index is y-Coordinate, so an address access looks like:
address = M.data + M.step[0]*y + M.step[1]*x + ch
Where M.step[0] = NUM_X*NUM_CH and M.step[1] = MAX_CH
The problem I experience is, that Matlab and Python interpret the data in a wrong way.
Though the dimensions of the read data are set correctly (channel, x, y), when I look into the data storage I see, that e.g. numpy reads the data backwards, meaning first y is incremented, then x and lastly the channel number, which means, that it assumes planar configuration of the channel data, while it is actually interleaved. This results in wrong displaying of images.
Is there a way to tell numpy/Matlab to change the data access, without reordering the data?
Thanks in advance.
Edit:
I store everything in a rank 3 dataset in the hdf5 file, where dimension 1 is channel, dimension 2 is x-coordinate and dimension 3 is y-coordinate.
If I read that dataset and process it with OpenCV in C++ the correct image is being displayed. OpenCV in python doesn't work because of error: (-206) Unrecognized or unsupported array type in function cvGetMat
I could solve the question in python by changing the shape and stride of the array, which had been calculated in a wrong way:
If i had a 3*1280*720 uint8 Image with 3 being channel number, 1280 being x-coordinate and 720 being y-coordinate, I would have to assign the shape, so it would look like data.shape = (720, 1280, 3) and the stride would have to be changed to data.strides = (3*1280,3,1).
This link explains how numpy arrays work in memory:
Numpy doc