How to extract individual JPEG images from a HDF5 file - python

I have a big HDF5 file with the images and its corresponding ground truth density map.
I want to put them into the network CRSNet and it requires the images in separate files.
How can I achieve that? Thank you very much.
-- Basic info I have a HDF5 file with two keys "images" and "density_maps". Their shapes are (300, 380, 676, 1).
300 stands for the number of images, 380 and 676 refer to the height and width respectively.
-- What I need to put into the CRSNet network are the images (jpg) with their corresponding HDF5 files. The shape of them would be (572, 945).
Thanks a lot for any comment and discussion!

For starters, a quick clarification on h5py and HDF5. h5py is a Python package to read HDF5 files. You can also read HDF5 files with the PyTables package (and with other languages: C, C++, FORTRAN).
I'm not entirely sure what you mean by "the images (jpg) with their corresponding h5py (HDF5) files" As I understand all of your data is in 1 HDF5 file. Also, I don't understand what you mean by: "The shape of them would be (572, 945)." This is different from the image data, right? Please update your post to clarify these items.
It's relatively easy to extract data from a dataset. This is how you can get the "images" as NumPy arrays and and use cv2 to write as individual jpg files. See code below:
with h5py.File('yourfile.h5','r') as h5f:
for i in range(h5f['images'].shape[0]):
img_arr = h5f['images'][i,:] # slice notation gets [i,:,:,:]
cv2.imwrite(f'test_img_{i:03}.jpg',img_arr)
Before you start coding, are you sure you need the images as individual image files, or individual image data (usually NumPy arrays)? I ask because the first step in most CNN processes is reading the images and converting them to arrays for downstream processing. You already have the arrays in the HDF5 file. All you may need to do is read each array and save to the appropriate data structure for CRSNet to process them. For example, here is the code to create a list of arrays (used by TensorFlow and Keras):
image_list = []
with h5py.File('yourfile.h5','r') as h5f:
for i in range(h5f['images'].shape[0]):
image_list.append( h5f['images'][i,:] ) # gets slice [i,:,:,:]

Related

Loading WSI slide images to python and converting images to array

I have been new to WSI image processing and have been trying load images to python. I was successfull in loading slide images but it is not getting converted to an array for further processing. any help is appreciated. Following is the code i used
filenames = os.listdir("C:/Users/DELL/Downloads/dataset1")
X_train= np.array(filenames)
following is the output i get instead of an array of numbers representing an image
'TCGA-BA-4074-01A-01-BS1.9c51e4d0-cb30-412a-995a-97ac4f860a87.svs'
You should use specialized libraries for the reading of WSI image. Check these:
slideio
openslide
Keep in mind that normally the WSI slides are too large to load them
in the memory in the original resolution. You can load them partially or on the low resolutions. Both of the libraries support such a functionality.

Easy way to horizontally flip images in dataset with json labels?

I'm using Tensorflow 2.0 with Python to train an image classifier. I'm using the file model_main_tf2.py to train the model, and have a dataset of images for training and testing. The images were annotated using the LabelMe tool in Python, which allows me to create polygon masks for a Mask RCNN.
What I would like to do is generate duplicates of all the training and test images, by flipping them horizontally. I can already do this easily in python, but I want to flip the JSON files that LabelMe generates, to save me from re-annotating the new flipped images. Is there a tool that allows me to do this?
Thanks
Since this question is under the Python tag - I assume you want this to be done in Python. Flipping can be done in numpy, PIL, or opencv (your choice).
image = # some image translated to a numpy array
print(type(image))
>> numpy.ndarray
# np.fliplr will also do the trick
flipped_image_h = np.flip(image, axis=1) # flip horizontally
# np.flipud will also do the trick
flipped_image_v = np.flip(image, axis=0) # flip vertically
# Save flipped_image
See the numpy docs for more info
I think there isn't an explicite way to do it. you just need to write a code that opens your JSon files and make the changes yourself

Save Image dataset into CSV

I have an Image dataset consisting of 90k images of size [64,64,3].
I have done some preprocessing to the images, which takes a lot of time if I have to do it from scratch.
Now, how do I store these images/ images as a numpy array for shape[90000,64,64,3] into a csv file, as integers, along with their labels?
Is there any other way (other file type) to store this data?
P.S: I tried np.savetxt but, when I read back the data, I get strings with dots and a lot of the values are lost.
Thank you.
Found it!!
We can use
np.save()
to save the array in a .npy format and load the file using
np.load()
Also, multiple numpy arrays can be saved using
np.savez()
and
np.savez_compressed()
to save them in .npz and a compressed .npz format.
COOL

Transferring training data from matlab to tensorflow

I have used MATLAB's Image Labeller App to create PixelLabelData for 500 images. So, I have got the original images and class labels for each image. This information is stored in a gTruth file which is in .mat format. I want to use this dataset for training a Unet in tensorflow (Google Colab).
I could not achieve the training task on MATLAb because of system limitations (insufficient RAM and no GPU).However, I have read that we can import training data from MATLAB for use in Colab. So, I uploaded original image set, the labelled pixels, and corresponding mat file (gTruth.mat) on Google Drive and then mounted the drive onto the Colab environment. But I don't know how to proceed forward with the mat file in Colab.
The pixelLabelTrainingData function will allow you to obtain two separate datastores for the input and pixel labeled images.
[imds,pxds] = pixelLabelTrainingData(gTruth);
https://www.mathworks.com/help/vision/ref/pixellabeltrainingdata.html
Given those, you can write each image, labeled image to parallel directories using the same naming convention using imwrite with the image file format of your choice.
inputImageDir = 'pathOfYourChoice';
count = 0;
while hasdata(imds)
img = read(imds);
fname = sprintf('img%d.png',count);
name = fullfile(inputImageDir,fname);
imwrite(img,name);
end
From there, you should be able to use standard tensorflow tooling (e.g. Dataset) to read in directories of images.

Custom file structure to save multiple images in python

I am experimenting with packaging of data, and since most of my data is stored as image/graphs and other similar data; I was planning to find a more efficient way to store these images.
I did read about saving them in a DB as blob; and some others are more inclined to save them in the file system; but what I would like is to have the images to not be visible outside the application. This is essential because when I run analysis on instruments; I am not interested in showing users all the images, but only the ones related to their particular instrument.
Plus it is convenient to pack data in one single file, compared to a folder with 20-30 images in it.
I was thinking to store the images in a custom structure, a sort of a bin file, using python; unless there is something that already cover that functionality. In my search I didn't notice any specific struct to save images, while the most common solutions were either a folder in the file system or the DB approach.
If you can convert your images to raster arrays, you can store them in an HDF5 file: Add raster image to HDF5 file using h5py

Categories