Keras ImageDataGenerator from directory - python

I want to train a network from data that I have in a Dataframe. I have image paths and classes.
There are many image classes so splitting them into directories is not an option.
I only found beside the flow_from_directory function, the flow function which takes into an argument a numpy array and classes. However, the data will not even fit in RAM.
Is there any solution using Keras ?
Edit 1:
Issue referenced in Github
https://github.com/keras-team/keras/issues/3295
https://github.com/keras-team/keras/issues/3338

Related

`flow_from_dataframe()` custom pre-processing

I'm trying to use the keras ImageDataGenerator.flow_from_dataframe() method to generate image data on the fly as the dataset I'm working on is too large to load into memory in one go.
The source image data files are DICOM files, not supported by the flow_from_dataframe() method.
Is it possible to (easily) extend flow_from_dataframe() to handle DICOM (or other unsupported) images/input?
Perhaps a custom pre-processing function could be run on each unsupported file, returning a normalised (windowed/photometric corrected) numpy array, then allowing the ImageDataGenerator instance to proceed.
I could edit the source on my own installation but a general solution that can be used on vanilla keras is preferred, to ensure portability to other platforms (especially Kaggle)!!
A solution to this can be found on the Keras github issue tracker/feature requests: https://github.com/keras-team/keras/issues/13665
A custom data generator can be created based on keras.utils.Sequence as a superclass.

How to create additional training images with Keras preprocessing layers?

I am following the official Tensorflow/Keras docs on image classification, in particular the section on image augmentation. There it says:
Data augmentation takes the approach of generating additional training data from your existing examples by augmenting then using random transformations that yield believable-looking images. This helps expose the model to more aspects of the data and generalize better.
So my understanding of this is that - for example if I have not many training images - I want to generate additional training data by creating new, augmented images in addition to the existing training images.
Then in the Keras docs linked above it is shown how some preprocessing layers from the layers.experimental.preprocessing module are being added as first layers to the Sequential model of the example. So in theory that makes sense, those new preprocessing layers augment the input data (=images) before the "enter" the real TF model.
However, as quoted above and what I thought we want to do is to create additional images, i.e. create new, more images for the existing training images. But how would such a set of preprocessing layers in the model create additional images? Wouldn't they simple (randomly) augment the existing training images before the enter the model, but not create new, additional images?
It is creating additional images, but that doesn't necessarily mean that it will create new jpg files.
If this is what you're trying to do, ImageDataGenerator can do that, with the save_to_dir argument.
Wouldn't they simple (randomly) augment the existing training images before the enter the model, but not create new, additional images?
Yes, it creates new images. But it doesn't create new files on your machine. You can use this:
ImageDataGenerator.flow_from_directory(directory, target_size=(256, 256), save_to_dir=None, save_prefix='', save_format='png'
)

How to use flow_from_directory function when we want implement cross validation

I have a small image dataset which I employed ImageDataGenerator class in Keras to augment my dataset. I put my dataset in a folder so I take advantage of flow_from_directory function to load and use the images.
Now I have to implement k-fold on my code. and I don't know how to manage my dataset since the name of each image is its label.
Does anybody have any idea to handle this situation?

Preparing Video data for classification Keras

I am unable to decide on feeding video data to the keras model. I'd like to use a DataGenerator for this case like ImageDataGenerator. From this answer I gather, ImageDataGenerator would not be suitable for this.
I have looked at this github repo for a VideoGenerator in keras which uses .npy files in directories. But the downside is, data augmentation is absent at the moment. How do I go about accomplishing this?
Is there no way I can use ImageDataGenerator?
Supposedly, I split all videos into frames and then load directories with .jpg files instead, how would that fare?
If I write a custom data generator using this data generator tutorial, how do I arrange this partition dict? My data consists of .avi files.
You might find this example helpful.
First you create your tensor of timestep data and then you reshape it for the LSTM network.
Assuming that your dataset consists of sorted frames in that order:
data/
class0/
img001.jpg
img002.jpg
...
class1/
img001.jpg
img002.jpg
...

How to import data into Tensorflow?

I am new to Tensorflow and to implementing deep learning. I have a dataset of images (images of the same object).
I want to train a Neural Network model using python and Tensorflow for object detection.
I am trying to import the data to Tensorflow but I am not sure what is the right way to do it.
Most of the tutorials available online are using public datasets (i.e. MNIST), which importing is straightforward but not helpful in the case where I need to use my own data.
Is there a procedure or tutorial that i can follow?
There are many ways to import images for training, you can use Tensorflow but these will be imported as Tensorflow objects, which you won't be able to visualize until you run the session.
My favorite tool to import images is skimage.io.imread. The imported images will have the dimension (width, height, channels)
Or you can use importing tool from scipy.misc.
To resize images, you can use skimage.transform.resize.
Before training, you will need to normalize all the images to have the values between 0 and 1. To do that, you simply divide the images by 255.
The next step is to one hot encode your labels to be an array of 0s and 1s.
Then you can build and train your CNN.
You could create a data directory containing one subdirectory per image class containing the respective image files and use flow_from_directory of tf.keras.preprocessing.image.ImageDataGenerator.
A tutorial on how to use this can be found in the Keras Blog.

Categories