In Keras, if I want to predict on my LSTM model for multiple instances, that are based on independent and new data from the training data, does the input array need to include the amount of time steps used in training? And, if so, can I expect that the shape of the input array for model.predict to be the same as the training data? (Ie [number of samples to be predicted on, their timesteps, their features])?
Thank you :)
You need to distinguish between the 'sample' or 'batch' axis and the time steps and features dimensions.
The number of samples is variable - you can train (fit) your model on thousands of samples, and make a prediction for a single sample.
The times steps and feature dimensions have to be identical for fit and predict - that is because the weights etc. have the same dimensions for the input layer.
In that, an LSTM is not that much different from a DNN.
There are cases (eg. one-to-many models) in which the application is different, but the formal design (i.e. input shape, output shape) is the same.
Related
I'm trying to build a model that use MLP for feature extraction and dimension reduction. The model could transform the data from 204 dimensions to 80 dimensions after this process. The proposed model is as follows:
A 512 dimension dense layer with the input of original data (204 dimension)
A 256 dimension dense layer with the input of 512 dimensions
A 80 dimension dense layer with the input of 256 dimensions
The proposed training epoch is 1, and the output of the MLP is regarded as the input of the further models (such as, LR, SVM, etc.)
My question is: When training the MLP, what loss function should I set? Is the MSE loss OK, or I should use other loss functions? Thanks!
What would you be training this MLP on? (what would be the target 80-dimensional "Y"?)
MLPs are used to learn features at the same time as the model. For example if you wanted to have an MLP that does linear regression and learns a set of features that are 80-dimensional you could create something like this:
model = keras.models.Sequential()
model.add(layers.Dense(80, input_dim=512, activation=MY_ACTIVATION))
model.add(layers.Dense(1))
model.compile(loss="mean_squared_error")
In the last layer, the network will learn to find the "best" weights and biases to capture Y as a function of the 80 features extracted. These features are in turn a function of X - a function the network learns by adjusting for how well these features are able to capture Y (this is backpropagation).
So creating an MLP just to learn features doesn't make sense without a problem statement for what these features are supposed to do.
As such I would recommend using something like Principal Component Analysis or Singular Value Decomposition. These project the data onto the k-dimensional space that captures the most variance (information) in the data.
I have individually trained the same neural network architecture on a large number of different datasets (order of 100s) to learn a unique non-linear function for each i.e have basically learned a set of weights that describes the function for each dataset.
Now, I want to use these sets of weights as a pre-trained layer in another optimization problem. I know how to load in a single saved model and employ that as a layer. However, what I will be doing is a group-wise optimization across the 100s of different datasets, where I have a pre-trained weights for each (from above).
So the setup is a batch of x datasets, each with n data points in d dimensions i.e. input data is of the shape [X, N, D]. There are a series of layers which act on all this data, then when it gets to the "pre-trained" layer, I wish to use different pre-trained weights i.e. For [0,:,:] uses the weights learned from dataset 0 from above, [1,:,:] with weights learned from dataset 1 etc etc etc.
I then need to combine the output of all this together, as the loss function for this groupwise optimization is based on the variance across all datasets. So I don't believe I can trivially evaulate one set, calculate loss, change weights, rinse and repeat and sum up at the end.
I doubt it is feasible to have some massive duplicate branches going, where I have x copies of the pre-trained NN layers as the pre-trained NN architecture is already quite complex.
Is it is possible to use a split layer, then a for loop type approach, in which I change the weights, then pass the correct portion of data through? Then merge all the outputs? Or is there a better way of tackling this?
Any help much appreciated.
say I have a set of image data for training, 20 input images and 20 output images, with image size 512*512. Firstly I prepare training data as "train_image_input"(size 20*512*512) and "train_image_output"(size 20*512*512), then I run below code in Keras,
model.fit(train_image_input, train_image_output,epochs=3,batch_size=5)
I would like to confirm the definition of a "batch" when data are images, on the above example, does "batch_size=5" means
5 images(data size 5*512*512) are taken into training at a time ?
5 column among a single image(data size 5*512) are taken into training at a time ?
I had read the article : https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
and the below description confuses me about the definition of sample/batch when data are images
What Is a Sample?
A sample is a single row of data.
It contains inputs that are fed into the algorithm and an output that is used to compare to the prediction and calculate an error.
A training dataset is comprised of many rows of data, e.g. many samples. A sample may also be called an instance, an observation, an input vector, or a feature vector.
Now that we know what a sample is, let’s define a batch.
What Is a Batch?
The batch size is a hyperparameter that defines the number of samples to work through before updating the internal model parameters.
Further more, if I set "batch_size=30" which is larger of number of images, there is no error during code execution, so I may consider the second one(data size 5*512) is correct ?
Thanks.
The batch size defines the number of samples that will be propagated through the network.
For instance, let's say you have 1050 training samples and you want to set up a batch_size equal to 100. The algorithm takes the first 100 samples (from 1st to 100th) from the training dataset and trains the network. Next, it takes the second 100 samples (from 101st to 200th) and trains the network again. We can keep doing this procedure until we have propagated all samples through the network. The problem might happen with the last set of samples. In our example, we've used 1050 which is not divisible by 100 without the remainder. The simplest solution is just to get the final 50 samples and train the network.
I'm reviewing the Keras CNN example located here and i see that the input data has the positive and negative sentiment training samples randomly shuffled. I was wondering if the CNN is sensitive to the ordering of the training data.
For clarity: if my y_train was of shape 100x1, in which 0-50 where all positive sentiments and 50-100 were negative sentiments, would the results be any different compared to when every even index has positive sentiment and odd index has negative?
Theoretically for the last epoch, if the last half of the samples were only positive your model might have a slight bias for the positive. However this is why Keras' fit() function has a shuffle feature so it shuffles training samples for every epoch to ensure there is no bias and your model gets to train on different batches get a look at your problem from many different angles. Unless you have a reason to believe you should not be doing this i'd definitely recommend it.
Shuffling the data when training a neural network in batches might be crucial for the performance of your model. A more detailed discussion on this topic is presented here on the data science stackexchange.
I just want to add that shuffling is in general beneficial for the evaluation of your model when you do cross validation, for example. In each train-test folds you want to have random samples, so that you can be sure that your model can generalize well.
Say we train a multilayer NN in tensorflow for a regression task (i.e. multi input and multi output case). Then we have new instances and we apply the trained model and of course we get the corresponding outputs. Is there a way to backpropagate the outputs and reconstruct the inputs in tensorflow in an easy/efficient manner? What I am thinking is to then use the difference of the original and the reconstructed inputs of the new instances as a QC measure i.e. if the reconstructed inputs are not close enough to the originals then we have a problem etc. I hope I am making myself clear.
No, unfortunately you cannot take a trained model and try to get the corresponding input. The reason for this is that you have infinite possible solutions for each output.
Furthermore, backpropagation is not passing an output backwards through the network. Its the idea of determining what parameters in the model are contributing to what extent to loss function. This will not give the inputs to these hidden layers, but the extent at which the weights affected your decision.