I've been working with LSTMs for a while and I think I have grasped the main concepts. I have been trying to play with the Keras environment for a while so that I could get a better idea of how LSTM work, so I decided to train a neural network to identify the MNIST dataset.
I know that when I train a LSTM I should give a tensor as an input (number of samples, time steps, features). I reshaped the image from a 28x28 to a single vector of 784 elements (1x784) and then I make the input_shape = (60000, 1, 784). Eventually I tried to change the number of time steps and my new input_shape becomes (60000,16,49).
What I don't understand is why when I change the number of time steps the feature vector changes from 784 to 49. I think I don't really understand the concept of time steps in an LSTM. Could you please explain it better? Possibly referring to this particular case?
Furthermore, when I increase the time steps the precision is lower, why is so? Shouldn't it be higher?
Thank you.
edit
from __future__ import print_function
import numpy as np
import struct
from keras.models import Sequential
from keras.layers import Dense, LSTM, Activation
from keras.utils import np_utils
train_im = open('train-images-idx3-ubyte','rb')
train_la = open('train-labels-idx1-ubyte','rb')
test_im = open('t10k-images-idx3-ubyte','rb')
test_la = open('t10k-labels-idx1-ubyte','rb')
##training images and labels
magic,num_ima = struct.unpack('>II', train_im.read(8))
rows,columns = struct.unpack('>II', train_im.read(8))
img = np.fromfile(train_im,dtype=np.uint8).reshape(rows*columns, num_ima) #784*60000
magic_l, num_l = struct.unpack('>II', train_la.read(8))
lab = np.fromfile(train_la, dtype=np.int8) #1*60000
## test images and labels
magic, num_test = struct.unpack('>II', test_im.read(8))
rows,columns = struct.unpack('>II', test_im.read(8))
img_test = np.fromfile(test_im,dtype=np.uint8).reshape(rows*columns, num_test) #784x10000
magic_l, num_l = struct.unpack('>II', test_la.read(8))
lab_test = np.fromfile(test_la, dtype=np.int8) #1*10000
batch = 50
epoch=15
hidden_units = 10
classes = 1
a, b = img.T.shape[0:]
img = img.reshape(img.T.shape[0],-1,784)
img_test = img_test.reshape(img_test.T.shape[0],-1,784)
lab = np_utils.to_categorical(lab, 10)
lab_test = np_utils.to_categorical(lab_test, 10)
print(img.shape[0:])
model = Sequential()
model.add(LSTM(40,input_shape =img.shape[1:], batch_size = batch))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer = 'RMSprop', loss='mean_squared_error', metrics = ['accuracy'])
model.fit(img, lab, batch_size = batch,epochs=epoch,verbose=1)
scores = model.evaluate(img_test, lab_test, batch_size=batch)
predictions = model.predict(img_test, batch_size = batch)
print('LSTM test score:', scores[0])
print('LSTM test accuracy:', scores[1])
edit 2
Thank you very much, when I do so I get the following error:
ValueError: Input arrays should have the same number of samples as target arrays. Found 3750 input samples and 60000 target samples.
I know that I should reshape the output as well but I don't know what shape it should have.
Timesteps represent states in time like frames extracted from a video. The shape of the input passed to the LSTM should be in the form (num_samples,timesteps,input_dim). If you want 16 timesteps you should reshape your data as (num_samples//timesteps, timesteps, input_dims)
img=img.reshape(3750,16,784)
So with your batch_size=50,it will pass 50*16 images at a time.
Right now as you keep the num_samples constant, it splits your input_dims.
edit:
The target array will have the same shape as the num_samples i.e 3750 in your case. All the time steps will share the same label. You have to decide what you are going to do with those MNIST sequences. Your current model classifies those sequences (not digits) into 10 classes.
Related
I'm building an RNN to analyze motion capture (MoCap) data using TensorFlow, Pandas, and Keras.
About my data:
Data is obtained through pandas.read_csv and has a shape of
(832, 165)
Each row denotes a whole frame of data in a movement sequence (832 frames)
Each column denotes the rotational data for a joint (165 joints total)
I'm attempting to feed the data in one row at a time. The output should be the next frame in the movement sequence. I keep running into different types of errors when running model.fit.
I've attached a series of photos representing the different attempts to make the model work. If someone could provide some guidance as to why it's not working and how to fix I'd greatly appreciate it.
As a side note, each version of my code is different. I'm okay with using any as long as it ends up working, so when providing feedback if you could identify which version of my code you're talking about?
Uses tf.data.Dataset as input
Version 1 Code / Output
Version 2 Code / Output
Version 3: [Code] [Output]
Uses pandas arrays for input and target
Version 4 Code / Output
Version 5 Code / Output
Using Code 4 as a basis for troubleshooting, I noticed that you are passing incompatible shapes to the layers.
This line model.add(keras.layers.InputLayer(input_shape = (N_TIMESTEPS, N_FEATURES))) expects your data to have the same shape.
Whereas your data have (832, 165), which is the N_SAMPLES on the first index and the N_FEATURES, the N_TIMESTEPS is missing.
First, you should create a modified dataset that will generate a shape of (N_SAMPLES, N_TIMESTEPS, N_FEATURES).
Here is an example to generate a dummy dataset:
data = tf.random.normal((N_SAMPLES, N_TIMESTEPS, N_FEATURES))
target = tf.random.normal((N_SAMPLES, N_TIMESTEPS, N_FEATURES))
The N_TIMESTEPS in your data is important in LSTM as it determines how many TIME_STEPS to consider per update.
Here is the complete code used to simulate successful execution in Google Colab.
%tensorflow_version 2.x # To ensure latest Tensorflow version in Google Colab
import tensorflow as tf
import tensorflow.keras as keras
print(tf.__version__) # Tensorflow 2.2.0-rc3
BATCH_SIZE = 1
N_TIMESTEPS = 10
#Data is obtained through pandas.read_csv and has a shape of (832, 165)
#Each row denotes a whole frame of data in a movement sequence (832 frames)
#Each column denotes the rotational data for a joint (165 joints total)
# N_SAMPLES = data.values.shape[0]
# N_FEATURES = data.values.shape[1]
N_SAMPLES = 832
N_FEATURES = 165
def get_compiled_model():
model = keras.Sequential()
model.add(keras.layers.InputLayer(input_shape = (N_TIMESTEPS, N_FEATURES)))
model.add(keras.layers.LSTM(35, activation = 'relu', return_sequences = True))
model.add(keras.layers.LSTM(35, activation = 'relu', return_sequences = True))
model.add(keras.layers.Dense(165, activation = 'tanh'))
model.compile(optimizer = 'adam',
loss = 'mse',
metrics = ['accuracy'])
return model
model = get_compiled_model()
model.summary()
data = tf.random.normal((N_SAMPLES, N_TIMESTEPS, N_FEATURES))
target = tf.random.normal((N_SAMPLES, N_TIMESTEPS, N_FEATURES))
model.fit(data, target, epochs = 15, batch_size = BATCH_SIZE, shuffle = False)
Hope this helps you.
You could read more about the Tensorflow Keras Guide using RNN in this link.
I load data from a CSV FILE OF 20+6 Columns(Features and Labels). Im trying to run my data through Convolutional Neural Network in pytorch. I get error saying it expects a 3D input and Im giving it 1D input. I am using Conv1d .
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import pandas as pd
from torch.utils.data import Dataset,DataLoader
from sklearn.model_selection import train_test_split
#Read Data
data=pd.read_csv('Data.csv')
Features=data[data.columns[0:20]]
Labels=data[data.columns[20:]]
#Split Data
X_train, X_test, y_train, y_test = train_test_split( Features, Labels, test_size=0.33, shuffle=True)
#Create Tensors
train_in=torch.tensor(X_train.values)
train_out=torch.tensor(y_train.values)
test_in=torch.tensor(X_test.values)
test_out=torch.tensor(y_test.values)
#Model CNN
class CNN(nn.Module):
def __init__(self):
super(CNN,self).__init__()
self.layer1 = nn.Sequential(
nn.Conv1d(20,40,kernel_size=5,stride=1,padding=2),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2,stride=2)
)
self.layer2 = nn.Sequential(
nn.Conv1d(40,60,kernel_size=5,stride=1,padding=2),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2,stride=2)
)
self.drop_out = nn.Dropout()
self.fc1 = nn.Linear(60,30)
self.fc2 = nn.Linear(30,15)
self.fc3 = nn.Linear(15,6)
def forward(self,x):
out=self.layer1(x)
out=self.layer2(out)
out=self.drop_out(out)
out=self.fc1(out)
out=self.fc2(out)
out=self.fc3(out)
return out
Epochs=10
N_labels=len(Labels.columns)
N_features=len(Features.columns)
batch_size=100
learning_rate=0.001
#TRAIN MODEL
model = CNN()
#LOSS AND OPTIMIZER
criterion = torch.nn.SmoothL1Loss()
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)
#TRAIN MODEL
model.train()
idx=0
for i in train_in:
y=model(i)
loss=criterion(y,train_out[idx])
idx+=1
loss.backward()
optimizer.step()
How do I write the Training and Eval loop? All the examples I see on the internet all use images and they also use DataLoader.
Conv1D takes as input a tensor with 3 dimensions (N, C, L) where N is the batchsize, C is the number of channels and L size of the 1D data. In your case it seems like one sample has 20 entries and you have one channel. You have a batch_size variable but it is not used in the code posted.
nn.Conv1d(20,40,kernel_size=5,stride=1,padding=2)
This lines creates a convolution which takes a input with 20 channels (you have 1) and outputs 40 channels. So you have to change the 20 to a 1 and you might wanna change the 40 to something smaller. Since convolutions are applied to the whole input (controlled by stride, patting and kernel size), there is no need to specify the size of a sample.
Also you might wanna add some logic to build minibatches. Right now it seems like you just want to input every sample by itself. Maybe read a bit about dataset classes and data loaders in pytorch.
I'm trying to come up with a Keras model that would do binary classification on image sequences. I split my dataset into train (1200 samples), valid (320 samples) and test (764 samples) partitions.
In the first step, I used InceptionV3 to extract features from my dataset consisting of image sequences. I saved those sequences of extracted features to disk.
The next step is to feed those sequences as input data to my LSTM based model.
Here is a minimal workable example that compiles:
import numpy as np
import copy
import random
from keras.layers.recurrent import LSTM
from keras.layers import Dense, Flatten, Dropout
from keras.models import Sequential
from keras.optimizers import Adam
def get_sequence(sequence_no, sample_type):
# Load sequence from disk according to 'sequence_no' and 'sample_type'
# eg. 'sample_' + sample_type + '_' + sequence_no
return np.zeros([100, 2048]) # added only for demo purposes, so it compiles
def frame_generator(batch_size, generator_type, labels):
# Define list of sample indexes
sample_list = []
for i in range(0, len(labels)):
sample_list.append(i)
# sample_list = [0, 1, 2, 3 ...]
while 1:
X, y = [], []
# Generate batch_size samples
for _ in range(batch_size):
# Reset to be safe
sequence = None
# Get a random sample
random_sample = random.choice(sample_list)
# Get sequence from disk
sequence = get_sequence(random_sample, generator_type)
X.append(sequence)
y.append(labels[random_sample])
yield np.array(X), np.array(y)
# Data mimicking
train_samples_num = 1200
valid_samples_num = 320
labels_train = np.random.randint(0, 2, size=(train_samples_num), dtype='uint8')
labels_valid = np.random.randint(0, 2, size=(valid_samples_num), dtype='uint8')
batch_size = 32
train_generator = frame_generator(batch_size, 'train', labels_train)
val_generator = frame_generator(batch_size, 'valid', labels_valid)
# Model
model = Sequential()
model.add(LSTM(2048, input_shape=(100, 2048), dropout = 0.5))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
model.summary()
# Training
model.fit_generator(
generator=train_generator,
steps_per_epoch=len(labels_train) // batch_size,
epochs=10,
validation_data=val_generator,
validation_steps=len(labels_valid) // batch_size)
This example was compiled with:
python 3.6.8
keras 2.2.4
tensorflow 1.13.1
This works well, I ran 3 training sessions on this model (same setup, same train/valid/test partitioning) and the test accuracy mean was 96.9%.
Then I started to ask myself whether it's a good idea to always choose a completely random sample from the sample list, inside the frame_generator() function, specifically line
random_sample = random.choice(sample_list). Picking samples completely at random makes it possible that some samples would be used way more than others, in the training session. Also, some of them would possibly never be used. This made me think that the model would generalize less in this setup COMPARED TO if it would see the samples equally during training.
As a fix, changes applied to frame_generator():
Make a backup copy of the sample list, then remove a sample from the sample list every time after using it. When the sample list becomes empty, replace the sample list with its backup copy. Do this for the whole training session. This ensures that ALL of the samples are seen by the model at training with almost the same frequency.
This is the new version of frame_generator(). Marked with #ADDED the 4 lines added:
def frame_generator(batch_size, generator_type, labels):
# Define list of sample indexes
sample_list = []
for i in range(0, len(labels)):
sample_list.append(i)
# sample_list = [0, 1, 2, 3 ...]
sample_list_backup = copy.deepcopy(sample_list) # ADDED
while 1:
X, y = [], []
# Generate batch_size samples
for _ in range(batch_size):
# Reset to be safe
sequence = None
# Get a random sample
random_sample = random.choice(sample_list)
sample_list.remove(random_sample) # ADDED
if len(sample_list) == 0: # ADDED
sample_list = copy.deepcopy(sample_list_backup) # ADDED
# Get sequence from disk
sequence = get_sequence(random_sample, generator_type)
X.append(sequence)
y.append(labels[random_sample])
yield np.array(X), np.array(y)
What I expected:
Given that now the model should be able to generalize a bit better, due to ensuring that all the samples are seen at training with an equal frequency, I expected that the test accuracy would increase slightly.
What happened:
I ran 3 training sessions on this model (same train/valid/test partitioning) and the test accuracy mean this time was 96.2%. That is a 0.7% decrease in accuracy from the 1st setup. So it seems that the model now generalizes worse.
Losses plots for each run:
Question:
Why does more randomness in frame_generator() result in better test set accuracy?
Seems rather counterintuitive to me.
My input data consists of 10 samples, each of which has 200 time steps, while each time step is described by a vector of 30 dimensions.
In addition, each time step consists of a 3 dimensional vector (one hot encoding) which describes the action which has been taken at that particular time step. With that being said, I am trying to build a model which get fed in all previous actions and then predicts which action would be the best to take next.
I tried to get this working with tflearn and tensorflow but with limited success so far.
Simple sample code:
import numpy as np
import operator
import tflearn
from tflearn import regression
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.embedding_ops import embedding
from tflearn.layers.recurrent import bidirectional_rnn, BasicLSTMCell
from tflearn.data_utils import to_categorical, pad_sequences
SAMPLES = 10
TIME_STEPS = 200
DATA_DIMENSIONS = 30
LABEL_CLASSES = 3
x = []
y = []
# Generate fake data.
for i in range(SAMPLES):
sequences = []
outputs = []
for i in range(TIME_STEPS):
d = []
for i in range(DATA_DIMENSIONS):
d.append(1)
sequences.append(d)
outputs.append([0,0,1])
x.append(sequences)
y.append(outputs)
print("X1:", len(x), ", X2:", len(x[0]), ", X3:", len(x[0][0]))
print("Y1:", len(y), ", Y2:", len(y[0]), ", Y3:", len(y[0][0]))
# Define model
net = tflearn.input_data([None, TIME_STEPS, DATA_DIMENSIONS], name='input')
net = tflearn.lstm(net, 128, dropout=0.8, return_seq=True)
net = tflearn.fully_connected(net, LABEL_CLASSES, activation='softmax')
net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy', name='targets')
model = tflearn.DNN(net)
# Fit model.
model.fit({'input': x}, {'targets': y},
n_epoch=1,
snapshot_step=1000,
show_metric=True, run_id='test', batch_size=32)
Error
ValueError: Cannot feed value of shape (10, 200, 3) for Tensor
'targets/Y:0', which has shape '(?, 3)'
As far as I understand, the input_data should be correct. However, the output data is apparently wrong, at least, Tensorflow throws an error. That is probably because my model expects one label per sample rather than one label per time step.
Can I even achieve my goal with an LSTM, and if so, how do I have to set up my model?
Thanks,
Robert
As the error suggests, there is a shape mismatch between the expected size of your targets tensor, and the one of the data you actually provide for it. Let us break it down.
From what I understand, you have labeled action for every timestep of your sequences. This means that the labels that you provide should have a shape (10, 200, 3). This seems to be the case from the error message. Good.
So we now know the error comes from what the network generates.
=================
Input data -> (10, 200, 30)
LSTM -> (10, 128) (because return_seq=False)
FullyConnected -> (10, 3).
=================
So that explains the second part of the error message, your network indeed produces an output with shape (10, 3) which mismatches the one of your data.
I think you missed the return_seq argument of the LSTM. As is usually the case with RNN implementations, you have a parameter telling if you want the layer to return outputs for the whole sequence, or only for the last timestep. Here by default it is the second option, that is why you don't get an output with the expected shape. Use return_seq=True.
I have some trouble understanding LSTM models in TensorFlow.
I use the tflearn as a wrapper, as it does all the initialization and other higher level stuff automatically. For simplicity, let's consider this example program. Until line 42, net = tflearn.input_data([None, 200]), it's pretty clear what happens. You load a dataset into variables and make it of a standard length (in this case, 200). Both the input variables and also the 2 classes are, in this case, converted to one-hot vectors.
How does the LSTM take the input? Across how many samples does it predict the output?
What does net = tflearn.embedding(net, input_dim=20000, output_dim=128) represent?
My goal is to replicate the activity recognition dataset in the paper. For example, I would like to input a 4096 vector as input to the LSTM, and the idea is to take 16 of such vectors, and then produce the classification result. I think the code would look like this, but I don't know how the input to the LSTM should be given.
from __future__ import division, print_function, absolute_import
import tflearn
from tflearn.data_utils import to_categorical, pad_sequences
from tflearn.datasets import imdb
train, val = something.load_data()
trainX, trainY = train #each X sample is a (16,4096) nd float64
valX, valY = val #each Y is a one hot vector of 101 classes.
net = tflearn.input_data([None, 16,4096])
net = tflearn.embedding(net, input_dim=4096, output_dim=256)
net = tflearn.lstm(net, 256)
net = tflearn.dropout(net, 0.5)
net = tflearn.lstm(net, 256)
net = tflearn.dropout(net, 0.5)
net = tflearn.fully_connected(net, 101, activation='softmax')
net = tflearn.regression(net, optimizer='adam',
loss='categorical_crossentropy')
model = tflearn.DNN(net, clip_gradients=0., tensorboard_verbose=3)
model.fit(trainX, trainY, validation_set=(testX, testY), show_metric=True,
batch_size=128,n_epoch=2,snapshot_epoch=True)
Basically, lstm takes the size of your vector for once cell:
lstm = rnn_cell.BasicLSTMCell(lstm_size, forget_bias=1.0)
Then, how many time series do you want to feed? It's up to your fed vector. The number of arrays in the X_split decides the number of time steps:
X_split = tf.split(0, time_step_size, X)
outputs, states = rnn.rnn(lstm, X_split, initial_state=init_state)
In your example, I guess the lstm_size is 256, since it's the vector size of one word. The time_step_size would be the max word count in your training/test sentences.
Please see this example: https://github.com/nlintz/TensorFlow-Tutorials/blob/master/07_lstm.py