I load data from a CSV FILE OF 20+6 Columns(Features and Labels). Im trying to run my data through Convolutional Neural Network in pytorch. I get error saying it expects a 3D input and Im giving it 1D input. I am using Conv1d .
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import pandas as pd
from torch.utils.data import Dataset,DataLoader
from sklearn.model_selection import train_test_split
#Read Data
data=pd.read_csv('Data.csv')
Features=data[data.columns[0:20]]
Labels=data[data.columns[20:]]
#Split Data
X_train, X_test, y_train, y_test = train_test_split( Features, Labels, test_size=0.33, shuffle=True)
#Create Tensors
train_in=torch.tensor(X_train.values)
train_out=torch.tensor(y_train.values)
test_in=torch.tensor(X_test.values)
test_out=torch.tensor(y_test.values)
#Model CNN
class CNN(nn.Module):
def __init__(self):
super(CNN,self).__init__()
self.layer1 = nn.Sequential(
nn.Conv1d(20,40,kernel_size=5,stride=1,padding=2),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2,stride=2)
)
self.layer2 = nn.Sequential(
nn.Conv1d(40,60,kernel_size=5,stride=1,padding=2),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2,stride=2)
)
self.drop_out = nn.Dropout()
self.fc1 = nn.Linear(60,30)
self.fc2 = nn.Linear(30,15)
self.fc3 = nn.Linear(15,6)
def forward(self,x):
out=self.layer1(x)
out=self.layer2(out)
out=self.drop_out(out)
out=self.fc1(out)
out=self.fc2(out)
out=self.fc3(out)
return out
Epochs=10
N_labels=len(Labels.columns)
N_features=len(Features.columns)
batch_size=100
learning_rate=0.001
#TRAIN MODEL
model = CNN()
#LOSS AND OPTIMIZER
criterion = torch.nn.SmoothL1Loss()
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)
#TRAIN MODEL
model.train()
idx=0
for i in train_in:
y=model(i)
loss=criterion(y,train_out[idx])
idx+=1
loss.backward()
optimizer.step()
How do I write the Training and Eval loop? All the examples I see on the internet all use images and they also use DataLoader.
Conv1D takes as input a tensor with 3 dimensions (N, C, L) where N is the batchsize, C is the number of channels and L size of the 1D data. In your case it seems like one sample has 20 entries and you have one channel. You have a batch_size variable but it is not used in the code posted.
nn.Conv1d(20,40,kernel_size=5,stride=1,padding=2)
This lines creates a convolution which takes a input with 20 channels (you have 1) and outputs 40 channels. So you have to change the 20 to a 1 and you might wanna change the 40 to something smaller. Since convolutions are applied to the whole input (controlled by stride, patting and kernel size), there is no need to specify the size of a sample.
Also you might wanna add some logic to build minibatches. Right now it seems like you just want to input every sample by itself. Maybe read a bit about dataset classes and data loaders in pytorch.
Related
I'm trying to implement a small prototype of The GCN Model using the library Stellargraph. I've my StellarGraph graph object ready, I'm trying to solve a multi-class multi-label classification problem. This means I'm trying to predict more than one column (19 exactly) each column is encoded to either 0 or 1.
Here is what I've done:
from sklearn.model_selection import train_test_split
from stellargraph.mapper import FullBatchNodeGenerator
train_subjects, test_subjects = train_test_split(nodelist, test_size = .25)
generator = FullBatchNodeGenerator(graph, method="gcn")
from stellargraph.layer import GCN
train_gen = generator.flow(train_subjects['ID'], train_subjects.drop(['ID'], axis = 1))
gcn = GCN(layer_sizes=[16, 16], activations=["relu", "relu"], generator=generator, dropout=0.5)
from tensorflow.keras import layers, optimizers, losses, metrics, Model
x_inp, x_out = gcn.in_out_tensors()
predictions = layers.Dense(units = 1, activation="sigmoid")(x_out)
from tensorflow.keras.metrics import Precision as Precision
model = Model(inputs=x_inp, outputs=predictions)
model.compile(
optimizer=optimizers.Adam(learning_rate = 0.01),
loss=losses.categorical_crossentropy,
metrics= [Precision()])
val_gen = generator.flow(test_subjects['ID'], test_subjects.drop(['ID'], axis = 1))
from tensorflow.keras.callbacks import EarlyStopping
es_callback = EarlyStopping(monitor="val_precision", patience=200, restore_best_weights=True)
history = model.fit(
train_gen,
epochs=200,
validation_data=val_gen,
verbose=2,
shuffle=False,
callbacks=[es_callback])
I've 271045 edges & 16354 nodes in total including 12265 training nodes. The issue I'm getting is a shape mismatching from Keras. It states as follows. I suspect it's due to inserting multiple columns as target columns. I've tried the model using only one column (class) & it worked perfectly.
InvalidArgumentError: Incompatible shapes: [1,12265] vs. [1,233035]
[[node LogicalAnd_1 (defined at tmp/ipykernel_52/2745570431.py:7) ]] [Op:__inference_train_function_1405]
It's worth mentioning that 233035 = 12265 (number of train nodes) times 19 (number of classes). Any Idea on what is going wrong here?
I figured out the problem.
It was a newbie mistake, I initialized the Dense Classification layer with 1 unit instead of 19 (number of classes).
I just needed to fix that line to:
predictions = layers.Dense(units = 19, activation="sigmoid")(x_out)
Have a nice day!
I want to load in a pretrained model and start testing on images.
This is the code that I thought would work:
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
base_model = InceptionV3(weights='imagenet', include_top=False)
from __future__ import absolute_import, division, print_function
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
#Preprocessing
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
train_images = train_images / 255.0
test_images = test_images / 255.0
#Preprocessing
test_loss, test_acc = base_model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)
Instead it says: "You must compile a model before training/testing"
Looking here https://keras.io/applications/ at InceptionV3: they seem to be compiling and fitting the model after importing it. Why do they do this?
The InceptionV3 model was trained on very different images in comparison to Fashion MNIST. What you are seeing in the tutorial is an instance of transfer learning. Roughly in transfer learning, you can split up the model into a feature extraction module and a classification module. The goal of the convolutional and pooling layers are to automate the feature extractions so that we can produce an ideal transformation from raw image pixels to a representative set of features that describes the images well.
These images are then fed to a classification module, where the goal is to take these features and actually do classification. That's the goal of the dense layers that are attached after the convolutional and pooling. Also note that the InceptionV3 model is trained on ImageNet images, which have 1000 classes. In order to successfully apply ImageNet to the Fashion MNIST dataset, you will need to retrain the Dense layers so that the conv and pooling layers can take the features extracted from the images and perform classification on that. Therefore, set include_top=False as what you have done, but you'll also have to attach some Dense layers and retrain those. Also make sure that you specify the last layer to have 10 classes due to the Fashion MNIST dataset.
However, some gotchas are that InceptionV3 takes in 299 x 299 sized images where Fashion MNIST takes in 28 x 28. You will need to resize the images, as well as artificially pad the images in the third dimension so that they are RGB. Because going from 28 x 28 to 299 x 299 requires a factor of 10 increase in both dimensions, resizing images up to this resolution probably will not look well perceptually. InceptionV3 can load in a model where you can change the expected input image size. The smallest image size unfortunately is 75 x 75 for InceptionV3, so we'll have to use that then resize up to 75 x 75. To resize the image, you can use Scikit-images resize method from skimage.transform. Also, if you plan on using InceptionV3, you'll need to preprocess the input images like they've done in their network prior to training.
Therefore:
from __future__ import absolute_import, division, print_function
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.applications.inception_v3 import preprocess_input # New
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(75, 75, 3))
# Now add some Dense Layers - let's also add in a Global Average Pooling layer too
# as a better way to "flatten"
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a softmax layer -- 10 classes
predictions = Dense(10, activation='softmax')(x)
# Create new model
model = Model(inputs=base_model.input, outputs=predictions)
# Make sure we set the convolutional layers and pooling layers so that they're not trainable
for layer in base_model.layers:
layer.trainable = False
#Preprocessing
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
train_images = train_images.astype(np.float) / 255.0 # Change
test_images = test_images.astype(np.float) / 255.0 # Change
# Preprocessing the images
from skimage.transform import resize
train_images_preprocess = np.zeros((train_images.shape[0], 75, 75, 3), dtype=np.float32)
for i, img in enumerate(train_images):
img_resize = resize(img, (75, 75), anti_aliasing=True)
img_resize = preprocess_input(img_resize).astype(np.float32)
train_images_preprocess[i] = np.dstack([img_resize, img_resize, img_resize])
del train_images
test_images_preprocess = np.zeros((test_images.shape[0], 75, 75, 3), dtype=np.float32)
for i, img in enumerate(test_images):
img_resize = resize(img, (75, 75), anti_aliasing=True)
img_resize = preprocess_input(img_resize).astype(np.float32)
test_images_preprocess[i] = np.dstack([img_resize, img_resize, img_resize])
del test_images
# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train it
model.fit(train_images_preprocess, train_labels, epochs=15)
# Now evaluate the model - note that we're evaluating on the new model, not the old one
test_loss, test_acc = model.evaluate(test_images_preprocess, test_labels)
print('Test accuracy:', test_acc)
Take note that I had to change the expected input shape of the data from the InceptionV3 base model so that it's 75 x 75 x 3 with 3 being it's expecting colour images. Also, I had to convert your data to floating-point before dividing by 255 or the data will still be unsigned 8-bit integer so the only values would be either 0 or 1, thus significantly decreasing your accuracy. Also, I created new arrays that store the RGB versions of the images that are not only resized to 75 x 75, but are also preprocessed using the same method that InceptionV3 uses prior to training their images. One other thing I should mention is that we need to set the layers prior to the Dense layers so that we don't train on them. We want to use these layers to provide the feature descriptors for the images that get pumped into the Dense layers to do the classification. Finally, take note that the labels for the training and test data are enumerated from 0 - 9. Therefore, the loss function you need will be the sparse categorical cross-entropy one which is designed to take in single-valued labels. The categorical cross-entropy loss function expects one-hot encoding.
We finally compile the model so we can set it up for training, then train it. Finally we evaluate the test data accuracy. This will of course require some tuning, especially the number of Dense layers you want, and the number of epochs to choose for training.
Warning
The resizing of the images and creating a new array for them will take some time as we're looping over 60000 training images and 10000 test images respectively. You'll need to be patient here. In order to conserve memory, I delete the original training and test images from memory to compensate for the preprocessed images.
Ending Note
Because the Fashion MNIST dataset has considerably less degrees of freedom than ImageNet, you can get away with getting a high accuracy model using fewer layers than normal. The ImageNet database consists of images that have varying levels of distortion, object orientations, placements and size. If you constructed a model that consisted of just a few conv and pool layers, combined with a flattening and a couple of dense layers, not only will it take less time to train, but you'll get a decently performing model.
Most pre-trained image classification models are pre-trained on the ImageNet dataset, so you're loading the parameter weights from that training when you call base_model = InceptionV3(weights='imagenet', include_top=False). The include_top=False parameter actually chops off the prediction layer from the model, which you are expected to add on and train on your own dataset, Fashion MNIST in this case.
The transfer learning approach doesn't completely get rid of all training, but makes it so you only need to fine tune the model based on your dataset's specific data. Since the model has already learned how to recognize basic and even somewhat complex shapes by training on ImageNet, now it just needs to be trained to recognize what certain combinations of shapes mean in the context of your data.
That being said, I believe you should still be able to call model.predict(x) on some preprocessed image, x, if you change include_top=False to include_top=True, although the model will try to classify the image into one of ImageNet's 1000 classes and not into one of Fashion MNIST's classes.
The example you are showing in the Keras documentation does not the same as you want to do. They fit a model in order to perform transfer learning.
You seem to want to just load a pretrained model and then evaluate its loss/accuracy on some dataset. The problem is that in order to call model.evaluate, you first need to define a loss and metrics (including accuracy), and for that you need to call model.compile(loss = ..., metrics = ..., optimizer = ...), just because it is the only Keras call that sets the loss and metrics of a model.
If for some reason you don't want to do that, you could just call y_pred = model.predict with your dataset, and use any python implementation of the loss and metrics you want on y_true and y_pred. This would not require to compile the model as you externalize the evaluation.
I've been working with LSTMs for a while and I think I have grasped the main concepts. I have been trying to play with the Keras environment for a while so that I could get a better idea of how LSTM work, so I decided to train a neural network to identify the MNIST dataset.
I know that when I train a LSTM I should give a tensor as an input (number of samples, time steps, features). I reshaped the image from a 28x28 to a single vector of 784 elements (1x784) and then I make the input_shape = (60000, 1, 784). Eventually I tried to change the number of time steps and my new input_shape becomes (60000,16,49).
What I don't understand is why when I change the number of time steps the feature vector changes from 784 to 49. I think I don't really understand the concept of time steps in an LSTM. Could you please explain it better? Possibly referring to this particular case?
Furthermore, when I increase the time steps the precision is lower, why is so? Shouldn't it be higher?
Thank you.
edit
from __future__ import print_function
import numpy as np
import struct
from keras.models import Sequential
from keras.layers import Dense, LSTM, Activation
from keras.utils import np_utils
train_im = open('train-images-idx3-ubyte','rb')
train_la = open('train-labels-idx1-ubyte','rb')
test_im = open('t10k-images-idx3-ubyte','rb')
test_la = open('t10k-labels-idx1-ubyte','rb')
##training images and labels
magic,num_ima = struct.unpack('>II', train_im.read(8))
rows,columns = struct.unpack('>II', train_im.read(8))
img = np.fromfile(train_im,dtype=np.uint8).reshape(rows*columns, num_ima) #784*60000
magic_l, num_l = struct.unpack('>II', train_la.read(8))
lab = np.fromfile(train_la, dtype=np.int8) #1*60000
## test images and labels
magic, num_test = struct.unpack('>II', test_im.read(8))
rows,columns = struct.unpack('>II', test_im.read(8))
img_test = np.fromfile(test_im,dtype=np.uint8).reshape(rows*columns, num_test) #784x10000
magic_l, num_l = struct.unpack('>II', test_la.read(8))
lab_test = np.fromfile(test_la, dtype=np.int8) #1*10000
batch = 50
epoch=15
hidden_units = 10
classes = 1
a, b = img.T.shape[0:]
img = img.reshape(img.T.shape[0],-1,784)
img_test = img_test.reshape(img_test.T.shape[0],-1,784)
lab = np_utils.to_categorical(lab, 10)
lab_test = np_utils.to_categorical(lab_test, 10)
print(img.shape[0:])
model = Sequential()
model.add(LSTM(40,input_shape =img.shape[1:], batch_size = batch))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer = 'RMSprop', loss='mean_squared_error', metrics = ['accuracy'])
model.fit(img, lab, batch_size = batch,epochs=epoch,verbose=1)
scores = model.evaluate(img_test, lab_test, batch_size=batch)
predictions = model.predict(img_test, batch_size = batch)
print('LSTM test score:', scores[0])
print('LSTM test accuracy:', scores[1])
edit 2
Thank you very much, when I do so I get the following error:
ValueError: Input arrays should have the same number of samples as target arrays. Found 3750 input samples and 60000 target samples.
I know that I should reshape the output as well but I don't know what shape it should have.
Timesteps represent states in time like frames extracted from a video. The shape of the input passed to the LSTM should be in the form (num_samples,timesteps,input_dim). If you want 16 timesteps you should reshape your data as (num_samples//timesteps, timesteps, input_dims)
img=img.reshape(3750,16,784)
So with your batch_size=50,it will pass 50*16 images at a time.
Right now as you keep the num_samples constant, it splits your input_dims.
edit:
The target array will have the same shape as the num_samples i.e 3750 in your case. All the time steps will share the same label. You have to decide what you are going to do with those MNIST sequences. Your current model classifies those sequences (not digits) into 10 classes.
I want to get loss values as model train with each instance.
history = model.fit(..)
for example above code returns the loss values for each epoch not mini batch or instance.
what is the best way to do this? Any suggestions?
There is exactly what you are looking for at the end of this official keras documentation page https://keras.io/callbacks/#callback
Here is the code to create a custom callback
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
model = Sequential()
model.add(Dense(10, input_dim=784, kernel_initializer='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
history = LossHistory()
model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=0, callbacks=[history])
print(history.losses)
# outputs
'''
[0.66047596406559383, 0.3547245744908703, ..., 0.25953155204159617, 0.25901699725311789]
'''
If you want to get loss values for each batch, you might want to use call model.train_on_batch inside a generator. It's hard to provide a complete example without knowing your dataset, but you will have to break your dataset into batches and feed them one by one
def make_batches(...):
...
batches = make_batches(...)
batch_losses = [model.train_on_batch(x, y) for x, y in batches]
It's a bit more complicated with single instances. You can, of course, train on 1-sized batches, though it will most likely thrash your optimiser (by maximising gradient variance) and significantly degrade performance. Besides, since loss functions are evaluated outside of Python's domain, there is no direct way to hijack the computation without tinkering with C/C++ and CUDA sources. Even then, the backend itself evaluates the loss batch-wise (benefitting from highly vectorised matrix-operations), therefore you will severely degrade performance by forcing it to evaluate loss on each instance. Simply put, hacking the backend will only (probably) help you reduce GPU memory transfers (as compared to training on 1-sized batches from the Python interface). If you really want to get per-instance scores, I would recommend you to train on batches and evaluate on instances (this way you will avoid issues with high variance and reduce expensive gradient computations, since gradients are only estimated during training):
def make_batches(batchsize, x, y):
...
batchsize = n
batches = make_batches(n, ...)
batch_instances = [make_batches(1, x, y) for x, y in batches]
losses = [
(model.train_on_batch(x, y), [model.test_on_batch(*inst) for inst in instances])
for batch, instances in zip(batches, batch_instances)
]
One solution is to calculate the loss function between train expectations and predictions from train input. In the case of loss = mean_squared_error and three dimensional outputs (i.e. image width x height x channels):
model.fit(train_in,train_out,...)
pred = model.predict(train_in)
loss = np.add.reduce(np.square(test_out-pred),axis=(1,2,3)) # this computes the total squared error for each sample
loss = loss / ( pred.shape[1]*pred.shape[2]*pred.shape[3]) # this computes the mean over the sample entry
np.savetxt("loss.txt",loss) # This line saves the data to file
After combining resources from here and here I came up with the following code. Maybe it will help you. The idea is that you can override the Callbacks class from keras and then use the on_batch_end method to check the loss value from the logs that keras will supply automatically to that method.
Here is a working code of an NN with that particular function built in. Maybe you can start from here -
import numpy as np
import pandas as pd
import seaborn as sns
import os
import matplotlib.pyplot as plt
import time
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import Callback
# fix random seed for reproducibility
seed = 155
np.random.seed(seed)
# load pima indians dataset
# download directly from website
dataset = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data",
header=None).values
X_train, X_test, Y_train, Y_test = train_test_split(dataset[:,0:8], dataset[:,8], test_size=0.25, random_state=87)
class NBatchLogger(Callback):
def __init__(self,display=100):
'''
display: Number of batches to wait before outputting loss
'''
self.seen = 0
self.display = display
def on_batch_end(self,batch,logs={}):
self.seen += logs.get('size', 0)
if self.seen % self.display == 0:
print('\n{0}/{1} - Batch Loss: {2}'.format(self.seen,self.params['samples'],
logs.get('loss')))
out_batch = NBatchLogger(display=1000)
np.random.seed(seed)
my_first_nn = Sequential() # create model
my_first_nn.add(Dense(5, input_dim=8, activation='relu')) # hidden layer
my_first_nn.add(Dense(1, activation='sigmoid')) # output layer
my_first_nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
my_first_nn_fitted = my_first_nn.fit(X_train, Y_train, epochs=1000, verbose=0, batch_size=128,
callbacks=[out_batch], initial_epoch=0)
Please let me know if you wanted to have something like this.
I have some trouble understanding LSTM models in TensorFlow.
I use the tflearn as a wrapper, as it does all the initialization and other higher level stuff automatically. For simplicity, let's consider this example program. Until line 42, net = tflearn.input_data([None, 200]), it's pretty clear what happens. You load a dataset into variables and make it of a standard length (in this case, 200). Both the input variables and also the 2 classes are, in this case, converted to one-hot vectors.
How does the LSTM take the input? Across how many samples does it predict the output?
What does net = tflearn.embedding(net, input_dim=20000, output_dim=128) represent?
My goal is to replicate the activity recognition dataset in the paper. For example, I would like to input a 4096 vector as input to the LSTM, and the idea is to take 16 of such vectors, and then produce the classification result. I think the code would look like this, but I don't know how the input to the LSTM should be given.
from __future__ import division, print_function, absolute_import
import tflearn
from tflearn.data_utils import to_categorical, pad_sequences
from tflearn.datasets import imdb
train, val = something.load_data()
trainX, trainY = train #each X sample is a (16,4096) nd float64
valX, valY = val #each Y is a one hot vector of 101 classes.
net = tflearn.input_data([None, 16,4096])
net = tflearn.embedding(net, input_dim=4096, output_dim=256)
net = tflearn.lstm(net, 256)
net = tflearn.dropout(net, 0.5)
net = tflearn.lstm(net, 256)
net = tflearn.dropout(net, 0.5)
net = tflearn.fully_connected(net, 101, activation='softmax')
net = tflearn.regression(net, optimizer='adam',
loss='categorical_crossentropy')
model = tflearn.DNN(net, clip_gradients=0., tensorboard_verbose=3)
model.fit(trainX, trainY, validation_set=(testX, testY), show_metric=True,
batch_size=128,n_epoch=2,snapshot_epoch=True)
Basically, lstm takes the size of your vector for once cell:
lstm = rnn_cell.BasicLSTMCell(lstm_size, forget_bias=1.0)
Then, how many time series do you want to feed? It's up to your fed vector. The number of arrays in the X_split decides the number of time steps:
X_split = tf.split(0, time_step_size, X)
outputs, states = rnn.rnn(lstm, X_split, initial_state=init_state)
In your example, I guess the lstm_size is 256, since it's the vector size of one word. The time_step_size would be the max word count in your training/test sentences.
Please see this example: https://github.com/nlintz/TensorFlow-Tutorials/blob/master/07_lstm.py