I have created a simple machine learning model to predict the multiplication of two given numbers. I followed a youtube tutorial to learn the basic and try to work on this simple idea.
My model has three dense layers - input, hidden, output. Input and hidden were using same activation function 'relu' which were giving me loss as NaN on model fit so I changed one of them to sigmoid which started giving me 0.00000+e... something as loss.
I don't know what is wrong. Anyone can please direct me what I am doing wrong or assuming wrong?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
x = np.array(df['X'])
y = np.array(df['Y'])
s = np.array(df['S'])
def build_model():
model = keras.Sequential()
inputLayer = layers.Dense(64, activation='sigmoid', input_shape=[2])
hiddenLayer = layers.Dense(64, activation='relu')
outputLayer = layers.Dense(1)
model.add(inputLayer)
model.add(hiddenLayer)
model.add(outputLayer)
model.compile(optimizer='sgd', loss='mean_squared_error',metrics=['accuracy'])
return model
model = build_model()
print(model.summary())
EPOCHS = 1000
# I didn't know how to provide mulitple input to my model for
# training so I checked stackoverflow here
# https://stackoverflow.com/questions/55233377/keras-sequential-model-with-multiple-inputs?noredirect=1&lq=1
merged_array = np.stack([x, y], axis=1)
history = model.fit(merged_array, s, epochs=EPOCHS, validation_split = 0.2, verbose=2)
print(history)
print(model.predict([[2,3],]))
Disclaimer: I am a beginner and I have just started using keras and python for the first time in my life.
It does work for smaller numbers with ReLU activation.
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
x = np.random.randint(0, 10, 1000)
y = np.random.randint(0, 10, 1000)
s = x*y
def build_model():
model = keras.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=[2]))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer=keras.optimizers.Adam(lr=0.01),
loss='mean_squared_error')
return model
model = build_model()
merged_array = np.stack([x, y], axis=1)
history = model.fit(merged_array, s, epochs=250,
validation_split=0.2)
test_input = [2, 3]
print('\n{} x {} ='.format(*test_input),
np.round(model.predict([test_input])[0][0]).astype(int))
2 x 3 = 6
SGD also works, but it requires standardization/normalization, which kind of defeats the purpose of your task, so I changed it. But it also works.
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
x = np.random.randint(0, 10, 1000)
y = np.random.randint(0, 10, 1000)
s = x*y
x = x/10
y = y/10
def build_model():
model = keras.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=[2]))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer=keras.optimizers.SGD(0.001), loss='mean_squared_error')
return model
model = build_model()
merged_array = np.stack([x, y], axis=1)
history = model.fit(merged_array, s, epochs=250,
validation_split=0.2, batch_size=16)
test_input = [2/10, 3/10]
print('\n{} x {} ='.format(*map(lambda l: int(l*10), test_input)),
np.round(model.predict([test_input])[0][0]).astype(int))
i noticed a couple of issues with your model:
Your input layer is not an input. You do not need to have a designated input layer in this case. The arguement input_shape=[2] is sufficient to add a proper input layer before this layer.
You do not determine any batchsize in the fit function: batches are usually a small subset of your training and validation set (commonly some base-2 numbers like 4, 8, 16, 32, ...). During training not only one sample of your set is used for backpropagating and adjusting your weights (aka "learning") but in batches, which makes it faster. Since your input data are two single floating numbers (I assume) you can choose a really high batchsize like 1024 or higher. The batch size belongs to the so called hyperparameter, which affect your overall training success.
history = model.fit(merged_array, s, batch_size=1024, epochs=EPOCHS, validation_split=0.2, verbose=2)
During training you track the "accuracy" metric. As you are working on a regression problem, this is not helping you in estimating your model's performance. (Accuracy is used for classification problems) You can leave it out
I cannnot give you more specific advice with knowledge about the data you are using, how many, datapoints you have and what kind of numbers you want to multiply (bounded to numbers between 0 and 10, float or integeres,...)
Hope this helps sofar (;
Related
I am writing a neural network in keras. I want to modify the loss function so that I can use the array (in the shape of a gradient array) of parameters as additional tool to modify the cost function.
To be precise, I'd like to use the variance of the gradients from past training. Parameters that have a high gradient variance - let's call it h, are assumed to be parameters that hold the features.
I would like the cost function to use parameters whose h value is as small as possible when training new features - for this I have to modify the cost functions for the parameter like this:
Loss (parameter) = Standard_loss (y, y_pred) + h * (parameter - old parameter) ** 2
I would very much like to ask for an answer.
Here is an excerpt from my code:
from keras import models
from keras.datasets import mnist
import tensorflow as tf
import matplotlib.pyplot as plt
from keras import backend as K
#I import CIFAR 10 dataset
from tensorflow.keras.datasets import cifar10
from keras.utils.np_utils import to_categorical
train_y = to_categorical(train_y, num_classes=10, dtype='float32')
test_y = to_categorical(test_y, num_classes=10, dtype='float32')
train_X = K.cast(train_X, dtype='float32')
test_X = K.cast(test_X, dtype='float32')
def get_model():
model = models.Sequential()
model.add(layers.Conv2D(1, 5, (1,1), input_shape=(32,32,3,), padding='same'))
model.add(layers.MaxPooling2D())
model.add(layers.ReLU())
model.add(layers.Conv2D(4, 5, (2,2), padding='same'))
model.add(layers.MaxPooling2D())
model.add(layers.ReLU())
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='sigmoid'))
model.add(layers.Dense(10, activation='linear'))
model.add(layers.Softmax())
print(model.summary())
return model
model = get_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_X, train_y, epochs=50, validation_split=0.2)
weights = model.get_weights()
Unfortunately, I don't know how to take the gradient from the weights :/
I want get a gradient table for each parameter for a single training example. I do not mean the total gradient of the cost function as mentioned elsewhere on the internet.
From what I can see, the cost function is modifiable, but it only takes y_pred and y_true. How could I input something that corresponds to the weights (but it is not a weight)?
Thanks in advance!
I have made a simple NN for deciding the XNOR values with the Two Binary values in the Input layer.
I have the Numpy array of all the possible combinations with the lables.
Code :
from keras.models import Sequential
from keras.layers import Dense
import numpy
data = numpy.array([[0.,0.,1.],[0.,1.,0.],[1.,0.,0.],[1.,1.,1.]])
train = data[:,:-1] # Taking The same and All data for training
test = data[:,:-1]
train_l = data[:,-1]
test_l = data[:,-1]
train_label = []
test_label = []
for i in train_l:
train_label.append([i])
for i in test_l:
test_label.append([i]) # Just made Labels Single element...
train_label = numpy.array(train_label)
test_label = numpy.array(test_label) # Numpy Conversion
model = Sequential()
model.add(Dense(2,input_dim = 2,activation = 'relu'))
model.add(Dense(2,activation = 'relu'))
model.add(Dense(1,activation = 'relu'))
model.compile(loss = "binary_crossentropy" , metrics = ['accuracy'], optimizer = 'adam')
model.fit(train,train_label, epochs = 10, verbose=2)
model.predict_classes(test)
Even if taking the Same dataset to train and to test... It doesn't predict properly ...
Where was I wrong ?
I have taken whole dataset deliberately as it wasn't predicting with 2 values...
Your architecture is just too simple for this function. If you use the architecture below and train for 100 epochs, you'll get accuracy = 1.
model = Sequential()
model.add(Dense(20,input_dim = 2,activation = 'relu'))
model.add(Dense(20,activation = 'relu'))
model.add(Dense(1,activation = 'sigmoid'))
UPD:
Why a simple model doesn't work that well?
One reason is that with a ReLU activation, if one neuron becomes negative on every data point, its gradient becomes zero, and its weights don't train any more. You have few neurons the start, and if some of them "die" this way, the remaining neurons may not be enough to approximate the function.
Another problem is that fewer neurons make it more likely for a model to get stuck in a local minimum.
However, you are right that theoretically, just a few neurons should be enough.
The model below works even with just one layer. I've replaced ReLU with LeakyReLU to remedy the first problem. It works most of the time, but sometimes gets stuck in a local minimum.
model = Sequential()
model.add(Dense(2,input_dim = 2,activation = LeakyReLU(alpha=0.3)))
model.add(Dense(1,activation = 'sigmoid'))
optimizer = Adam(lr=0.01)
model.compile(loss = "binary_crossentropy" , metrics = ['accuracy'], optimizer=optimizer)
model.fit(train,train_label, epochs = 500, verbose=2)
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv('insurance.csv')
X = df.drop(['sex', 'children', 'smoker', 'region'], axis = 1)
X = X.values
y = df['charges']
y = y.values.reshape(1331,1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 75)
from keras import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(5, activation = 'sigmoid'))
model.add(Dense(4, activation = 'sigmoid'))
model.add(Dense(1, activation = 'sigmoid'))
from keras import optimizers
sgd = optimizers.SGD(lr=0.1)
model.compile(sgd, 'mse')
model.fit(X_train, y_train, 32, 100, shuffle=False)
This right here is my code, the data I am feeding in is all numerical, and I have tried different hyperparameters, nothing seems to work.
Any help would be much appreciated.
I just don't know what is going wrong over here.
If you are indeed in a regression setting (as implied by your choice of loss, MSE) and not in a classification one, the basic mistake in your code is the activation of your last layer, which should be linear:
model.add(Dense(1, activation = 'linear'))
Of course, there can be several other things going wrong with your approach, including the architecture of your model itself (there is not any kind of "guarantee" that, whatever model architecture you throw in your data, it will produce decent results, and your model looks too simple), the activation functions of the other layers (usually today we start with relu) etc., but it is impossible to say more without knowing your data.
I understand that the features extracted from an auto-encoder can be fed into an mlp for classification or regression purpose. This is something that I did earlier.
But what if I have 2 auto-encoders? Can I extract the features from the bottleneck layers of 2 auto-encoders and feed them into an mlp which performs classification based on these features? If yes, then how? I am not sure how to concatenate these two feature sets. I tried with numpy.hstack() which gives me 'unhashable slice' error, whereas, using tf.concat() gives me the error 'Input tensors to a Model must be Keras tensors.' the bottleneck layers of the two auto-encoders are of dimension (None,100) each. So, essentially, if I stack them horizontally, I should be getting a (None, 200). The hidden layer of the mlp may contain some (num_hidden=100) neurons. Could anyone please help?
x1 = autoencoder1.get_layer('encoder2').output
x2 = autoencoder2.get_layer('encoder2').output
#inp = np.hstack((x1, x2))
inp = tf.concat([x1, x2], 1)
x = tf.concat([x1, x2], 1)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
y = Dense(1, activation='sigmoid', name='prediction')(h)
mymlp = Model(inputs=inp, outputs=y)
# Compile model
mymlp.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train model
mymlp.fit(x_train, y_train, epochs=20, batch_size=8)
updated as per #twolffpiggott's suggestion:
from keras.layers import Input, Dense, Dropout
from keras import layers
from keras.models import Model
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import numpy as np
x1 = Data1
x2 = Data2
y = Data3
num_neurons1 = x1.shape[1]
num_neurons2 = x2.shape[1]
# Train-test split
x1_train, x1_test, x2_train, x2_test, y_train, y_test = train_test_split(x1, x2, y, test_size=0.2)
# scale data within [0-1] range
scalar = MinMaxScaler()
x1_train = scalar.fit_transform(x1_train)
x1_test = scalar.transform(x1_test)
x2_train = scalar.fit_transform(x2_train)
x2_test = scalar.transform(x2_test)
x_train = np.concatenate([x1_train, x2_train], axis =-1)
x_test = np.concatenate([x1_test, x2_test], axis =-1)
# Auto-encoder1
encoding_dim1 = 500
encoding_dim2 = 100
input_data = Input(shape=(num_neurons1,))
encoded = Dense(encoding_dim1, activation='relu', name='encoder1')(input_data)
encoded1 = Dense(encoding_dim2, activation='relu', name='encoder2')(encoded)
decoded = Dense(encoding_dim2, activation='relu', name='decoder1')(encoded1)
decoded = Dense(num_neurons1, activation='sigmoid', name='decoder2')(decoded)
# this model maps an input to its reconstruction
autoencoder1 = Model(inputs=input_data, outputs=decoded)
autoencoder1.compile(optimizer='sgd', loss='mse')
# training
autoencoder1.fit(x1_train, x1_train,
epochs=100,
batch_size=8,
shuffle=True,
validation_data=(x1_test, x1_test))
# Auto-encoder2
encoding_dim1 = 500
encoding_dim2 = 100
input_data = Input(shape=(num_neurons2,))
encoded = Dense(encoding_dim1, activation='relu', name='encoder1')(input_data)
encoded2 = Dense(encoding_dim2, activation='relu', name='encoder2')(encoded)
decoded = Dense(encoding_dim2, activation='relu', name='decoder1')(encoded2)
decoded = Dense(num_neurons2, activation='sigmoid', name='decoder2')(decoded)
# this model maps an input to its reconstruction
autoencoder2 = Model(inputs=input_data, outputs=decoded)
autoencoder2.compile(optimizer='sgd', loss='mse')
# training
autoencoder2.fit(x2_train, x2_train,
epochs=100,
batch_size=8,
shuffle=True,
validation_data=(x2_test, x2_test))
# MLP
num_hidden = 100
encoded1.trainable = False
encoded2.trainable = False
encoded1 = autoencoder1(autoencoder1.inputs)
encoded2 = autoencoder2(autoencoder2.inputs)
concatenated = layers.concatenate([encoded1, encoded2], axis=-1)
x = Dropout(0.2)(concatenated)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
h = Dropout(0.5)(h)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
# Compile model
myMLP.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Training
myMLP.fit(x_train, y_train, epochs=200, batch_size=8)
# Testing
myMLP.predict(x_test)
giving me an error: unhashable type: 'list' from the line:
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
The problem is that you're mixing numpy arrays with keras tensors. This can't go.
There are two approaches.
Predict numpy arrays from each autoencoder, concat the arrays, send them to the third model
Connect all models, probably make the autoencoders untrainable, fit with one input for each autoencoder.
Personally, I'd go for the first. (Assuming the autoencoders are already trained and don't need change).
First approach
numpyOutputFromAuto1 = autoencoder1.predict(numpyInputs1)
numpyOutputFromAuto2 = autoencoder2.predict(numpyInputs2)
inputDataForThird = np.concatenate([numpyOutputFromAuto1,numpyOutputFromAuto2],axis=-1)
inputTensorForMlp = Input(inputsForThird.shape[1:])
h = Dense(num_hidden, activation='relu', name='hidden')(inputTensorForMlp)
y = Dense(1, activation='sigmoid', name='prediction')(h)
mymlp = Model(inputs=inputTensorForMlp, outputs=y)
....
mymlp.fit(inputDataForThird ,someY)
Second Approach
This is a little more complicated, and at first I don't see much reason to do this. (But of course there may be cases where it's a good choice)
Now we're totally forgetting numpy and working with keras tensors.
Creating the mlp on its own (good if you will use it later without the autoencoders):
inputTensorForMlp = Input(input_shape_compatible_with_concatenated_encoder_outputs)
x = Dropout(0.2)(inputTensorForMlp)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
h = Dropout(0.5)(h)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
We probably want the bottleneck features of the autoencoders, right? If you happened to create the autoencoders properly with: encoder model, decoder model, join both, then it's easier to use just the encoder model. Else:
encodedOutput1 = autoencoder1.layers[bottleneckLayer].outputs #or encoder1.outputs
encodedOutput2 = autoencoder1.layers[bottleneckLayer].outputs #or encoder2.outputs
Creating a joined model. The concatenation must use a keras layer (we're working with keras tensors):
concatenated = Concatenate()([encodedOutput1,encodedOutput2])
output = myMLP(concatenated)
joinedModel = Model([autoencoder1.input,autoencoder2.input],output)
I'd also go with Daniel's first approach (for simplicity and efficiency), but if you're interested in the second; for instance if you're interested in running the network end-to-end, you'd approach it like this:
# make autoencoders not trainable
autoencoder1.trainable = False
autoencoder2.trainable = False
encoded1 = autoencoder1(kerasInputs1)
encoded2 = autoencoder2(kerasInputs2)
concatenated = layers.concatenate([encoded1, encoded2], axis=-1)
h = Dense(num_hidden, activation='relu', name='hidden')(concatenated)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model([input_data1, input_data2], y)
myMLP.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Training
myMLP.fit([x1_train, x2_train], y_train, epochs=200, batch_size=8)
# Testing
myMLP.predict([x1_test, x2_test])
Key edits
The weights of both autoencoders should be frozen end-to-end (otherwise early-stage gradient updates from the randomly initialized MLP will likely result in the loss of much of their learning).
The autoencoder input layers should be assigned to separate variables input_data1 and input_data2 per autoencoder (instead of both to input_data). Even though autoencoder1.inputs returns a tf tensor, this is the source of the unhashable type: list exception, and replacing with [input_data1, input_data2] solves the issue.
When fitting the MLP for the end-to-end model, the input should be a list of x1_train and x2_train rather than the concatenated inputs. Same when predicting.
I am trying to train a deep neural network using transfer learning in Keras with tensorflow. There are different ways to do that, if your data is small you can afford computing features using the pre-trained model for the entire data and then use those features to train and test a small network, this is good as you don't need to compute those features for each batch and at each epoch. However, if the data is large, it will be impossible to compute features for the entire data, in this case we use ImageDataGenerator, flow_from_directory and fit_generator. In this case features are computed each time fore each batch at each epoch which make things much slower. I was assuming that both approaches produce similar results in terms of accuracy and loss. The problem is that I took a small data-set and tried both approaches and got completely different results. I will appreciate if someone can tell if something is wrong in the provided code and/or why I am getting different results please?
Approach when having large data-set:
from keras.applications.inception_v3 import InceptionV3,preprocess_input
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
datagen= ImageDataGenerator(preprocessing_function=preprocess_input)
train_generator = datagen.flow_from_directory('data/train',
class_mode='categorical',
batch_size=64,...)
vaild_generator = datagen.flow_from_directory('data/valid',
class_mode='categorical',
batch_size=64,...)
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = Conv2D(filters = 128 , kernel_size = (2,2)) (x)
x = MaxPooling2D()(x)
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',...)
model.fit_generator(generator = train_generator,
steps_per_epoch = len (train_generator),
validation_data = valid_generator ,
validation_steps = len(valid_generator),
...)
Approach when having small data-set:
from keras.applications.inception_v3 import InceptionV3,preprocess_input
from keras.models import Sequential
from keras.utils import np_utils
base_model = InceptionV3(weights='imagenet', include_top=False)
train_features = base_model.predict(preprocess_input(train_data))
valid_features = base_model.predict(preprocess_input(valid_data))
model = Sequential()
model.add(Conv2D(filters = 128 , kernel_size = (2,2),
input_shape=(train_features [1],
train_features [2],
train_features [3])))
model.add(MaxPooling2D())
model.add(GlobalAveragePooling2D())
model.add(Dense(1024, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',...)
model.fit(train_features, np_utils.to_categorical(y_train,2),
validation_data=(valid_features, np_utils.to_categorical(y_valid,2)),
batch_size=64,...)