I learning python by using of the Keras high-level neural networks library.
I made a simple neural network with only one neuron for a linear classification problem for 2 inputs and one outputs.
My network is working well, but if I want to calculate the prediction self (to demonstrate the working of the NN), using the weights of the network, there will be a little difference between the own "implementation" of neural network's firing and Keras's. E.g.:
Using predict() method on trained network:
testset = np.array([[5],[1]])
prediction = model.predict(testset.transpose())
print prediction
The result is:
[[ 0.22708023]]
Calculate the result self:
# get the weights of the neural network
for layer in model.layers:
weights = layer.get_weights()
# the math of the prediction
prediction_calc = testset[0]*weights[0][0]+testset[1]*weights[0][1]+1*weights[1]
print prediction_calc
The reusult is:
[ 0.22708024]
Why is this little difference between the two values? The neural network should do more as I made at calculating the prediction_calc variable.
I think it should be something with casting between variables. If I print the weights variable, I see, it is a matrices of float32 values.
[array([[-0.07256483],
[ 0.02924729]], dtype=float32), array([ 0.56065708], dtype=float32)]
I don't find why is this difference, it should be a simple rounding error, but I don't know how to avoid it.
For some help here is the whole code:
import numpy as np
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# load and transpose input data set
inputset = np.loadtxt("learningsets/hyperplane2d_INPUTS.csv", delimiter=",")
inputset = inputset.transpose()
# load output data set
outputset = np.loadtxt("learningsets/hyperplane2d_OUTPUTS.csv", delimiter=",")
outputset = outputset
# build the simple NN
from keras.models import Sequential
model = Sequential()
from keras.layers import Dense
# The neuron
# Dense: 2 inputs, 1 outputs . Linear activation
model.add(Dense(output_dim=1, input_dim=2, activation="linear"))
model.compile(loss='mean_squared_error', metrics=['accuracy'], optimizer='sgd')
model.fit(inputset, outputset, nb_epoch=20, batch_size=10)
# calculate prediction for one input
testset = np.array([[5],[1]])
predictions = model.predict(testset.transpose())
print predictions
# get the weights of the neural network
for layer in model.layers:
weights = layer.get_weights()
# the math of the prediction
prediction_calc = testset[0]*weights[0][0]+testset[1]*weights[0][1]+1*weights[1]
print prediction_calc
Related
I am working on CNN model using Tensorflow frames in google collab. I am unable to extract the latent vectors from the convolutional layers. I want to extract the output of the convolutional layers, the layers before fully connected layer.
I have tried with the following code
a = dropout()(classifier_model.output)
print(a)
I am unable to understand the solution suggested on the link Stackoverflow solution to print the value of tensorflow object after applying a-conv-pool-layer
Anyone with any suggestion?
You can use get_layer method of the Model class to get a layer by its name, find bellow an example with a dummy 1D CNN and a binary classifier :
timesteps = 100
nfeatures = 2
# build the model using the functional API
# example of a 1D CNN inspired by the your stack overflow link, but using a model instead of successive *raw* layers
# the values of the Conv1D filters and kernels are different
input = Input((timesteps, nfeatures))
p = Conv1D(filters=16, kernel_size=10)(input)
p = ReLU()(p)
p = MaxPool1D(pool_size=2)(p)
p = Conv1D(filters=32, kernel_size=10)(p)
p = ReLU()(p)
p = MaxPool1D(pool_size=2)(p)
p = Conv1D(filters=64, kernel_size=10)(p)
p = ReLU()(p)
p = MaxPool1D(pool_size=2, name='conv1Dfeat')(p) # give a name to the CNN output
# fully connected part
p = Flatten()(p)
p = Dense(10)(p)
# could add a dropout layer to ease optimization
finaloutput = Dense(1, activation='sigmoid')(p)
# full model
model = Model(inputs=input, outputs=finaloutput)
# compile network, i.e. define optimizer, loss and metrics
model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
You need to train the model using the fit method with some data. Then you can get the output of the layer which name is conv1Dfeat (the last layer of the convolutive part) by defining the model:
modelCNN = Model(inputs=input, outputs=model.get_layer('conv1Dfeat').output)
modelCNN.summary()
If you want to get the output of the convolutive part, let's say based on a single numpy input array of shape (timesteps, nfeatures), you can use the predict of the Model class on batched data:
data = np.random.normal(size=(timesteps, nfeatures)) # dummy data
data_tf = tf.expand_dims(data, axis=0) # convert to TF tensor and add batch dimension at the same time
cnn_out_np = modelCNN.predict(data_tf)
cnn_out_np = np.squeeze(cnn_out_np, axis=0) # remove batch dimension
print(cnn_out_np.shape)
(4, 64)
I have a list of 70k datas of dimension 25000 that I'm trying to feed into a neural network to get a classification of 20 different things. The system run out of memory before I can do anything. So I came out with this idea. So I divide the set of features into 5 5000. Then I feed each of the 5 set of data of dimension 5000 into a neural network. Then the result of prediction is the average of the 5.
Here is how I separated the datas:
datas_feature1=datas[:,0:5000]
datas_feature2=datas[:,5000:10000]
datas_feature3=datas[:,10000:15000]
datas_feature4=datas[:,15000:20000]
datas_feature5=datas[:,20000:25000]
Then I feed each of them into a neural network:
model1 = Sequential()
model1.add(layers.Dense(300, activation = "relu", input_shape=(5000,)))
# Hidden - Layers
model1.add(layers.Dropout(0.4, noise_shape=None, seed=None))
model1.add(layers.Dense(20, activation = "softmax"))
model1.summary()
model1.compile(loss="categorical_crossentropy",
optimizer="adam",
metrics=['accuracy'])
model1.fit( np.array(vectorized_training1), np.array(y_train_neralnettr),
batch_size=2000,
epochs=10,
verbose=1,
validation_data=(np.array(vectorized_validation1), np.array(y_validation_neralnet)))
predict1= model1.predict(np.array(vectorized_validation1))
I have this same code for model2 neural network trained on feature2 dataset and so on.
And in the end, I took the average of the predictions.
Here is my question: prediction from neural network gives an elementary vector and not a probability. So taking the average of the predictions is going to return a vector of 1s and 0s. How can I change so that I actually get a probability of being one of the 20 classes for each prediction?
Is my method good to try?
can you give me sone reference?
The problem is the following. I have a categorical prediction task of vocabulary size 25K. On one of them (input vocab 10K, output dim i.e. embedding 50), I want to introduce a trainable weight matrix for a matrix multiplication between the input embedding (shape 1,50) and the weights (shape(50,128)) (no bias) and the resulting vector score is an input for a prediction task along with other features.
The crux is, I think that the trainable weight matrix varies for each input, if I simply add it in. I want this weight matrix to be common across all inputs.
I should clarify - by input here I mean training examples. So all examples would learn some example specific embedding and be multiplied by a shared weight matrix.
After every so many epochs, I intend to do a batch update to learn these common weights (or use other target variables to do multiple output prediction)
LSTM? Is that something I should look into here?
With the exception of an Embedding layer, layers apply to all examples in the batch.
Take as an example a very simple network:
inp = Input(shape=(4,))
h1 = Dense(2, activation='relu', use_bias=False)(inp)
out = Dense(1)(h1)
model = Model(inp, out)
This a simple network with 1 input layer, 1 hidden layer and an output layer. If we take the hidden layer as an example; this layer has a weights matrix of shape (4, 2,). At each iteration the input data which is a matrix of shape (batch_size, 4) is multiplied by the hidden layer weights (feed forward phase). Thus h1 activation is dependent on all samples. The loss is also computed on a per batch_size basis. The output layer has a shape (batch_size, 1). Given that in the forward phase all the batch samples affected the values of the weights, the same is true for backdrop and gradient updates.
When one is dealing with text, often the problem is specified as predicting a specific label from a sequence of words. This is modelled as a shape of (batch_size, sequence_length, word_index). Lets take a very basic example:
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
sequence_length = 80
emb_vec_size = 100
vocab_size = 10_000
def make_model():
inp = Input(shape=(sequence_length, 1))
emb = Embedding(vocab_size, emb_vec_size)(inp)
emb = Reshape((sequence_length, emb_vec_size))(emb)
h1 = Dense(64)(emb)
recurrent = LSTM(32)(h1)
output = Dense(1)(recurrent)
model = Model(inp, output)
model.compile('adam', 'mse')
return model
model = make_model()
model.summary()
You can copy and paste this into colab and see the summary.
What this example is doing is:
Transform a sequence of word indices into a sequence of word embedding vectors.
Applying a Dense layer called h1 to all the batches (and all the elements in the sequence); this layer reduces the dimensions of the embedding vector. It is not a typical element of a network to process text (in isolation). But this seemed to match your question.
Using a recurrent layer to reduce the sequence into a single vector per example.
Predicting a single label from the "sentence" vector.
If I get the problem correctly you can reuse layers or even models inside another model.
Example with a Dense layer. Let's say you have 10 Inputs
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
# defining 10 inputs in a List with (X,) shape
inputs = [Input(shape = (X,),name='input_{}'.format(k)) for k in
range(10)]
# defining a common Dense layer
D = Dense(64, name='one_layer_to_rule_them_all')
nets = [D(inp) for inp in inputs]
model = Model(inputs = inputs, outputs = nets)
model.compile(optimizer='adam', loss='categorical_crossentropy')
This code is not going to work if the inputs have different shapes. The first call to D defines its properties. In this example, outputs are set directly to nets. But of course you can concatenate, stack, or whatever you want.
Now if you have some trainable model you can use it instead of the D:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
# defining 10 inputs in a List with (X,) shape
inputs = [Input(shape = (X,),name='input_{}'.format(k)) for k in
range(10)]
# defining a shared model with the same weights for all inputs
nets = [special_model(inp) for inp in inputs]
model = Model(inputs = inputs, outputs = nets)
model.compile(optimizer='adam', loss='categorical_crossentropy')
The weights of this model are shared among all inputs.
I'm new in keras and deep learning field. In fact, I want to make a dense vector for each document in my data so that i built a simple autoencoder using keras library.
The input data are normalized using Word2vec with 200 as embedding size and all features are between -1 and 1. I prepared a 3D tensor that contains 137 samples (number of document) with 469 columns (maximum numbers of words) and the third dimension is the embedding size.I used mse loss function and GRU as recurrent neural network. I am having the same vector for all documents as the autoencoder prediction output while loss start with a very low value and became constant after a few number of epochs.
I tried different number of epochs but I got the same thing. I tried also to change the batch size but no change. Can any one help me find the problem please.
input = Input(shape=(469,200))
encoder = GRU(120,activation='sigmoid',dropout=0.2)(input)
neck = Dense(20)(encoder)
decoder1 = RepeatVector(469)(neck)
decoder1 = GRU(120,return_sequences=True,activation='sigmoid',dropout=0.2)(decoder1)
decoder1 = TimeDistributed(Dense(200,activation='tanh'))(decoder1)
model = Model(inputs=input, outputs=decoder1)
model.compile(optimizer='adam', loss='mse')
history = model.fit(x_train, x_train,validation_data=(x_test,x_test) ,epochs=10, batch_size=8)
this is the input data "x_train" :
print(model.predict(x_train)) return this values (same vectors):
Why "model.predict(x_train)" return the same vector for the 137 samples ?
Thank you in advance.
I have some trouble understanding LSTM models in TensorFlow.
I use the tflearn as a wrapper, as it does all the initialization and other higher level stuff automatically. For simplicity, let's consider this example program. Until line 42, net = tflearn.input_data([None, 200]), it's pretty clear what happens. You load a dataset into variables and make it of a standard length (in this case, 200). Both the input variables and also the 2 classes are, in this case, converted to one-hot vectors.
How does the LSTM take the input? Across how many samples does it predict the output?
What does net = tflearn.embedding(net, input_dim=20000, output_dim=128) represent?
My goal is to replicate the activity recognition dataset in the paper. For example, I would like to input a 4096 vector as input to the LSTM, and the idea is to take 16 of such vectors, and then produce the classification result. I think the code would look like this, but I don't know how the input to the LSTM should be given.
from __future__ import division, print_function, absolute_import
import tflearn
from tflearn.data_utils import to_categorical, pad_sequences
from tflearn.datasets import imdb
train, val = something.load_data()
trainX, trainY = train #each X sample is a (16,4096) nd float64
valX, valY = val #each Y is a one hot vector of 101 classes.
net = tflearn.input_data([None, 16,4096])
net = tflearn.embedding(net, input_dim=4096, output_dim=256)
net = tflearn.lstm(net, 256)
net = tflearn.dropout(net, 0.5)
net = tflearn.lstm(net, 256)
net = tflearn.dropout(net, 0.5)
net = tflearn.fully_connected(net, 101, activation='softmax')
net = tflearn.regression(net, optimizer='adam',
loss='categorical_crossentropy')
model = tflearn.DNN(net, clip_gradients=0., tensorboard_verbose=3)
model.fit(trainX, trainY, validation_set=(testX, testY), show_metric=True,
batch_size=128,n_epoch=2,snapshot_epoch=True)
Basically, lstm takes the size of your vector for once cell:
lstm = rnn_cell.BasicLSTMCell(lstm_size, forget_bias=1.0)
Then, how many time series do you want to feed? It's up to your fed vector. The number of arrays in the X_split decides the number of time steps:
X_split = tf.split(0, time_step_size, X)
outputs, states = rnn.rnn(lstm, X_split, initial_state=init_state)
In your example, I guess the lstm_size is 256, since it's the vector size of one word. The time_step_size would be the max word count in your training/test sentences.
Please see this example: https://github.com/nlintz/TensorFlow-Tutorials/blob/master/07_lstm.py