I have tried the example with keras but was not with LSTM. My model is with LSTM in Tensorflow and I am willing to predict the output in the form of classes as the keras model thus with predict_classes.
The Tensorflow model I am trying is something like this:
seq_len=10
n_steps = seq_len-1
n_inputs = x_train.shape[2]
n_neurons = 50
n_outputs = y_train.shape[1]
n_layers = 2
learning_rate = 0.0001
batch_size =100
n_epochs = 1000
train_set_size = x_train.shape[0]
test_set_size = x_test.shape[0]
tf.reset_default_graph()
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_outputs])
layers = [tf.contrib.rnn.LSTMCell(num_units=n_neurons,activation=tf.nn.sigmoid, use_peepholes = True) for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons])
stacked_outputs = tf.layers.dense(stacked_rnn_outputs, n_outputs)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])
outputs = outputs[:,n_steps-1,:]
loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
I am encoding the with sklearn LabelEncoder as:
encoder_train = LabelEncoder()
encoder_train.fit(y_train)
encoded_Y_train = encoder_train.transform(y_train)
y_train = np_utils.to_categorical(encoded_Y_train)
The data is converted to sparse matrix kinda thing in binary format.
When I tried to predict the output I got the following:
actual==> [[0. 0. 1.]
[1. 0. 0.]
[1. 0. 0.]
[0. 0. 1.]
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]
[0. 1. 0.]
[0. 1. 0.]]
predicted==> [[0.3112209 0.3690182 0.31357136]
[0.31085992 0.36959863 0.31448898]
[0.31073445 0.3703295 0.31469804]
[0.31177694 0.37011752 0.3145326 ]
[0.31220382 0.3692756 0.31515726]
[0.31232828 0.36947766 0.3149037 ]
[0.31190437 0.36756667 0.31323162]
[0.31339088 0.36542615 0.310322 ]
[0.31598282 0.36328828 0.30711085]]
What I was expecting for the label based on the encoding done. As the Keras model thus. See the following:
predictions = model.predict_classes(X_test, verbose=True)
print("REAL VALUES:",reverse_category(Y_test,axis=1))
print("PRED VALUES:",predictions)
print("REAL COLORS:")
print(encoder.inverse_transform(reverse_category(Y_test,axis=1)))
print("PREDICTED COLORS:")
print(encoder.inverse_transform(predictions))
The output is something like the following:
REAL VALUES: [1 1 1 ... 1 2 1]
PRED VALUES: [2 1 1 ... 1 2 2]
REAL COLORS:
['ball' 'ball' 'ball' ... 'ball' 'bat' 'ball']
PREDICTED COLORS:
['bat' 'ball' 'ball' ... 'ball' 'bat' 'bat']
Kindly, let me know what I can do in the tensorflow model that will get me the result with respect to the encoding done.
I am using Tensorflow 1.12.0 and Windows 10
You are trying to map the predicted class probabilities back to class labels. Each row in the list of output predictions contains the three predicted class probabilities. Use np.argmax to obtain the one with the highest predicted probability in order to map to the predicted class label:
import numpy as np
predictions = [[0.3112209, 0.3690182, 0.31357136],
[0.31085992, 0.36959863, 0.31448898],
[0.31073445, 0.3703295, 0.31469804],
[0.31177694, 0.37011752, 0.3145326 ],
[0.31220382, 0.3692756, 0.31515726],
[0.31232828, 0.36947766, 0.3149037 ],
[0.31190437, 0.36756667, 0.31323162],
[0.31339088, 0.36542615, 0.310322 ],
[0.31598282, 0.36328828, 0.30711085]]
np.argmax(predictions, axis=1)
Gives:
array([1, 1, 1, 1, 1, 1, 1, 1, 1])
In this case, class 1 is predicted 9 times.
As noted in the comments: this is exactly what Keras does under the hood, as you'll see in the source code.
Related
I have a GRU model
GRU = keras.models.Sequential([keras.layers.GRU(32),
keras.layers.Dense(32, activation= 'relu'),
keras.layers.Dense(1, activation=None)])
GRU.compile(loss="mae", optimizer="adam")
resultsGRU = GRU.fit_generator(generator = train, validation_data = train, epochs = 3, verbose= 1, shuffle = False)
If I convert train data to numpy array, I can see I don't have any zero or Nan values (I also dropna values before)
trainArray= np.array(train)
print(trainArray)
I copied only a part of array, just so you can see the values:
[[array([[[-0.86286026, 0.51805955, 1.0427724 , ..., 0.27464896,
0.08823532, -1.1183959 ],
[-0.3186916 , 0.00295895, 0.740636 , ..., 0.27464896,
0.08823532, -1.1304985 ],
[-0.31057638, 0.00295895, 0.5593542 , ..., 0.27464896,
-0.5521559 , -1.1183959 ],
...,
If I print resultsGRU
print(resultsGRU.history.values())
I get
dict_values([[0.597104012966156, 0.5652544498443604, 0.5574262142181396], [0.6241905093193054, 0.6183988451957703, 0.6134349703788757]])
Then I use predict, but values are returned 0
predictGRU = GRU.predict(test)
print(predictGRU)
[0. 0. 0. ... 0. 0. 0.]
I then save this model and use it for API and the values are NaN.
What is the problem here? How do I get the model to predict a different, reasonable value?
I also use metrics later on
print(metrics.mean_absolute_error(test, predictGRU))
print(metrics.mean_squared_error(test, predictGRU))
print(metrics.explained_variance_score(test, predictGRU))
And I get normal numbers
0.6471065
0.50334525
0.23076766729354858
I don't know how to fix this on my own.
My initial data is:
[[ 8375.5 0. 8374.14285714 8374.14285714]
[ 8354.5 0. 8383.39285714 8371.52380952]
...
[11060. 0. 11055.21428571 11032.53702732]
[11076.5 0. 11061.60714286 11038.39875701]]
I create MinMax scaler to transform data to values from 0 to 1
scaler = MinMaxScaler(feature_range = (0, 1))
T = scaler.fit_transform(T)
So data now is:
[[0.5186697 , 0. , 0.46812344, 0.46950912],
[0.5161844 , 0. , 0.46935928, 0.46915412],
...,
[0.72264636, 0. , 0.6767292 , 0.6807525 ],
[0.7198651 , 0. , 0.6785377 , 0.6833385 ]]
I do some magic to prepare this data for LSTM layer and this is the result:
X_train variable of shape (6989, 4, 200)
[[[0.5186697 0. 0.46812344 ... 0. 0.45496237 0.45219505]
[0.48742527 0. 0.45273864 ... 0. 0.43144143 0.431924 ]
[0.4800284 0. 0.43054438 ... 0. 0.425362 0.4326681 ]
[0.5007989 0. 0.4290794 ... 0. 0.4696839 0.47831726]]
...
[[0.61240304 0. 0.57254803 ... 0. 0.5749577 0.57792616]
[0.61139715 0. 0.5746571 ... 0. 0.5971378 0.6017289 ]
[0.6365465 0. 0.59772 ... 0. 0.62671924 0.63145673]
[0.65719867 0. 0.62684333 ... 0. 0.6757128 0.6772785 ]]]
I process the data using this model with Dense(1) layer at the end:
model = Sequential()
model.add(LSTM(units = 50, activation = 'relu', #return_sequences = True,
input_shape = (X_train.shape[1], window_size)))
model.add(Dropout(0.2))
model.add(Dense(1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
And when I set return_sequences to false the shape of new data after fit is (6989, 1) and when I want to inverse_transform scaler.inverse_transform(train_predict) using this scalar I get an error:
ValueError: non-broadcastable output operand with shape (6989,1) doesn't match the broadcast shape (6989,4)
When I do set return_sequences to true the new shape is (6989, 4, 1) and when I inverse_transform I get other error:
ValueError: Found array with dim 3. None expected <= 2.
============
I think I know why I get these errors, because scaler requires shape of (6989,4), but what and how can I do to transform this data so that I will be able to inverse_transform?
How can I inverse_transform the data of new shape of (6989, 1)?
How can I inverse_transform the data of new shape of (6989, 4, 1)?
Is it doable? My scaler can be used? Or should I create new scaler? Can you suggest something? What am I missing?
I will appreciate any help, thanks!
I was trying to optimize my Code when I encountered this strange behaviour with the following model in Keras:
# random_minibatches of form (state, action, reward, state_next, done)
random_minibatch = random.sample(list_of_samples, batch_size)
# A state is a list in the form of [x, y]
next_states = [temp[3] for temp in random_minibatch]
# Reshape the next_states for the model
next_states = np.reshape(next_states, [-1, 2])
next_states_preds = model.predict(next_states)
for i, (_, _, _, state_next, _) in enumerate(random_minibatch):
state_next= np.reshape(state_next, [1, 2])
pred = model.predict(state_next)
print("inputs: {} ; {}".format(next_states[i], state_next))
print(pred)
print(next_states_preds[i])
print("amax: {} ; {}".format(np.amax(pred), np.amax(next_states_preds[i])))
print()
and a simple model:
model = Sequential()
model.add(layers.Dense(16, activation="relu", input_dim=2))
model.add(layers.Dense(32, activation="relu"))
model.add(layers.Dense(8))
model.compile(loss="mse", optimizer=Adam(lr=0.00025))
next_states is a list of lists in the form of [[x1, y1], [x2, y2], ...]
and state_next is a list in the form of [x, y]
As you can see, next_states contains every state_next in the for-loop and the input for my model is the same. The only difference is that in the first time I put the whole list of lists in the model and the second time I put the lists in one by one.
My Problem is that I get different outputs from the same input.
An example of the printed output would be:
inputs: [39 -7] ; [39 -7]
[0. 0. 0. 0. 0. 5.457102 0. 0.]
[[0. 0. 0. 0. 0. 5.4571013 0. 0.]]
amax: 5.457101345062256 ; 5.457101821899414
So at this point I'm not sure if I misunderstood something or just did something wrong somewhere? I would be very glad if someone could help me with that strange behaviour.
I am testing keras layer. I have built a simple dense layer with input shape is (10,2) and all value equals 1. And I use zero_initial_state to initial layer weights. However, I could not understand the output of the dense layer since it may compute the final outputs with sth. unknown. My code is:
batch_size = 10
time_steps = 30
label_num = 2.
units = 5
batch_data = tf.ones((batch_size, label_num))
dense_layer = Dense(units)
output = dense_layer(batch_data)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
print('__________________output_____________________')
print(sess.run(output))
I print the intial kernel and bias:
____________________self.kernel____________________
[[-0.6072792 0.87520194 -0.5916964 -0.28233814 0.37042332]
[ 0.24503589 -0.8950937 -0.7122175 0.67322683 0.9035703 ]]
____________________self.bias____________________
[0. 0. 0. 0. 0.]
I think the final output should be:
[[-0.3622433 -0.01989174 -1.3039138 0.3908887 1.2739936 ]
[-0.3622433 -0.01989174 -1.3039138 0.3908887 1.2739936 ]
[-0.3622433 -0.01989174 -1.3039138 0.3908887 1.2739936 ]
[-0.3622433 -0.01989174 -1.3039138 0.3908887 1.2739936 ]
....
However, the final output is:
[[-0.25280607 1.0728977 -0.6096982 1.1957564 0.82103825]
[-0.25280607 1.0728977 -0.6096982 1.1957564 0.82103825]
[-0.25280607 1.0728977 -0.6096982 1.1957564 0.82103825]
Activation is None. Why the output of the keras dense layer is this ?
I have the following code based on the MNIST example. It is modified in two ways:
1) I'm not using a one-hot-vector, so I simply use tf.equal(y, y_)
2) My results are binary: either 0 or 1
import tensorflow as tf
import numpy as np
# get the data
train_data, train_results = get_data(2000, 2014)
test_data, test_results = get_data(2014, 2015)
# setup a session
sess = tf.Session()
x_len = len(train_data[0])
y_len = len(train_results[0])
# make placeholders for inputs and outputs
x = tf.placeholder(tf.float32, shape=[None, x_len])
y_ = tf.placeholder(tf.float32, shape=[None, y_len])
# create the weights and bias
W = tf.Variable(tf.zeros([x_len, 1]))
b = tf.Variable(tf.zeros([1]))
# initialize everything
sess.run(tf.initialize_all_variables())
# create the "equation" for y in terms of x
y_prime = tf.matmul(x, W) + b
y = tf.nn.softmax(y_prime)
# construct the error function
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(y_prime, y_)
# setup the training algorithm
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
# train the thing
for i in range(1000):
rand_rows = np.random.choice(train_data.shape[0], 100, replace=False)
_, w_out, b_out, ce_out = sess.run([train_step, W, b, cross_entropy], feed_dict={x: train_data[rand_rows, :], y_: train_results[rand_rows, :]})
print("%d: %s %s %s" % (i, str(w_out), str(b_out), str(ce_out)))
# compute how many times it was correct
correct_prediction = tf.equal(y, y_)
# find the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print(sess.run(accuracy, feed_dict={x: test_data, y_: test_results}))
for i in range(0, len(test_data)):
res = sess.run(y, {x: [test_data[i]]})
print("RES: " + str(res) + " ACT: " + str(test_results[i]))
The accuracy is always 0.5 (because my test data has about as many 1s as 0s). The values of W and b always seem to increase, probably because the values of cross_entropy are always a vector of all zeros.
When I try and use this model for prediction, the predictions are always 1:
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
What am I doing wrong here?
You seem to be predicting a single scalar, rather than a vector. The softmax op produces a vector-valued prediction for each example. This vector must always sum to 1. When the vector only contains one element, that element must always be 1. If you want to use a softmax for this problem, you could use [1, 0] as the output target where you are currently using [0] and use [0, 1] where you are currently using [1]. Another option is you could keep using just one number, but change the output layer to sigmoid instead of softmax, and change the cost function to be the sigmoid-based cost function as well.