Ive coded this machine learning algoritm but it retured to me a wierd array. I want to input 2 numbers and then those numbers be clasified into similar results found in Y, How do I make a prediction using this model?
import numpy as np # mutivariate clasification
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
X =np.array(
[[3, 7],
[3, 6],
[3, 7.2],
[6, 8],
[7, 7.5],
[7.9, 7.5]])
Y =np.array([1, 1, 1, 2, 3, 3])
model = Sequential([
Dense(units = 25, activation = "relu"),
Dense(units = 15, activation = "relu"),
Dense(units = 10, activation = "softmax"),])
from keras.losses import SparseCategoricalCrossentropy
model.compile(loss = SparseCategoricalCrossentropy())
model.fit(X, Y, epochs = 100)
I tried this code:
Xpred = [[3,7.8]]
prediction = model.predict(Xpred, verbose = 1)
print(prediction)
and it returned:
[[3.4789115e-02 8.4235787e-01 7.6775238e-02 1.9370530e-02 1.0821970e-02
4.8491983e-03 4.7121649e-03 7.4993627e-04 2.9366722e-04 5.2804169e-03]]
Im new to stack and ML so please let me know how I could improve or if you have any reading materal or resources for ML you could share!
There's a lot to understand here here and I suggest that you work through some more tutorials on classification and follow the steps closely (keras documentation is quite good for this), but I'll attempt to talk you through enough to understand what you're seeing and get your basic example working.
The array of floating point numbers you get at the end is an array of probabilities for each class. There are 10 probabilities because you set the number of units in the output layer to 10, even though you only have 3 classes in your data. I'm guessing that you just want to get a classification for your new set of features ([3, 7.8]), so you take the highest probability. In this case you can see just from inspection that the predicted class is 1 because the highest probability is 8.4235787e-01 which is in the 1st position, but in general you can get this using np.argmax on a numpy array.
Steps to get your code working the way you expect:
Set the number of units in the output layer to the number of classes (3)
Number the classes starting from 0. In general you can use integer encoding for this task (there are tools for this in keras and other ML libraries so look into those), but since the data is very small here we can just do it by hand by replacing 1 with 0, 2 with 1, and 3 with 2.
Apply np.argmax to the arrays you get from model.predict to get the predicted class
The code ends up looking like this:
import numpy as np # mutivariate clasification
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
X =np.array(
[[3, 7],
[3, 6],
[3, 7.2],
[6, 8],
[7, 7.5],
[7.9, 7.5]])
Y =np.array([0, 0, 0, 1, 2, 2])
model = Sequential([
Dense(units = 25, activation = "relu"),
Dense(units = 15, activation = "relu"),
Dense(units = 3, activation = "softmax")
])
from keras.losses import SparseCategoricalCrossentropy
model.compile(loss = SparseCategoricalCrossentropy())
model.fit(X, Y, epochs = 100)
for prediction in model.predict([[3, 7.8]]):
print(prediction)
print(np.argmax(prediction))
The final part of the output is:
[0.916569 0.07700075 0.00643022]
0
So the predicted class is 0 (or 1 based on the original data you posted), which is what we'd expect based on inspection of the training data and new data.
Related
Not really sure whether I'm stupid or not, but shouldn't the values produced by BatchNormalization end up being between -1 and 1? There were already a lot of discussions on Keras BatchNormalization and I couldn't really find what I was looking for. I became suspicious one day and tryed several test scenarios, but none of them produced what I was expecting. I even tried on Google Colab for version problems
EDIT:
So, the question was rather stupid. However, I was more interested in the initial state which is why I set the "lr" so low and was running only one epoch.
btw:
tf.__version__
>>> 2.4.1
Simple test case:
import tensorflow as tf
import numpy as np
# a = (np.arange(25, dtype=np.float32)/50).reshape(1, 5, 5, 1)
a = np.arange(25, dtype=np.float32).reshape(1, 5, 5, 1)
inputs = tf.keras.layers.Input(shape=[5,5,1])
initializer = tf.random_normal_initializer(1.0, 0.002)
loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)
model = tf.keras.Sequential()
model.add(inputs)
model.add(tf.keras.layers.Conv2D(1, 4, strides=2, padding='same', kernel_initializer=initializer, use_bias=False)) # not really necessary
model.add(tf.keras.layers.BatchNormalization(momentum=0.99, epsilon=0.001, center=True, scale=True))
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.000000000001), loss=loss_fn, metrics=['accuracy'])
model.fit(a, a[:, 1:4, 1:4, :], epochs=1, batch_size=1)
print(model(a), 0)
>>> tf.Tensor(
[[[[ 8.615232]
[14.495497]
[ 8.131738]]
[[26.24201 ]
[38.98827 ]
[20.710234]]
[[17.929565]
[25.93689 ]
[13.535995]]]], shape=(1, 3, 3, 1), dtype=float32) 0
Short Answer!! NO!!
You should not expect BatchNormalization to give values between -1 to 1.
Even with Normalised you should not expect values betwen -1 to 1
But after gamma and beta layer it gets inflated again. The kind of values that you are seeing
Because there are two things that are happening in the BatchNormalisation layer.
Normalisation of the Layer using mean and standard $\ddot z$=(z-$\mu$)/$\sigma$
Learning of new parameters \gamma and \beta z_delta=\gamma
*\ddot z+/beta
You can see that Batchnorm layer has 4 parameters 2 untrainable and 2 of them are trainable.
Not able to write properly hence uploading a picture
Tensorflow 2.0.0
python 3.7.6
I'm trying to solve a problem that given a number of inputs it returns multiple classifications in one output. For example, given the input [1, 0, 3000, 200, 250, ... 0, 1], it can return an output that would look like this [0, 1, 0, 1, 1, 1, ... 0, 0]. There are about 319 possible labels, and there can be 1 to 7 of them active in one output as above. In addition, I also want to know quantity, so the output could look like [1, 0, 2, 0, 3, 1, 2, 0, ..., 1].
After some reading, I came up with the following model
...
# This is how i'm normalizing the inputs
train_input = np.array(tf.keras.utils.normalize(input, axis = 1, order = 2).tolist())
test_input = np.array(tf.keras.utils.normalize(test_input, axis = 1, order = 2).tolist())
model = keras.Sequential([
keras.layers.Dense(230, input_shape=(26,)),
keras.layers.Dense(319, activation='relu'),
keras.layers.Dense(319, activation='linear'),
])
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
model.fit(train_input, train_output, validation_data=(test_input, test_output), epochs = 50)
...
I read that since I want multiple labels in one output, the activation function for the output layer should be 'linear'. Also, since I want quantities as an output the loss function should be 'mean square error'. My data set is about 16k.
This network did not perform well. The highest accuracy it could reach. after some tweaks, is about 15%. I figured I was being too ambitious by trying to get quantities and classifications on the same output. I decided to create 7 different similar networks and have the output look more like a traditional classification (only one label active per output [0, 0, 1, 0, 0]). That model looks like this
...
train_input = np.array(tf.keras.utils.normalize(input, axis = 0, order = 2).tolist())
test_input = np.array(tf.keras.utils.normalize(test_input, axis = 0, order = 2).tolist())
model = keras.Sequential([
keras.layers.Dense(230, input_shape=(26,)),
keras.layers.Dense(319, activation='relu'),
keras.layers.Dense(319, activation='softmax'),
])
model.compile(optimizer=keras.optimizers.Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_input, train_output, validation_data=(test_input, test_output), epochs = 20)
This network, with not much optimization, can achieve about 60% (this is not great, but its a much better start). However, this is not ideal as it would not show me quantity and since the 7 outputs are related, It seems to be better to have them in one network.
All this being said, I would like some guidance to improve my original network. What architecture would work better? Is it too much to get a network to classify through quantity? Should I continue the 7 network approach? Any useful readings/tips?
Thanks in advance!
I'm new to Keras. I am trying to implement this model https://www.aclweb.org/anthology/D15-1167 for document classification, and I want to use LSTM for getting sentence representation. I have trained vector representation separately with the skip-gram model on my dataset. now after converting each document to separate sentence and then converting each sentence to separate word and then converting each word to the corresponding integer in the dictionary, I have something for example like this for each document:
[[54,32,13],[21,43,2]...[28,1,9]]
which I should feed each sentence to an LSTM to get a sentence vector and after that I should feed each sentence vector to a diffrent LSTM on the higher layer in order to get a document representation and then apply classification to it. my problem is in the first layer. how should I feed each sentence simultaneously to each LSTM (therefore at each time step each LSTM should be applied to a word vector from each sentence)?
edit: I just used TimeDistributed and it seems like to work although I am not sure if it does what I want. I used time distributed wrapper over embeding layer and then over the first Lstm layer. this is the model that I have implemented (very simple one):
model.add(tf.keras.layers.TimeDistributed(embeding_layer))
model.add(tf.keras.layers.TimeDistributed
(layers.LSTM(50,activation=’relu’)))
model.add(layers.LSTM(50,activation=’relu’))
model.add(layers.Dense(1,activation=’sigmoid’))
Is my interpretation of the network correct?
my interpretation :
my input to the embedding layer is (document, sentences, words). I padded the document to have 30 sentences and I also padded the sentences to have at 200 words. I have 20000 documents so my input shape is (20000,30,200). after feeding it to the network it first go through emeding layer which is 300 length for each word vector. so after applying embeding layer to first docuemnt with shape (1.30,200), then I get (1,30,200,300) which would be the input for the timedistributed LSTM. then time distribut, will make 30 copy of LSTM layer with shared wights where each LSTM will output a sentece vector, and then the next LSTM will be applied to this 30 sentence vectors. am I right ?
The below example might be what you are looking for, or at least point you in the right direction. It's a bit experimental on my part, but I believe it has the right structure. It was created in Google Colab with Tensorflow 2.0. The first section is provided to make the processing reproducible, but the rest illustrates the general idea of using "TimeDistributed Layer" along with masking and padding. BTW - I believe this is a similar idea to what #El Sheikh (first comment above) was providing. Note: I used a SimpleRNN here, but I believe the idea applies to LSTMs as well. I hope this helps get you moving in the right direction.
%tensorflow_version 2.x
import numpy as np
import tensorflow as tf
import random as rn
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
# Force TensorFlow to use single thread.
# Multiple threads are a potential source of non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/
session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.compat.v1.set_random_seed(1234)
sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
tf.compat.v1.keras.backend.set_session(sess)
# The code above here is provided to make the below reproducible each time you
# run.
#
# Main code follows:
from tensorflow import keras
from tensorflow.keras import layers
# Input structure
# Sentence1 ..... SentenceM
# Word11 Word21 Word31 ..... Wordn11 Word11 .... WordnM1
# Word12 Word22 Word32 Wordn12 Word12 WordnM2
# Word13 Word23 Word33 Wordn13 Word13 WordnM3
# example parameters
word_vec_dimension = 3 # dimension of the embedding
sentence_representation = 4 # dimensionality of sentence vector
#
# This represents a single test document.
# Each row is a sentence and the words are represented by 3 dimensionsal
# integer vectors.
#
raw_inputs = [ [ [1, 5, 7], [2, 6, 7] ],
[ [9, 6, 3], [1, 8, 2], [4, 5, 9], [8, 2, 1] ],
[ [1, 6, 2], [4, 2, 9] ],
[ [2, 6, 2], [8, 2, 9] ],
[ [3, 6, 2], [2, 2, 9], [1, 6, 2] ],
]
print(raw_inputs)
# Create the model
#
# Allow for variable number of words per sentence and variable number of
# sentences:
# Input shape(num_samples, [SentenceCount], [WordCount], word_vector_dim)
#
# Note: Using None for Sentence Count, and None for Word count to allow
# for variable sequences length in both these dimensions.
#
inputs = keras.Input(shape=(None, None, word_vec_dimension), name='inputlayer')
x = tf.keras.layers.Masking(mask_value=0.0)(inputs) # Force RNNs to ignore timesteps with zero vectors.
x = tf.keras.layers.TimeDistributed(layers.SimpleRNN(sentence_representation,
use_bias=False,
activation=None),
name='TD1')(x)
outputs = x
# more layers here if needed:
model = tf.keras.Model(inputs=inputs, outputs=outputs, name='Sentiment')
model.compile(optimizer='rmsprop', loss='mse', accuracy='mse' )
model.summary()
# Set up fitting calls
import numpy as np
# document 1
x_train = raw_inputs # use the dummy document for testing
# Set zeros in locations where there is no data to indicate mask to RNN's so
# they ignore that timestep.
padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(x_train,
padding='post')
print(x_train)
# Insert a dummy dimension 1 to represent the sample dimension.
padded_inputs = np.expand_dims(padded_inputs,axis=0)/1.0 # Make float type
print(padded_inputs)
print(padded_inputs.shape)
y_train = np.array([[ 1.0, 2.0, 3.0, 4.0 ]])
print(y_train.shape)
# Train model:
model.fit(padded_inputs,y_train,epochs=1)
print('get_weights:')
print(model.get_layer(name='TD1').get_weights())
print('get_predictions:')
print(model.predict(padded_inputs))
I am very new to Keras and to machine learning in general, but here is what I want. I have a list of inputs (1 value for every input node) and a list of targets (1 value for every output node).
input_list = [1, 0, 1, 0, 1, 0] # maybe longer
wanted_output_list = [1, 0, 0, 0] # also maybe longer
And now I want to give these as input to train my neural network:
# create model
model = Sequential()
# get number of columns in training data
n_cols = 6
# add model layers
model.add(Dense(6, activation='relu', input_shape= (n_cols,)))
model.add(Dense(2, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(3))
# compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# train model
model.fit(input_list, wanted_output_list, validation_split=0.2, epochs=30)
However I get this error:
ValueError: Error when checking input: dense_1_input to have shape (6,) but got with shape (1,)
Can anyone please tell me why and how I can fix this?
When defining your model, you specified a model that accepts an input with 6 features, and output a vector with 3 component. You training data, however, is not shaped correctly (nor your labels, by the way). You should shape your data the way you have defined your model. In this case, that means that each sample of your training data is a vector with 6 components, and each label is a vector with 3 components.
Keras expects a list of numpy array (or a 2D array) when training a model with multiple inputs, see the documentation.
x : Input data. It could be:
A Numpy array (or array-like), or a list of arrays (in case the model has multiple inputs).
So per your model definition, you could shape your training data the following way :
import numpy as np
# your data, in this case 2 datapoints
X = np.array([[1, 0, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1]])
# the corresponding labels
y = np.array([[1, 0, 0], [0, 1, 0]])
And then train your model by calling fit.
model.fit(x, y, epochs=30)
If you are actually asking about having multiple input and outputs as opposed to multiple features (not the same thing) then this cannot be done using the Sequential API and instead you must use the Keras Functional API. Here you can define multiple inputs and outputs and then pass those as a list as you suggested.
My answer on a this topic from last week may also be of assistance.
Please convert your input features and classes to arrays and set number of nodes in last layer to number of classes in y.
Refer the code below:
X = np.array([[1, 0, 1, 0, 1, 0]])
y = np.array([[1, 0, 0,0]])
model = Sequential()
n_cols = 6
model.add(Dense(6, activation='relu', input_shape= (n_cols,)))
model.add(Dense(2, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(5))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X, y, epochs=30)
I have been reading about Keras RNN models (LSTMs and GRUs), and authors seem to largely focus on language data or univariate time series that use training instances composed of previous time steps. The data I have is a bit different.
I have 20 variables measured every year for 10 years for 100,000 persons as input data, and the 20 variables measured for year 11 as output data. What I would like to do is predict the value of one of the variables (not the other 19) for the 11th year.
I have my data structured as X.shape = [persons, years, variables] = [100000, 10, 20] and Y.shape = [persons, variable] = [100000, 1]. Below is my Python code for a LSTM model.
## LSTM model.
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(128, activation = 'tanh',
input_shape = (X.shape[1], X.shape[2])))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X, Y, epochs = 25, batch_size = 128)
I have four (related) questions, please:
Have I coded the Keras model correctly for the data structure I have? The performance I get from a fully-connected network (using flattened data) and from LSTM, GRU, and 1D CNN models are nearly identical, and I don't know if I have made an error in Keras or if a recurrent model is simply not helpful in this case.
Should I have Y as a series with shape Y.shape = [persons, years] = [100000, 11], rather than including the variable in X, which would then have shape X.shape = [persons, years, variables] = [100000, 10, 19]? If so, how can I get the RNN to output the predicted sequence? When I use return_sequences = True, Keras returns an error.
Is this the best way to predict with the data I have? Are there better option choices available in the Keras RNN models, or even other models?
How could I simulate data resembling the data structure I have so that a RNN model would outperform a fully-connected network?
UPDATE:
I have tried a simulation, with what I hope is a very simple case where an RNN should be expected to outperform a FNN.
While the LSTM tends to outperform the FNN when both have less hidden layers (4), the performance becomes identical with more hidden layers (8+). Can anyone think of a better simulation where a RNN would be expected to outperform a FNN with a similar data structure?
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
The code below simulates data for 10,000 instances, 10 time steps, and 2 variables. If the second variable has a 0 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 3. If the second variable has a 1 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 9.
My hope was that the RNN would keep the value of second variable at the very first time step in memory and use that to know which value (3 or 9) to multiply the the first variable for the very last time step.
## Simulate data.
instances = 10000
sequences = 10
X = np.zeros((instances, sequences * 2))
X[:int(instances / 2), 1] = 1
for i in range(instances):
for j in range(0, sequences * 2, 2):
X[i, j] = np.random.random()
Y = np.zeros((instances, 1))
for i in range(len(Y)):
if X[i, 1] == 0:
Y[i] = X[i, -2] * 3
if X[i, 1] == 1:
Y[i] = X[i, -2] * 9
Below is code for a FNN:
## Densely connected model.
# Define model.
network_dense = models.Sequential()
network_dense.add(layers.Dense(4, activation = 'relu',
input_shape = (X.shape[1],)))
network_dense.add(Dense(1, activation = None))
# Compile model.
network_dense.compile(optimizer = 'rmsprop', loss = 'mean_absolute_error')
# Fit model.
history_dense = network_dense.fit(X, Y, epochs = 100, batch_size = 256, verbose = False)
plt.scatter(Y[X[:, 1] == 0, :], network_dense.predict(X[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y[X[:, 1] == 1, :], network_dense.predict(X[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
Below is code for a LSTM:
## Structure X data for LSTM.
X_lstm = X.reshape(X.shape[0], X.shape[1] // 2, 2)
X_lstm.shape
## LSTM model.
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(4, activation = 'relu',
input_shape = (X_lstm.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm, Y, epochs = 100, batch_size = 256, verbose = False)
plt.scatter(Y[X[:, 1] == 0, :], network_lstm.predict(X_lstm[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y[X[:, 1] == 1, :], network_lstm.predict(X_lstm[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
Yes the code used is correct for what you are trying to do. 10 years is the time window used to predict the following year so that should be the number of inputs into your model for each of the 20 variables. The sample size of 100,000 observations is not relevant to the input shape of your model.
The way that you had originally shaped the dependent variable Y is correct. You are predicting a window of 1 year for 1 variable and you have 100,000 observations. The key word argument return_sequences=True will cause an error to be thrown because you only have a single LSTM layer. Set this parameter to True if you are implementing multiple LSTM layers and the layer in question is followed by another LSTM layer.
I wish I could offer some guidance to 3 but without actually having your dataset I don't know if it's possible to answer this with any sort of certainty.
I will say that LSTM's were designed to address what is know as the the long term dependency problem present in regular RNN's. What this problem boils down to is that as the gap between when the relevant information was observed to the point where that information would be useful grows, the standard RNN will have a harder time learning the relationship between them. Think of predicting a stock price based on 3 days of activity vs an entire year.
This leads into number 4. If I use the term 'resembling' loosely and stretch your time window further out to say 50 years as opposed to 10, the advantages gained from using an LSTM would become more apparent. Although I'm sure that someone more experienced will be able to offer a better answer and I look forward to seeing it.
I found this page helpful for understanding LSTM's:
https://colah.github.io/posts/2015-08-Understanding-LSTMs/