We have already the trained model but we can't write the part of the program that makes the predictions. We can open the picture but we can't process it with TensorFlow.
Any help would be appreciated. Here is what we already have.
from __future__ import absolute_import, division, print_function, unicode_literals
# Install TensorFlow
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='tanh'), #relu, softmax, tanh
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='tanh')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
You can use the model object's .predict() method as follows:
import numpy as np
predictions = model.predict([x_test]) # Make prediction
print(np.argmax(predictions[1000])) # Print out the number
Anyway I advice you to study a nice, simple example of how to use tensorflow for classifying handwritten digits by Abdelhakim Ouafi. I'm including the main parts here for future readers for the case, if the link would be available:
Description of the MNIST Handwritten Digit.
The MNIST Handwritten Digit is a dataset for evaluating machine learning and deep learning models on the handwritten digit classification problem, it is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.
Import the TensorFlow library
import tensorflow as tf # Import tensorflow library
import matplotlib.pyplot as plt # Import matplotlib library
Create a variable named mnist, and set it to an object of the MNIST dataset from the Keras library and we’re gonna unpack it to a training dataset (x_train, y_train) and testing dataset (x_test, y_test):
mnist = tf.keras.datasets.mnist # Object of the MNIST dataset
(x_train, y_train),(x_test, y_test) = mnist.load_data() # Load data
Preprocess the data
To make sure that our data was imported correctly, we are going to plot the first image from the training dataset using matplotlib:
plt.imshow(x_train[0], cmap="gray") # Import the image
plt.show() # Plot the image
Before we feed the data into the neural network we need to normalize it by scaling the pixels value in a range from 0 to 1 instead of being from 0 to 255 and that make the neural network needs less computational power:
# Normalize the train dataset
x_train = tf.keras.utils.normalize(x_train, axis=1)
# Normalize the test dataset
x_test = tf.keras.utils.normalize(x_test, axis=1)
Build the model
Now, we are going to build the model or in other words the neural network that will train and learn how to classify these images.
It worth noting that the layers are the most important thing in building an artificial neural network since it will extract the features of the data.
First and foremost, we start by creating a model object that lets you add the different layers.
Second, we are going to flatten the data which is the image pixels in this case. So the images are 28×28 dimensional we need to make it 1×784 dimensional so the input layer of the neural network can read it or deal with it. This is an important concept you need to know.
Third, we define input and a hidden layer with 128 neurons and an activation function which is the relu function.
And the Last thing we create the output layer with 10 neurons and a softmax activation function that will transform the score returned by the model to a value so it will be interpreted by humans.
#Build the model object
model = tf.keras.models.Sequential()
# Add the Flatten Layer
model.add(tf.keras.layers.Flatten())
# Build the input and the hidden layers
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
# Build the output layer
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
Compile the model
Since we finished building the neural network we need to compile the model by adding some few parameters that will tell the neural network how to start the training process.
First, we add the optimizer which will create or in other word update the parameter of the neural network to fit our data.
Second, the loss function that will tell you the performance of your model.
Third, the Metrics which give indicative tests of the quality of the model.
# Compile the model
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
Train the model
We are ready to train our model, we call the fit subpackage and feed it with the training data and the labeled data that correspond to the training dataset and how many epoch should run or how many times should make a guess.
model.fit(x=x_train, y=y_train, epochs=5) # Start training process
Evaluate the model
Let’s see how the model performs after the training process has finished.
# Evaluate the model performance
test_loss, test_acc = model.evaluate(x=x_test, y=y_test)
# Print out the model accuracy
print('\nTest accuracy:', test_acc)
Evaluating the Model Performance
It shows that the neural network has reached 97.39% accuracy which is pretty good since we train the model just with 5 epochs.
Make predictions
Now, we will start making a prediction by importing the test dataset images.
predictions = model.predict([x_test]) # Make prediction
We are going to make a prediction for numbers or images that the model has never seen before.
For instance, we try to predict the number that corresponds to the image number 1000 in the test dataset:
print(np.argmax(predictions[1000])) # Print out the number
As you see, the prediction is number nine but how we can make sure that this prediction was true? well, we need to plot the image number 1000 in the test dataset using matplotlib:
plt.imshow(x_test[1000], cmap="gray") # Import the image
plt.show() # Show the image
Related
I tried to create stacking regressor to predict multiple output with SVR and Neural network as estimators and final estimator is linear regression.
print(X_train.shape) #(73, 39)
print(y_train.shape) #(73, 13)
print(X_test.shape) #(19, 39)
print(y_test.shape) #(19, 13)
def build_nn():
ann = Sequential()
ann.add(Dense(40, input_dim=X_train.shape[1], activation='relu', name="Hidden_Layer_1"))
ann.add(Dense(y_train.shape[1], activation='sigmoid', name='Output_Layer'))
ann.compile( loss='mse', optimizer= 'adam', metrics = 'mse')
return ann
keras_reg = KerasRegressor(model = build_nn,optimizer="adam",optimizer__learning_rate=0.001,epochs=100,verbose=0)
stacker = StackingRegressor(estimators=[('svr',SVR()),('ann',keras_reg)], final_estimator= LinearRegression())
reg = MultiOutputRegressor(estimator=stacker)
model = reg.fit(X_train,y_train)
I am able to 'fit' the model. However, I got below problem when trying to predict.
prediction = reg.predict(X_test)
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 19 and the array at index 1 has size 247
Imo the point here is the following. On one side, NN models do support multi-output regression tasks on their own, which might be solved defining an output layer similar to the one you built, namely with a number of nodes equal to the number of outputs (though, with respect to your construction, I would specify a linear activation with activation=None rather than a sigmoid activation).
def build_nn():
ann = Sequential()
ann.add(Dense(40, input_dim=X_train.shape[1], activation='relu', name="Hidden_Layer_1"))
ann.add(Dense(y_train.shape[1], name='Output_Layer'))
ann.compile(loss='mse', optimizer= 'adam', metrics = 'mse')
return ann
On the other side, here, you're trying to solve your multi-output regression task by calling the MultiOutputRegressor constructor on a StackingRegressor instance, i.e. by explicitly training one regression model per output, the regression model being the combination of multiple regression models.
The issue arises from the concatenation of the predictions of the StackingRegressor base estimators and from their different shapes, in particular. Indeed:
the predictions of the MultiOutputRegressor instance are demanded to the StackingRegressor as you can see in https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/multioutput.py#L234
in turn, in a StackingRegressor the predictions of each individual estimator are stacked together and used as input to a final_estimator to compute the prediction. .predict() is called on final_estimator in https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/ensemble/_stacking.py#L267 (and in particular, you can see that it is taking the transformed X as input).
the transformed X is the result of the concatenation of the predictions of the StackingRegressor base estimators, as you can see in https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/ensemble/_stacking.py#L67.
This said, among the StackingRegressor base estimators you have an SVR() model which is designed not to be able to natively solve multi-output regression tasks and a KerasRegressor neural network which, defined as you did, is meant to be able to solve a multi-output regression task without delegating to MultiOutputRegressor. Therefore, what happens in _concatenate_predictions is that dimensionally-inconsistent predictions arise from SVR() (1D array of shape (19,)=(n_samples,) eventually reshaped into a (19,1) array) and from the KerasRegressor (2D array of shape (19,13)=(n_samples,n_outputs) eventually flattened and reshaped into a (19*13,1)=(247,1) array). This reflects the fact that letting your neural network output layer have a number of nodes equal to the number of outputs cannot fit into a StackingRegressor with another base estimator which should be necessarily extended via MultiOutputRegressor to be able to solve a multi-output regression task.
Therefore, for me, if you want to keep the same "architecture", you should let your neural network have an output layer with a single node so that its predictions can be concatenated with the ones from the SVR model and accessible to the StackingRegressor final_estimator and eventually delegate to MultiOutputRegressor.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
import tensorflow as tf
import tensorflow.keras
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from scikeras.wrappers import KerasRegressor
from sklearn.ensemble import StackingRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
X, y = make_regression(n_samples=92, n_features=39, n_informative=39, n_targets=13, random_state=42)
print(X.shape, y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
def build_nn():
ann = Sequential()
ann.add(Dense(40, input_dim=X_train.shape[1], activation='relu', name="Hidden_Layer_1"))
ann.add(Dense(1, name='Output_Layer'))
ann.compile(loss='mse', optimizer= 'adam', metrics = 'mse')
return ann
keras_reg = KerasRegressor(model = build_nn, optimizer="adam",
optimizer__learning_rate=0.001, epochs=100, verbose=0)
stacker = StackingRegressor(estimators=[('svr', SVR()), ('ann', keras_reg)], final_estimator = LinearRegression())
reg = MultiOutputRegressor(estimator=stacker)
reg.fit(X_train,y_train)
predictions = reg.predict(X_test)
I want to use SGD optimizer in tf.keras.
But SGD detail said
Gradient descent (with momentum) optimizer.
Dose it mean SGD doesn't support "Randomly shuffle examples in the data set phase"?
I checked the SGD source,
It seems that there is no random shuffle method.
My understanding about SGD is applying gradient descent for random sample.
But it does only gradient descent with momentum and nesterov.
Does the batch-size which I defined in code represent SGD random shuffle phase?
If so, it does randomly shuffle but never use same dataset, doesn't it?
Is my understanding correct?
I wrote code about batch as below.
(x_train, y_train)).shuffle(10000).batch(32)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)
I'm not sure if it's what you are looking for, but try using the tf.data.Dataset for your Dataset. For example, for mnist you can easily create the dataset variable, shuffle the samples and divide in batches:
shuffle_buffer_size = 100
batch_size = 10
train, test = tf.keras.datasets.fashion_mnist.load_data()
images, labels = train
images = images/255
dataset = tf.data.Dataset.from_tensor_slices((images, labels))
dataset.shuffle(shuffle_buffer_size).batch(batch_size)
You can have a look at the tutorial about datasets: td.data
Good morning, I'm new in machine learning and neural networks. I am trying to build a fully connected neural network to solve a regression problem. The dataset is composed by 18 features and 1 label, and all of these are physical quantities.
You can find the code below. I upload the figure of the loss function evolution along the epochs (you can find it below). I am not sure if there is overfitting. Someone can explain me why there is or not overfitting?
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import SelectFromModel
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
import keras
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping
from keras import optimizers
from sklearn.metrics import r2_score
from keras import regularizers
from keras import backend
from tensorflow.keras import regularizers
from keras.regularizers import l2
# =============================================================================
# Scelgo il test size
# =============================================================================
test_size = 0.2
dataset = pd.read_csv('DataSet.csv', decimal=',', delimiter = ";")
label = dataset.iloc[:,-1]
features = dataset.drop(columns = ['Label'])
y_max_pre_normalize = max(label)
y_min_pre_normalize = min(label)
def denormalize(y):
final_value = y*(y_max_pre_normalize-y_min_pre_normalize)+y_min_pre_normalize
return final_value
# =============================================================================
# Split
# =============================================================================
X_train1, X_test1, y_train1, y_test1 = train_test_split(features, label, test_size = test_size, shuffle = True)
y_test2 = y_test1.to_frame()
y_train2 = y_train1.to_frame()
# =============================================================================
# Normalizzo
# =============================================================================
scaler1 = preprocessing.MinMaxScaler()
scaler2 = preprocessing.MinMaxScaler()
X_train = scaler1.fit_transform(X_train1)
X_test = scaler2.fit_transform(X_test1)
scaler3 = preprocessing.MinMaxScaler()
scaler4 = preprocessing.MinMaxScaler()
y_train = scaler3.fit_transform(y_train2)
y_test = scaler4.fit_transform(y_test2)
# =============================================================================
# Creo la rete
# =============================================================================
optimizer = tf.keras.optimizers.Adam(lr=0.001)
model = Sequential()
model.add(Dense(60, input_shape = (X_train.shape[1],), activation = 'relu',kernel_initializer='glorot_uniform'))
model.add(Dropout(0.2))
model.add(Dense(60, activation = 'relu',kernel_initializer='glorot_uniform'))
model.add(Dropout(0.2))
model.add(Dense(60, activation = 'relu',kernel_initializer='glorot_uniform'))
model.add(Dense(1,activation = 'linear',kernel_initializer='glorot_uniform'))
model.compile(loss = 'mse', optimizer = optimizer, metrics = ['mse'])
history = model.fit(X_train, y_train, epochs = 100,
validation_split = 0.1, shuffle=True, batch_size=250
)
history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
y_train_pred = denormalize(y_train_pred)
y_test_pred = denormalize(y_test_pred)
plt.figure()
plt.plot((y_test1),(y_test_pred),'.', color='darkviolet', alpha=1, marker='o', markersize = 2, markeredgecolor = 'black', markeredgewidth = 0.1)
plt.plot((np.array((-0.1,7))),(np.array((-0.1,7))),'-', color='magenta')
plt.xlabel('True')
plt.ylabel('Predicted')
plt.title('Test')
plt.figure()
plt.plot((y_train1),(y_train_pred),'.', color='darkviolet', alpha=1, marker='o', markersize = 2, markeredgecolor = 'black', markeredgewidth = 0.1)
plt.plot((np.array((-0.1,7))),(np.array((-0.1,7))),'-', color='magenta')
plt.xlabel('True')
plt.ylabel('Predicted')
plt.title('Train')
plt.figure()
plt.plot(loss_values,'b',label = 'training loss')
plt.plot(val_loss_values,'r',label = 'val training loss')
plt.xlabel('Epochs')
plt.ylabel('Loss Function')
plt.legend()
print("\n\nThe R2 score on the test set is:\t{:0.3f}".format(r2_score(y_test_pred, y_test1)))
print("The R2 score on the train set is:\t{:0.3f}".format(r2_score(y_train_pred, y_train1)))
from sklearn import metrics
# Measure MSE error.
score = metrics.mean_squared_error(y_test_pred,y_test1)
print("\n\nFinal score test (MSE): %0.4f" %(score))
score1 = metrics.mean_squared_error(y_train_pred,y_train1)
print("Final score train (MSE): %0.4f" %(score1))
score2 = np.sqrt(metrics.mean_squared_error(y_test_pred,y_test1))
print(f"Final score test (RMSE): %0.4f" %(score2))
score3 = np.sqrt(metrics.mean_squared_error(y_train_pred,y_train1))
print(f"Final score train (RMSE): %0.4f" %(score3))
EDIT:
I tried alse to do feature importances and to raise n_epochs, these are the results:
Feature Importance:
No Feature Importace:
Looks like you don't have overfitting! Your training and validation curves are descending together and converging. The clearest sign you could get of overfitting would be a deviation between these two curves, something like this:
Since your two curves are descending and are not diverging, it indicates your NN training is healthy.
HOWEVER! Your validation curve is suspiciously below the training curve. This hints a possible data leakage (train and test data have been mixed somehow). More info on a nice an short blog post. In general, you should split the data before any other preprocessing (normalizing, augmentation, shuffling, etc...).
Other causes for this could be some type of regularization (dropout, BN, etc..) that is active while computing the training accuracy and it's deactivated when computing the Validation/Test accuracy.
Overfitting is, when the model does not generalize to other data than the training data. When this happen you will have a very (!) low training loss but a high validation loss. You can think of it this way: if you have N points you can fit a N-1 polynomial such that you have a zero training loss (your model hits all your training points perfectly). But, if you apply that model to some other data, it will most likely produce a very high error (see the image below). Here the red line is our model and the green is the true data (+ noice), and you can see in the last picture we get zero training error. In the first, our model is too simple (high train/high validation error), the second is good (low train/low valuidation error) the third and last is too complex i.e overfitting (very low train/high validation error).
Neural network can work in the same way, so by looking at your training vs validation error, you can conclude if it overfits or not
No, this is not overfitting as your validation loss isn´t increasing.
Nevertheless, if I were you I would be a little bit skeptical. Try to train your model for even more epochs and watch out for the validation loss.
What you definitely should do, is to observe the following:
- are there duplicates or near-duplicates in the data (creates information leakage from train to test validation split)
- are there features that have a causal connection to the target variable
Edit:
Usually, you have some random component in a real-world dataset, so that rules that are observed in train data aren´t 100% true for validation data.
Your plot shows that the validation loss is even more decreasing as train loss decreases. Usually, you get to some point in training, where the rules you observe in train data are too specific to describe the whole data. That´s when overfitting begins. Hence, it is weird, that your validation loss doesn´t increase again.
Please check whether your validation loss approaches zero when you´re training for more epochs. If it´s the case I would check your database very carefully.
Let´s assume, that there is a kind of information leakage from the train set to the validation set (through duplicate records for example). Your model would change the weights to describe very specific rules. When applying your model to new data it would fail miserably since the observed connections are not really general.
Another common data problem is, that features may have an inversed causality.
The thing that validation loss is generally lower than train error is probably depending on dropout and regularization, since it´s applied while training but not for predicting/testing.
I put some emphasis on this because a tiny bug or an error in the data can "fuck up" your whole model.
i am trying to build an MLP model that takes a dataset consists of 9 columns
this is a sample (patient number, time in mill/sec., normalization of X Y and Z, kurtosis, skewness, pitch, roll and yaw, label) respectively.
1,15,-0.248010047716,0.00378335508419,-0.0152548459993,-86.3738760481,0.872322164158,-3.51314800063,0
1,31,-0.248010047716,0.00378335508419,-0.0152548459993,-86.3738760481,0.872322164158,-3.51314800063,0
1,46,-0.267422664673,0.0051143782875,-0.0191247001961,-85.7662354031,1.0928406847,-4.08015176908,0
1,62,-0.267422664673,0.0051143782875,-0.0191247001961,-85.7662354031,1.0928406847,-4.08015176908,0
and this is my code, there is no error in my code but the results with and without features are the same .. so i am asking if i used the right way to fed those features into the model.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
import pandas as pd
import itertools
import math
np.random.seed(7)
train = np.loadtxt("featwithsignalsTRAIN.txt", delimiter=",")
test = np.loadtxt("featwithsignalsTEST.txt", delimiter=",")
x_train = train[:,[2,3,4,5,6,7]]
x_test = test[:,[2,3,4,5,6,7]]
y_train = train[:,8]
y_test = test[:,8]
model = Sequential()
model.add(Dense(500, input_dim=6, activation='relu'))
model.add(Dense(300, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy' , optimizer='adam', metrics=['accuracy'])
# Fit the model
batch_size = 128
epochs = 10
hist = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=2,
)
avg = np.mean(hist.history['acc'])
print('The Average Testing Accuracy is', avg)
##Evaluate the model
score=model.evaluate(x_test, y_test, verbose=2)
print(score)
There is nothing wrong with your model, but it's possible that your model doesn't learn anything useful. It could be that you are using a learning too high or too small, that you need more epochs, or that simply your features are not useful.
Here are some advices :
You can directly add a validation set to your fit method, which will compute the same metrics on this set at the end of each epoch and will allow you to see if your model learn something useful or if it's just overfitting on the training set without having to wait for the model to finish its training. (make sure you use verbose = 1 or 2 to see the training process).
model.fit( ... , validation_data = (x_test , y_test) , ...)
I see that you used the history callback. A good practice is to see how the accuracy is changing from an epoch to another instead of taking the mean. This allows you to see if your network is effectively learning something. A network rarely converge on the firsts epochs.
Do you have an idea of the 'usefulness' of your feature ? You can get an idea of that by performing an exploratory analysis before creating your model or by fitting a more 'conventional' model (linear regression, decision trees, random forest ...). It's Highly recommended before fitting a neural network, and this also allows you to compare different types of models and to see if you realy need to use neural networks.
If you are sure that your features would at least perform better than a random guess, try playing with the learning rate. A high learnign rate could cause the model to overshoot the minimum, and a learning rate too small could cause the model to learn very slowly or to get stuck in a local minima. You could also try to tune the number of epochs.
I am trying to understand how LSTM RNNs work and how they can be implemented in Keras in order to be able to solve a binary classification problem. My code and the dataset i use are visible below. When i compilr the code i get an error TypeError: __init__() got multiple values for keyword argument 'input_dim', Can anybody help?
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.layers import Dense
from sklearn.cross_validation import train_test_split
import numpy
from sklearn.preprocessing import StandardScaler # data normalization
seed = 7
numpy.random.seed(seed)
dataset = numpy.loadtxt("sorted output.csv", delimiter=",")
X = dataset[:,0:4]
scaler = StandardScaler(copy=True, with_mean=True, with_std=True ) #data normalization
X = scaler.fit_transform(X) #data normalization
Y = dataset[:4]
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
# create model
model = Sequential()
model.add(Embedding(12,input_dim=4,init='uniform',activation='relu'))
model.add(Dense(4, init='uniform', activation='relu'))
model.add(LSTM(100))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), nb_epoch=150, batch_size=10)
Looks like two separate questions here.
Regarding how to use LSTMs / Keras, there are some good tutorials around. Try this one which also describes a binary classification problem. If you have a specific issue or area that you don't understand, let me know.
Regarding the file opening issue, perhaps the whitespace in the filename is causing an issue. Check out this answer to see if it helps.
This is in fact a case where the error message you are getting is perfectly to-the-point. (I wish this would always be the case with Python and Keras...)
Keras' Embedding layer constructor has this signature:
keras.layers.embeddings.Embedding(input_dim, output_dim, ...)
However, you are constructing it using:
Embedding(12,input_dim=4,...)
So figure out which is the input and output dimension, respectively, and fix your parameter order and names. Based on the table you included in the question, I'm guessing 4 is your input dimension and 12 is your output dimension; then it'd be Embedding(input_dim=4, output_dim=12, ...).