I am currently building a neural network to predict features such as temperature. So the output for this could be a positive or negative value. I am normalizing my input data and using the tanh activation function in each hidden layer.
Should I use a linear activation function for the output layer to get an unbounded continuous output OR should I use tanh for the output layer and then inverse normalize the output? Could someone explain this I don't think my understanding of this is correct.
You are actually in the correct direction
Option1:
you need to normalize the temperatures first and then fed it to the model let say your temperature ranges from [-100,100] so convert it into a scale of [-1,1] then use this scaled version of temp in your target variable.
At the time of prediction just inverse transform the output and you will get your desired result.
Option2:
You create a regression kind of neural network and don't apply any activation function to the output layer (means no bonds for values it could be +ve or -ve).
In this case you are not required to normalize the values.
Sample NN Spec:
Input Layer==> # neurons one per feature
Hidden Layer==>relu/selu as activation function| # of neurons/Hidden layers is as per your requirement
Output Layer==> 1 neuron/ no Activation function required
Related
I wrote a very basic tensorflow model where I want to predict a number:
import tensorflow as tf
import numpy as np
def HW_numbers(x):
y = (2 * x) + 1
return y
x = np.array([1.0,2.0,3.0,4.0,5.0,6.0,7.0], dtype=float)
y = np.array(HW_numbers(x))
model = tf.keras.models.Sequential([tf.keras.layers.Dense(units=1,input_shape=[1])])
model.compile(optimizer='sgd',loss='mean_squared_error')
model.fit(x,y,epochs = 30)
print(model.predict([10.0]))
This above code works fine. But if I add an activation function in Dense layer, the prediction becomes weird. I have tried 'relu','sigmoid','tanh' etc.
My question is, why is that? What exactly is activation function doing in that single layer that messes up the prediction?
I have used Tensorflow 2.0
Currently, you are learning a linear function. As it can be described by a single neuron, you just need a single neuron to learn the function. On the other hand activation function is:
to learn and make sense of something really complicated and Non-linear complex functional mappings between the inputs and response variable. It introduces non-linear properties to our Network. Their main purpose is to convert an input signal of a node in an A-NN to an output signal. That output signal now is used as an input in the next layer in the stack.
Hence, as you have just a single neuron here (a specific case), you do not need to pass the value to the next layer. In other words, all hidden, input, and output layers are merged together. Hence, the activation function is not helpful for your case. Unless you want to make a decision base on the output of the neuron.
Your network consists of just one neuron. So what it does with with no activation function is to multiply your input with the neurons weight. This weight will eventually converge to something around 2.1.
But with relu as an activation function, only positive numbers are propagated through your network. So if your neuron's weight is initialized with a negative number, you will always get zero as an output. So with relu, you have a 50:50 chance to get good results.
With the activation functions tanh and sigmoid, the output of the neuron is limited to [-1,1] and [0, 1] respectively, so your output can't be more than one.
So for such a small neuronal network, these activation functions don't match the problem.
Python 3.7 tensorflow
I am experimenting Time series forecasting w Tensorflow
I understand the second line creates a LSTM RNN i.e. a Recurrent Neural Network of type Long Short Term Memory.
Why do we need to add a Dense(1) layer in the end?
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Tutorial for Dense() says
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
would you like to rephrase or elaborate on need for Dense() here ?
The following line
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
creates an LSTM layer which transforms each input step of size #features into a latent representation of size 32. You want to predict a single value so you need to convert this latent representation of size 32 into a single value. Hence, you add the following line
single_step_model.add(tf.keras.layers.Dense(1))
which adds a Dense Layer (Fully-Connected Neural Network) with one neuron in the output which, obviously, produces a single value. Look at it as a way to transform an intermediate result of higher dimensionality into the final result.
Well in the tutorial you are following Time series forecasting, they are trying to forecast temperature (6 hrs ahead). For which they are using an LSTM followed by a Dense layer.
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Dense layer is nothing but a regular fully-connected NN layer. In this case you are bringing down the output dimensionality to 1, which should represent some proportionality (need not be linear) to the temperature you are trying to predict. There are other layers you could use as well. Check out, Keras Layers.
If you are confused about the input and output shape of LSTM, check out
I/O Shape.
I have a model that I loaded using Keras. I need to be able to find individual feature maps (print values of each feature map). I was able to print weights. Following is my code:
for layer in model.layers:
g=layer.get_config()
h=layer.get_weights()
print g
print h
The model consists of one convlayer which has total 384 neurons. First 128 have filter size 3, next 4 and last 128 have filter size 5. Then, there are relu and maxpool layers and then it is fed into softmax layer. I want to be able to find outputs (values not shapes) of convlayer, relu and maxpool. I have seen codes online but I'm unable to comprehend on how to map them to my situation.
If you are looking for a way to find the activation (i.e. feature map or output) of a layer given one or more input samples, you can simply define a backend function that takes the input array(s) and gives the activation(s) as its output. Here is an example for illustration (i.e. you may need to adapt it to your needs and your model architecture):
from keras import backend as K
# define a function to get the activation of all layers
outputs = [layer.output for layer in model.layers]
active_func = K.function([model.input], [outputs])
# you can use it like this
activations = active_func([my_input_array])
I have this doubt when I fit a neural network in a regression problem. I preprocessed the predictors (features) of my train and test data using the methods of Imputers and Scale from sklearn.preprocessing,but I did not preprocessed the class or target of my train data or test data.
In the architecture of my neural network all the layers has relu as activation function except the last layer that has the sigmoid function. I have choosen the sigmoid function for the last layer because the values of the predictions are between 0 and 1.
tl;dr: In summary, my question is: should I deprocess the output of my neuralnet? If I don't use the sigmoid function, the values of my output are < 0 and > 1. In this case, how should I do it?
Thanks
Usually, if you are doing regression you should use a linear' activation in the last layer. A sigmoid function will 'favor' values closer to 0 and 1, so it would be harder for your model to output intermediate values.
If the distribution of your targets is gaussian or uniform I would go with a linear output layer. De-processing shouldn't be necessary unless you have very large targets.
I need to implement custom cost function with coefficients depending on input values of concrete training samples. Is it possible with Keras?
Background for a problem: I'm trying to implement quite simple neural network topology shown here on 3rd page (heavy link).
upd:
Simplified example: given two input neurons IN1 and IN2, some hidden neurons and three output neurons OUT1, OUT2 and OUT3, calculate cost function for each training entry separately as
COST = IN1 * OUT1 + IN2*IN2*OUT2 + OUT3
(in other words, I need to get access to all input and output values to use different combinations of them)