I've been trying to understand LSTM inputs for a while now and I think I understand but I keep getting confused on how to implement them.
This is what I think, please correct me if I am wrong.
When specifying an LSTM, you specify the number of cells and the input shape (I've been having issues with the input shape). The Number of cells specifies how many cells should look at the data given and does not affect the required input shape. The input shape (When Stateful) goes by batch size, Timesteps in a batch, and features in a time step. A stateful LSTM retains it's Internal States until reset. Is this right?
If so I'm having so much confusion trying to specify the input shape for my network. This is because I'm trying to upgrade a current network and I cant figure out how and where to specify the input shape without an error.
The way I'm trying to upgrade it is initially I have a CNN going to a dense layer. I'm trying to change it such that it adds an LSTM that takes the CNN's flattened 1D output as one batch and one time step with features dependent on the size of the CNN's Output. Then concatenates its output with the CNN's output (The LSTM's Input) then feeds into the dense layer. Thus it now behaves like LSTM with a skip connection. The issue that I cant seem to understand is when and how to specify the LSTM layer's Input_shape as it has NO INPUT_SHAPE parameter for the functional API? Or maybe I'm just super confused, Everyone uses different API's going over different examples and it gets super confusing what is and isn't specified and how.
Thank you, even if you just help with one of the two parts.
TLDR:
Do I understand LSTM Parameters correctly?
How and when do I specify LSTM Input_shapes if at all?
LSTM units argument means dimensions of LSTM matrices and output shape.
With Functional API you can specify input shape for the very first layer only. If your LSTM layer follows CNN - then its input shape is determined automatically as CNN output.
Related
Im new to keras and i was wondering if I could do some work regarding text classification using neural networks.
So i went ahead and got a data set regarding spam or ham and I vectorized the data using tfidf and converted the labels to a numpy array using the to_categorical() and managed to split my data into train and test each of which is a numpy array having around 7k columns.
This the code i used.
model.add(Dense(8,input_dim=7082,activation='relu'))
model.add(Dense(8,input_dim=7082))
model.add(Dense(2,activation='softmax'))
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=['accuracy'])
I dont know if im doing something totally wrong. Could someone point me in the right direction as to what i should change.
The error thrown:
Error when checking input: expected dense_35_input to have 2 dimensions, but got array with shape ()
Dense layers doesn't seem to have any input_dim parameter according to the documentation.
input_shape is a tuple and must be used in the first layer of your model. It refers to the shape of the input data.
units refers to the dimension of the output space, that is the shape of each output element processed by the dense layer.
In your case, if your input data has is of dimensionality 7082, this should work:
model.add(Dense(8,input_shape=(7082,),activation='relu'))
model.add(Dense(2,activation='softmax'))
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=['accuracy'])
Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.
Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)
I want to extract four intermediate layers of my model. Here is how I setup my function:
K.function([yolo_model.layers[0].input, yolo_model.layers[4].input, K.learning_phase()],
[yolo_model.layers[1].layers[17].output,
yolo_model.layers[1].layers[27].output,
yolo_model.layers[1].layers[43].output,
yolo_model.layers[1].layers[69].output])
I always got the error that tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'input_3' with dtype float and shape [?,416,416,3]
It seems my input has dimension or type error but I used this input for my model train_on_batch or predict and it worked. I got the same error even when I passed np.zeros((1,416,416,3)) into it. Another wired thing is I don't have input_3 since my model only takes two inputs. I have no idea where input_3 tensor comes from.
Thanks in advance if anyone can give me some hints.
I found out where the problem is. My model is composed of one inner model and several layers. When I build the function, my input is from outer model but output is from inner model, which causes disconnection between input and output. I just changed the original input to the input layer of inner model and it works.
I am using the tf.nn.dynamic_rnn class to create an LSTM. I have trained this model on some data, and now I want to inspect what are the values of the hidden states of this trained LSTM at each time step when I provide it some input.
After some digging around on SO and on TensorFlow's GitHub page, I saw that some people mentioned that I should write my own LSTM cell that returns whatever I want printed as part of the output of the LSTM. However, this does not seem straight forward to me since the hidden states and the output of the LSTM do not have the same shapes.
My output tensor from the LSTM has shape [16, 1] and the hidden state is a tensor of shape [16, 16]. Concatenating them results in a tensor of shape [16, 17]. When I tried to return it, I get an error saying that some TensorFlow op required a tensor of shape [16,1].
Does anyone know an easier work around to this situation? I was wondering if it is possible to use tf.Print to just print the required tensors.
Okay, so the issue was that I was modifying the output but wasn't updating the output_size of the LSTM itself. Hence the error. It works perfectly fine now. However, I still find this method to be extremely annoying. Not accepting my own answer with the hope that somebody will have a cleaner solution.
I'm just starting with deep learning, and I've been told that Keras would be the best library for beginners.
Before that, for the sake of learning, I built a simple feed forward network using only numpy so I could get the feel of it.
In this case, the shape of the weight matrix was (len(X[0]), num_neurons). The number of features and the number of neurons. And it worked.
Now, I'm trying to build a simple RNN using Keras. My data has 7 features and the size of the layer would be 128.
But if I do something like model.add(Dense(128, input_dim=(7, 128)))it says it's wrong.
So I have no idea what this input_dim should be.
My data has 5330 data points and 7 features (shape is (5330, 7)).
Can someone tell me what the input_dim should be and why?
Thank you.
The input_dim is just the shape of the input you pass to this layer. So:
input_dim = 7
There are other options, such as:
input_shape=(7,) -- This argument uses tuples instead of integers, good when your input has more than one dimension
batch_input_shape=(batch_size,7) -- This is not usually necessary, but you use it in cases you need a fixed batch size (there are a few layer configurations that demand that)
Now, the size of the output in a Dense layer is the units argument. Which is 128 in your case and should be equal to num_neurons.