how does tensorflow train RNNs?

how does tensorflow train RNNs? - python

I just read through and ran the code here: https://tomaxent.com/2017/04/26/LSTM-by-Example-using-Tensorflow-Text-Generate/
(this guy rips off the following medium.com article, but I can't access medium.com from my work computer); https://medium.com/towards-data-science/lstm-by-example-using-tensorflow-feb0c1968537
From my previous reading, it is my understanding that to train RNNs, we have to 'unwrap' them into feed forward networks (FFN) for a certain number of steps (along with an extra input for the "x at time t"), and set it so that all the weights in the FFN that correspond to a single weight in the RNN are equal.
I'm looking at the code and I don't see any 'unwrapping' step, or even a variable indicating the number of steps for which we want to unwrap.
Is there another way to train an RNN? Am I just missing the line in the code where that variable is defined?

If i am not mistaken there is no such 'unwrapping step'.We generally "unroll" a RNN in-order to understand its working properly (through each time step).
Now,coming to Tensorflow Implementation I found this repo:MuhammedBuyukkinaci/TensorFlow-Text-Generator to be very useful and this might clear up your most of the doubts.
Other Useful Links:
Tensorflow-RNN
Basic_Rnn_Cell
Static_RNN Cell

Related

Neural Network Regression - Considering a dynamic state

I am using Tensorflow to solve a regression problem with known dynamic components, that is, we are aware that the (singular) label at time t depends on some dynamic state of the environment, but this feature is unknown. The initial attempt to solve the problem via simple regression has, understandably, failed, confirming our assumption that there is some kind of dynamic influence by a feature we have no access to.
However, the state of the environment at time t should be reflected somewhere in the features and labels (and in particular their interplay) known at times t0-n, where n > 0. Unfortunately, because of the nature of the problem, the output at time t heavily depends on the input at time t, about as much as it depends on the dynamic state of the environment. I am worried that this renders the approach I wanted to try ineffective - time series forecasting, in my understanding, would consider features from previous timesteps, but no inputs on the current timestep. Additionally, I know labels from previous timesteps, but not at the time at which I want to make my prediction.
Here is a table to illustrate the problem:
t
input
output
0
x(t=0)
y(t=0)
...
...
...
t0-1
x(t=t0-1)
y(t=t0-1)
t0
x(t=t0)
y(t=t0)=?
How can I use all the information at my disposal to predict the value of y(t=t0), using x(t=t0) (where x is the array of input features) and a defined window of features and labels at previous timesteps?
Is there an established method for solving a problem like this, either using a neural net or perhaps even a different model?
Does this problem require a combination of methods, and if so, which ones might be suitable for tackling it?
The final model is meant to be deployed and continue working for future time windows as well. We know the size of the relevant time window to be roughly 100 time steps into the past.

The kind of problem I have described is, as I have since learned, linked to so-called exogenous variables. In my case, I require something called NNARX, which is similar to the ARMAX model at its core, but (as a neural net) can take non-linearity into account.
The general idea is to introduce an LSTM layer which acts as an Encoder for the historical input, which is then coupled to another input layer with the exogenous variables. Both are coupled at the so-called Decoder - the rest of the NN architecture.

Keras - how can LSTM for time series be so accurate?

SO I'm starting to test LSTM for time series prediction, and I've found a few different notebooks to use with my own data (here's one example)
What they all have in common is that they predict one timestep into the future, and do a really good job at matching the test data. I tried forcing an outlier in there, and the prediction almost perfectly matched it:
What's going on here? There's no way the model can learn this from the pattern of the data since it's a made up point, but supposedly by looking at the previous time steps this model will "know" an outlier is coming next? I must be missing something, because it predicts the data with an outlier just as well as the data without an outlier...

How do I feed data into my neural network?

I've coded a simple neural network for XOR in python. While there is loads of information online about how to program this, there isn't much on how to feed the data through it. I've tested the change in weights after one cycle for inputs [1,1] to compare my results with my lecture slides and it's 100% the same, so I believe the code works. I can train the network for that same input, but when I change the input (and corresponding target) every cycle the error doesn't go down.
Should I allow changing the weights and inputs after every cycle or should I run through all the possible inputs first, get an average error and then change the weights? (But changing weights are dependent on the output, so what output would I use then)
I can share my code, if needed, but I'm pretty certain it's correct.
Please give me some advice? Thank you in advance.

So, you're saying you implemented a neural network on your own ?
well in this case, basically each neuron on the input layer must be assigned with a feature of a certain row, than just iterate through each layer and each neuron in that layer and calculate as instructed.
I'm sure you are familiar with the back-propagation algorithm so you'll know when to stop.
once you're done with that row, do it again to the next row, assign each feature to each of the input neurons and start the iterations again.
once youre done with all records, thats an Epoch.
I hope that answers your question.
also, I would recommend you to try out Keras, its easy to use and a good tool to be experienced in.

what does the optional argument "constants" do in the keras recurrent layers?

I'm learning to build a customized sequence-to-sequence model with keras, and have been reading some codes that other people wrote, for example here. I got confused in the call method regarding constants. There is the keras "Note on passing external constants to RNNs", however I'm having trouble to understand what the constants are doing to the model.
I did go through the attention model and the pointer network papers, but maybe I've missed something.
Any reference to the modeling details would be appreciated! Thanks in advance.

Okay just as a reference in case someone else stumbles across this question: I went through the code in the recurrent.py file, I think the get_constants is getting the dropout mask and the recurrent dropout mask, then concatenating it with the [h,c] states (the order of these four elements is required in the LSTM step method). After that it doesn't matter anymore to the original LSTM cell, but you can add your own 'constants' (in the sense that it won't be learned) to pass from one timestep to the next. All constants will be added to the returned [h,c] states implicitly. In Keon's example the fifth position of the returned state is the input sequence, and it can be referenced in every timestep by calling states[-1].

Tensorflow: how do I extract/export variable values at every iteration of training?

I have been playing around with some neural networks on Tensorflow and I wanted to make a visualization of the neural network's learning process.
To do so, I intend to extract the following variables into text/JSON/csv: pre-activation result before 1st layer, activation, bias and weight values for testing and training, each layer and for all time steps. I am looking for a generalizable solution so that I don't have to modify my source code (or at least not more than one or two lines) when applying visualization to future networks. Ideally I could run some function from another python program to read any python/TF code and extract the variables described above. So far I have considered the following solutions:
1) use tf.summary and the filewriter to save as a serialized protocol buffer, then find a way to go from protocol buffer --> JSON format. This unfortunately would not fit the bill as it requires me to modify too much inner code.
2) Perhaps using https://www.tensorflow.org/api_docs/python/tf/train/export_meta_graph
Although I am not sure how to implement given my TF foundations are not quite there yet
3) I have also found this solution:
W_val, b_val= sess.run([W, b])
np.savetxt("W1.csv", W_val, delimiter=",")
np.savetxt("b1.csv", b_val, delimiter=",")
But the problem is that it only saves the final values of the weights and biases, whereas I am looking to save their values at all timesteps of training.
If anyone has any suggestions on how to tackle this problem or any guidance I would appreciate it.
Many thanks

for step in range(num_train_steps):
_, weight_values, bias_values = sess.run([your_train_op, weight, bias])
# save weight_values and bias_values

Doing it with tf.Summaries is probably a good idea. You could then visualize it all in Tesnorboard, much like with some of the tutorials and the inception retraining code.
Alternatively you could perform fetches within your sess.run() call to grab whatever tensors you like at every step (i.e. every run call).
I have pasted a response to a similar question regarding extracting the cross entropy from another question below:
When you do your session run call (e.g. res = sess.run(...) ) then you can put in a fetch for your cross entropy variable.
For example, let's say you have a complicated sess.run() call that gets some predictions but you also want to your cross entropy then you may have code that looks like this:
feeds={x_data:x,y_data:y}
fetches=[y_result,cross_entropy]
res=sess.run(fetches=fetches, feed_dict=feeds) predictions=res[0]
#your first fetch parameter xent=res[1] #Your second fetch parameter.
Fetches within the run call allows you to "fetch" tensors from your graph.
You should be able to do the above but instead of cross entropy, just a list of whatever you want. I use it to fetch both my summaries and also intermediate accuracy values.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.