How can I use Long Short-term Memory (LSTM) to predict a future value x(t+1) (out of sample prediction) based on a historical dataset. I read and tried many web tutorials for forecasting and prediction using lstm, but still far away from the point. What's the exact procedure to do this prediction? Is it just as simple as shifting the target array (n)steps where n is the number of future predicts and do the prediction operation? or there's another techniques?
please help or leave a suggestion.
Can you provide the framework you are using? tensorflow? pytorch? which web tutorials specifically?
Assuming you are going tensorflow, you can copy and paste code from one of these, test that it works on the provided dataset, then modify the input encoding functions to fit your dataset, then run on your dataset.
https://github.com/llSourcell/How-to-Predict-Stock-Prices-Easily-Demo (best)
https://github.com/sebastianheinz/stockprediction
https://github.com/talolard/MarketVectors/blob/master/preparedata.ipynb (you will have to replace fc layers with lstm, and fiddle with inputs)
In general procedure is something like (assuming tensorflow):
Download Dataset
Create a function to load batches of data
Create a function to encode batch of data (normalization, other transforms)
Create LSTM layer to recieve series of inputs.
Create output layer (usually fully connected) to take last lstm state and predict output of your desired size.
Create a tf session to wire everything together, and hit run.
Some questions to ask conceptually about which network use:
How many inputs to how many outputs - see this excellent http://cs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf by Karpathy
How far back do you consider the stock prices eg {t-100... t} or {t-10 ...t} which may dictate size of hidden layers.
What other information do you think is relevant to the model? does stock A influence stock B? in which case you may have 2 lstms outputing a state to your fully connected layer...
Related
I'm new to machine learning, and I've been given a task where I'm asked to extract features from a data set with continuous data using representation learning (for example a stacked autoencoder).
Then I'm to combine these extracted features with the original features of the dataset and then use a feature selection technique to determine my final set of features that goes into my prediction model.
Could anyone point me to some resources or demos or sample code of how I could get started on this? I'm very confused on where to begin on this and would love some advice!
Okay, say you have an input of (1000 instances and 30 features). What I would do based on what you told us is:
Train an autoencoder, a neural network that compresses the input and then decompresses it, which has as a target your original input. The compressed representation lies in the latent space and encapsulates information about the input which is not directly accessible by humans. Now you may find such networks in tensorflow or pytorch. Tensorflow is easier and more straightforward so it could be better for you. I will provide this link (https://keras.io/examples/generative/vae/) for a variational autoencoder that may do the job for you. This has Conv2D layers so it performs really well for image data, but you can play around with the architecture. I cannot tell u more because you did not provide more info for your dataset. However, the important thing is the following:
After your autoencoder is trained properly and you need to make sure of it, (it adequately reconstructs the input) then you need to extract the aforementioned latent inputs (you will find more in the link). Now, that will be let's say 16 numbers but you can play with it. These 16 numbers were built to preserve info regarding your input. You said you wanted to combine these numbers with your input so might as well do that and end up with 46 input features. Now the feature selection part has to do with selecting the input features that are more useful for your model. That is not very interesting, you may find more information (https://towardsdatascience.com/feature-selection-techniques-in-machine-learning-with-python-f24e7da3f36e) and one way to select features is by training many models with different feature subsets. Remember, techniques such as PCA are for feature extraction not selection. I cannot provide any demo that does the whole thing but there are sources that can help. Remember, your autoencoder is supposed to return 16 numbers for each training example. Your autoencoder is trained only on your train data, with your train data as targets.
I am working on LSTM RNN on stock prediction sample data. It seems the RNN is not giving more weights to
recent data.The weights is equally shared in different time steps in RNN. Do we have an option to increase the weight for recent data? (With any parameters in LSTM OR RNN).
Can you correct me or give some more inputs on this.
Thanks in advance.
This is why most time-series models now have an attention mechanism. As the attention mechanism is better at learning the relevant time steps. It is also why some people are now using the transformer. RNN/LSTM don't learn long range dependencies well. For instance the DA-RNN paper states
In the first stage, we introduce an in- put attention mechanism to adaptively extract rel- evant driving series (a.k.a., input features) at each time step by referring to the previous encoder hid- den state. In the second stage, we use a temporal attention mechanism to select relevant encoder hid- den states across all time steps.
The key word being across all time steps. You can find implementations of several attention/transformer based models here (disclaimer I'm the maintainer of this framework).
I'm building a Sequential NN model in Keras for binary classification. The training data has about 600,000 rows and 2,000 features, so every epoch and every layer is very time consuming. I believe many of the features are not relevant to the model, and can be dropped altogether, to make the model thinner, so it it would be faster to work with.
I run a simple model with one hidden-layer of 200 neurons. How can I tell which of the features (which are actually the nodes in the input layer) are meaningless, so I could drop them from the data set and re run the model without them?
There is a very big topic in machine learning called feature selection. Though, neural networks are considered to automatically choose the best features for the problem, to an extent, by using their weights, to either consider more or less some of them. Neural networks also need a lot of experience to be tuned correctly. I would definitely suggest you to increase the layers of the network, because you have a lot of data and features and use l1 regularisation, in order to get sparse weights and exclude most of the features. Also, these information are indicative, since I do not know anything about your dataset and your network architecture. At last, I would suggest you to study more about the basics of machine learning and then continue learning about neural networks, before practicing with real data.
Is there a way to learn unsupervised features from set of images. Similar to word2vec or doc2vec, where neural network is learnt and given new document we get its features.
Expecting similar to this example shows that it can load learnt nn-model and predict features for new images.
Is there any simple example how to implement cnn over images and get their features back will help !!
Suppose in this example
If I want to get cnn features for all X_train and X_test ... is there any way?
Also, if we can get weights per layer per image, we can stack them and use as features. In that case is there a way to get the same.
Using those features for unsupervised task would be easier, if we consider them as vectors.
If I correctly understood your question, this task is quite common in a deep learning field. In case of images what I consider the best is a convolutional autoencoder. You may read about this architecture e.g. here
http://people.idsia.ch/~ciresan/data/icann2011.pdf
Previous version of Keras supported this architecture as one of core layers, though from version 1.0 I noticed that it disappeared from documentation. But - it's still quite easy to build it from a scratch :)
In noimage cases there are also another approaches like e.g. Restricted Boltzmann Machines.
UPDATE :
When it comes to what sort of activations are the best for obtaining new features from neural network activations - from my personal experience - it depends on the size of the net which you use. If you use a network which last layer is wide (has a lot of nodes) it might be useful to get only last layer (due to number of parameters if you want to consider also previous layers - it may harm the performance of learning). But - if (like in case of some MNIST networks) your last layer is not sufficient for this task - you may try using also previous layers activation or even all net activity. To be honest - I'm not expecting much of improvement in this case - but you may try. I think that you should use both approaches - starting from taking only last layer activations - and then trying to check the behaviour of your new classifier when you add activations from previous layers.
What I will strongly advise to you is also getting some insights from what sort of features network is learning - using T-SNE embeddings of it activations. In many cases I found it useful - e.g. checking if the size of a layer is sufficient. Using T-SNE you may check if the features obtained from last layer are good discriminators of your classes. It may also give you good insights about your data and what neural networks are really learning (alongside with amazing visualizations :) )
I have installed the tensorflow and follow the tutorial here
https://www.tensorflow.org/versions/0.6.0/tutorials/mnist/tf/index.html#tensorflow-mechanics-101
and build it successfully, I can get the evaluation result for the same size dataset, like 1000X784 for training set, and 1000X784 for testing set.
but what if i want to test one data, 1X784, and find out what's the output, using the algorithm trained above.
I am now to tensorflow, and new to Machine Learning, I hope that I have described my self.
It's not clear to me which part you're having trouble with, but I think what you're asking is how to use batch size 1000 for training, but only predict on a single input. I assume you already know how to predict on batches of size 1000.
If the first dimension of your model's input placeholder, which is usually the batch size, is set to be None, the size is inferred when you provide an input. So, if you change the 1000 to be None, you should then be able to pass an input of size 1 by 784 to make predictions.
The solution that you found to feed a 1*784 is a great solution to just get a quick feed back , however in bigger networks which they need a lot of time (around hours) for training your solution is not feasible.
Tensorflow they have a new feature it's name is Tensorflow serving which you give it a train model then you interact with your model as a client.
Here is their website for more information : https://github.com/tensorflow/serving