I have a basic question. I want to know whether it is possible to use Keras (e.g. the functional API) to specify a neural network model, and then use the Keras optimization routines to train the network outside of the Keras training process. In other words, use Keras purely to specify a neural network, pull the weights out and put them into a loss function (outside the Keras APIs if necessary) then use one of the built-in optimization routines purely to minimize the loss function over a single batch of data (no multiple epochs etc. or anything beyond minimization of the loss function with regard to one set of data at this point).
My reason for wanting to do this is that I would like to use a dynamic optimization process that changes from batch iteration to batch iteration, which seems difficult to implement entirely within the Keras APIs.
Related
I have recently completed a "pretty good" TF2/keras model for image recog using a number of layers, SGD optimization AND starting with a MobileNetv2 pre-trained model.
I could tweak this forever: adding/removing layers, different optimization algos, learning rate, momentum, various dataset augmentation, etc. And I haven't even considered starting with other pre-trained models. I changed the optimizer from SGD to ADAM (which should be better, right?) and it was slightly more inaccurate.
So, how do I converge on a better pre-trained model, parameters, values? Is it just trial-and-error? It takes about 45min to train my model (10 epochs), which seems forever when I'm tweaking so many variables.
I think I could write a python framework to plug in various training attributes and then just let it run for a couple days.
I dont know if this is a suitable SO question or not.
This problem is called hyperparameter tuning (or optimization). You can decide to do this manually or by using search technique like grid search over all your parameters.
There is also more advanced techniques that use Bayesian optimization to automate this process.
One common and established tool for doing hyperparameter optimization in the ML community is called hyperopt.
https://github.com/hyperopt/hyperopt
Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.
Also, since you tagged Keras in the question, there is a tool called auto keras which also searches for hyperparameters https://autokeras.com/
Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models.
R's package 'forecast' has a function nnetar, which uses feed-forward neural networks with a single hidden layer to predict in time series.
Now I am using Python to do the similar analysis. I want to use neural network which does not need to be as complex as deep learning. Maybe 2 layers and a couple of nodes are good enough for my case.
So, does Python have a model of simple neural networks which can be used in time series lik nnetar? If not, how to deal with this problem?
Any NN model that uses 1 or more hidden layers is a multi-layer perceptron model, and for that case it is trivial to make it extendable to N layers. So any library that you pick will support it. My guess for you not picking a complex library like pytorch/Tensorflow is its size.
Tensorflow does have TF-Lite which can work for smaller IOT devices.
Sklearn does have MLPRegressor that can train NNs if that is more to your liking.
You can always write your model. There are plenty of examples for this that use numpy and are plenty fast for cpu computation.( Single Hidden layer NN I am guessing will be more memory bound than computation bound)
Use another ML algorithm. Single Hidden layer NNs will not perform nearly as well as other other simpler algorithms.
If there are other reasons for not using a standard library like tensorflow/pytorch then you should mention them.
I am trying to write some custom layers in Keras. The ultimate goal is that certain parameters (updated according to a fixed formula after each batch of data is optimized over in the training process) be passed to the loss function. I do not believe it is possible to use dynamic loss functions in Keras, but that I should be able to pass these parameters to the loss function using multiple inputs and a custom layer.
I want to know whether it is possible to create a layer in Keras having parameters that are not trainable (and not optimized over at all in the training process), but instead updated according to a fixed formula at the end of each batch optimization in the training process.
The simplest example I can give: instead of optimizing a generic cost function (like cross-entropy), I want to optimize something proportional to the cross entropy (c*cross_entropy). After one batch of data is processed in the training procedure, I want to set, for example, c = 1.2*c, and this to be used as the c value in the batch of data.
(This should be more or less useless in this case as a positive constant times the loss function shouldn't affect the minima but it's fairly close to what I actually need to do).
I am trying to convert a keras model to tpu model in google colab, but this model has another model inside.
Take a look at the code:
https://colab.research.google.com/drive/1EmIrheKnrNYNNHPp0J7EBjw2WjsPXFVJ
This is a modified version of one of the examples in the google tpu documentation:
https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/fashion_mnist.ipynb
If the sub_model is converted and used directly it works, but if the sub model is inside another model it does not work. I need the sub model type of network because i am trying to train a GAN network that has 2 networks inside (gan=generator+discriminator) so if this test works probably it will work with the gan too.
I have tried several things:
Convert to tpu the model without converting the sub model, in that case when training starts an error is prompted related to the inputs of the sub model.
Convert both the model and sub model to tpu, in that case an error is prompted when converting the "parent" model, the exception only says at the end "layers".
Convert only the sub model to tpu, in that case no error is prompted but the training is not accelerated by the tpu and it is extremely slow like if no conversion to tpu was made at all.
Using fixed batch size or not, both have the same result, the model does not work.
Any ideas? Thanks a lot.
Divide into parts only use submodel at tpu first. Then put something simple instead of submodel and use the model in TPU. If this does not work , create something very simple which includes similar structure with models you are sure that are working and then step by step add things to converge your complex model which you want to use in TPU.
I am struggling with such things. What I did at the very beginning using MNIST is trained the model and get the coefficients outside rewrite relu dense dropout and NN matricies myself and run the model using numpy and then cupy and then pyopencl and then I replaced functions with my own raw cuda C and opencl functions so that getting deeper and simpler I can find what is wrong when something does not work. At last I write my genetic selective training algo and learned a lot.
And most important it gave me the opportunity to try some crazy ideas for training and modelling and manuplating and making sense of NN coffecients.
The problem in my opinion is TF - Keras etc are too high level. Optimizers - Solvers , there is too much unknown. Even neural networks are not under control. GAN is problematic while training it does not converge everytime takes days to train most of the time. Even if you train. You dont know any idea how it converges. Most of the tricks - techniques which protects you from vanishing gradient are not mathematically backed they are nevertheless works very amazingly. (?!?)
**Go simpler deeper and and complexity step by step. Follow a practicing on which you comprehend as much as you can ** It will cost some time and energy but you will benefit it tremendously in my opinion.
I'm attempting to make multiple sequential predictions from a tensorflow network, but performance seems very poor (~500ms per prediction for a 2-layer 8x8 convolutional network) even for a CPU. I suspect that part of the problem is that it appears to be reloading the network parameters every time. Each call to classifier.predict in the code below results in the following line of output - which I therefore see hundreds of times.
INFO:tensorflow:Restoring parameters from /tmp/model_data/model.ckpt-102001
How can I reuse the checkpoint that is already loaded?
(I can't do batch predictions here because the output of the network is a move to play in a game, which then needs to be applied to the current state before feeding the the new game state.)
Here's the loop that's doing the predictions.
def rollout(classifier, state):
while not state.terminated:
predict_input_fn = tf.estimator.inputs.numpy_input_fn(x={"x": state.as_nn_input()}, shuffle=False)
prediction = next(classifier.predict(input_fn=predict_input_fn))
index = np.random.choice(NUM_ACTIONS, p=prediction["probabilities"]) # Select a move according to the network's output probabilities
state.apply_move(index)
classifier is a tf.estimator.Estimator created with...
classifier = tf.estimator.Estimator(
model_fn=cnn_model_fn, model_dir=os.path.join(tempfile.gettempdir(), 'model_data'))
The Estimator API is a high-level API.
The tf.estimator framework makes it easy to construct and train
machine learning models via its high-level Estimator API. Estimator
offers classes you can instantiate to quickly configure common model
types such as regressors and classifiers.
The Estimator API abstracts away lots of the complexity of TensorFlow, but loses some generality in the process. Having read the code, it's clear that there's no way to run multiple sequential predictions without reloading the model each time. The low-level TensorFlow APIs allow this behaviour. But...
Keras is a high-level framework that supports this use case. Simple define the model and then call predict repeatedly.
def rollout(model, state):
while not state.terminated:
predictions = model.predict(state.as_nn_input())
for _, prediction in enumerate(predictions):
index = np.random.choice(bt.ACTIONS, p=prediction)
state.apply_mode(index)
Unscientific benchmarking shows that this is ~100x faster.