I've created a script to train a keras neural net and have run it successfully on my machine (at the end of training there is roughly 0.8 validation accuracy). However, when I try to run the exact same code (on the same data) on a Google Cloud VM instance I get drastically worse results (~0.2 validation accuracy).
Git status confirms that the repo in the VM is up to date with master (same with my local machine), and I have verified that its versions of tf and keras are up to date (and same as my local machine). I've also set the numpy and tensorflow random seeds before importing Keras.
Has anyone run into a problem like this before? I'm at a loss for what could be causing this... the only difference I can think of is that my machine is running Python 3.6 whereas the VM is running Python 2.7. Could that account for the vast difference is training results?
I found a buggy interaction between Keras and the Estimator API in tensorflow 1.10 (current gcloud version), but not in >=1.11 (what I was using locally).
Not sure if it applies to you (do you use Keras+Estimator and tensorflow >=1.11 for local?)
I filed a bug report here: https://github.com/tensorflow/tensorflow/issues/24299
Related
Working on my local computer, I've created a Tensorflow Object Detector. I have exported the model (which I've tested using the checkpoints) to a protobuf file as well as several others (TF lite, TF js, etc). I now need to transfer this trained model to another computer that doesn't have the Object Detection API or other things I needed to build the model.
Do I need all these dependencies on the new machine? Or, does the protobuf file contain everything that the machine will need? The new machine only has the basic anaconda environment packages as well as tensorflow.
Protobuf files most commonly contains both model and weights. So in theory you can load your model on any machine with TensorFlow.
The only problem that I can think of is saving custom layers/losses/optimizers and data pre/postprocessing.
I have a python code which uses keras and tensorflow backend. My system doesn't support training this model due to low memory space. I want to take use of Amazon sagemaker.
However all the tutorials I find are about deploying your model in docker containers. My model isn't trained and I want to train it on Amazon Sagemaker.
Is there a way to do this?
EDIT : Also can I make a script of my python code and run on it on AWS sagemaker?
SageMaker provides the capability for users to bring in their custom training scripts and train their algorithms using the script it on SageMaker using one of the pre-built containers for frameworks like TensorFlow, MXNet, PyTorch.
Please take a look at https://github.com/aws/amazon-sagemaker-examples/blob/master/frameworks/tensorflow/get_started_mnist_train.ipynb
It walks through how you can bring in your training script using TensorFlow and train it using SageMaker.
There are several other examples in the repository which will help you answer other questions you might have as you progress on with your SageMaker journey.
I'm trying to use a custom keras model I trained with tensorflow-gpu on my desktop with Python on a mobile phone (Android), however I need to run it with Python on the phone as well. I looked up TensorFlow Lite, however that appears to be written for Java.
Is there any lite (Python) version of TensorFlow, some kind of barebones package that's just set up for making predictions from a TensorFlow/keras model file? I'm trying to focus on saving space, so a solution under 50mb would be desired.
Thanks
TensorFlow Serving was built for the specific purpose of serving pre-trained models. I'm not sure if it runs (or how difficult to make it run) on Android or what it's compiled footprint is, if it's less than 50MB. If you can make it work, please do report back here!
I am testing the new Tensorflow Object Detection API in Python, and I succeeded in installing it on Windows using docker. However, my trained model (Faster RCNN resnet101 COCO) takes up to 15 seconds to make a prediction (with very good accuracy though), probably because I only use Tensorflow CPU.
My three questions are:
Considering the latency, where is the problem? I heard Faster RCNN was a good model for low latency visual detection, is it because of the CPU-only execution?
With such latency, is it possible to make efficient realtime video processing by using tensorflow GPU, or should I use a more popular model like YOLO?
The popular mean to use tensorflow GPU in docker is nvidia-docker but is not supported on windows. Should I continue to look for a docker (or conda) solution for local prediction, or should I deploy my model directly to a virtual instance with GPU (I am comfortable with Google Cloud Platform)?
Any advice and/or good practice concerning real-time video processing with Tensorflow is very welcome!
Considering the latency, where is the problem ? I heard Faster RCNN
was a good model for low latency visual detection, is it because of
the CPU-only execution ?
Of course, it's because you are using CPU.
With such latency, is it possible to make efficient realtime video
processing by using tensorflow GPU, or should I use a more popular
model like YOLO ?
Yolo is fast, but I once used it for face and accuracy was not that great. But a good alternative.
The popular mean to use tensorflow GPU in docker is nvidia-docker but
is not supported on windows. Should I continue to look for a docker
(or conda) solution for local prediction, or should I deploy my model
directly to a virtual instance with GPU (I am comfortable with Google
Cloud Platform) ?
I think you can still use your local GPU in windows, as Tensorflow supports GPU on python.
And here is an example, simply to do that. It has a client which can read webcam or IP cam stream. The server is using Tensorflow python GPU version and ready to use pre-trained model for predictions.
Unfortunately, Tensoflow does not support tensorflow-serving on windows. Also as you said Nvidia-Docker is not supported on windows. Bash on windows has no support for GPU either. So I think this is the only easy way to go for now.
From google tutorial we know how to train a model in TensorFlow. But what is the best way to save a trained model, then serve the prediction using a basic minimal python api in production server.
My question is basically for TensorFlow best practices to save the model and serve prediction on live server without compromising speed and memory issue. Since the API server will be running on the background for forever.
A small snippet of python code will be appreciated.
TensorFlow Serving is a high performance, open source serving system for machine learning models, designed for production environments and optimized for TensorFlow. The initial release contains C++ server and Python client examples based on gRPC. The basic architecture is shown in the diagram below.
To get started quickly, check out the tutorial.