When does ONNX decide to include a Gather op? - python

Obscure question, I suppose, but I have a PyTorch model that I've just built, and it's failing to convert to CoreML because ONNX has added Gather ops. The complete model is actually an amalgamation of two separate models, intended to improve performance by keeping the processing on the GPU/Metal for as long as possible.
Building this "composite" model required me to create a couple of slices, of the form x = y[:, 0], and I'm wondering if these might be the reason for the Gather ops?
I do realize I can create a custom layer, but I've just been through a horrible fiasco with custom layers in CoreML, that wasted many, many hours, and got me nowhere, so I'm trying to find another way around the problem.
If finding a way around those slices would prevent ONNX from adding Gather I'd be willing to search for a solution.
Any thoughts appreciated.

Related

How to understand/debug/visualize U-Net segmentation results

I am training a U-Net architecture to for a segmentation task. This is in Python using Keras. I have now run into an issue, that I am trying to understand:
I have two very similar images from a microscopy image series (these are consecutive images), where my current U-Net model performs very good on one, but performs extremely poor on the immediately following one. However, there is little difference between the two to the eye and the histograms also look very much alike. Also on other measurements the model performs great across the whole frame-range, but then this issue appears for other measurements.
I am using data-augmentation during training (histogram stretching, affine transformation, noise-addition) and I am surprised that still the model is so brittle.
Since the U-Net is still mostly a black-box to me, I want to find out steps I can take to better understand the issue and then adjust the training/model accordingly.
I know there are ways to visualize what individual layers learn (e.g. as discussed F. Chollets book see here) and I should be able to apply these to U-Nets, which is fully convolutional.
However, these kinds of methods are practically always discussed in the realm of classifying networks - not semantic segmentation.
So my question is:
Is this the best/most direct approach to reach an understanding of how U-Net models attain a segmentation result? If not, what are better ways to understand/debug U-Nets?
I suggest you use the U-Net container on NGC https://ngc.nvidia.com/catalog/resources/nvidia:unet_industrial_for_tensorflow
I also suggest you read this: Mixed Precision Training: https://arxiv.org/abs/1710.03740
https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/
Let me know how you are progressing and if any public repo, happy to have a look

Can I extend a Tensorflow Estimator to also return the explanation values?

I have a model built by following roughly the tutorial provided for the tf.estimator.BoostedTreesClassifier in the docs. I then exported it by using the tf.Estimator.export_saved_model method as described in the SavedModels from Estimators section of the SavedModel docs. This loads in to TensorFlow Serving and answers gRPC and REST requests.
I'd now like to include the explanation factors along with any predictions. Or, less ideally, as a second signature available on the exported model. tf.estimator._BoostedTreesBase.experimental_predict_with_explanations already implements an appropriate algorithm, as described in Local Interpretability section of the docs.
I thought it would be possible to 'extend' the existing estimator in a way that would let me expose this method as another served signature. I've thought of several approaches, but only tried the first two so far:
I've Tried
Change which signatures export_saved_model exports
This didn't go very far. The exposed signatures are a little dynamic, but seem to be limited to the train, predict or eval options defined by tensorflow_core.python.saved_model.model_utils.mode_keys.KerasModeKeys.
Just use an eval_savedmodel?
I briefly thought Eval might be what I was looking for, and followed some of the getting started guide for TensorFlow Model Analysis. The further I go on this path the more it seems like the main difference with an Eval model is how the data is loaded, and that isn't what I want to change.
Subclass the estimator
There are extra caveats with exporting subclassed models. And on top of that an Estimator isn't a Model. It's a model with extra metadata around inputs, outputs and configuration, so I am not clear if a subclassed estimator would even be exportable in the same way a Keras Model is.
I abandoned this subclassing approach without writing much code.
Pull the BoostedTrees Model out of the Estimator
I am not savvy enough to arrange a BoostedTrees model myself, using the low-level primitives. The code in the Estimator that sets it up looks fairly complex. It would be nice to leverage that work, but it seems that the Estimator deals in model_fns, they change depending on the train/predict/eval mode, and it isn't clear what the relationship to a Keras Model is.
I wrote a little code for this, but also gave up on it quickly.
What Next?
Given the above dead ends, which angle should I be persuing further?
Both the low-level export API, and the low-level model building API look like they could get me closer to a solution. The gap between setting up an Estimator, and re-creating one using either API seems fairly wide.
Is it possible I could continue using the existing Estimator, but use the low-level export API to create something with an "interpret" signature that calls through to experimental_predict_with_explanations? Or even "predict and interpret" in a single step? Which tutorial will put me on that path?

Project organization with Tensorflow.keras. Should one subclass tf.keras.Model?

I'm using Tensorflow 1.14 and the tf.keras API to build a number (>10) of differnet neural networks. (I'm also interested in the answers to this question using Tensorflow 2). I'm wondering how I should organize my project.
I convert the keras models into estimators using tf.keras.estimator.model_to_estimator and Tensorboard for visualization. I'm also sometimes using model.summary(). Each of my models has a number (>20) of hyperparameters and takes as input one of three types of input data. I sometimes use hyperparameter optimization, such that I often manually delete models and use tf.keras.backend.clear_session() before trying the next set of hyperparameters.
Currently I'm using functions that take hyperparameters as arguments and return the respective compiled keras model to be turned into an estimator. I use three different "Main_Datatype.py" scripts to train models for the three different input data types. All data is loaded from .tfrecord files and there is an input function for each data type, which is used by all estimators taking that type of data as input. I switch between models (i.e. functions returning a model) in the Main scripts. I also have some building blocks that are part of more than one model, for which I use helper functions returning them, piecing together the final result using the Keras functional API.
The slight incompatibilities of the different models are begining to confuse me and I've decided to organise the project using classes. I'm planing to make a class for each model that keeps track of hyperparameters and correct naming of each model and its model directory. However, I'm wondering if there are established or recomended ways to do this in Tensorflow.
Question: Should I be subclassing tf.keras.Model instead of using functions to build models or python classes that encapsulate them? Would subclassing keras.Model break (or require much work to enable) any of the functionality that I use with keras estimators and tensorboard? I've seen many issues people have with using custom Model classes and am somewhat reluctant to put in the work only to find that it doesn't work for me. Do you have other suggestions how to better organize my project?
Thank you very much in advance.
Subclass only if you absolutely need to. I personally prefer following the following order of implementation. If the complexity of the model you are designing, can not be achieved using the first two options, then of course subclassing is the only option left.
tf.keras Sequential API
tf.keras Functional API
Subclass tf.keras.Model
Seems like a reasonable thing to do: https://www.tensorflow.org/guide/keras/custom_layers_and_models https://www.tensorflow.org/api_docs/python/tf/keras/Model guide

Reading multiple images, process them to one image and feed through model

Is there a way to build following graph in Tensorflow:
Load some N images (N can vary for each set) using TF Queues and TF Image Readers.
Process these images to get fixed size image and prepare batches.
Feed these batches through the CNN model
Some questions/info:
I am trying to build data loading part in TF instead of Python functions and feed_dict. I guess, TF Data loading can train the model faster compared to python and feed_dict. Is that right ?
Building the graph for small N (N<5) is easy. Define exclusive nodes for each image in N and process on them. (working)
Can I use TF "while_loop" to build such functionality to read N images ??
Does Keras supports such functionality ?
Thanks for your suggestions.
I just did this last week! It was awesome, I learned a ton about tensorflow using things like tf.map_fn, and tf.cond. And it worked.
This week I just refactored my code to eliminate it all, because it was a bad idea.
Issues I ran into:
Doing preprocessing in tensorflow is messy to debug. Doing proper TDD will definitely benefit you here, but still not going to be particularly pretty or easy to debug.
You should be offloading the preprocessing to the CPU and leaving the GPU (assuming you're using one) to do training. A better approach is to just have a queue and load it from a thread/class that's dedicated to your preprocessing task. And doing the work in numpy/scikit/scikit-image is going to be easier to configure and test.
I thought I was so smart, corralling all my code into a single model. But the complexity of the preprocessing meant my model was really hard to iterate on, it got to be rigid code quickly - example is when I added my test set evaluation in, the preprocessing requirement was slightly different. Suddenly I had to add large sections of conditional code to my model and it got ugly quick.
That being said, my preprocessing steps were maybe more complex than yours. If you're sticking to simple things where you can just apply some of the simple image preprocessing steps it might still be easier for you to go this approach.
To answer your questions specifically:
Queues won't give any benefit over feed_dict that I know of. You still have a problem of moving data from a TF queue on the CPU to the GPU memory each iteration same as feed_dict does, watch this thread if you care about that topic, GPU queues are coming: https://github.com/tensorflow/tensorflow/issues/7679
You should just dequeue_many from the queue, process them as a batch. If you need to do something to each individual image just use tf.map_fn which will remove the first dimension and pass individual 3D images to your specified function. But heed my warning above when you go this route - you'll probably be happier just doing this in a separate thread.
Already answered in #2, use tf.map_fn to iterate over multiple images in a batch. it's pretty easy to use actually.
I don't know Keras.

How to find the most important features learned during Deep Learning using CNN?

I followed the tutorial given at this site, which detailed how to perform text classification on the movie dataset using CNN. It utilized the movie review dataset to find predict positive and negative reviews.
My question is, is there any way to find the most important learned features from the model? Does Tensorflow/Theano has any support for this?
Thanks !
A word of warning: if you can trace the classification back to specific input features, it's quite possible that CNN is the wrong ML paradigm for your application. Most text processing uses RNN, bag-of-words, bi-grams, and other simple linear combinations.
The structure of a CNN is generally antithetical to identifying the importance of individual features. Because of the various non-linear layers, it is rarely possible to pick out any one feature as important; rather, the combinations of inputs form small structures of inference, which then convolve to form more complex structures, until the final output is driven by a series of neighbor relationships, cut-offs, poolings, and other items.
This is why back-propagation is so important to running CNNs: the causation chain does not reverse cleanly. Otherwise, we'd reduce the process to a simple linear NN with one hidden layer.
If you want to analyze what's happening, try visualizing your intermediate layers. There are various modules to help with that; for instance, try a search for "+theano +visualize +CNN -news" (the last is to remove the high-traffic references to Cable News Network). There are plenty of examples in image processing; we won't know how much it might help your text processing, until you try it.

Categories