which higher layer abstraction to use for tensorflow

which higher layer abstraction to use for tensorflow - python

I am looking for higher layer abstractions for my deep learning project.
Few doubts lately.
I am really confused about which is more actively maintained tflearn(docs), or tensorflow.contrib.learn. But projects are different and actively contributed on Github. I did not find why are people working this way, same goal, same name, but working differently.
That was not enough, we also have skflow, why do we have this project separately, this aims to mimic scikit-learn like functionality for deep learning(just like tflearn do).
There are more and more coming, which one choose, and which one will be maintained in future?
Any ideas?
PS: I know this might get closed. but I would definitely want some answers first. Those wanting it closed, please care to drop a reason/hint/link in comments

What about keras (https://keras.io/)? It is easy to use. However you can do pretty much everything you want with it. It uses either theano or tensorflow as its backend. Kaggle contests are often solved using keras (e.g. https://github.com/EdwardTyantov/ultrasound-nerve-segmentation).
Edit:
Because you did not specify python I would also recommend matconvnet if you look for more abstraction.

Related

Tensorflow - executing a model in production as efficiently as possible

I have a semantic-segmentation model created using Keras.
Now I want to use it in production where I need to execute the model on a large folder with 10k-100k images a few times a day. This takes several hours, so every improvement is helpful.
I'm wondering what is the correct way to use it in production. I currently just use model.predict() on a created Sequence. But everywhere I look I see all kinds of different libraries or technologies that seem relevant.
Tensorflow-serving, converting to C, different libraries by intel and others.
I'm wondering what is the bottom-line recommended way to execute a model as production-grade and as efficiently as possible.

I'm not sure this has a canonical answer — as with many things there are lots of tradeoffs between different choices — but I'll attempt to give an answer.
I've been pretty happy using TensorFlow Serving to do model deployment, with separate services doing the business logic calling those models and doing something with the predictions. That provides a small boost because there won't be as much resource contention — the TensorFlow Serving instances do nothing but run the models. We have them deployed via Kubernetes, and that makes grouping a cluster of TensorFlow Serving instances very easy if you want to scale horizontally as well for more throughput.
You're unlikely to get meaningful improvements by messing around the edges with things like making sure the TensorFlow Serving deployment is compiled with the right flags to use all of Intel's vector instructions. The big boost is running everything in fast C++ code. The one (probably very obvious) way to boost performance is to run the inference on a GPU rather than a CPU. That's going to scale more or less the way you'd expect: the more powerful the GPU, the faster the inference is going to be.
There are probably more involved things you could do to eke our more single percentage point gains. But this strikes a pretty good balance of speed with flexibility. It's definitely a little bit more finicky to have this separate service architecture: if you're not doing something too complicated, it might be easier (if quite a bit slower) to use your models "as-is" in a Python script rather than going to the trouble of setting up TensorFlow serving. On the other hand, the speedup is pretty significant, and it's pretty easy to manage. On the other end of the spectrum, I have no idea what crazy things you could do to eke out more marginal performance gains, but instinct tells me that they're going to be pretty exotic, and therefore pretty difficult to maintain.

It is difficult to answer, but I will consider the following orthogonal aspects
Is it possible for me to run a model at a lower resolution? If so, resizing an image before running the model -- this should give you X**2 times of speed up, where X is the downsampling factor that you use.
Production models are often executed remotely. So understanding your remote machine config is very important. If you only have CPU-only machines, options like OpenVINO typically give more speed-up than the native tensorflow. If you have GPU machines, options like tensorRT can also help you. The actual speed-up is very difficult to estimate, but I would say at least 2x faster.
Uploading/downloading a JPEG image instead of PNG or BMP. This should largely reduce your communication time.

Doest TensorFlow 2 supports Multiprocessing on different CPU cores?

I am stuck in implementing the Asynchronous Advantage Actor-Critic (A3C) using TensorFlow 2.
Problem Definition:
For A3C implementation, I have to create a bunch of workers (as much as number of CPU cores) and a master. All the workers and also the master will create a copy of a unique CNN module for themselves. The problem arises when each worker has to optimize the master's CNN module and also synchronize its weight with the weights of the master's CNN. I have implemented this by multithreading with no problems, but when multiprocessing comes in, python could serialize neither the weights nor CNN itself to pass between workers and master.
Other's problem:
When I was googling to cope with this problem I have noticed different opinions (Also almost all the Q-A were related to the TF 1). Some people believe that TensorFlow doesn't support multiprocessing so they either moved to PyTorch or just used multithreading. Other people proposed ray library.
First of all, I want to know is it possible to do a multiprocessing approach like A3C by TF 2.
And if it is possible I will appreciate it if someone shares similar works with me.

I'm running into the exact same issue myself. I did find a resource (see link below) that uses TF1.X and multiprocessing for A3C. In general, they use Queues to share the model weights.
Personally, I'm curious is there's an easier or better way to use multiprocessing for A3C. I found it quite hard to replicate their approach, so if you find another method, please share!
https://github.com/hongzimao/a3c/blob/master/train.py

Inspecting the values of constant or variable tensors during debug

I implemented a model in TensorFlow (Python) that I previously programmed in C++ using Eigen, where it worked as expected. But the model is not working as expected in Python, and it's probably because I am defining tensors incorrectly or I am mixing up dimensions.
I am trying to get a feel for the problems by using Visual Studio's (2017) debugger (if a different IDE is better for this then I'm all ears, but I would prefer to stick with VS), but tensors do not evaluate to anything - and I can understand this because the tensor defines an operation and not a data object (well it only produces a data object after calling a session.run).
However, constant and variable tensors - and any other tensors built solely on top of such tensors - come with predefined data. So hey, why not be able to inspect the value through the debugging UI?
So my question: is there a way to inpect the data with some extension?
For example, if I was working in C++ and with Eigen, I can use Eigen.natvis as described here. Anything similar for TensorFlow? It's not just a matter of seeing the evaluated value, either. It would be nice to see things like the shape, etc... while debugging.
I would also be open to other debugging techniques of TensorFlow code, if anyone has a good suggestion.

TensorFlow includes tfdbg, a debugger for TensorFlow models, where you can step through each execution step, check values, stop on NaN, etc. See the programmer's guide TensorFlow Debugger and The Debugger Dashboard for more information.
tfdbg can be a bit cumbersome to setup and use though. A quick alternative to check intermediate values is to use tf.Print operations. TensorFlow includes a few other debugging operations that you may find useful to check for some basic things.
EDIT: Another tool that can be useful is eager execution. This allows you to use TensorFlow operations as if they were regular Python operations (they return the result of the operation instead of the graph object), so it is a good way to check if some particular code does what you expect.

Using 2 GPU at same time for different traning's TensorFlow\Python

Python 2.7
TensorFlow
Ubuntu
Hello my dear friends, have question. Is it possible to use multi GPU for training different models at same time? It's not a problem to use them for one training with simply code manipulation, but what if I want try to do 2 of them at the same time with different parameters at same time?
I am sorry, if this question is easy, new in programming and TensorFlow. At guide
Using GPU's I tried to find a question about it, but here is for one process only, as I can understand.

It looks possible. For your models, you use "with tf.device.." and specify different gpus for different models. And then run both the program and see what happens. I tried with a simple program on different gps, it ran. let us know the output you find.

Can I program Nvidia's CUDA using only Python or do I have to learn C?

I guess the question speaks for itself. I'm interested in doing some serious computations but am not a programmer by trade. I can string enough python together to get done what I want. But can I write a program in python and have the GPU execute it using CUDA? Or do I have to use some mix of python and C?
The examples on Klockner's (sp) "pyCUDA" webpage had a mix of both python and C, so I'm not sure what the answer is.
If anyone wants to chime in about Opencl, feel free. I heard about this CUDA business only a couple of weeks ago and didn't know you could use your video cards like this.

You should take a look at CUDAmat and Theano. Both are approaches to writing code that executes on the GPU without really having to know much about GPU programming.

I believe that, with PyCUDA, your computational kernels will always have to be written as "CUDA C Code". PyCUDA takes charge of a lot of otherwise-tedious book-keeping, but does not build computational CUDA kernels from Python code.

pyopencl offers an interesting alternative to PyCUDA. It is described as a "sister project" to PyCUDA. It is a complete wrapper around OpenCL's API.
As far as I understand, OpenCL has the advantage of running on GPUs beyond Nvidia's.

Great answers already, but another option is Clyther. It will let you write OpenCL programs without even using C, by compiling a subset of Python into OpenCL kernels.

A promising library is Copperhead (alternative link), you just need to decorate the function that you want to be run by the GPU (and then you can opt-in / opt-out it to see what's best between cpu or gpu for that function)

There is a good, basic set of math constructs with compute kernels already written that can be accessed through pyCUDA's cumath module. If you want to do more involved or specific/custom stuff you will have to write a touch of C in the kernel definition, but the nice thing about pyCUDA is that it will do the heavy C-lifting for you; it does a lot of meta-programming on the back-end so you don't have to worry about serious C programming, just the little pieces. One of the examples given is a Map/Reduce kernel to calculate the dot product:
dot_krnl = ReductionKernel(np.float32, neutral="0",
reduce_expr="a+b",
map_expr="x[i]*y[i]",
arguments="float *x, float *y")
The little snippets of code inside each of those arguments are C lines, but it actually writes the program for you. the ReductionKernel is a custom kernel type for map/reducish type functions, but there are different types. The examples portion of the official pyCUDA documentation goes into more detail.
Good luck!

Scikits CUDA package could be a better option, provided that it doesn't require any low-level knowledge or C code for any operation that can be represented as numpy array manipulation.

I was wondering the same thing and carried a few searches. I found the article linked below which seems to answer your question. However, you asked this back in 2014 and the Nvidia article does not have a date.
https://developer.nvidia.com/how-to-cuda-python
The video goes through the set up, an initial example and, quite importantly, profiliing. However, I do not know if you can implement all of the usual general compute patterns. I would think you can because as far as I could there are no limitations in NumPy.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.