Inspecting the values of constant or variable tensors during debug - python

I implemented a model in TensorFlow (Python) that I previously programmed in C++ using Eigen, where it worked as expected. But the model is not working as expected in Python, and it's probably because I am defining tensors incorrectly or I am mixing up dimensions.
I am trying to get a feel for the problems by using Visual Studio's (2017) debugger (if a different IDE is better for this then I'm all ears, but I would prefer to stick with VS), but tensors do not evaluate to anything - and I can understand this because the tensor defines an operation and not a data object (well it only produces a data object after calling a session.run).
However, constant and variable tensors - and any other tensors built solely on top of such tensors - come with predefined data. So hey, why not be able to inspect the value through the debugging UI?
So my question: is there a way to inpect the data with some extension?
For example, if I was working in C++ and with Eigen, I can use Eigen.natvis as described here. Anything similar for TensorFlow? It's not just a matter of seeing the evaluated value, either. It would be nice to see things like the shape, etc... while debugging.
I would also be open to other debugging techniques of TensorFlow code, if anyone has a good suggestion.

TensorFlow includes tfdbg, a debugger for TensorFlow models, where you can step through each execution step, check values, stop on NaN, etc. See the programmer's guide TensorFlow Debugger and The Debugger Dashboard for more information.
tfdbg can be a bit cumbersome to setup and use though. A quick alternative to check intermediate values is to use tf.Print operations. TensorFlow includes a few other debugging operations that you may find useful to check for some basic things.
EDIT: Another tool that can be useful is eager execution. This allows you to use TensorFlow operations as if they were regular Python operations (they return the result of the operation instead of the graph object), so it is a good way to check if some particular code does what you expect.

Related

Is there an alternative to tf.py_function() for custom Python code?

I have started using TensorFlow 2.0 and have a little uncertainty with regard to one aspect.
Suppose I have this use case: while ingesting data with the tf.data.Dataset I want to apply some specific augmentation operations upon some images. However, the external libraries that I am using require that the image is a numpy array, not a tensor.
When using tf.data.Dataset.from_tensor_slices(), the flowing data needs to be of type Tensor. Concrete example:
def my_function(tensor_image):
print(tensor_image.numpy()
return
data = tf.data.Dataset.from_tensor_slices(tensor_images).map(my_function)
The code above does not work yielding an
'Tensor' object has no attribute 'numpy' error.
I have read the documentation on TensorFlow 2.0 stating that if one wants to use an arbitrary python logic, one should use tf.py_function or only TensorFlow primitives according to:
How to convert "tensor" to "numpy" array in tensorflow?
My question is the following: Is there another way to use arbitrary python code in a function with a custom decorator/an easier way than to use tf.py_function?
To me honestly it seems that there must be a more elegant way than passing to a tf.py_function, transforming to a numpy array, perform operations A,B,C,D and then retransform to a tensor and yield the result.
There is no other way of doing it, because tf.data.Datasets are still (and they will always be, I suppose, for performance reasons) executed in graph mode and, thus, you cannot use anything outside of the tf.* methods, that can be easily converted by TensorFlow to its graph representation.
Using tf.py_function is the only way to mix Python execution (and thus, you can use any Python library) and graph execution when using a tf.data.Dataset object (on the contrary of what happens when using TensorFlow 2.0, that being eager by default allow this mixed execution naturally).

How to use TFF api's for custom usage?

I have read and studied the TFF guide and APIs pages precisely. But I am confused in some detail parts.
For example, when I want to wrap/decorate a TF/python function, use these two below APIs:
1. tff.tf_computation()
2. tff.federated_computation()
I can not find what are differences between them and when I am allowed to use them. Especially, in case I want to use other algorithms except for FedAvg or FedSgd. I wonder if you know:
How they could be used to manipulate inputs? do they work on #CLIENT or #SERVER?
How I could use them in another usage except for the output of tff.federated_mean or tff.federated_sum that the value will be in the server?
How I am able to have access to the detail of data and metrics in #CLIENT and #SERVER?
Why we should invoke the tff.tf_computation() from tff.federated_computation()? In this link, there was not any explanation about them.
Do these APIs (e.g. tff.federated_mean or tff.federated_sum) modify the output elements of each #CLIENT and bring them to the #SERVER?
Could anyone help me to understand intuitive behind the concept?
A possible rule of thumb about the different function decorators:
tff.tf_computation is for wrapping TF logic. Think "tensors in, tensors out": this should be very similar to the usage of tf.function, where the parameters and return values are tensors, or nested structures of tensors. TFF intrinsics (e.g. tff.federated_mean) cannot be used inside a tff.tf_computation, and tff.tf_computations cannot call tff.federated_computations. The type signature is always on unplaced.
tff.federated_computation should be used to wrap TFF programming abstractions. Think "tensors here, tensors there": Inside this context, a tff.tf_computation can be applied to tff.Values and tff.Values can be communicated to other placements using the intrinsics. The type signature can accept federated types (i.e. types with placements).
For your list of questions:
Both can work on values placed at CLIENTS or SERVER. For example, tff.tf_computation called my_comp can be applied to a value v with type int32#CLIENTS with tff.federated_map(my_comp, v), which will run my_comp on each client.
tff.federated_map() supports applying a computation pointwise (across clients) to data not on the server. You can manipulate the metrics on each client using tff.federated_map. TFF isn't intended for separate options on different clients; the abstractions do not support addressing individuals. You may be able to simulate this in Python, see Operations performed on the communications between the server and clients.
The values of placed data can be inspected in simulation simply by returning them from a tff.Computation, and invoking that computation. The values should be available in the Python environment.
tff.tf_computations should be invokable from anywhere, if there is documentation that says otherwise please point to it. I believe what was intended to highlight is that a tff.federated_computation may invoke a tff.tf_computation, but not vice versa.
The tutorials (Federated Learning for Image Classification and Federated Learning for Text Generation) show examples of printing out the metrics in simulation. You may also be interested in the answer to how to print local outputs in tensorflow federated?
tff.tf_computations should be executed directly if desired. This will avoid any of the federated part of TFF, and simply delegate to TensorFlow. To apply the computation to federated values and use in combination with federated intrinsics, they must be called inside a tff.federated_computation.

Why couldn't Julia superset python?

The Julia Language syntax looks very similar to python, while the concept of a class (if one should address it as such a thing) is more what you use in C. There were many reasons why the creators decided on the difference with respect to the OOP. Still would it have been so hard (in comparison to create Julia in first place which is impressive) to find some canonical way to interpret python to Julia and thus get a hold of all the python libraries?
Yes. The design of Python makes it fundamentally difficult to optimize at compile-time (i.e. before you run the code). It is simply false that Julia is fast BECAUSE of its JIT. Rather, Julia is designed with its type system and multiple dispatch in mind so that way the compiler can know all of the necessary details to compile "the same code you would have written in C". That's what makes it fast: the type system. It makes a few trade-offs that allow it to, in "type-stable" functions, fully deduce what the types of every variable is, know what the memory layout of the type should be (including parametric types, so Vector{Float64} has a memory layout which is determined by the type and its parameter which inlines Float64 values like a NumPy array, except this is generalized in a way that your own struct types get the same efficiency), and compile a version of the code specifically for the types which are seen.
There are many ways where this is at odds with Python. For example, if the number of fields in a struct could change, then the memory layout could not be determined and thus these optimizations cannot occur at "compile-time". Julia was painstakingly designed to make sure that it would have type inferrability, and it uses that to generate code which is fully typed and remove all runtime checks (in type-stable functions. When a function is not type-stable, the types of the variables become dynamic rather than static and it slows down to Python-like speeds). In this sense, Julia actually isn't even optimized yet: all of its performance comes "for free" given the design of its type system. Python/MATLAB/R has to try really hard to optimize at runtime because it doesn't have the capability to do these deductions. In fact, those languages are "better optimized" right now in terms of runtime optimizations, but no one has really worked on runtime optimizations in Julia yet because in most performance sensitive cases you can get it all at compile time.
So then, what about Numba? Numba tries to take the route that Julia takes but with Python code by limiting what can be done so that way it can get type-stable code and compile that efficiently. However, this means a few things. First of all, it's not compatible with all Python codes or libraries. But more importantly, since Python is not a language built around its type system, the tools for controlling the code at the level of types is much reduced. So Numba doesn't have parametric vectors and generic codes which auto-specialize via multiple dispatch because these aren't features of the language. But that also means that it cannot make full use of the design, which limits how much it can do. It can handle the "use only floating point array" stuff just fine, but you can see it gets limited if you want one code to produce efficient code for "any number type, even ones I don't know about". However, by design, Julia does this automatically.
So at the core, Julia and Python are extremely different languages. It can be hard to see because Julia's syntax is close to Python's syntax, but they do not work the same at all.
This is a short summary of what I have described in a few blog posts. These go into more detail and show you how Julia is actually generating efficient code, how it gives you a generic "Python looking style" but doing so with full inferrability all the way down, and what the tradeoffs are.
How type-stability plus multiple dispatch gives performance:
http://ucidatascienceinitiative.github.io/IntroToJulia/Html/WhyJulia
http://www.stochasticlifestyle.com/7-julia-gotchas-handle/
How the type system allows for highly performant generic designs
http://www.stochasticlifestyle.com/type-dispatch-design-post-object-oriented-programming-julia/

which higher layer abstraction to use for tensorflow

I am looking for higher layer abstractions for my deep learning project.
Few doubts lately.
I am really confused about which is more actively maintained tflearn(docs), or tensorflow.contrib.learn. But projects are different and actively contributed on Github. I did not find why are people working this way, same goal, same name, but working differently.
That was not enough, we also have skflow, why do we have this project separately, this aims to mimic scikit-learn like functionality for deep learning(just like tflearn do).
There are more and more coming, which one choose, and which one will be maintained in future?
Any ideas?
PS: I know this might get closed. but I would definitely want some answers first. Those wanting it closed, please care to drop a reason/hint/link in comments
What about keras (https://keras.io/)? It is easy to use. However you can do pretty much everything you want with it. It uses either theano or tensorflow as its backend. Kaggle contests are often solved using keras (e.g. https://github.com/EdwardTyantov/ultrasound-nerve-segmentation).
Edit:
Because you did not specify python I would also recommend matconvnet if you look for more abstraction.

Automatic CudaMat conversion in Python

I'm looking into speeding up my python code, which is all matrix math, using some form of CUDA. Currently my code is using Python and Numpy, so it seems like it shouldn't be too difficult to rewrite it using something like either PyCUDA or CudaMat.
However, on my first attempt using CudaMat, I realized I had to rearrange a lot of the equations in order to keep the operations all on the GPU. This included the creation of many temporary variables so I could store the results of the operations.
I understand why this is necessary, but it makes what were once easy to read equations into somewhat of a mess that difficult to inspect for correctness. Additionally, I would like to be able to easily modify the equations later on, which isn't in their converted form.
The package Theano manages to do this by first creating a symbolic representation of the operations, then compiling them to CUDA. However, after trying Theano out for a bit, I was frustrated by how opaque everything was. For example, just getting the actual value for myvar.shape[0] is made difficult since the tree doesn't get evaluated until much later. I would also much prefer less of a framework in which my code much conform to a library that acts invisibly in the place of Numpy.
Thus, what I would really like is something much simpler. I don't want automatic differentiation (there are other packages like OpenOpt that can do that if I require it), or optimization of the tree, but just a conversion from standard Numpy notation to CudaMat/PyCUDA/somethingCUDA. In fact, I want to be able to have it evaluate to just Numpy without any CUDA code for testing.
I'm currently considering writing this myself, but before even consider such a venture, I wanted to see if anyone else knows of similar projects or a good starting place. The only other project I know that might be close to this is SymPy, but I don't know how easy it would be to adapt to this purpose.
My current idea would be to create an array class that looked like a Numpy.array class. It's only function would be to build a tree. At any time, that symbolic array class could be converted to a Numpy array class and be evaluated (there would also be a one-to-one parity). Alternatively, the array class could be traversed and have CudaMat commands be generated. If optimizations are required they can be done at that stage (e.g. re-ordering of operations, creation of temporary variables, etc.) without getting in the way of inspecting what's going on.
Any thoughts/comments/etc. on this would be greatly appreciated!
Update
A usage case may look something like (where sym is the theoretical module), where we might be doing something such as calculating the gradient:
W = sym.array(np.rand(size=(numVisible, numHidden)))
delta_o = -(x - z)
delta_h = sym.dot(delta_o, W)*h*(1.0-h)
grad_W = sym.dot(X.T, delta_h)
In this case, grad_W would actually just be a tree containing the operations that needed to be done. If you wanted to evaluate the expression normally (i.e. via Numpy) you could do:
npGrad_W = grad_W.asNumpy()
which would just execute the Numpy commands that the tree represents. If on the other hand, you wanted to use CUDA, you would do:
cudaGrad_W = grad_W.asCUDA()
which would convert the tree into expressions that can executed via CUDA (this could happen in a couple of different ways).
That way it should be trivial to: (1) test grad_W.asNumpy() == grad_W.asCUDA(), and (2) convert your pre-existing code to use CUDA.
Have you looked at the GPUArray portion of PyCUDA?
http://documen.tician.de/pycuda/array.html
While I haven't used it myself, it seems like it would be what you're looking for. In particular, check out the "Single-pass Custom Expression Evaluation" section near the bottom of that page.

Categories