Tensorflow flatten vs numpy flatten function effect on machine learning training - python

I am starting with deep learning stuff using keras and tensorflow. At very first stage i am stuck with a doubt. when I use tf.contrib.layers.flatten (Api 1.8) for flattening a image (could be multichannel as well).
How is this different than using flatten function from numpy?
How does this affect the training. I can see the tf.contrib.layers.flatten is taking longer time than numpy flatten. Is it doing something more?
This is a very close question but here the accepted answer includes Theano and does not solve my doubts exactly.
Example:
Lets say i have a training data of (10000,2,96,96) shape. Now I need the output to be in (10000,18432) shape. I can do this using tensorflow flatten or by using numpy flatten like
X_reshaped = X_train.reshape(*X_train.shape[:1], -2)
what difference does it make in training and which is the best practice?

The biggest difference between np.flatten and tf.layers.flatten (or tf.contrib.layers.flatten) is that numpy operations are applicable only to static nd arrays, while tensorflow operations can work with dynamic tensors. Dynamic in this case means that the exact shape will be known only at runtime (either training or testing).
So my recommendation is pretty simple:
If the input data is static numpy array, e.g. in pre-processing, use np.flatten. This avoids unnecessary overhead and returns numpy array as well.
If the data is already a tensor, use any of the flatten ops provided by tensorflow. Between those, tf.layers.flatten is better choice since tf.layers API is more stable than tf.contrib.*.

Use numpy directly on your data, without participation of a neural network. This is for preprocessing and postprocessing only
Use TF or Keras layers inside models if this operation is needed for some reason in the model. This will assure model connectivity and proper backpropagation
Models are symbolic graphs meant to create Neural Networks that can be trained. There will be a proper connection and backpropagation will work properly when you have a graph connected from input to output.
If you don't intend to create a network, don't use a TF layer. If your goal just to flatten an array, you don't need a neural network.
Now if inside a model you need to change the format of the data without losing connection and backpropagation, then go for the flatten layer.

The flatten function in numpy does a complete array flattening, meaning that you end up with a single axis of data (1 dimension only).
For example,
import numpy as np
a = np.arange(20).reshape((5,4))
print(a)
print(a.flatten().shape)
In the previous example, you end up with a 1-d array of 20 elements.
In tensorflow, the flatten layer (tf.layers.flatten) preserves the batch axis (axis 0). In the previous example, with tensorflow, you would still have a shape of (5,4).
In any case, there is no effect on training if you use flatten in an equivalent way. However, you should avoid using numpy when working with tensorflow, since almost all numpy operations have their tensorflow counterparts. Tensorflow and numpy rely on different runtime libraries and combining both could be runtime inefficient.
Moreover, avoid using contrib package layers, when they already exist in the main package (use tf.layers.flatten instead of tf.contrib.layers.flatten).
For a more general performance comparison between numpy and tensorflow, have a look at this question: Tensorflow vs. Numpy Performance

Difference
When you use tensorflow flatten, it gets added as an operation (op) in the graph. It can operate only on tensors. Numpy on the other hand works on actual numpy arrays. The usage is completely different.
Usage
You would use tensorflow op if this is an operation in the training process such as resizing before feeding to the next layer.
You would use numpy op when you want to operate on actual value at that time, like reshaping for calculating accuracy at the end of training step.
So if you had a task of
tensor A -> reshape -> matrix_mul
If you use tensorflow for reshape, you can directly run the matrix_mul from session.
If you use numpy however, you'd have to run the operation in two stages (two session calls).
You calculate tensor A
You reshape it in numpy.
Run the matrix_mul by "feeding" in reshaped array.
Performance
I haven't benchmarked anything but I'd say for just a reshape operation as standalone, numpy would be faster (ignoring gpu ) , but in a process where reshape is an intermediate op, tensorflow should be faster.

Related

Can I use pytorch .backward function without having created the input forward tensors first?

I have been trying to understand RNNs better and am creating an RNN from scratch myself using numpy. I am at the point where I have calculated a Loss but it was suggested to me that rather than do the gradient descent and weight matrix updates myself, I use pytorch .backward function. I started to read some of the documentation and posts here about how it works and it seems like it will calculate the gradients where a torch tensor has requires_grad=True in the function call.
So it seems that unless create a torch tensor, I am not able to use the .backward. When I try to do this on the loss scalar, I get a 'numpy.float64' object has no attribute 'backward' error. I just wanted to confirm. Thank you!
Yes, this will only work on PyTorch Tensors.
If the tensors are on CPU, they are basically numpy arrays wrapped into PyTorch Tensors API (i.e., running .numpy() on such a tensor returns exactly the data, it can modified etc.)

Print whole tensors in Keras using print_tensor

I am using print_tensor to see the values of tensors in a custom loss function. However, this only prints the first three values! Like this:
softmax = [[-0.245408952 -0.0407191925 -0.0813238621]...]
In Tensorflow you can control this using the summarize parameter in tf.Print (not applicable in my case), but print_tensor for Keras has no such parameters, so how can I change this behaviour?
So far I have tried:
Reading the documentation (and, briefly, the code) for print_tensor, but I could not find any parameters
Very naively setting numpy to write whole arrays
import numpy
numpy.set_printoptions(threshold=numpy.nan)

TensorFlow embedding_lookup_sparse optimization

I have some embedding_vectors and I need to use the following new_embeddings:
new_embeddings = tf.nn.embedding_lookup_sparse(
params=embedding_vectors,
sp_ids=some_ids,
sp_weights=None,
)
The problem is that some_ids is really big and remarkably sparsed but constant for the given data 2-D tensor. My pipeline includes the evaluation of its indices, values and shape which I use directly with the sparse_placeholder in training loop to feed up the some_ids placeholder.
Unfortunately it is very slow. It seems that in every training step the some_ids are converted to dense tensor which seems really unnecessary and strange. Am I right about this convertion and is there any alternative for embedding_lookup_sparse?
I find tf.sparse_tensor_dense_matmul() is mush faster than tf.nn.embedding_lookup_sparse().

How to Switch from Keras Tensortype to numpy array for a custom layer?

So I have a custom layer, that does not have any weights.
In a fist step, I tried to implement the functions manipulating the input tensors in Kers. But I did not succeed because of many reasons. My second approach was to implement the functions with numpy operations, since the custom layer I am implementing does not have any weights, from my understanding, I would say, that I could use numpy operarations, as I don't need backpropagation, since there are no weights, right? And then, I would just convert the output of my layer to a tensor with:
Keras.backend.variable(value = output)
So the main idea is to implement a custom layer, that takes tensors, convert them to numpy arrays, operate on them with numpy operations, then convert the output to a tensor.
The problem is that I seem not to be able to use .eval() in order to convert the input tensors of my layer into numpy arrays, so that they could be manipulated with numpy operations.
Can anybody tell, how I can get around this problem ?
As mentioned by Daniel Möller in the comments, Keras needs to be able to backpropagate through your layer, in order the calculate the gradients for the previous layers. Your layer needs to be differentiable for this reason.
For the same reason you can only use Keras operations, as those can be automatically differentiated with autograd. If your layer is something simple, have a look at the Lambda layer where you can implement custom layers quickly.
As an aside, Keras backend functions should cover a lot of use cases, so if you're stuck with writing your layer through those, you might want to post a another question here.
Hope this helps.

Data Augmentation using GPU in Theano

I am new in Theano and Deep Learning, I am running my experiments in Theano but I would like to reduce the time I spend per epoch by doing data augmentation directly using the GPU.
Unfortunately I can not use PyCuda, so I would like to know if is possible to do basic Data Augmentation using Theano. For example Translation or Rotation in images, meanwhile I am using scipy functions in CPU using Numpy but it is quite slow.
If the data augmentation is part of your computation graph, and can be executed on GPU, it will naturally be executed on the GPU. So the question narrows down to "is it possible to do common data augmentation tasks using Theano tensor operations on the GPU".
If the transformations you want to apply are just translations, you can just use theano.tensor.roll followed by some masking. If you want the rotations as well, take a look at this implementation of spatial transformer network. In particular take a look at the _transform function, it takes as an input a matrix theta that has a 2x3 transformation (left 2x2 is rotation, and right 1x2 is translation) one per sample and the actual samples, and applies the rotation and translation to those samples. I didn't confirm that what it does is optimized for the GPU (i.e. it could be that the bottleneck of that function is executed on the CPU, which will make it not appropriate for your use case), but it's a good starting point.

Categories