I am using print_tensor to see the values of tensors in a custom loss function. However, this only prints the first three values! Like this:
softmax = [[-0.245408952 -0.0407191925 -0.0813238621]...]
In Tensorflow you can control this using the summarize parameter in tf.Print (not applicable in my case), but print_tensor for Keras has no such parameters, so how can I change this behaviour?
So far I have tried:
Reading the documentation (and, briefly, the code) for print_tensor, but I could not find any parameters
Very naively setting numpy to write whole arrays
import numpy
numpy.set_printoptions(threshold=numpy.nan)
Related
Note: I already solved my issue, but I'm posting the question in case others have it too and because I don't understand how I solved it.
I was building a Named Entity Classifier (sequence labelling model) in Keras with Tensorflow backend. When I tried to fit the model, I got this error (which, amazingly, returns only 4 Google results):
"If your data is in the form of symbolic tensors, you should specify the `steps_per_epoch` argument (instead of the batch_size argument, because symbolic tensors are expected to produce batches of input data)."
This stackoverflow post discussed the issue, and someone suggested to the op:
one of your data tensors that is being used by Fit() is a symbolic tensor. The one hot label function returns a symbolic tensor. Try something like:
label_onehot = tf.Session().run(K.one_hot(label, 5))
Then I read on this (not related) site:
The Wolfram System also has powerful algorithms to manipulate algebraic combinations of expressions representing [...] arrays. These expressions are called symbolic arrays or symbolic tensors.
These two sources made me think symbolic arrays (at least in TensorFlow) might be something more like arrays of functions that are yet to be evaluated, rather than actual values.
So, using %whos to view all my variables, I saw that my X and Y data were tensors (rather than arrays, like I normally use for my models). The data/info column had quite a complicated description for them, but I lost it once I solved my issue and I can't work out how to get back to the state where I was getting the error.
In any case, I know I solved the problem by changing my data pre-processing so that the X and y data (i.e. X_train and y_train) were of type <class 'numpy.ndarray'> and of dimensions (num sents, max len) for X_train and (num_sents, max len, 1) for y_train (the 1 is necessary because my final layer expects 3D input). Now the model works fine. But I'm still wondering, what are these symbolic tensors and how/why is using steps per epoch instead of batch size supposed to help? I tried that too initially but had no luck.
This can be solved bu using the eval() or numpy() function of your tensors.
Check:
How can I convert a tensor into a numpy array in TensorFlow?
I am starting with deep learning stuff using keras and tensorflow. At very first stage i am stuck with a doubt. when I use tf.contrib.layers.flatten (Api 1.8) for flattening a image (could be multichannel as well).
How is this different than using flatten function from numpy?
How does this affect the training. I can see the tf.contrib.layers.flatten is taking longer time than numpy flatten. Is it doing something more?
This is a very close question but here the accepted answer includes Theano and does not solve my doubts exactly.
Example:
Lets say i have a training data of (10000,2,96,96) shape. Now I need the output to be in (10000,18432) shape. I can do this using tensorflow flatten or by using numpy flatten like
X_reshaped = X_train.reshape(*X_train.shape[:1], -2)
what difference does it make in training and which is the best practice?
The biggest difference between np.flatten and tf.layers.flatten (or tf.contrib.layers.flatten) is that numpy operations are applicable only to static nd arrays, while tensorflow operations can work with dynamic tensors. Dynamic in this case means that the exact shape will be known only at runtime (either training or testing).
So my recommendation is pretty simple:
If the input data is static numpy array, e.g. in pre-processing, use np.flatten. This avoids unnecessary overhead and returns numpy array as well.
If the data is already a tensor, use any of the flatten ops provided by tensorflow. Between those, tf.layers.flatten is better choice since tf.layers API is more stable than tf.contrib.*.
Use numpy directly on your data, without participation of a neural network. This is for preprocessing and postprocessing only
Use TF or Keras layers inside models if this operation is needed for some reason in the model. This will assure model connectivity and proper backpropagation
Models are symbolic graphs meant to create Neural Networks that can be trained. There will be a proper connection and backpropagation will work properly when you have a graph connected from input to output.
If you don't intend to create a network, don't use a TF layer. If your goal just to flatten an array, you don't need a neural network.
Now if inside a model you need to change the format of the data without losing connection and backpropagation, then go for the flatten layer.
The flatten function in numpy does a complete array flattening, meaning that you end up with a single axis of data (1 dimension only).
For example,
import numpy as np
a = np.arange(20).reshape((5,4))
print(a)
print(a.flatten().shape)
In the previous example, you end up with a 1-d array of 20 elements.
In tensorflow, the flatten layer (tf.layers.flatten) preserves the batch axis (axis 0). In the previous example, with tensorflow, you would still have a shape of (5,4).
In any case, there is no effect on training if you use flatten in an equivalent way. However, you should avoid using numpy when working with tensorflow, since almost all numpy operations have their tensorflow counterparts. Tensorflow and numpy rely on different runtime libraries and combining both could be runtime inefficient.
Moreover, avoid using contrib package layers, when they already exist in the main package (use tf.layers.flatten instead of tf.contrib.layers.flatten).
For a more general performance comparison between numpy and tensorflow, have a look at this question: Tensorflow vs. Numpy Performance
Difference
When you use tensorflow flatten, it gets added as an operation (op) in the graph. It can operate only on tensors. Numpy on the other hand works on actual numpy arrays. The usage is completely different.
Usage
You would use tensorflow op if this is an operation in the training process such as resizing before feeding to the next layer.
You would use numpy op when you want to operate on actual value at that time, like reshaping for calculating accuracy at the end of training step.
So if you had a task of
tensor A -> reshape -> matrix_mul
If you use tensorflow for reshape, you can directly run the matrix_mul from session.
If you use numpy however, you'd have to run the operation in two stages (two session calls).
You calculate tensor A
You reshape it in numpy.
Run the matrix_mul by "feeding" in reshaped array.
Performance
I haven't benchmarked anything but I'd say for just a reshape operation as standalone, numpy would be faster (ignoring gpu ) , but in a process where reshape is an intermediate op, tensorflow should be faster.
So I have a custom layer, that does not have any weights.
In a fist step, I tried to implement the functions manipulating the input tensors in Kers. But I did not succeed because of many reasons. My second approach was to implement the functions with numpy operations, since the custom layer I am implementing does not have any weights, from my understanding, I would say, that I could use numpy operarations, as I don't need backpropagation, since there are no weights, right? And then, I would just convert the output of my layer to a tensor with:
Keras.backend.variable(value = output)
So the main idea is to implement a custom layer, that takes tensors, convert them to numpy arrays, operate on them with numpy operations, then convert the output to a tensor.
The problem is that I seem not to be able to use .eval() in order to convert the input tensors of my layer into numpy arrays, so that they could be manipulated with numpy operations.
Can anybody tell, how I can get around this problem ?
As mentioned by Daniel Möller in the comments, Keras needs to be able to backpropagate through your layer, in order the calculate the gradients for the previous layers. Your layer needs to be differentiable for this reason.
For the same reason you can only use Keras operations, as those can be automatically differentiated with autograd. If your layer is something simple, have a look at the Lambda layer where you can implement custom layers quickly.
As an aside, Keras backend functions should cover a lot of use cases, so if you're stuck with writing your layer through those, you might want to post a another question here.
Hope this helps.
When I read TensorFlow codes, I see people specify placeholders for the input arguments of the functions and then feed the input data in a session.run. A trivial example can be like:
def sigmoid(z):
x = tf.placeholder(tf.float32, name='x')
sigmoid = tf.sigmoid(x)
with tf.Session() as session:
result = session.run(sigmoid, feed_dict={x:z})
return result
I wonder why don't they directly feed the z into the tf.sigmoid(z) and get rid of the placeholder x?
If this is a best practice, what is the reason behind it?
In your example method sigmoid, you basically built a small computation graph (see below) and run it with session.run (in the same method). Yes, it does not add any benefit to use a place-holder in your case.
However, usually people just built the computation graph (and execute the graph with data later). But at the time of building the graph, the data is not needed. That's why we use a place-holder to hold the place of data. Or in other words, it allows us to create our computing operations without needing any data.
Also this should explain why we want to use tf.placehoder instead of tf.Variable for holding training data. In short:
tf.Variable is for trainable parameters of the model.
tf.placeholder is for training data which does not change as model trains.
No initial values are needed for placeholders.
The first dimension of data through feeding could be None thus supporting any batch_size.
Suppose I have two placeholder quantities in tensorflow: placeholder_1 and placeholder_2. Essentially I would like the following computational functionality: "if placeholder_1 is defined (ie is given a value in the feed_dict of sess.run()), compute X as f(placeholder_1), otherwise, compute X as g(placeholder_2)." Think of X as being a hidden layer in a neural network that can optionally be computed in these two different ways. Eventually I would use X to produce an output, and I'd like to backpropagate error to the parameters of f or g depending on which placeholder I used.
One could accomplish this using the tf.where(condition, x, y) function if there was a way to make the condition "placeholder_1 has a value", but after looking through the tensorflow documentation on booleans and asserts I couldn't find anything that looked applicable.
Any ideas? I have a vague idea of how I could accomplish this basically by copying part of the network, sharing parameters and syncing the networks after updates, but I'm hoping for a cleaner way to do it.
You can create a third placeholder variable of type boolean to select which branch to use and feed that in at run time.
The logic behind it is that since you are feeding in the placholders at runtime anyways you can determine outside of tensorflow which placeholders will be fed.