I recently found a proof of concept implementation, which prepares features in a one-hot encoding using numpy.zeros:
data = np.zeros((len(raw_data), n_input, vocab_size),dtype=np.uint8)
As could be seen above, the single ones are typed as np.uint8.
After inspecting the model, I realized that the input placeholder of the tensorflow model is defined as tf.float32:
x = tf.placeholder(tf.float32, [None, n_input, vocab_size], name="onehotin")
My particular question:
How does tensorflow deal with this "mismatch" of input types. Are those values (0/1) correctly interpreted or casted by tensorflow. If so, is this somewhere mentioned in the docs. After googling I could not found a answer. It should be mentioned that the model runs and values seems plausible. However, typing the input numpy features as np.float32 would cause a significant amount of memory needed.
Relevance:
A running but falsely trained model would behave differently after adopting the input pipeline / rolling out a model into production.
Tensorflow supports dtype conversion like that.
In operations, such as x + 1, the value 1 is going through tf.convert_to_tensor function that takes care of validation and conversion. The function is sometimes called manually under the hood, and when the dtype argument is set, the value automatically gets converted to this type.
When you feed the array into a placeholder like that:
session.run(..., feed_dict={x: data})
... the data is explicitly converted to a numpy array of the right type via np.asarray call. See the source code at python/client/session.py. Note that this method may reallocate the buffer when the dtype is different and that's exactly what's happening in your case. So your memory optimization doesn't quite work as you expect: the temporary 32-bit data is allocated internally.
Related
Note: I already solved my issue, but I'm posting the question in case others have it too and because I don't understand how I solved it.
I was building a Named Entity Classifier (sequence labelling model) in Keras with Tensorflow backend. When I tried to fit the model, I got this error (which, amazingly, returns only 4 Google results):
"If your data is in the form of symbolic tensors, you should specify the `steps_per_epoch` argument (instead of the batch_size argument, because symbolic tensors are expected to produce batches of input data)."
This stackoverflow post discussed the issue, and someone suggested to the op:
one of your data tensors that is being used by Fit() is a symbolic tensor. The one hot label function returns a symbolic tensor. Try something like:
label_onehot = tf.Session().run(K.one_hot(label, 5))
Then I read on this (not related) site:
The Wolfram System also has powerful algorithms to manipulate algebraic combinations of expressions representing [...] arrays. These expressions are called symbolic arrays or symbolic tensors.
These two sources made me think symbolic arrays (at least in TensorFlow) might be something more like arrays of functions that are yet to be evaluated, rather than actual values.
So, using %whos to view all my variables, I saw that my X and Y data were tensors (rather than arrays, like I normally use for my models). The data/info column had quite a complicated description for them, but I lost it once I solved my issue and I can't work out how to get back to the state where I was getting the error.
In any case, I know I solved the problem by changing my data pre-processing so that the X and y data (i.e. X_train and y_train) were of type <class 'numpy.ndarray'> and of dimensions (num sents, max len) for X_train and (num_sents, max len, 1) for y_train (the 1 is necessary because my final layer expects 3D input). Now the model works fine. But I'm still wondering, what are these symbolic tensors and how/why is using steps per epoch instead of batch size supposed to help? I tried that too initially but had no luck.
This can be solved bu using the eval() or numpy() function of your tensors.
Check:
How can I convert a tensor into a numpy array in TensorFlow?
I have some embedding_vectors and I need to use the following new_embeddings:
new_embeddings = tf.nn.embedding_lookup_sparse(
params=embedding_vectors,
sp_ids=some_ids,
sp_weights=None,
)
The problem is that some_ids is really big and remarkably sparsed but constant for the given data 2-D tensor. My pipeline includes the evaluation of its indices, values and shape which I use directly with the sparse_placeholder in training loop to feed up the some_ids placeholder.
Unfortunately it is very slow. It seems that in every training step the some_ids are converted to dense tensor which seems really unnecessary and strange. Am I right about this convertion and is there any alternative for embedding_lookup_sparse?
I find tf.sparse_tensor_dense_matmul() is mush faster than tf.nn.embedding_lookup_sparse().
My dataset consists of sentences. Each sentence has a variable length and is initially encoded as a sequence of vocabulary indexes, ie. a tensor of shape [sentence_len]. The batch size is also variable.
I have grouped sentences of similar lengths into buckets and padded where necessary, to bring each sentence in a bucket to the same length.
How could I deal with having both an unknown sentence length AND batch size?
My data provider would tell me what the sentence length is at every batch, but I don't know how to feed that -> the graph is already built at that point. The input is represented with a placeholder x = tf.placeholder(tf.int32, shape=[batch_size, sentence_length], name='x'). I can turn batch_size or sentence_length to None, but not both.
UPDATE: in fact, interestingly, I can set both to None, but I get Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.Note: the next layer is an embedding_lookup.
I'm not sure what this means and how to avoid it. I assume it has something to do with using tf.gather later, which I need to use.
Alternatively is there any other way to achieve what I need?
Thank you.
Unfortunately there is no workaround here unless you provide a tf.Variable() (which is not possible in your case) to the parameter of tf.nn.embedding_lookup()/tf.gather().
This is happening because, When you declare them with a placeholder of shape [None, None], from tf.gather() function tf.IndexedSlices() become a sparse tensor.
I have already done projects facing this warning. What I can tell you that if there is a tf.nn.dynamic_rnn() next to the embedding_lookup then make the parameter named swap_memory of tf.nn.dynamic_rnn() to True. Also to avoid OOM or Resource Exhausted error make the batch size smaller (test for different batch size).
There are already some good explanation on this. Please refer to the following Question of the Stackoverflow.
Tensorflow dense gradient explanation?
When I read TensorFlow codes, I see people specify placeholders for the input arguments of the functions and then feed the input data in a session.run. A trivial example can be like:
def sigmoid(z):
x = tf.placeholder(tf.float32, name='x')
sigmoid = tf.sigmoid(x)
with tf.Session() as session:
result = session.run(sigmoid, feed_dict={x:z})
return result
I wonder why don't they directly feed the z into the tf.sigmoid(z) and get rid of the placeholder x?
If this is a best practice, what is the reason behind it?
In your example method sigmoid, you basically built a small computation graph (see below) and run it with session.run (in the same method). Yes, it does not add any benefit to use a place-holder in your case.
However, usually people just built the computation graph (and execute the graph with data later). But at the time of building the graph, the data is not needed. That's why we use a place-holder to hold the place of data. Or in other words, it allows us to create our computing operations without needing any data.
Also this should explain why we want to use tf.placehoder instead of tf.Variable for holding training data. In short:
tf.Variable is for trainable parameters of the model.
tf.placeholder is for training data which does not change as model trains.
No initial values are needed for placeholders.
The first dimension of data through feeding could be None thus supporting any batch_size.
I am working on neural networks and many times faced problems with shapes.....Tensorflow provides us a keyword None so that we don't have to worry about the size of the tensor.....
Is there any disadvantage of using None in place of known numeric value for shape.
method 1
input_placeholder = tf.placeholder(tf.float32,[None,None])
method 2
input_placeholder = tf.placeholder(tf.float32,[64,100])
Will it make any difference while running the code ?
Tensorflow's tf.placeholder() tensors do not require a fixed shape to be passed to them. This allows you to pass different shapes in later tf.Session.run() call.
So your code will work just fine.
It doesn't have any disadvantage because when we create a placeholder, Tensorflow doesn't allocate any memory. When you feed the placeholder, in the call to tf.Session.run(), TensorFlow will allocate appropriately sized memory for the input tensors.
If you use these input_placeholder in some operation in your code further, defining them with None i.e. an unconstrained shape, can cause Tensorflow to perform some checks, related to the shape of the tensors, while performing those ops dynamically during the Session.run() call. This is because while building the graph Tensorflow doesn't know about the exact shape of your input.