My dataset consists of sentences. Each sentence has a variable length and is initially encoded as a sequence of vocabulary indexes, ie. a tensor of shape [sentence_len]. The batch size is also variable.
I have grouped sentences of similar lengths into buckets and padded where necessary, to bring each sentence in a bucket to the same length.
How could I deal with having both an unknown sentence length AND batch size?
My data provider would tell me what the sentence length is at every batch, but I don't know how to feed that -> the graph is already built at that point. The input is represented with a placeholder x = tf.placeholder(tf.int32, shape=[batch_size, sentence_length], name='x'). I can turn batch_size or sentence_length to None, but not both.
UPDATE: in fact, interestingly, I can set both to None, but I get Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.Note: the next layer is an embedding_lookup.
I'm not sure what this means and how to avoid it. I assume it has something to do with using tf.gather later, which I need to use.
Alternatively is there any other way to achieve what I need?
Thank you.
Unfortunately there is no workaround here unless you provide a tf.Variable() (which is not possible in your case) to the parameter of tf.nn.embedding_lookup()/tf.gather().
This is happening because, When you declare them with a placeholder of shape [None, None], from tf.gather() function tf.IndexedSlices() become a sparse tensor.
I have already done projects facing this warning. What I can tell you that if there is a tf.nn.dynamic_rnn() next to the embedding_lookup then make the parameter named swap_memory of tf.nn.dynamic_rnn() to True. Also to avoid OOM or Resource Exhausted error make the batch size smaller (test for different batch size).
There are already some good explanation on this. Please refer to the following Question of the Stackoverflow.
Tensorflow dense gradient explanation?
Related
Keras layers can be reused i.e. if I have l = keras.layers.Dense(5) I can apply it multiple times to different tensors like t1 = l(t1); t2 = l(t2).
Is there anything similar in tensorflow without using keras?
Why do I need it. I have non-eager mode and want to create static .pb graph-file. Suppose I have a function f(t) that is huge and long, and it does tensor t transformations. Inside a graph it creates a huge sub-graph of different operations with flow of tensors over paths. Now I want to reuse it, meaning that I don't want to call it for every input t because it will form new sub-graph each time, just duplicates with different inputs. I want somehow to reuse same subgraph and directing different tensors as inputs to this subgraph. Also it is good to reuse it not to call huge function to form same structure for every possible input tensor, because it is slow.
Another important reason for re-using same operation is because same weights and heavy parameters can be used for many calls of operation on many inputs. It is sometimes important and needed that weights are same for all inputs to have correctly trained neural network.
The real reason for reusing is not only to save sapce occupied by graph, but also due to the fact that number of possible inputs to f(t) may vary depending on input. Suppose we have keras.layers.Input(...) placeholder as input. It always has batch 0-th dimension equal to None (unknown) at graph construction time, the real value for 0-th dimension is only known when real data is fed through sess.run(...). Now when data is fed I want to make as many transformations (calls to f(t)) as the size of batch dimension, in other words I want to call f(t) for every sub-tensor in the batch. E.g. for batch of images I want to call f(t) for every single image in the batch. Hence there will be different number of calls of f(t) for different batch sizes. How do I achieve this? Could it be achieved through tf.while_loop, if yes than how do I use while loop in my case?
I am trying to build a Speech Recognition System, which is a squence-to-sequence model. But I got confused about how to feed the extracted feature(fbank with the dimension of 40) to LSTM. As far as I have found, there are different methods to feed the data as input into LSTM. However, I have a doubt to fully understand them. I would be so thankful if someone tells me whether or not I am correct in the following cases.
Case 1:
In the convenient format [Batch_Size, Time_Step, Feature_Dim], If I select [1, None, 40], the length of each sequence(utterance) can be varied? if so, in this case I do not need to pad each sequence, am I right?
Case 2:
If all input sequences are padded to the same length, the Batch_Size can be any value like 64, 128 and etc?
Finally, one more question, do I notice that the Time_Step in each Batch should be the same?
I would be so thankful if someone can help me to get rid of my doubts or give me some suggestions.
it depends on how your system built is it end-to-end training or did u use hand-engineering features such MFCC? one more note the main use of RNN's is to have a variable-length input.
Note: I already solved my issue, but I'm posting the question in case others have it too and because I don't understand how I solved it.
I was building a Named Entity Classifier (sequence labelling model) in Keras with Tensorflow backend. When I tried to fit the model, I got this error (which, amazingly, returns only 4 Google results):
"If your data is in the form of symbolic tensors, you should specify the `steps_per_epoch` argument (instead of the batch_size argument, because symbolic tensors are expected to produce batches of input data)."
This stackoverflow post discussed the issue, and someone suggested to the op:
one of your data tensors that is being used by Fit() is a symbolic tensor. The one hot label function returns a symbolic tensor. Try something like:
label_onehot = tf.Session().run(K.one_hot(label, 5))
Then I read on this (not related) site:
The Wolfram System also has powerful algorithms to manipulate algebraic combinations of expressions representing [...] arrays. These expressions are called symbolic arrays or symbolic tensors.
These two sources made me think symbolic arrays (at least in TensorFlow) might be something more like arrays of functions that are yet to be evaluated, rather than actual values.
So, using %whos to view all my variables, I saw that my X and Y data were tensors (rather than arrays, like I normally use for my models). The data/info column had quite a complicated description for them, but I lost it once I solved my issue and I can't work out how to get back to the state where I was getting the error.
In any case, I know I solved the problem by changing my data pre-processing so that the X and y data (i.e. X_train and y_train) were of type <class 'numpy.ndarray'> and of dimensions (num sents, max len) for X_train and (num_sents, max len, 1) for y_train (the 1 is necessary because my final layer expects 3D input). Now the model works fine. But I'm still wondering, what are these symbolic tensors and how/why is using steps per epoch instead of batch size supposed to help? I tried that too initially but had no luck.
This can be solved bu using the eval() or numpy() function of your tensors.
Check:
How can I convert a tensor into a numpy array in TensorFlow?
I read in a related question that keras custom loss function have to return one scalar per batch item.
I wrote a loss function that output a scalar for the whole batch and the network seems to converge. However I am not able to find any documentation on this or what exactly happens in the code. Is there a broadcasting done somewhere ? What happens if I add sample weights? Does someone has a pointer to where the magic happens ?
Thanks!
In general you can often use a scalar in place of a vector and this will be interpreted as a vector that is filled with this value ( e.g 1 is interpreted as 1,1,1,1 ).
So if the result of your loss function for a batch is x, it is interpreted as if you were saying that loss for each item in the batch is x.
I recently found a proof of concept implementation, which prepares features in a one-hot encoding using numpy.zeros:
data = np.zeros((len(raw_data), n_input, vocab_size),dtype=np.uint8)
As could be seen above, the single ones are typed as np.uint8.
After inspecting the model, I realized that the input placeholder of the tensorflow model is defined as tf.float32:
x = tf.placeholder(tf.float32, [None, n_input, vocab_size], name="onehotin")
My particular question:
How does tensorflow deal with this "mismatch" of input types. Are those values (0/1) correctly interpreted or casted by tensorflow. If so, is this somewhere mentioned in the docs. After googling I could not found a answer. It should be mentioned that the model runs and values seems plausible. However, typing the input numpy features as np.float32 would cause a significant amount of memory needed.
Relevance:
A running but falsely trained model would behave differently after adopting the input pipeline / rolling out a model into production.
Tensorflow supports dtype conversion like that.
In operations, such as x + 1, the value 1 is going through tf.convert_to_tensor function that takes care of validation and conversion. The function is sometimes called manually under the hood, and when the dtype argument is set, the value automatically gets converted to this type.
When you feed the array into a placeholder like that:
session.run(..., feed_dict={x: data})
... the data is explicitly converted to a numpy array of the right type via np.asarray call. See the source code at python/client/session.py. Note that this method may reallocate the buffer when the dtype is different and that's exactly what's happening in your case. So your memory optimization doesn't quite work as you expect: the temporary 32-bit data is allocated internally.