I have a sequence prediction problem in which, given the last n items in a sequence I need to predict next item. There are N=60k sequences for now and for each sequence n, I have 6 events and I want to predict next(7th in this case) event. The dataset looks like this
seq_inputs = [
["AA1", "BB3", "CC4","DD5","AA2", "CC8", "CC11"], #CC11 is target
["FF1", "DD3", "FF6","KK8","AA5", "CC8", "AA2"] #AA2 will be target
..
..
] # there are 60k of them i.e len(seq) = 60000
What I have done till now:
Uptill now, I casted it as next word prediction problem and used embedding+LSTm.
First I tokenise them and convert them in numeric form using keras tokenizer.text_to_sequence()
From this numeric converted sequence, I take the last one as target and first six as input (JUST DUMMY EXAMPLE BELOW)
seq_inputs = [
[1, 10, 200, 5, 3, 90 ],
[95, 15, 4,11,78, 43]
..
..
]
targets = [40,3, ... , ... ]
And I convert targets to Categories as
targets = to_categorical(targets, num_classes=vocabulary_size)
SO, I feed it into embedding and LSTM as
model = Sequential()
model.add(Embedding(vocabulary_size, 32, input_length=seq_len)) #seq_length
model.add(LSTM(80,return_sequences=True))
..
..
..
model.fit(train_inputs,train_targets,epochs=50,verbose=1,batch_size=32)
Currently, I have very bad test performance and I feel I am not utilising LSTM well for sequential task, so I want to convert this problem to Many-to-One Sequence Problems with a Single Feature where I will have 6 time steps with single feature for each sequence. for example
Inputs (6 time steps and one feature for every sequence n in N(60k):
seq = [[[ 1],
[10],
[200],
[5],
[3],
[90],
],
[[ 95],
[15],
[4],
[11],
[78],
[43],
],
...
...
... #60,000 of them
]
target:
targets = [40,3, ... , ... ]
Question: How can I modify this network, specially embedding layer to take this shape of input where EACH seq/ row is 6 time-step and one feature only. Also, is my understanding correct about 6 time-steps and one future, if yes, What else I need to modify in the network.
Related
Please see python code below, I put comments in the code where I felt emphasis on information is required.
import keras
import numpy
def build_model():
model = keras.models.Sequential()
model.add(keras.layers.LSTM(3, input_shape = (3, 1), activation = 'elu'))# Number of LSTM cells in this layer = 3.
return model
def build_data():
inputs = [1, 2, 3, 4, 5, 6, 7, 8, 9]
outputs = [10, 11, 12, 13, 14, 15, 16, 17, 18]
inputs = numpy.array(inputs)
outputs = numpy.array(outputs)
inputs = inputs.reshape(3, 3, 1)# Number of samples = 3, Number of input vectors in each sample = 3, size of each input vector = 3.
outputs = outputs.reshape(3, 3)# Number of target samples = 3, Number of outputs per target sample = 3.
return inputs, outputs
def train():
model = build_model()
model.summary()
model.compile(optimizer= 'adam', loss='mean_absolute_error', metrics=['accuracy'])
x, y = build_data()
model.fit(x, y, batch_size = 1, epochs = 4000)
model.save("LSTM_testModel")
def apply():
model = keras.models.load_model("LSTM_testModel")
input = [[[7], [8], [9]]]
input = numpy.array(input)
print(model.predict(input))
def main():
train()
main()
My understanding is that for each input sample there are 3 input vectors. Each input vector goes to an LSTM cell. i.e. For sample 1, input vector 1 goes to LSTM cell 1, input vector 2 goes to LSTM cell 2 and so on.
Looking at tutorials on the internet, I've seen that the number of LSTM cells is much greater than the number of input vectors e.g. 300 LSTM cells.
So say for example I have 3 input vectors per sample what input goes to the 297 remaining LSTM cells?
I tried compiling the model to have 2 LSTM cells and it still accepted the 3 input vectors per sample, although I had to change the target outputs in the training data to accommodate for this(change the dimensions) . So what happened to the third input vector of each sample...is it ignored?
I believe the above image shows that each input vector (of an arbitrary scenario) is mapped to a specific RNN cell. I may be misinterpreting it. Above image taken from the following URL: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
I will try to answer some of your questions and then will consolidate the information provided in the comments for completeness, for the benefit of you as well as for the Community.
As mentioned by Matias in the comments, irrespective of whether the Number of Inputs are more than or less than the Number of Units/Neurons, it will be connected like a Fully Connected Network, as shown below.
To understand how RNN/LSTM work internally, let's assume we have
Number of Input Features => 3 => F1, F2 and F3
Number of Timesteps => 2 => 0 and 1
Number of Hidden Layers => 1
Number of Neurons in each Hidden Layer => 5
Then what actually happens inside can be represented in the screenshots shown below:
You have also asked about words being assigned to LSTM Cell. Not sure which link you are referring to and whether it is correct or not but in simple terms (words in this screenshot actually will be replaced by Embedding Vectors), you can understand how LSTM handles the Text as shown in the screenshot below:
For more information, please refer Beautiful Explanation by OverLordGoldDragon and Daniel Moller.
Hope this helps. Happy Learning!
I'm new to Keras. I am trying to implement this model https://www.aclweb.org/anthology/D15-1167 for document classification, and I want to use LSTM for getting sentence representation. I have trained vector representation separately with the skip-gram model on my dataset. now after converting each document to separate sentence and then converting each sentence to separate word and then converting each word to the corresponding integer in the dictionary, I have something for example like this for each document:
[[54,32,13],[21,43,2]...[28,1,9]]
which I should feed each sentence to an LSTM to get a sentence vector and after that I should feed each sentence vector to a diffrent LSTM on the higher layer in order to get a document representation and then apply classification to it. my problem is in the first layer. how should I feed each sentence simultaneously to each LSTM (therefore at each time step each LSTM should be applied to a word vector from each sentence)?
edit: I just used TimeDistributed and it seems like to work although I am not sure if it does what I want. I used time distributed wrapper over embeding layer and then over the first Lstm layer. this is the model that I have implemented (very simple one):
model.add(tf.keras.layers.TimeDistributed(embeding_layer))
model.add(tf.keras.layers.TimeDistributed
(layers.LSTM(50,activation=’relu’)))
model.add(layers.LSTM(50,activation=’relu’))
model.add(layers.Dense(1,activation=’sigmoid’))
Is my interpretation of the network correct?
my interpretation :
my input to the embedding layer is (document, sentences, words). I padded the document to have 30 sentences and I also padded the sentences to have at 200 words. I have 20000 documents so my input shape is (20000,30,200). after feeding it to the network it first go through emeding layer which is 300 length for each word vector. so after applying embeding layer to first docuemnt with shape (1.30,200), then I get (1,30,200,300) which would be the input for the timedistributed LSTM. then time distribut, will make 30 copy of LSTM layer with shared wights where each LSTM will output a sentece vector, and then the next LSTM will be applied to this 30 sentence vectors. am I right ?
The below example might be what you are looking for, or at least point you in the right direction. It's a bit experimental on my part, but I believe it has the right structure. It was created in Google Colab with Tensorflow 2.0. The first section is provided to make the processing reproducible, but the rest illustrates the general idea of using "TimeDistributed Layer" along with masking and padding. BTW - I believe this is a similar idea to what #El Sheikh (first comment above) was providing. Note: I used a SimpleRNN here, but I believe the idea applies to LSTMs as well. I hope this helps get you moving in the right direction.
%tensorflow_version 2.x
import numpy as np
import tensorflow as tf
import random as rn
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
# Force TensorFlow to use single thread.
# Multiple threads are a potential source of non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/
session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.compat.v1.set_random_seed(1234)
sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
tf.compat.v1.keras.backend.set_session(sess)
# The code above here is provided to make the below reproducible each time you
# run.
#
# Main code follows:
from tensorflow import keras
from tensorflow.keras import layers
# Input structure
# Sentence1 ..... SentenceM
# Word11 Word21 Word31 ..... Wordn11 Word11 .... WordnM1
# Word12 Word22 Word32 Wordn12 Word12 WordnM2
# Word13 Word23 Word33 Wordn13 Word13 WordnM3
# example parameters
word_vec_dimension = 3 # dimension of the embedding
sentence_representation = 4 # dimensionality of sentence vector
#
# This represents a single test document.
# Each row is a sentence and the words are represented by 3 dimensionsal
# integer vectors.
#
raw_inputs = [ [ [1, 5, 7], [2, 6, 7] ],
[ [9, 6, 3], [1, 8, 2], [4, 5, 9], [8, 2, 1] ],
[ [1, 6, 2], [4, 2, 9] ],
[ [2, 6, 2], [8, 2, 9] ],
[ [3, 6, 2], [2, 2, 9], [1, 6, 2] ],
]
print(raw_inputs)
# Create the model
#
# Allow for variable number of words per sentence and variable number of
# sentences:
# Input shape(num_samples, [SentenceCount], [WordCount], word_vector_dim)
#
# Note: Using None for Sentence Count, and None for Word count to allow
# for variable sequences length in both these dimensions.
#
inputs = keras.Input(shape=(None, None, word_vec_dimension), name='inputlayer')
x = tf.keras.layers.Masking(mask_value=0.0)(inputs) # Force RNNs to ignore timesteps with zero vectors.
x = tf.keras.layers.TimeDistributed(layers.SimpleRNN(sentence_representation,
use_bias=False,
activation=None),
name='TD1')(x)
outputs = x
# more layers here if needed:
model = tf.keras.Model(inputs=inputs, outputs=outputs, name='Sentiment')
model.compile(optimizer='rmsprop', loss='mse', accuracy='mse' )
model.summary()
# Set up fitting calls
import numpy as np
# document 1
x_train = raw_inputs # use the dummy document for testing
# Set zeros in locations where there is no data to indicate mask to RNN's so
# they ignore that timestep.
padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(x_train,
padding='post')
print(x_train)
# Insert a dummy dimension 1 to represent the sample dimension.
padded_inputs = np.expand_dims(padded_inputs,axis=0)/1.0 # Make float type
print(padded_inputs)
print(padded_inputs.shape)
y_train = np.array([[ 1.0, 2.0, 3.0, 4.0 ]])
print(y_train.shape)
# Train model:
model.fit(padded_inputs,y_train,epochs=1)
print('get_weights:')
print(model.get_layer(name='TD1').get_weights())
print('get_predictions:')
print(model.predict(padded_inputs))
Context:
I am currently working on time series prediction using Keras with Tensorflow backend and, therefore, studied the tutorial provided here.
Following this tutorial, I came to the point where the generator for the fit_generator() method is described.
The output this generator generates is as follows (left sample, right target):
[[[10. 15.]
[20. 25.]]] => [[30. 35.]] -> Batch no. 1: 2 Samples | 1 Target
---------------------------------------------
[[[20. 25.]
[30. 35.]]] => [[40. 45.]] -> Batch no. 2: 2 Samples | 1 Target
---------------------------------------------
[[[30. 35.]
[40. 45.]]] => [[50. 55.]] -> Batch no. 3: 2 Samples | 1 Target
---------------------------------------------
[[[40. 45.]
[50. 55.]]] => [[60. 65.]] -> Batch no. 4: 2 Samples | 1 Target
---------------------------------------------
[[[50. 55.]
[60. 65.]]] => [[70. 75.]] -> Batch no. 5: 2 Samples | 1 Target
---------------------------------------------
[[[60. 65.]
[70. 75.]]] => [[80. 85.]] -> Batch no. 6: 2 Samples | 1 Target
---------------------------------------------
[[[70. 75.]
[80. 85.]]] => [[90. 95.]] -> Batch no. 7: 2 Samples | 1 Target
---------------------------------------------
[[[80. 85.]
[90. 95.]]] => [[100. 105.]] -> Batch no. 8: 2 Samples | 1 Target
In the tutorial the TimeSeriesGenerator was used, but for my question it is secondary if a custom generator or this class is used.
Regarding the data, we have 8 steps_per_epoch and a sample of shape (8, 1, 2, 2).
The generator is fed to a Recurrent Neural Network, implemented by an LSTM.
My questions
fit_generator() only allows a single target per batch, as outputted by the TimeSeriesGenerator.
When I first read about the option of batches for fit(), I thought that I could have multiple samples and a corresponding number of targets (which are processed batchwise, meaning row by row). But this is not allowed by fit_generator() and, therefore, obviously false.
This would look for example like:
[[[10. 15. 20. 25.]]] => [[30. 35.]]
[[[20. 25. 30. 35.]]] => [[40. 45.]]
|-> Batch no. 1: 2 Samples | 2 Targets
---------------------------------------------
[[[30. 35. 40. 45.]]] => [[50. 55.]]
[[[40. 45. 50. 55.]]] => [[60. 65.]]
|-> Batch no. 2: 2 Samples | 2 Targets
---------------------------------------------
...
Secondly, I thought that, for example, [10, 15] and [20, 25] were used as input for the RNN consecutively for the target [30, 35], meaning that this is analog to inputting [10, 15, 20, 25]. Since the output from the RNN differs using the second approach (I tested it), this also has to be a wrong conclusion.
Hence, my questions are:
Why is only one target per batch allowed (I know there are some
workarounds, but there has to be a reason)?
How may I understand the
calculation of one batch? Meaning, how is some input like [[[40,
45], [50, 55]]] => [[60, 65]] processed and why is it not analog to
[[[40, 45, 50, 55]]] => [[60, 65]]
Edit according to todays answer
Since there is some misunderstanding about my definition of samples and targets - I follow what I understand Keras is trying to tell me when saying:
ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 2 target samples.
This error occurs, when I create for example a batch which looks like:
#This is just a single batch - Multiple batches would be fed to fit_generator()
(array([[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]]),
array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]]))
This is supposed to be a single batch containing two time-sequences of length 5 (5 consecutive data points / time-steps), whose targets are also two corresponding sequences. [ 5, 6, 7, 8, 9] is the target of [0, 1, 2, 3, 4] and [10, 11, 12, 13, 14] is the corresponding target of [5, 6, 7, 8, 9].
The sample-shape in this would be shape(number_of_batches, number_of_elements_per_batch, sequence_size) and the target-shape shape(number_of_elements_per_batch, sequence_size).
Keras sees 2 target samples (in the ValueError), because I have two provide 3D-samples as input and 2D-targets as output (maybe I just don't get how to provide 3D-targets..).
Anyhow, according to #todays answer/comments, this is interpreted as two timesteps and five features by Keras. Regarding my first question (where I still see a sequence as target to my sequence, as in this edit-example), I seek information how/if I can achieve this and how such a batch would look like (like I tried to visualize in the question).
Short answers:
Why is only one target per batch allowed (I know there are some workarounds, but there has to be a reason)?
That's not the case at all. There is no restriction on the number of target samples in a batch. The only requirement is that you should have the same number of input and target samples in each batch. Read the long answer for further clarification.
How may I understand the calculation of one batch? Meaning, how is some input like [[[40,
45], [50, 55]]] => [[60, 65]] processed and why is it not analog to [[[40, 45, 50, 55]]] => [[60, 65]]?
The first one is a multi-variate timeseries (i.e. each timestep has more than one features), and the second one is a uni-variate timeseris (i.e. each timestep has one feature). So they are not equivalent. Read the long answer for further clarification.
Long answer:
I'll give the answer I mentioned in comments section and try to elaborate on it using examples:
I think you are mixing samples, timesteps, features and targets. Let me describe how I understand it: in the first example you provided, it seems that each input sample consists of 2 timesteps, e.g. [10, 15] and [20, 25], where each timestep consists of two features, e.g. 10 and 15 or 20 and 25. Further, the corresponding target consists of one timestep, e.g. [30, 35], which also has two features. In other words, each input sample in a batch must have a corresponding target. However, the shape of each input sample and its corresponding target may not be necessarily the same.
For example, consider a model where both its input and output are timeseries. If we denote the shape of each input sample as (input_num_timesteps, input_num_features) and the shape of each target (i.e. output) array as (output_num_timesteps, output_num_features), we would have the following cases:
1) The number of input and output timesteps are the same (i.e. input_num_timesteps == output_num_timesteps). Just as an example, the following model could achieve this:
from keras import layers
from keras import models
inp = layers.Input(shape=(input_num_timesteps, input_num_features))
# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(inp)
# ...
x = layers.LSTM(..., return_sequences=True)(x)
# a final RNN layer that has `output_num_features` unit
out = layers.LSTM(output_num_features, return_sequneces=True)(x)
model = models.Model(inp, out)
2) The number of input and output timesteps are different (i.e. input_num_timesteps ~= output_num_timesteps). This is usually achieved by first encoding the input timeseries into a vector using a stack of one or more LSTM layers, and then repeating that vector output_num_timesteps times to get a timeseries of desired length. For the repeat operation, we can easily use RepeatVector layer in Keras. Again, just as an example, the following model could achieve this:
from keras import layers
from keras import models
inp = layers.Input(shape=(input_num_timesteps, input_num_features))
# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(inp)
# ...
x = layers.LSTM(...)(x) # The last layer ONLY returns the last output of RNN (i.e. return_sequences=False)
# repeat `x` as needed (i.e. as the number of timesteps in output timseries)
x = layers.RepeatVector(output_num_timesteps)(x)
# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(x)
# ...
out = layers.LSTM(output_num_features, return_sequneces=True)(x)
model = models.Model(inp, out)
As a special case, if the number of output timesteps is 1 (e.g. the network is trying to predict the next timestep given the last t timesteps), we may not need to use repeat and instead we can just use a Dense layer (in this case the output shape of the model would be (None, output_num_features), and not (None, 1, output_num_features)):
inp = layers.Input(shape=(input_num_timesteps, input_num_features))
# a stack of RNN layers on top of each other (this is optional)
x = layers.LSTM(..., return_sequences=True)(inp)
# ...
x = layers.LSTM(...)(x) # The last layer ONLY returns the last output of RNN (i.e. return_sequences=False)
out = layers.Dense(output_num_features, activation=...)(x)
model = models.Model(inp, out)
Note that the architectures provided above are just for illustration, and you may need to tune or adapt them, e.g. by adding more layers such as Dense layer, based on your use case and the problem you are trying to solve.
Update: The problem is that you don't pay enough attention when reading, both my comments and answer as well as the error raised by Keras. The error clearly states that:
... Found 1 input samples and 2 target samples.
So, after reading this carefully, if I were you I would say to myself: "OK, Keras thinks that the input batch has 1 input sample, but I think I am providing two samples!! Since I am a very good person(!), I think it's very likely that I would be wrong than Keras, so let's find out what I am doing wrong!". A simple and quick check would be to just examine the shape of input array:
>>> np.array([[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]]).shape
(1,2,5)
"Oh, it says (1,2,5)! So that means one sample which has two timesteps and each timestep has five features!!! So I was wrong into thinking that this array consists of two samples of length 5 where each timestep is of length 1!! So what should I do now???" Well, you can fix it, step-by-step:
# step 1: I want a numpy array
s1 = np.array([])
# step 2: I want it to have two samples
s2 = np.array([
[],
[]
])
# step 3: I want each sample to have 5 timesteps of length 1 in them
s3 = np.array([
[
[0], [1], [2], [3], [4]
],
[
[5], [6], [7], [8], [9]
]
])
>>> s3.shape
(2, 5, 1)
Voila! We did it! This was the input array; now check the target array, it must have two target samples of length 5 each with one feature, i.e. having a shape of (2, 5, 1):
>>> np.array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]]).shape
(2,5)
Almost! The last dimension (i.e. 1) is missing (NOTE: depending on the architecture of your model you may or may not need that last axis). So we can use the step-by-step approach above to find our mistake, or alternatively we can be a bit clever and just add an axis to the end:
>>> t = np.array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> t = np.expand_dims(t, axis=-1)
>>> t.shape
(2, 5, 1)
Sorry, I can't explain it better than this! But in any case, when you see that something (i.e. shape of input/target arrays) is repeated over and over in my comments and my answer, assume that it must be something important and should be checked.
I need help implementing a sequence classifier prediction using Keras APIs.
So far, I've managed to get the data in a specific format that I believe should be suited for input into Keras, however, I am still failing to understand the exact specifics on what parameters I need to change.
Here is the situation:
What I have is a number of targets, with each target representing an independent event.
Each target contains multiple detections of varying number. For example, one target might contain 33 detections, while another might contain 54. Each detection is just a single value between 0-1. The original dataset has a shape of (# samples, # detections)
I want to be able to input the sequence of these detections into a LSTM to classify the overall target into two classes for ALL targets
So far, I've pre-pended 0s to the detection sequences such that they are equal in length. Now the dataset has a shape of (# samples, 77(max detections across all targets))
Then, I create the time steps of an arbitrary window size of 7. The dataset has shape (# samples, 77-window+1 = 71, 7).
In case this isn't quite clear, the sequence has been turned from 1 long
[1, 2, 3, ... 77]
into 71 sequences of 7 that look like:
[[1, 2, 3, 4, 5, 6, 7],
[2, 3, 4, 5, 6, 7, 8],
...,
[71, 72, 73, 74, 75, 76, 77]]
Now that my data is in the format (# samples, # windows per sample, window), what tweaks should I make in order to obtain the output of 1 classification output per sample?
I've tried looking online at keras's documentation about TimeDistributed layers and LSTM layers, at all of the blog posts on MachineLearningMastery, and other forum posts, but I couldn't quite understand enough for me to figure out how to use the API for my specific case.
Here's what I have so far:
train_new.shape
output: (31179, 71, 7)
model = Sequential()
model.add(LSTM(100, input_shape=(71, 7), return_sequences=True))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=['accuracy'])
model.fit(train_new, label_train, validation_data=(test, label_test), epochs=3, batch_size=128)
Returns:
ValueError: Error when checking target: expected time_distributed_3 to have 3 dimensions, but got array with shape (31179, 1)
Any direction or guidance would be greatly appreciated.
Thanks for your time!
What is the best way to handle sparse vectors of size (about) 30 000, where all indexes are zero except one index with value one (1-HOT vector)?
In my dataset I have a sequence of values, that I convert to one 1-HOT vector for each value. Here is what i currently do:
# Create some queues to read data from .csv files
...
# Parse example(/line) from the data file
example = tf.decode_csv(value, record_defaults=record_defaults)
# example now looks like (e.g) [[5], [1], [4], [38], [571], [9]]
# [5] indicates the length of the sequence
# 1, 4, 38, 571 is the input sequence
# 4, 38, 571, 9 is the target sequence
# Create 1-HOT vectors for each value in the sequence
sequence_length = example[0]
one_hots = example[1:]
one_hots = tf.reshape(one_hots, [-1])
one_hots = tf.one_hot(one_hots, depth=n_classes)
# Grab the first values as the input features and the last values as target
features = one_hots[:-1]
targets = one_hots[1:]
...
# The sequence_length, features and targets are added to a list
# and the list is sent into a batch with tf.train_batch_join(...).
# So now I can get batches and feed into my RNN
...
This works, but I am convinced that it could be done in a more efficient way. I looked at SparseTensor, but I could not figure out how to create SparseTensors from the example tensor I get from tf.decode_csv. And I read somwhere that it is best to parse the data after it is retrieved as a batch, is this still true?
Here is a pastebin of the full code. From line 32 is my current way of creating 1-HOT vectors.
Instead of dealing with converting your inputs to sparse 1 hot vectors, it is preffered to use tf.nn.embedding_lookup, which simply selects the relevant rows of the matrix you would multiply by. This is equivalent for multiplication of the matrix by the 1-hot vector.
Here is a usage example
embed_dim = 3;
vocab_size = 10;
E = np.random.rand(vocab_size, embed_dim)
print E
embeddings = tf.Variable(E)
examples = tf.Variable(np.array([4,5, 2,9]).astype('int32'))
examples_embedded = tf.nn.embedding_lookup(embeddings, examples)
s = tf.InteractiveSession()
s.run(tf.initialize_all_variables())
print ''
print examples_embedded.eval()
Also see this example in im2txt project, for how to feed this kind of data for RNNs, (the line saying seq_embeddings = tf.nn.embedding_lookup(embedding_map, self.input_seqs))