I am self-teaching myself about tf.data API. I am using MNIST dataset for binary classification. The training x and y data is zipped together in the full train_dataset. Chained along together with this zip method is first the batch() dataset method. the data is batched with a batch size of 30. Since my training set size is 11623, with batch size 128, I will have 91 batches. The size of the last batch will be 103 which is fine since this is LSTM. Additionally, I am using drop-out. When I compute batch accuracy, I am turning off the drop-out.
The full code is given below:
#Ignore the warnings
import warnings
import pandas as pd
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (8,7)
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/")
Xtrain = mnist.train.images[mnist.train.labels < 2]
ytrain = mnist.train.labels[mnist.train.labels < 2]
#Data parameters
num_inputs = 28
num_classes = 2
# create the training dataset
Xtrain = tf.data.Dataset.from_tensor_slices(Xtrain).map(lambda x: tf.reshape(x,(num_steps, num_inputs)))
# apply a one-hot transformation to each label for use in the neural network
ytrain = tf.data.Dataset.from_tensor_slices(ytrain).map(lambda z: tf.one_hot(z, num_classes))
# zip the x and y training data together and batch and Prefetch data for faster consumption
train_dataset = tf.data.Dataset.zip((Xtrain, ytrain)).batch(128).prefetch(128)
iterator = tf.data.Iterator.from_structure(train_dataset.output_types,train_dataset.output_shapes)
X, y = iterator.get_next()
training_init_op = iterator.make_initializer(train_dataset)
#### model is here ####
#Network parameters
num_epochs = 2
batch_size = 128
output_keep_var = 0.5
with tf.Session() as sess:
# Training cycle
for epoch in range(0, num_epochs):
num_batch = 0
print ("Epoch: ", epoch)
avg_cost = 0.
avg_accuracy =0
total_batch = int(11623 / batch_size + 1)
while True:
_, miniBatchCost = sess.run([trainer, loss], feed_dict={output_keep_prob: output_keep_var})
miniBatchAccuracy = sess.run(accuracy, feed_dict={output_keep_prob: 1.0})
print('Batch %d: loss = %.2f, acc = %.2f' % (num_batch, miniBatchCost, miniBatchAccuracy * 100))
num_batch +=1
except tf.errors.OutOfRangeError:
When I run this code, it seems it is working and printing:
Batch 0: loss = 0.67276, acc = 0.94531
Batch 1: loss = 0.65672, acc = 0.92969
Batch 2: loss = 0.65927, acc = 0.89062
Batch 3: loss = 0.63996, acc = 0.99219
Batch 4: loss = 0.63693, acc = 0.99219
Batch 5: loss = 0.62714, acc = 0.9765
Batch 39: loss = 0.16812, acc = 0.98438
Batch 40: loss = 0.10677, acc = 0.96875
Batch 41: loss = 0.11704, acc = 0.99219
Batch 42: loss = 0.10592, acc = 0.98438
Batch 43: loss = 0.09682, acc = 0.97656
Batch 44: loss = 0.16449, acc = 1.00000
However, as one can see easily, there is something wrong. Only 45 batches are printed not 91 and I do not know why this is happening. I tried so many things and I think I am missing something out.
I can use repeat() function but I do not want that because I have redundant observations for last batches and I want LSTM to handle it.
This is an annoying pitfall when defining a model based directly on the get_next() output of a tf.data iterator. In your loop, you have two sess.run calls, both of which will advance the iterator by one step. This means each loop iteration actually consumes two batches (and also your loss and accuracy calculations are computed on different batches).
Not entirely sure if there is a "canonical" way of fixing this, but you could
compute the accuracy in the same run call as the cost/training step. This would mean that the accuracy calculation is also affected by the dropout mask, but since it's an approximate value based on only one batch, that shouldn't be a huge issue.
define your model based on a placeholder instead, and in each loop iteration run the get_next op itself, then feed the resulting numpy arrays (i.e. the batch) into the loss/accuracy computations.
Suppose you have an network that has worked with feed_dict so far to inject data into a graph. Every few epochs, I evaluated the training and test loss by feeding a batch from either dataset to my graph.
Now, for performance reasons, I decided to use an input pipeline. Take a look at this dummy example:
import tensorflow as tf
import numpy as np
dataset_size = 200
batch_size= 5
dimension = 4
# create some training dataset
dataset = tf.data.Dataset.\
dataset = dataset.batch(batch_size) # take batches
iterator = dataset.make_initializable_iterator()
x = tf.cast(iterator.get_next(),tf.float32)
w = tf.Variable(np.random.normal(size=(1,dimension)).astype(np.float32))
loss_func = lambda x,w: tf.reduce_mean(tf.square(x-w)) # notice that the loss function is a mean!
loss = loss_func(x,w) # this is the loss that will be minimized
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
with tf.Session() as sess:
# train one epoch
for i in range(dataset_size//batch_size):
# the training step will update the weights based on ONE batch of examples each step
loss1,_ = sess.run([loss,train_op])
print('train step {:d}. batch loss {:f}.'.format(i,loss1))
# I want to print the loss from another dataset (test set) here
Printing the loss of the training data is no problem, but how do I do this for another dataset? When using feed_dict, I simply got a batch from said set and fed it a value for x.
There are several things you can do for that. One simple option could be something like having two datasets and iterators and use tf.cond to switch between them. However, the more powerful way of doing it is to use an iterator that supports this directly. See the guide on how to create iterators for a description of the various iterator types. For example, using a reinitializable iterator you could have something like this:
import tensorflow as tf
import numpy as np
dataset_size = 200
dataset_test_size = 20
batch_size= 5
dimension = 4
# create some training dataset
dataset = tf.data.Dataset.\
dataset = dataset.batch(batch_size) # take batches
# create some test dataset
dataset_test = tf.data.Dataset.\
dataset_test = dataset_test.batch(batch_size) # take batches
iterator = tf.data.Iterator.from_structure(dataset.output_types,
dataset_init_op = iterator.make_initializer(dataset)
dataset_test_init_op = iterator.make_initializer(dataset_test)
x = tf.cast(iterator.get_next(),tf.float32)
w = tf.Variable(np.random.normal(size=(1,dimension)).astype(np.float32))
loss_func = lambda x,w: tf.reduce_mean(tf.square(x-w)) # notice that the loss function is a mean!
loss = loss_func(x,w) # this is the loss that will be minimized
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
with tf.Session() as sess:
# train one epoch
for i in range(dataset_size//batch_size):
# the training step will update the weights based on ONE batch of examples each step
loss1,_ = sess.run([loss,train_op])
print('train step {:d}. batch loss {:f}.'.format(i,loss1))
# print test loss
for i in range(dataset_test_size//batch_size):
loss1 = sess.run(loss)
print('test step {:d}. batch loss {:f}.'.format(i,loss1))
You can do something similar with a feedable iterator, depending on what you find more convenient, and I suppose even with an initializable iterator, for example making a boolean dataset that then you map to some data with tf.cond, although that would not be a very natural way to do it.
Here is how you can do it with an initializable iterator, actually in a cleaner way than what I was initially thinking, so maybe you actually like this more:
import tensorflow as tf
import numpy as np
dataset_size = 200
dataset_test_size = 20
batch_size= 5
dimension = 4
# create data
data = tf.constant(np.random.normal(2.0,size=(dataset_size,dimension)), tf.float32)
data_test = tf.constant(np.random.normal(2.0,size=(dataset_test_size,dimension)), tf.float32)
# choose data
testing = tf.placeholder_with_default(False, ())
current_data = tf.cond(testing, lambda: data_test, lambda: data)
# create dataset
dataset = tf.data.Dataset.from_tensor_slices(current_data)
dataset = dataset.batch(batch_size)
# create iterator
iterator = dataset.make_initializable_iterator()
x = tf.cast(iterator.get_next(),tf.float32)
w = tf.Variable(np.random.normal(size=(1,dimension)).astype(np.float32))
loss_func = lambda x,w: tf.reduce_mean(tf.square(x-w)) # notice that the loss function is a mean!
loss = loss_func(x,w) # this is the loss that will be minimized
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
with tf.Session() as sess:
# train one epoch
for i in range(dataset_size//batch_size):
# the training step will update the weights based on ONE batch of examples each step
loss1,_ = sess.run([loss,train_op])
print('train step {:d}. batch loss {:f}.'.format(i,loss1))
# print test loss
sess.run(iterator.initializer, feed_dict={testing: True})
for i in range(dataset_test_size//batch_size):
loss1 = sess.run(loss)
print('test step {:d}. batch loss {:f}.'.format(i,loss1))
import tensorflow as tf
hidden_layer1_node= 2
hidden_layer2_node= 1
X = tf.placeholder('float',[8,3])
Y = tf.placeholder('float',[8,1])
#neural model
def neural_model(x):
layer1_weight = {'weight':tf.Variable(tf.random_normal([3,hidden_layer1_node])),
layer2_weight = {'weight':tf.Variable(tf.random_normal([2,hidden_layer2_node])),
zl1 = tf.add(tf.matmul(x,layer1_weight['weight']), layer1_weight['bias'])
prediction1 = tf.sigmoid(zl1)
zl2 = tf.add(tf.matmul(prediction1,layer2_weight['weight']), layer2_weight['bias'])
return tf.sigmoid(zl2)
prediction = neural_model(X)
#cost function
def cost_function():
loss = tf.reduce_mean(-1*((Y*tf.log(prediction))+((1-Y)*tf.log(1.0-prediction))))
return loss
loss = cost_function()
training = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
#training stage
train_x = [[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]]
train_y = [[0],[1],[1],[0],[1],[0],[0],[1]]
epoch = 10
with tf.Session() as sess:
for i in range(epoch):
for _ in range(5000):
sess.run(training, feed_dict={X:train_x,Y:train_y})
Based on the network model(assuming one understands) after it has been trained, How could you pass tensors of not just [8,3] but be able to pass [1,3] such as [0,0,1] or something. I guess I'm rephrasing my question.
Unfortunately, TensorFlow doesn't allow the graph to change, that means that the input (and intermediate) tensors are required to have constant size. To distinguish between training and testing, you can use shared variables as explained here: https://www.tensorflow.org/guide/variables#sharing_variables
I am new to tensorflow and I am trying to implement a simple feed-forward network for regression, just for learning purposes. The complete executable code is as follows.
The regression mean squared error is around 6, which is quite large. It is a little unexpected because the function to regress is linear and simple 2*x+y, and I expect a better performance.
I am asking for help to check if I did anything wrong in the code. I carefully checked the matrix dimensions so that should be good, but it is possible that I misunderstand something so the network or the session is not properly configured (like, should I run the training session multiple times, instead of just one time (the code below enclosed by #TRAINING#)? I see in some examples they input data piece by piece, and run the training progressively. I run the training just one time and input all data).
If the code is good, maybe this is a modeling issue, but I really don't expect to use a complicated network for such a simple regression.
import tensorflow as tf
import numpy as np
from sklearn.metrics import mean_squared_error
# inputs are points from a 100x100 grid in domain [-2,2]x[-2,2], total 10000 points
lsp = np.linspace(-2,2,100)
gridx,gridy = np.meshgrid(lsp,lsp)
inputs = np.dstack((gridx,gridy))
inputs = inputs.reshape(-1,inputs.shape[-1]) # reshpaes the grid into a 10000x2 matrix
feature_size = inputs.shape[1] # feature_size is 2, features are the 2D coordinates of each point
input_size = inputs.shape[0] # input_size is 10000
# a simple function f(x)=2*x[0]+x[1] to regress
f = lambda x: 2 * x[0] + x[1]
label_size = 1
labels = f(inputs.transpose()).reshape(-1,1) # reshapes labels as a column vector
ph_inputs = tf.placeholder(tf.float32, shape=(None, feature_size), name='inputs')
ph_labels = tf.placeholder(tf.float32, shape=(None, label_size), name='labels')
# just one hidden layer with 16 units
hid1_size = 16
w1 = tf.Variable(tf.random_normal([hid1_size, feature_size], stddev=0.01), name='w1')
b1 = tf.Variable(tf.random_normal([hid1_size, label_size]), name='b1')
y1 = tf.nn.relu(tf.add(tf.matmul(w1, tf.transpose(ph_inputs)), b1))
# the output layer
wo = tf.Variable(tf.random_normal([label_size, hid1_size], stddev=0.01), name='wo')
bo = tf.Variable(tf.random_normal([label_size, label_size]), name='bo')
yo = tf.transpose(tf.add(tf.matmul(wo, y1), bo))
# defines optimizer and predictor
lr = tf.placeholder(tf.float32, shape=(), name='learning_rate')
loss = tf.losses.mean_squared_error(ph_labels,yo)
optimizer = tf.train.GradientDescentOptimizer(lr).minimize(loss)
predictor = tf.identity(yo)
init = tf.global_variables_initializer()
sess = tf.Session()
_, c = sess.run([optimizer, loss], feed_dict={lr:0.05, ph_inputs: inputs, ph_labels: labels})
# gets the regression results
predictions = np.zeros((input_size,1))
for i in range(input_size):
predictions[i] = sess.run(predictor, feed_dict={ph_inputs: inputs[i, None]}).squeeze()
# prints regression MSE
print(mean_squared_error(predictions, labels))
You're right, you understood the problem by yourself.
The problem is, in fact, that you're running the optimization step only one time. Hence you're doing one single update step of your network parameter and therefore the cost won't decrease.
I just changed the training session of your code in order to make it work as expected (100 training steps):
init = tf.global_variables_initializer()
sess = tf.Session()
for i in range(100):
_, c = sess.run(
[optimizer, loss],
lr: 0.05,
ph_inputs: inputs,
ph_labels: labels
print("Train step {} loss value {}".format(i, c))
and at the end of the training step I go:
Train step 99 loss value 0.04462708160281181
I've been working on this neural network with the intent to predict TBA (time based availability) of simulated windmill parks based on certain attributes. The neural network runs just fine, and gives me some predictions, however I'm not quite satisfied with the results. It fails to notice some very obvious correlations that I can clearly see by myself. Here is my current code:
`# Import
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
maxi = 0.96
mini = 0.7
# Make data a np.array
data = pd.read_csv('datafile_ML_no_avg.csv')
data = data.values
# Shuffle the data
shuffle_indices = np.random.permutation(np.arange(len(data)))
data = data[shuffle_indices]
# Training and test data
data_train = data[0:int(len(data)*0.8),:]
data_test = data[int(len(data)*0.8):int(len(data)),:]
# Scale data
scaler = MinMaxScaler(feature_range=(mini, maxi))
data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)
# Build X and y
X_train = data_train[:, 0:5]
y_train = data_train[:, 6:7]
X_test = data_test[:, 0:5]
y_test = data_test[:, 6:7]
# Number of stocks in training data
n_args = X_train.shape[1]
multi = int(8)
# Neurons
n_neurons_1 = 8*multi
n_neurons_2 = 4*multi
n_neurons_3 = 2*multi
n_neurons_4 = 1*multi
# Session
net = tf.InteractiveSession()
# Placeholder
X = tf.placeholder(dtype=tf.float32, shape=[None, n_args])
Y = tf.placeholder(dtype=tf.float32, shape=[None,1])
# Initialize1s
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg",
distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()
# Hidden weights
W_hidden_1 = tf.Variable(weight_initializer([n_args, n_neurons_1]))
bias_hidden_1 = tf.Variable(bias_initializer([n_neurons_1]))
W_hidden_2 = tf.Variable(weight_initializer([n_neurons_1, n_neurons_2]))
bias_hidden_2 = tf.Variable(bias_initializer([n_neurons_2]))
W_hidden_3 = tf.Variable(weight_initializer([n_neurons_2, n_neurons_3]))
bias_hidden_3 = tf.Variable(bias_initializer([n_neurons_3]))
W_hidden_4 = tf.Variable(weight_initializer([n_neurons_3, n_neurons_4]))
bias_hidden_4 = tf.Variable(bias_initializer([n_neurons_4]))
# Output weights
W_out = tf.Variable(weight_initializer([n_neurons_4, 1]))
bias_out = tf.Variable(bias_initializer([1]))
# Hidden layer
hidden_1 = tf.nn.relu(tf.add(tf.matmul(X, W_hidden_1), bias_hidden_1))
hidden_2 = tf.nn.relu(tf.add(tf.matmul(hidden_1, W_hidden_2),
hidden_3 = tf.nn.relu(tf.add(tf.matmul(hidden_2, W_hidden_3),
hidden_4 = tf.nn.relu(tf.add(tf.matmul(hidden_3, W_hidden_4),
# Output layer (transpose!)
out = tf.transpose(tf.add(tf.matmul(hidden_4, W_out), bias_out))
# Cost function
mse = tf.reduce_mean(tf.squared_difference(out, Y))
# Optimizer
opt = tf.train.AdamOptimizer().minimize(mse)
# Init
# Fit neural net
batch_size = 10
mse_train = []
mse_test = []
# Run
epochs = 10
for e in range(epochs):
# Shuffle training data
shuffle_indices = np.random.permutation(np.arange(len(y_train)))
X_train = X_train[shuffle_indices]
y_train = y_train[shuffle_indices]
# Minibatch training
for i in range(0, len(y_train) // batch_size):
start = i * batch_size
batch_x = X_train[start:start + batch_size]
batch_y = y_train[start:start + batch_size]
# Run optimizer with batch
net.run(opt, feed_dict={X: batch_x, Y: batch_y})
# Show progress
if np.mod(i, 50) == 0:
mse_train.append(net.run(mse, feed_dict={X: X_train, Y: y_train}))
mse_test.append(net.run(mse, feed_dict={X: X_test, Y: y_test}))
pred = net.run(out, feed_dict={X: X_test})
Have tried to tweak around with the number of hidden layers, number of nodes per layer, number of epochs to run and trying different activation functions and optimizers. However, I am quite new to neural networks, so there might be something very obvious that I'm missing.
Thanks in advance to anyone who managed to read through all of that.
It will make is much easier you you will share a small dataset that illustrate the problem. However, I will state some of the issues with non-standards datasets and how to overcome them.
Possible solutions
Regularization and validation-based optimization - are methods that are always good to try when looking for some extra-accuracy. See dropout methods here (original paper), and some overview here.
Unbalanced data - Sometimes of the time series categories/events behave like anomalies, or just in unbalanced ways. If you read a book, words like the or it will appear much more times than warehouse or such. This can become a problem if your main task is to detect the word warehouse and you train your network (even lstms) in traditional ways. A way to overcome this problem is by balancing the samples (creating balanced datasets) or to give more weight to low-frequent categories.
Model structure - sometimes fully connected layers are not enough. See computer vision problems for instance, where we train using convolution layers. The convolution and pooling layers enforce structure on the model, which is suitable for images. This is also some sort of regulation, since we have less parameters in those layers. In time-series problems, convolutions are also possible and turns out that works just fine. See example in Conditional Time Series Forecasting with Convolution Neural Networks.
The above suggestions are presented in the order I would suggest to try.
Good luck!
My input data is as follows:
14.96 41.76 1024.07 73.17 463.26
25.18 62.96 1020.04 59.08 444.37
5.11 39.4 1012.16 92.14 488.56
20.86 57.32 1010.24 76.64 446.48
10.82 37.5 1009.23 96.62 473.9
26.27 59.44 1012.23 58.77 443.67
15.89 43.96 1014.02 75.24 467.35
9.48 44.71 1019.12 66.43 478.42
14.64 45 1021.78 41.25 475.98
I am basically working on Python using Tensorflow Library.
As of now,I have a linear model,which is working fine for 4 inputs and 1 output.This is basically a regression problem.
For e.g: After training my neural network with sufficient data(say if the size of data is some 10000), then while training my neural network,if I am passing the values 45,30,25,32,as inputs , it is returning the value 46 as Output.
I basically have two queries:
As of now, in my code, I am using the parameters
training_epochs , learning_rate etc. I am as of now giving the
value of training_epochs as 10000.So, when I am testing my neural
network by passing four input values, I am getting the output as
some 471.25, while I expect it to be 460.But if I am giving the
value of training_epochs as 20000, instead of 10000, I am getting
my output value as 120.5, which is not at all close when compared to
the actual value "460".
Can you please explain, how can one chose the values of training_epochs and learning_rate(or any other parameter values) in my code, so that I can get good accuracy.
Now, the second issue is, my neural network as of now is working
only for linear data as well as only for 1 output. If I want to have
3 inputs and 2 outputs and also a non-linear model, what are the
possible changes I can make in my code?
I am posting my code below:
import tensorflow as tf
import numpy as np
import pandas as pd
#import matplotlib.pyplot as plt
rng = np.random
# In[180]:
# Parameters
learning_rate = 0.01
training_epochs = 10000
display_step = 1000
# In[171]:
# Read data from CSV
df = pd.read_csv("H:\MiniThessis\Sample.csv")
# In[173]:
# Seperating out dependent & independent variable
train_x = df[['AT','V','AP','RH']]
train_y = df[['PE']]
trainx = train_x.as_matrix().astype(np.float32)
trainy = train_y.as_matrix().astype(np.float32)
# In[174]:
n_input = 4
n_classes = 1
n_hidden_1 = 5
n_samples = 9569
# tf Graph Input
#Inserts a placeholder for a tensor that will be always fed.
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Set model weights
W_h1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
W_out = tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
b_h1 = tf.Variable(tf.random_normal([n_hidden_1]))
b_out = tf.Variable(tf.random_normal([n_classes]))
# In[175]:
# Construct a linear model
layer_1 = tf.matmul(x, W_h1) + b_h1
layer_1 = tf.nn.relu(layer_1)
out_layer = tf.matmul(layer_1, W_out) + b_out
# In[176]:
# Mean squared error
cost = tf.reduce_sum(tf.pow(out_layer-y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
# In[177]:
# Initializing the variables
init = tf.global_variables_initializer()
# In[181]:
# Launch the graph
with tf.Session() as sess:
# Fit all training data
for epoch in range(training_epochs):
_, c = sess.run([optimizer, cost], feed_dict={x: trainx,y: trainy})
# Display logs per epoch step
if (epoch+1) % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={x: trainx,y: trainy})
correct_prediction = tf.equal(tf.argmax(out_layer, 1), tf.argmax(y, 1))
best = sess.run([out_layer], feed_dict=
1.you can adjust these following lines;
# In general baises are either initialized as zeros or not zero constant, but not Gaussian
b_h1 = tf.Variable(tf.zeros([n_hidden_1]))
b_out = tf.Variable(tf.zeros([n_classes]))
# MSE error
cost = tf.reduce_mean(tf.pow(out_layer-y, 2))/(2*n_samples)
Also, Feed the data as mini batches; as the optimizer you are using is tuned for minibatch optimization; feeding the data as a whole doesn't result in optimal performance.
for multiple ouputs you need to change only the n_classes and the cost fucntion (tf.nn.softmax_cross_entropy_with_logits). Also the model you defined here isn't linear; as you are using the non linear activation function tf.nn.relu.