How does dimensions for placeholders work for tensorflow? - python

So suppose I have x_train and y_train where they are arrays and each element of that array a data point (in an array form)(so x_train would be in the form of x_train[i][j]). so x_train[0] represents 1st data point in the training set (in an array form) and suppose I want to create a simple regression
so I coded this
input = tf.placeholder(tf.float32, shape=[len(data[0]),None])
target = tf.placeholder(tf.flaot32, shape=[len(data[0]),None])
network = tf.layers.Dense(10, tf.keras.activations.relu)(input)
network = tf.layers.BatchNormalization()(network)
network = tf.layers.Dense(10,tf.keras.activations.relu)(network)
network = tf.layers.BatchNormalization()(network)
network = tf.layers.Dense(10,tf.keras.activations.linear)(network)
cost = tf.reduce_mean((target - network)**2)
optimizer = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
for epoch in range(1000):
_, val = sess.run([optimizer,cost], feed_dict={input: x_train, target: y_train})
print(val)
But is this correct? I'm not sure if the dimensions for the placeholders even match. When I try to run this code,
I get the error message
ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`.
So what I tried was to interchange the position of the dimensions' size for placeholders, so
the changed placeholders were
input = tf.placeholder(tf.float32, shape=[None,len(data[0])])
target = tf.placeholder(tf.float32, shape=[None,len(data[0])])
But with these, I then get the error message
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value dense/bias
[[{{node dense/bias/read}}]]

I was able to solve the above issue by performing np.expand_dims() on x_train & y_train at axis=0 and initializing batch_norm and network parameters with sess.run(tf.global_variable_initializer()) before optimizing the model.
Note: The presence of None in the first dimension of the shape of placeholder is alright as it allows TensorFlow to train models when batch_size is unknown (the same is true even for other dimensions of placeholder's shape). The error is due to mismatch in input and placeholder dimensions. Your inputs (x_train & y_train) were probably one-dimensional tensors while the placeholders either needed two-dimensional ones or one-dimensional vectors reshaped to two-dimensions.
Please find my below implementation for the same and a matplotlib plot that verifies the implementation:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
data = [[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20]]
x_train = data[0]
y_train = data[1]
x_train = np.expand_dims(x_train, 0)
y_train = np.expand_dims(y_train, 0)
input = tf.placeholder(tf.float32, shape=[None, len(data[0])])
target = tf.placeholder(tf.float32, shape=[None, len(data[1])])
network = tf.layers.Dense(10, tf.keras.activations.relu)(input)
network = tf.layers.BatchNormalization()(network)
network = tf.layers.Dense(10,tf.keras.activations.relu)(network)
network = tf.layers.BatchNormalization()(network)
network = tf.layers.Dense(10,tf.keras.activations.linear)(network)
cost = tf.reduce_mean((target - network)**2)
optimizer = tf.train.AdamOptimizer().minimize(cost)
costs = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(1000):
_, val = sess.run([optimizer,cost], feed_dict={input: x_train, target: y_train})
costs.append(val)
print(val)
fig, ax = plt.subplots(figsize=(11, 8))
ax.plot(range(1000), costs)
ax.set_title("Costs vs epochs")
ax.set_xlabel("Epoch")
ax.set_ylabel("Avg. val. accuracy")
Here's the plot of costs vs epochs:
Costs vs Epochs
Additionally, to test the network on new data (say) x_test = [[21,22,23,24,25,26,27,28,29,30]], you could use below code:
y_pred = sess.run(network,feed_dict={input: x_test})
PS: Ensure you use the same Tensorflow Session sess created above to run the inference (unless you're not saving and loading the model checkpoint)

Related

RuntimeError: mat1 dim 1 must match mat2 dim 0

I am still grappling with PyTorch, having played with Keras for a while (which feels a lot more intuitive).
Anyway - I have the nn.linear model code below, which works fine for just one input feature, where:
inputDim = 1
I am now trying to expand the same code to include 2 features, and so I have included another column in my feature dataframe and also set:
inputDim = 2
However, when I run the code, I get the dreaded error:
RuntimeError: mat1 dim 1 must match mat2 dim 0
This error references line 63, which is:
outputs = model(inputs)
I have gone through several other posts here relating to this dimensionality error, but I still can't see what is wrong with my code. Any help would be appreciated.
The full code looks like this:
import numpy as np
import pandas as pd
import torch
from torch.autograd import Variable
import matplotlib.pyplot as plt
device = 'cuda' if torch.cuda.is_available() else 'cpu'
df = pd.read_csv('Adjusted Close - BAC-UBS-WFC.csv')
x = df[['BAC', 'UBS']]
y = df['WFC']
# number_of_features = x.shape[1]
# print(number_of_features)
x_train = np.array(x, dtype=np.float32)
x_train = x_train.reshape(-1, 1)
y_train = np.array(y, dtype=np.float32)
y_train = y_train.reshape(-1, 1)
class linearRegression(torch.nn.Module):
def __init__(self, inputSize, outputSize):
super(linearRegression, self).__init__()
self.linear = torch.nn.Linear(inputSize, outputSize)
def forward(self, x):
out = self.linear(x)
return out
inputDim = 2
outputDim = 1
learningRate = 0.01
epochs = 500
# Model instantiation
torch.manual_seed(42)
model = linearRegression(inputDim, outputDim)
if torch.cuda.is_available(): model.cuda()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate)
# Model training
loss_series = []
for epoch in range(epochs):
# Converting inputs and labels to Variable
inputs = Variable(torch.from_numpy(x_train).cuda())
labels = Variable(torch.from_numpy(y_train).cuda())
# Clear gradient buffers because we don't want any gradient from previous epoch to carry forward, dont want to cummulate gradients
optimizer.zero_grad()
# get output from the model, given the inputs
outputs = model(inputs)
# get loss for the predicted output
loss = criterion(outputs, labels)
loss_series.append(loss.item())
print(loss)
# get gradients w.r.t to parameters
loss.backward()
# update parameters
optimizer.step()
print('epoch {}, loss {}'.format(epoch, loss.item()))
# Calculate predictions on training data
with torch.no_grad(): # we don't need gradients in the testing phase
predicted = model(Variable(torch.from_numpy(x_train).cuda())).cpu().data.numpy()
General advice: For errors with dimension, it usually helps to print out dimensions at each step of the computation.
Most likely in this specific case, you have made mistake in reshaping the input with this x_train = x_train.reshape(-1, 1)
Your input is (N,1) but NN expects (N,2).

Getting InvalidArgumentError in softmax_cross_entropy_with_logits

I'm pretty new to tensorflow and trying to do some experiments with the Iris dataset. I created following model function (MWE):
def model_fn(features, labels, mode):
net = tf.feature_column.input_layer(features, [tf.feature_column.numeric_column(key=key) for key in FEATURE_NAMES])
logits = tf.layers.dense(inputs=net, units=3)
loss = tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train_op = optimizer.minimize(
loss=loss,
global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
Unfortunately I get the following error:
InvalidArgumentError: Input to reshape is a tensor with 256 values, but the requested shape has 1
[[Node: Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](softmax_cross_entropy_with_logits_sg, Reshape/shape)]]
Seems to be some problem with the shapes of the tensors. However both logits and labels have an equal shape of (256, 3) - as it is required by the documentation. Also both tensors have type float32.
Just for the sake of completeness, here is the input function for the estimator:
import pandas as pd
import tensorflow as tf
import numpy as np
IRIS_DATA = "data/iris.csv"
FEATURE_NAMES = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
CLASS_NAME = ["class"]
COLUMNS = FEATURE_NAMES + CLASS_NAME
# read dataset
iris = pd.read_csv(IRIS_DATA, header=None, names=COLUMNS)
# encode classes
iris["class"] = iris["class"].astype('category').cat.codes
# train test split
np.random.seed(1)
msk = np.random.rand(len(iris)) < 0.8
train = iris[msk]
test = iris[~msk]
def iris_input_fn(batch_size=256, mode="TRAIN"):
def prepare_input(data=None):
#do mean normaization across all samples
mu = np.mean(data)
sigma = np.std(data)
data = data - mu
data = data / sigma
is_nan = np.isnan(data)
is_inf = np.isinf(data)
if np.any(is_nan) or np.any(is_inf):
print('data is not well-formed : is_nan {n}, is_inf: {i}'.format(n= np.any(is_nan), i=np.any(is_inf)))
data = transform_data(data)
return data
def transform_data(data):
data = data.astype(np.float32)
return data
def load_data():
global train
trn_all_data=train.iloc[:,:-1]
trn_all_labels=train.iloc[:,-1]
return (trn_all_data.astype(np.float32),
trn_all_labels.astype(np.int32))
data, labels = load_data()
data = prepare_input(data)
labels = tf.one_hot(labels, depth=3)
labels = tf.cast(labels, tf.float32)
dataset = tf.data.Dataset.from_tensor_slices((data.to_dict(orient="list"), labels))
dataset = dataset.shuffle(1000).repeat().batch(batch_size)
return dataset.make_one_shot_iterator().get_next()
Dataset from UCI repo
Solved the problem by replacing the loss function from nn module:
loss = tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)
by the loss function of losses module
loss = tf.losses.softmax_cross_entropy(onehot_labels=labels, logits=logits)
or by
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))
loss which is fed to the minimize method of GradientDescentOptimizer needed to be a scalar. A single value for the whole batch.
Problem was, I computed the softmax cross entropy for each element in the batch, which resulted in a tensor containing 256 (batch size) cross entropy values, and tried to feed this in the minimize method. Therefore the error message
Input to reshape is a tensor with 256 values, but the requested shape has 1

linear regression by tensorflow gets noticeable mean square error

I am new to tensorflow and I am trying to implement a simple feed-forward network for regression, just for learning purposes. The complete executable code is as follows.
The regression mean squared error is around 6, which is quite large. It is a little unexpected because the function to regress is linear and simple 2*x+y, and I expect a better performance.
I am asking for help to check if I did anything wrong in the code. I carefully checked the matrix dimensions so that should be good, but it is possible that I misunderstand something so the network or the session is not properly configured (like, should I run the training session multiple times, instead of just one time (the code below enclosed by #TRAINING#)? I see in some examples they input data piece by piece, and run the training progressively. I run the training just one time and input all data).
If the code is good, maybe this is a modeling issue, but I really don't expect to use a complicated network for such a simple regression.
import tensorflow as tf
import numpy as np
from sklearn.metrics import mean_squared_error
# inputs are points from a 100x100 grid in domain [-2,2]x[-2,2], total 10000 points
lsp = np.linspace(-2,2,100)
gridx,gridy = np.meshgrid(lsp,lsp)
inputs = np.dstack((gridx,gridy))
inputs = inputs.reshape(-1,inputs.shape[-1]) # reshpaes the grid into a 10000x2 matrix
feature_size = inputs.shape[1] # feature_size is 2, features are the 2D coordinates of each point
input_size = inputs.shape[0] # input_size is 10000
# a simple function f(x)=2*x[0]+x[1] to regress
f = lambda x: 2 * x[0] + x[1]
label_size = 1
labels = f(inputs.transpose()).reshape(-1,1) # reshapes labels as a column vector
ph_inputs = tf.placeholder(tf.float32, shape=(None, feature_size), name='inputs')
ph_labels = tf.placeholder(tf.float32, shape=(None, label_size), name='labels')
# just one hidden layer with 16 units
hid1_size = 16
w1 = tf.Variable(tf.random_normal([hid1_size, feature_size], stddev=0.01), name='w1')
b1 = tf.Variable(tf.random_normal([hid1_size, label_size]), name='b1')
y1 = tf.nn.relu(tf.add(tf.matmul(w1, tf.transpose(ph_inputs)), b1))
# the output layer
wo = tf.Variable(tf.random_normal([label_size, hid1_size], stddev=0.01), name='wo')
bo = tf.Variable(tf.random_normal([label_size, label_size]), name='bo')
yo = tf.transpose(tf.add(tf.matmul(wo, y1), bo))
# defines optimizer and predictor
lr = tf.placeholder(tf.float32, shape=(), name='learning_rate')
loss = tf.losses.mean_squared_error(ph_labels,yo)
optimizer = tf.train.GradientDescentOptimizer(lr).minimize(loss)
predictor = tf.identity(yo)
# TRAINING
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
_, c = sess.run([optimizer, loss], feed_dict={lr:0.05, ph_inputs: inputs, ph_labels: labels})
# TRAINING
# gets the regression results
predictions = np.zeros((input_size,1))
for i in range(input_size):
predictions[i] = sess.run(predictor, feed_dict={ph_inputs: inputs[i, None]}).squeeze()
# prints regression MSE
print(mean_squared_error(predictions, labels))
You're right, you understood the problem by yourself.
The problem is, in fact, that you're running the optimization step only one time. Hence you're doing one single update step of your network parameter and therefore the cost won't decrease.
I just changed the training session of your code in order to make it work as expected (100 training steps):
# TRAINING
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(100):
_, c = sess.run(
[optimizer, loss],
feed_dict={
lr: 0.05,
ph_inputs: inputs,
ph_labels: labels
})
print("Train step {} loss value {}".format(i, c))
# TRAINING
and at the end of the training step I go:
Train step 99 loss value 0.04462708160281181
0.044106700712455045

Fine-tuning a neural network in tensorflow

I've been working on this neural network with the intent to predict TBA (time based availability) of simulated windmill parks based on certain attributes. The neural network runs just fine, and gives me some predictions, however I'm not quite satisfied with the results. It fails to notice some very obvious correlations that I can clearly see by myself. Here is my current code:
`# Import
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
maxi = 0.96
mini = 0.7
# Make data a np.array
data = pd.read_csv('datafile_ML_no_avg.csv')
data = data.values
# Shuffle the data
shuffle_indices = np.random.permutation(np.arange(len(data)))
data = data[shuffle_indices]
# Training and test data
data_train = data[0:int(len(data)*0.8),:]
data_test = data[int(len(data)*0.8):int(len(data)),:]
# Scale data
scaler = MinMaxScaler(feature_range=(mini, maxi))
scaler.fit(data_train)
data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)
# Build X and y
X_train = data_train[:, 0:5]
y_train = data_train[:, 6:7]
X_test = data_test[:, 0:5]
y_test = data_test[:, 6:7]
# Number of stocks in training data
n_args = X_train.shape[1]
multi = int(8)
# Neurons
n_neurons_1 = 8*multi
n_neurons_2 = 4*multi
n_neurons_3 = 2*multi
n_neurons_4 = 1*multi
# Session
net = tf.InteractiveSession()
# Placeholder
X = tf.placeholder(dtype=tf.float32, shape=[None, n_args])
Y = tf.placeholder(dtype=tf.float32, shape=[None,1])
# Initialize1s
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg",
distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()
# Hidden weights
W_hidden_1 = tf.Variable(weight_initializer([n_args, n_neurons_1]))
bias_hidden_1 = tf.Variable(bias_initializer([n_neurons_1]))
W_hidden_2 = tf.Variable(weight_initializer([n_neurons_1, n_neurons_2]))
bias_hidden_2 = tf.Variable(bias_initializer([n_neurons_2]))
W_hidden_3 = tf.Variable(weight_initializer([n_neurons_2, n_neurons_3]))
bias_hidden_3 = tf.Variable(bias_initializer([n_neurons_3]))
W_hidden_4 = tf.Variable(weight_initializer([n_neurons_3, n_neurons_4]))
bias_hidden_4 = tf.Variable(bias_initializer([n_neurons_4]))
# Output weights
W_out = tf.Variable(weight_initializer([n_neurons_4, 1]))
bias_out = tf.Variable(bias_initializer([1]))
# Hidden layer
hidden_1 = tf.nn.relu(tf.add(tf.matmul(X, W_hidden_1), bias_hidden_1))
hidden_2 = tf.nn.relu(tf.add(tf.matmul(hidden_1, W_hidden_2),
bias_hidden_2))
hidden_3 = tf.nn.relu(tf.add(tf.matmul(hidden_2, W_hidden_3),
bias_hidden_3))
hidden_4 = tf.nn.relu(tf.add(tf.matmul(hidden_3, W_hidden_4),
bias_hidden_4))
# Output layer (transpose!)
out = tf.transpose(tf.add(tf.matmul(hidden_4, W_out), bias_out))
# Cost function
mse = tf.reduce_mean(tf.squared_difference(out, Y))
# Optimizer
opt = tf.train.AdamOptimizer().minimize(mse)
# Init
net.run(tf.global_variables_initializer())
# Fit neural net
batch_size = 10
mse_train = []
mse_test = []
# Run
epochs = 10
for e in range(epochs):
# Shuffle training data
shuffle_indices = np.random.permutation(np.arange(len(y_train)))
X_train = X_train[shuffle_indices]
y_train = y_train[shuffle_indices]
# Minibatch training
for i in range(0, len(y_train) // batch_size):
start = i * batch_size
batch_x = X_train[start:start + batch_size]
batch_y = y_train[start:start + batch_size]
# Run optimizer with batch
net.run(opt, feed_dict={X: batch_x, Y: batch_y})
# Show progress
if np.mod(i, 50) == 0:
mse_train.append(net.run(mse, feed_dict={X: X_train, Y: y_train}))
mse_test.append(net.run(mse, feed_dict={X: X_test, Y: y_test}))
pred = net.run(out, feed_dict={X: X_test})
print(pred)`
Have tried to tweak around with the number of hidden layers, number of nodes per layer, number of epochs to run and trying different activation functions and optimizers. However, I am quite new to neural networks, so there might be something very obvious that I'm missing.
Thanks in advance to anyone who managed to read through all of that.
It will make is much easier you you will share a small dataset that illustrate the problem. However, I will state some of the issues with non-standards datasets and how to overcome them.
Possible solutions
Regularization and validation-based optimization - are methods that are always good to try when looking for some extra-accuracy. See dropout methods here (original paper), and some overview here.
Unbalanced data - Sometimes of the time series categories/events behave like anomalies, or just in unbalanced ways. If you read a book, words like the or it will appear much more times than warehouse or such. This can become a problem if your main task is to detect the word warehouse and you train your network (even lstms) in traditional ways. A way to overcome this problem is by balancing the samples (creating balanced datasets) or to give more weight to low-frequent categories.
Model structure - sometimes fully connected layers are not enough. See computer vision problems for instance, where we train using convolution layers. The convolution and pooling layers enforce structure on the model, which is suitable for images. This is also some sort of regulation, since we have less parameters in those layers. In time-series problems, convolutions are also possible and turns out that works just fine. See example in Conditional Time Series Forecasting with Convolution Neural Networks.
The above suggestions are presented in the order I would suggest to try.
Good luck!

How to input char data to tensorflow rnn cell?

This is my model:
rnn_cell = tf.contrib.rnn.BasicRNNCell(512)
m_rnn_cell = tf.contrib.rnn.MultiRNNCell([rnn_cell]*3, state_is_tuple = False)
prediction, state = tf.nn.dynamic_rnn(m_rnn_cell, X, dtype=tf.float32)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
error = tf.reduce_mean((labels - prediction)**2
train = tf.train.GradientDescentOptimizer(learning_rate).minimize(error)
X is the placeholder I use to input data. I want to train it using a few English sentences. How do I do that? How do I shape my data, labels and placeholders? Can you also provide the code to train it?

Categories