For my future use,I wanted to test multivariate multilayer perceptron.
In order to test it, I made a simple python program.
Here's the code.
import tensorflow as tf
import pandas as pd
import numpy as np
import random
input = []
result = []
for i in range(0,10000):
x = random.random()*100
y = random.random()*100
input.append([x,y])
result.append(x*y)
input = np.array(input,dtype=float)
result = np.array(result,dtype = float)
activation_func = "relu"
unit_count = 256
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1,input_dim=2),
tf.keras.layers.Dense(unit_count,activation=activation_func),
tf.keras.layers.Dense(unit_count,activation=activation_func),
tf.keras.layers.Dense(unit_count,activation=activation_func),
tf.keras.layers.Dense(unit_count,activation=activation_func),
tf.keras.layers.Dense(1)])
model.compile(optimizer="adam",loss="mse")
model.fit(input,result,epochs=10)
predict_input = np.array([[7,3],[5,4],[8,8]]);
print(model.predict(predict_input))
I tried with this code, and the result was not good. The loss value seem not to get lower at some point.
I also tried with smaller x and y. It made model inaccurate with bigger numbers.
I've changed activation function, made more dense layers and increased the number of units but it didnt get better.
Neural networks are not able to adapt themself (without additional training) to a different domain, this means that you should train on a domain and run the inference on the same domain.
In images, we often just scale the input images from [0,255] to the [-1,1] and let the network learn from values in this range (and during inference we rescale always the input values to be in the [-1,1] range).
For solving your tasks you should bring the problem to a restricted domain.
In practice, if you're interested in training a model only for multiplying positive number you can squash them in the [0,1] range, and since the multiplication of values in this range always gives an output value in the same range.
I slightly modified your code and added some comments in the source code.
import random
import numpy as np
import pandas as pd
import tensorflow as tf
input = []
result = []
# We want to train our network to work in a fixed domain
# the [0,1] range.
# Let's also increase the training set -> more data is always better
for i in range(0, 100000):
x = random.random()
y = random.random()
input.append([x, y])
result.append(x * y)
print(input, result)
sys.exit()
input = np.array(input, dtype=float)
result = np.array(result, dtype=float)
activation_func = "relu"
unit_count = 256
# no need for a tons of layers
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(unit_count, input_dim=2, activation=activation_func),
tf.keras.layers.Dense(unit_count, activation=activation_func),
tf.keras.layers.Dense(1, use_bias=False),
]
)
model.compile(optimizer="adam", loss="mse")
model.fit(input, result, epochs=10)
# Bring our input values in the [0,1] range
max_value = 10
predict_input = np.array([[7, 3], [5, 4], [8, 8]]) / max_value
print(predict_input)
# Back to the original domain
# Multiply by max_value**2 is required since the multiplication
# for a number in [0,1] it's the same of a division
print(model.predict(predict_input) * max_value ** 2)
Example output:
[[0.7 0.3]
[0.5 0.4]
[0.8 0.8]]
[[21.04468 ]
[20.028284]
[64.05521 ]]
Related
I am trying to create a bare minimum PyTorch example only for learning purpose. I found out that my PyTorch code works fine with a really small training data but as soon as I increase the input data size it stops working. This seems very counterintuitive, ideally bigger training data size should give better results.
[ I have intentionally not used Object Oriented Paradigm as I am trying to first learn the core functionality hence keeping the code to a bare minimum. ]
import numpy as np
import torch
x_train = np.float32(np.random.rand(25,1)*10)
#Synthesize training data; we will verify the weights and bias later with the trained model
def synthesize_output(input):
return (1.29*input[0] + 13)
y_train = np.array([synthesize_output(row) for row in x_train]).reshape(-1,1)
X_train = torch.from_numpy(x_train)
Y_train = torch.from_numpy(y_train)
learning_rate = 0.001
# Initialize Weights and Bias to random starting values
w = torch.rand(1, 1, requires_grad=True)
b = torch.rand(1, 1, requires_grad=True)
for iter in range(1, 4001):
#forward pass : predict values
y_pred = X_train.mm(w).clamp(min=0).add(b)
#find loss
loss = (y_pred - Y_train).pow(2).sum()
#Backword pass for computing gradients
loss.backward()
#Just printing the loss to see how it is changing over the iterations
if (iter % 100) == 0:
print(f"Iter: {iter}, Loss={loss}")
#Manually updating weights
with torch.no_grad():
w -= learning_rate * w.grad
b -= learning_rate * b.grad
w.grad.zero_()
b.grad.zero_()
#finally check the weight and bias
print(f"Weights: {w} \n\nBias: {b}")
Above code works as it is but as soon as I increase ( just doubling ) the data size it stops working.
x_train = np.float32(np.random.rand(50,1)*10)
Unlike the above code the basic sklearn sample I had created however seems to work fine even with a much larger input dataset.
import numpy as np
from sklearn.linear_model import LinearRegression
x_train = np.float32(np.random.rand(2000000,1)*10)
def synthesize_output(input):
return (1.29*input[0] + 13)
y_train = np.array([synthesize_output(row) for row in x_train]).reshape(-1,1)
lm = LinearRegression()
lm.fit(x_train, y_train)
#finally check the weight and constant
lm.score(x_train, y_train)
print(f"Weight: {lm.coef_}")
print(f"Bias: {lm.intercept_}")
Why is PyTorch not able to handle large input data like sklearn?
When I use input very large size ( > 5000 ) training datasize the loss goes to a NaN.
How do we typically work around this problem?
I am using Ubuntu 19.04 (Disco Dingo), Python 3.7.3, and TensorFlow 1.14.0.
I noticed that the number of outputs given by the tensorflow.keras.Sequential.predict function is different than the number of inputs. Furthermore, it appears that there is no relation between the inputs and outputs.
Example:
import tensorflow as tf
import math
import numpy as np
import json
# We will train the model to recognize an XOR
x = [ [0,0], [0,1], [1,0], [1,1] ]
y = [ 0, 1, 1, 0 ]
xt = tf.cast(x, tf.float64)
yt = tf.cast(y, tf.float64)
# This model should be more than enough to learn an XOR
L0 = tf.keras.layers.Dense(2)
L1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
L2 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
L3 = tf.keras.layers.Dense(2, activation=tf.nn.softmax)
model = tf.keras.Sequential([L0,L1,L2,L3])
model.compile(
optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
model.fit(
x=xt,
y=yt,
batch_size=32,
epochs=1000, # Try to overfit data
shuffle=False,
steps_per_epoch=math.ceil(len(x)/32)
)
# While it is training, the loss drops to near zero
# and the accuracy goes to 100%.
# The large number of epochs and the small number of training examples
# should mean that the network is overtrained.
print("testing")
for i in range(len(y)):
m = tf.cast([x[i]],tf.float64)
# m should be the ith training example
values = model.predict(m,steps=1)
best = np.argmax(values[0])
print(x[i],y[i],best)
The output I always get is:
(input, correct answer, predicted answer)
[0, 0] 0 0
[0, 1] 1 0
[1, 0] 1 0
[1, 1] 0 0
or
[0, 0] 0 1
[0, 1] 1 1
[1, 0] 1 1
[1, 1] 0 1
So, even though I thought that the network would be overtrained, even though the program said that the accuracy was 100% and the loss was virtually zero, the output looks as though the network hadn't trained at all.
Stranger yet is when I replace the testing section with the following:
print("testing")
m = tf.cast([], tf.float64)
values = model.predict(m, steps=1)
print(values)
I would think that this would return an empty array or throw an exception. Instead it gives:
[[0.9979249 0.00207507]
[0.10981816 0.89018184]
[0.10981816 0.89018184]
[0.9932179 0.0067821 ]]
This corresponds to [0,1,1,0]
So even though it was given nothing to predict on, it still gives out predictions for something. And it appears as though the predictions match up with what what we would expect from sending the entire training set into the predict method.
Replacing the testing section again:
print("testing")
m = tf.cast([[0,0]], tf.float64)
# [0,0] is the first training example
# the output should be something close to [[1.0,0.0]]
values = model.predict(m, steps=1)
for j in range(len(values)):
print(values[j])
exit()
I get:
[0.9112452 0.08875483]
[0.00552484 0.9944752 ]
[0.00555605 0.99444395]
[0.9112452 0.08875483]
This corresponds to [0,1,1,0]
So asking it to predict on zero inputs, gives out 4 predictions. Asking it to predict on one input gives out 4 predictions. Furthermore, the predictions it gives out looks like what we would expect if we put the entire training set into the predict function.
Any ideas as to what's going on? How do I get my network to give exactly one prediction for each input given?
Providing the solution here (Answer Section), even though it is present in the Comment Section, for the benefit of the community.
Upgrading Tensorflow from 1.14.0 >=2.0 has resolved the issue.
After upgrading test section works as expected
m = tf.cast([[0,0]], tf.float64)
# [0,0] is the first training example
# the output should be something close to [[1.0,0.0]]
values = model.predict(m, steps=1)
for j in range(len(values)):
print(values[j])
exit()
Output:
[0.9921625 0.00783745]
I'm trying to come up with a Keras model that would do binary classification on image sequences. I split my dataset into train (1200 samples), valid (320 samples) and test (764 samples) partitions.
In the first step, I used InceptionV3 to extract features from my dataset consisting of image sequences. I saved those sequences of extracted features to disk.
The next step is to feed those sequences as input data to my LSTM based model.
Here is a minimal workable example that compiles:
import numpy as np
import copy
import random
from keras.layers.recurrent import LSTM
from keras.layers import Dense, Flatten, Dropout
from keras.models import Sequential
from keras.optimizers import Adam
def get_sequence(sequence_no, sample_type):
# Load sequence from disk according to 'sequence_no' and 'sample_type'
# eg. 'sample_' + sample_type + '_' + sequence_no
return np.zeros([100, 2048]) # added only for demo purposes, so it compiles
def frame_generator(batch_size, generator_type, labels):
# Define list of sample indexes
sample_list = []
for i in range(0, len(labels)):
sample_list.append(i)
# sample_list = [0, 1, 2, 3 ...]
while 1:
X, y = [], []
# Generate batch_size samples
for _ in range(batch_size):
# Reset to be safe
sequence = None
# Get a random sample
random_sample = random.choice(sample_list)
# Get sequence from disk
sequence = get_sequence(random_sample, generator_type)
X.append(sequence)
y.append(labels[random_sample])
yield np.array(X), np.array(y)
# Data mimicking
train_samples_num = 1200
valid_samples_num = 320
labels_train = np.random.randint(0, 2, size=(train_samples_num), dtype='uint8')
labels_valid = np.random.randint(0, 2, size=(valid_samples_num), dtype='uint8')
batch_size = 32
train_generator = frame_generator(batch_size, 'train', labels_train)
val_generator = frame_generator(batch_size, 'valid', labels_valid)
# Model
model = Sequential()
model.add(LSTM(2048, input_shape=(100, 2048), dropout = 0.5))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
model.summary()
# Training
model.fit_generator(
generator=train_generator,
steps_per_epoch=len(labels_train) // batch_size,
epochs=10,
validation_data=val_generator,
validation_steps=len(labels_valid) // batch_size)
This example was compiled with:
python 3.6.8
keras 2.2.4
tensorflow 1.13.1
This works well, I ran 3 training sessions on this model (same setup, same train/valid/test partitioning) and the test accuracy mean was 96.9%.
Then I started to ask myself whether it's a good idea to always choose a completely random sample from the sample list, inside the frame_generator() function, specifically line
random_sample = random.choice(sample_list). Picking samples completely at random makes it possible that some samples would be used way more than others, in the training session. Also, some of them would possibly never be used. This made me think that the model would generalize less in this setup COMPARED TO if it would see the samples equally during training.
As a fix, changes applied to frame_generator():
Make a backup copy of the sample list, then remove a sample from the sample list every time after using it. When the sample list becomes empty, replace the sample list with its backup copy. Do this for the whole training session. This ensures that ALL of the samples are seen by the model at training with almost the same frequency.
This is the new version of frame_generator(). Marked with #ADDED the 4 lines added:
def frame_generator(batch_size, generator_type, labels):
# Define list of sample indexes
sample_list = []
for i in range(0, len(labels)):
sample_list.append(i)
# sample_list = [0, 1, 2, 3 ...]
sample_list_backup = copy.deepcopy(sample_list) # ADDED
while 1:
X, y = [], []
# Generate batch_size samples
for _ in range(batch_size):
# Reset to be safe
sequence = None
# Get a random sample
random_sample = random.choice(sample_list)
sample_list.remove(random_sample) # ADDED
if len(sample_list) == 0: # ADDED
sample_list = copy.deepcopy(sample_list_backup) # ADDED
# Get sequence from disk
sequence = get_sequence(random_sample, generator_type)
X.append(sequence)
y.append(labels[random_sample])
yield np.array(X), np.array(y)
What I expected:
Given that now the model should be able to generalize a bit better, due to ensuring that all the samples are seen at training with an equal frequency, I expected that the test accuracy would increase slightly.
What happened:
I ran 3 training sessions on this model (same train/valid/test partitioning) and the test accuracy mean this time was 96.2%. That is a 0.7% decrease in accuracy from the 1st setup. So it seems that the model now generalizes worse.
Losses plots for each run:
Question:
Why does more randomness in frame_generator() result in better test set accuracy?
Seems rather counterintuitive to me.
This is an example of use Elman Recurrent Neural Network from Neurolab Python Library:
import neurolab as nl
import numpy as np
# Create train samples
i1 = np.sin(np.arange(0, 20))
i2 = np.sin(np.arange(0, 20)) * 2
t1 = np.ones([1, 20])
t2 = np.ones([1, 20]) * 2
input = np.array([i1, i2, i1, i2]).reshape(20 * 4, 1)
target = np.array([t1, t2, t1, t2]).reshape(20 * 4, 1)
# Create network with 2 layers
net = nl.net.newelm([[-2, 2]], [10, 1], [nl.trans.TanSig(), nl.trans.PureLin()])
# Set initialized functions and init
net.layers[0].initf = nl.init.InitRand([-0.1, 0.1], 'wb')
net.layers[1].initf= nl.init.InitRand([-0.1, 0.1], 'wb')
net.init()
# Train network
error = net.train(input, target, epochs=500, show=100, goal=0.01)
# Simulate network
output = net.sim(input)
# Plot result
import pylab as pl
pl.subplot(211)
pl.plot(error)
pl.xlabel('Epoch number')
pl.ylabel('Train error (default MSE)')
pl.subplot(212)
pl.plot(target.reshape(80))
pl.plot(output.reshape(80))
pl.legend(['train target', 'net output'])
pl.show()
In this example it's merging 2 unit length input and also it's merging 2 unit length output. After that it's training the network with these merged arrays.
First of all it doesn't seem like the schema that I got from here:
My main question is;
I have to train the network with arbitrary length of inputs and outputs like these:
Arbitrary length inputs to fixed length outputs
Fixed length inputs to arbitrary length outputs
Arbitrary length inputs to arbitrary length outputs
At this point this will come to your mind: "Your answer is Long short-term memory networks."
And I know It but Neurolab is easy to use because of it's good features. Particularly, it is exceptionally Pythonic. So I'm insisting on using Neurolab Library for my problem. But if you suggest me another library like Neurolab with better LSTM functionality, I will accept it.
Eventually, How can I rearrange this example for arbitrary length of inputs and outputs?
I don't have the best understanding about RNNs and LSTMs so please be explanatory.
After a long time when I look at this question of mine today, I can see it was a question of a person with lack of understanding about neural networks.
Matrix multiplication is the basic math at the heart of neural networks. You can not simply change the shape of input matrix because it changes the shape of the product and breaks the consistency among the dataset.
Neural networks are always trained with fixed length of input and output. Here is a very simple neural network implementation that using nothing but numpy's dot product to feedforward:
import numpy as np
# sigmoid function
def nonlin(x,deriv=False):
if(deriv==True):
return x*(1-x)
return 1/(1+np.exp(-x))
# input dataset
X = np.array([ [0,0,1],
[0,1,1],
[1,0,1],
[1,1,1] ])
# output dataset
y = np.array([[0,0,1,1]]).T
# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)
# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1
for iter in xrange(10000):
# forward propagation
l0 = X
l1 = nonlin(np.dot(l0,syn0))
# how much did we miss?
l1_error = y - l1
# multiply how much we missed by the
# slope of the sigmoid at the values in l1
l1_delta = l1_error * nonlin(l1,True)
# update weights
syn0 += np.dot(l0.T,l1_delta)
print "Output After Training:"
print l1
credit: http://iamtrask.github.io/2015/07/12/basic-python-network/
I am trying to use pybrain to output rgb values. The input layer takes an array of rgb values, and all hidden layers are linear models. I would have expected the network to output rgb values. However, the output of this network turns out to be an array of values that are no where near within the range of 0:255.
The images are about 25 different .jpg images of a bull. Each image is a flattened array of length 575280. I was hoping the network would converge on an image that ends up resembling a bull.
import numpy as np
from pybrain.structure import FeedForwardNetwork, LinearLayer, SigmoidLayer, GaussianLayer, TanhLayer
from pybrain.structure import FullConnection, BiasUnit
import testabull
bull_x = 510
bull_y = 398
bull_flat = 575280
n = FeedForwardNetwork()
bias_unit = BiasUnit()
in_layer = LinearLayer(bull_flat)
hidden_A = LinearLayer(5)
hidden_B = LinearLayer(10)
out_layer = LinearLayer(bull_flat)
n.addInputModule(in_layer)
n.addModule(hidden_A)
n.addModule(hidden_B)
n.addOutputModule(out_layer)
n.addModule(bias_unit)
in_to_hidden = FullConnection(in_layer, hidden_A)
hidden_to_hidden = FullConnection(hidden_A, hidden_B)
hidden_to_out = FullConnection(hidden_B, out_layer)
bias_to_hidden = FullConnection(hidden_B, out_layer)
n.addConnection(in_to_hidden)
n.addConnection(hidden_to_hidden)
n.addConnection(bias_to_hidden)
n.addConnection(hidden_to_out)
n.sortModules()
bull_img_array = testabull.crop_the_bull_images('../../imgs/thebull/')
trainable_array = [] ## an array of flattened images
for im in bull_img_array:
flat_im = np.array(im).flatten()
trainable_array.append(flat_im)
print n
print n.activate(trainable_array[0])
output = None
for a in trainable_array:
output = n.activate(a)
print output, len(output)
If anyone has any tips I would be very greatful.
First off there are two issues here, one you need to scale your outputs between 0 and 255. You can do this with some transformation afterwards. By taking the max and min value, then transposing between 0 and 255.
On the other hand this network will likely not learn what you'd like it to, your hidden layers are using Linear Layers. This is not very useful, as the weights themselves form a linear transformation. You'll essentially end up with a linear function. ftp://ftp.sas.com/pub/neural/FAQ2.html#A_act
I would recommend using a SigmoidLayer for your hidden layers, this of course squashes the values between 0 and 1. You can correct this in the output layer by multiplying by 255. Either via a fixed layer or just transforming the values afterwards.