Related
I apologize for the poor question title but I'm not sure quite how to phrase it. Here's the problem I'm trying to solve: I have two NNs working off of the same input dataset in my code. One of them is a traditional network while the other is used to limit the acceptable range of the first. This works by using a tf.where() statement which works fine in most cases, such as this toy example:
pcts= [0.04,0.06,0.06,0.06,0.06,0.06,0.06,0.04,0.04,0.04]
legal_actions = tf.where(pcts>=0.05, tf.ones_like(pcts), tf.zeros_like(pcts))
Which gives the correct result: legal_actions = [0,1,1,1,1,1,1,0,0,0]
I can then multiply this by the output of my first network to limit its Q values to only those of the legal actions. In a case like the above this works great.
However, it is also possible that my original vector looks something like this, with low values in the middle of the high values: pcts= [0.04,0.06,0.06,0.04,0.04,0.06,0.06,0.04,0.04,0.04]
Using the same code as above my legal_actions comes out as this: legal_actions = [0,1,1,0,0,1,1,0,0,0]
Based on the code I have this is correct, however, I'd like to include any zeros in the middle as part of my legal_actions. In other words, I'd like this second example to be the same as the first. Working in basic TF this is easy to do in several different ways, such as in this reproducible example (it's also easy to do with sparse tensors):
import tensorflow as tf
pcts= tf.placeholder(tf.float32, shape=(10,))
legal_actions = tf.where(pcts>=0.05, tf.ones_like(pcts), tf.zeros_like(pcts))
mask = tf.where(tf.greater(legal_actions,0))
legals = tf.cast(tf.range(tf.reduce_min(mask),tf.reduce_max(mask)+1),tf.int64)
oh = tf.one_hot(legals,10)
oh = tf.reduce_sum(oh,0)
with tf.Session() as sess:
print(sess.run(oh,feed_dict={pcts:[0.04,0.06,0.06,0.04,0.04,0.06,0.06,0.04,0.04,0.04]}))
The problem that I'm running into is when I try to apply this to my actual code which is reading in batches from a file. I can't figure out a way to fill in the "gaps" in my tensor without the range function and/or I can't figure out how to make the range function work with batches (it will only make one range at a time, not one per batch, as near as I can tell). Any suggestions on how to either make what I'm working on work or how to solve the problem a completely different way would be appreciated.
Try this code:
import tensorflow as tf
pcts = tf.random.uniform((2,3,4))
a = pcts>=0.5
shape = tf.shape(pcts)[-1]
a = tf.reshape(a, (-1, shape))
a = tf.cast(a, dtype=tf.float32)
def rng(t):
left = tf.scan(lambda a, x: max(a, x), t)
right = tf.scan(lambda a, x: max(a, x), t, reverse=True)
return tf.minimum(left, right)
a = tf.map_fn(lambda x: rng(x), a)
a = tf.reshape(a, (tf.shape(pcts)))
In some question and tutorials like below:
Why is an iterable object not an iterator?
Generator "TypeError: 'generator' object is not an iterator"
suggest that a data generator for keras should be a class having __iter__ and __next__ methods in it.
While some other tutorials like:
https://keunwoochoi.wordpress.com/2017/08/24/tip-fit_generator-in-keras-how-to-parallelise-correctly/
https://www.altumintelligence.com/articles/a/Time-Series-Prediction-Using-LSTM-Deep-Neural-Networks
use the normal python function with a yield statement providing data. While I successfully used yield in an LSTM network following the second tutorial above, I am not able to use the normal yield function in a convolutional network and getting the below error in fit_generator:
'method' object is not an iterator
I haven't tried using the __next__ method, but whoever got the above error was advised to use the __next__ method (EDIT: working after a fix suggested by Daniel Möller). Can someone please help me clarify which technique to use when and what is the difference between a function that "yields" the next sample vs a class with __iter__ & __next__?
My working code using yield:
https://github.com/KashyapCKotak/Multidimensional-Stock-Price-Prediction/blob/master/StockTF1_4Sequential.ipynb
My current data generator function using yield (EDIT: working after a fix suggested by Daniel Möller):
def train_images_generator(self):
for epoch in range(0, self.epochs):
print("Current Epoch:",epoch)
cnt = 0
if epoch > 2000:
learning_rate = 1e-5
for ind in np.random.permutation(len(self.train_ids)):
print("provided image with id:",ind)
#get the input image and target/ground truth image based on ind
raw = rawpy.imread(in_path)
input_images = np.expand_dims(pack_raw(raw), axis=0) * ratio # pack the bayer image in 4 channels of RGBG
gt_raw = rawpy.imread(gt_path)
im = gt_raw.postprocess(use_camera_wb=True,
half_size=False,
no_auto_bright=True, output_bps=16)
gt_images = np.expand_dims(np.float32(im / 65535.0),axis=0) # divide by 65535 to normalise (scale between 0 and 1)
# crop
H = input_images.shape[1] # get the image height (number of rows)
W = input_images.shape[2] # get the image width (number of columns)
xx = np.random.randint(0, W - ps) # get a random number in W-ps (W-512)
yy = np.random.randint(0, H - ps) # get a random number in H-ps (H-512)
input_patch = input_images[:, yy:yy + ps, xx:xx + ps, :]
gt_patch = gt_images[:, yy * 2:yy * 2 + ps * 2, xx * 2:xx * 2 + ps * 2, :]
if np.random.randint(2) == 1: # random flip for rows
input_patch = np.flip(input_patch, axis=1)
gt_patch = np.flip(gt_patch, axis=1)
if np.random.randint(2) == 1: # random flip for columns
input_patch = np.flip(input_patch, axis=2)
gt_patch = np.flip(gt_patch, axis=2)
if np.random.randint(2) == 1: # random transpose
input_patch = np.transpose(input_patch, (0, 2, 1, 3))
gt_patch = np.transpose(gt_patch, (0, 2, 1, 3))\
input_patch = np.minimum(input_patch, 1.0)
yield (input_patch,gt_patch)
How I use it:
model.fit_generator(
generator=data.train_images_generator(),
steps_per_epoch=steps_per_epoch,
epochs=epochs,
callbacks=callbacks,
max_queue_size=50
#workers=0
)
Looking carefully at the word 'method', I see you are not "calling" your generator (you are not creating it).
You are passing just the function/method.
Suppose you have:
def generator(...):
...
yield x, y
Instead of something like:
model.fit_generator(generator)
You should do something like:
model.fit_generator(generator(...))
Generator or Sequence
What is the difference between using a generator (a function with yield) and a keras.utils.Sequence?
When using a generator, training will follow the exact loop order, and it will not know when to finish. So.
With a generator:
Cannot shuffle batches because it will always follow the order of the loop
Must inform steps_per_epoch because Keras cannot know when the generator has finished (generators for Keras must be infinite)
If using multiprocessing, the system may not handle the batches correctly because it's impossible to know which process will start or finish before the others.
With a Sequence:
You control the length of the generator. Keras knows the number of batches automatically
You control the indexing of the batches, so Keras can shuffle batches.
You can take whatever batch you want how many times you want (you are not forced to take batches in sequence)
Multiprocessing can use the indices to make sure the batches are not going to be mixed in the end.
I am new to DL and Keras and currently I am trying to implement a sobel-filter-based custom loss function in Keras.
The idea is to calculate the mean squared loss of a sobel filtered prediction and a sobel filtered ground truth image.
So far, my custom loss function looks like this:
from scipy import ndimage
def mse_sobel(y_true, y_pred):
for i in range (0, y_true.shape[0]):
dx_true = ndimage.sobel(y_true[i,:,:,:], 1)
dy_true = ndimage.sobel(y_true[i,:,:,:], 2)
mag_true[i,:,:,:] = np.hypot(dx_true, dy_true)
mag_true[i,:,:,:] *= 1.0 / np.max(mag_true[i,:,:,:])
dx_pred = ndimage.sobel(y_pred[i,:,:,:], 1)
dy_pred = ndimage.sobel(y_pred[i,:,:,:], 2)
mag_pred[i,:,:,:] = np.hypot(dx_pred, dy_pred)
mag_pred[i,:,:,:] *= 1.0 / np.max(mag_pred[i,:,:,:])
return(K.mean(K.square(mag_pred - mag_true), axis=-1))
Using this loss function leads to this error:
in mse_sobel
for i in range (0, y_true.shape[0]):
TypeError: __index__ returned non-int (type NoneType)
Using the debugger I found out, that y_true.shape only returns None - fine. But when I replace y_true.shape with for example 1 such that it looks like this for i in range (0,1):, another error occurs:
in sobel
axis = _ni_support._check_axis(axis, input.ndim)
in _check_axis
raise ValueError('invalid axis')
ValueError: invalid axis
Here, I am not sure about why the axis seems to be invalid?
Can anyone help me figure out how to implement that loss function?
Thank you very much for your help!
Losses must be made with tensor operations, using the keras backend, or tensorflow/theano/cntk functions. This is the only way to keep backpropagation. Using numpy, scipy etc. breaks the graph.
Let's import the keras backend:
import keras.backend as K
Defining the filters:
#this contains both X and Y sobel filters in the format (3,3,1,2)
#size is 3 x 3, it considers 1 input channel and has two output channels: X and Y
sobelFilter = K.variable([[[[1., 1.]], [[0., 2.]],[[-1., 1.]]],
[[[2., 0.]], [[0., 0.]],[[-2., 0.]]],
[[[1., -1.]], [[0., -2.]],[[-1., -1.]]]])
Here, a function that repeats the filters for each input channel, in case your images are RGB or have more than 1 channel. This will just replicate the sobel filters for each input channel: (3,3,inputChannels, 2):
def expandedSobel(inputTensor):
#this considers data_format = 'channels_last'
inputChannels = K.reshape(K.ones_like(inputTensor[0,0,0,:]),(1,1,-1,1))
#if you're using 'channels_first', use inputTensor[0,:,0,0] above
return sobelFilter * inputChannels
And this is the loss function:
def sobelLoss(yTrue,yPred):
#get the sobel filter repeated for each input channel
filt = expandedSobel(yTrue)
#calculate the sobel filters for yTrue and yPred
#this generates twice the number of input channels
#a X and Y channel for each input channel
sobelTrue = K.depthwise_conv2d(yTrue,filt)
sobelPred = K.depthwise_conv2d(yPred,filt)
#now you just apply the mse:
return K.mean(K.square(sobelTrue - sobelPred))
Apply this loss in the model:
model.compile(loss=sobelLoss, optimizer = ....)
My experience shows that calculating the unified sobel filter sqrt(X² + Y²) brings terrible results and the resulting images sound like chess boards. But if you do want it:
def squareSobelLoss(yTrue,yPred):
#same beginning as the other loss
filt = expandedSobel(yTrue)
squareSobelTrue = K.square(K.depthwise_conv2d(yTrue,filt))
squareSobelPred = K.square(K.depthwise_conv2d(yPred,filt))
#here, since we've got 6 output channels (for an RGB image)
#let's reorganize in order to easily sum X² and Y²: change (h,w,6) to (h,w,3,2)
#caution: this method of reshaping only works in tensorflow
#if you do need this in other backends, let me know
newShape = K.shape(squareSobelTrue)
newShape = K.concatenate([newShape[:-1],
newShape[-1:]//2,
K.variable([2],dtype='int32')])
#sum the last axis (the one that is 2 above, representing X² and Y²)
squareSobelTrue = K.sum(K.reshape(squareSobelTrue,newShape),axis=-1)
squareSobelPred = K.sum(K.reshape(squareSobelPred,newShape),axis=-1)
#since both previous values are already squared, maybe we shouldn't square them again?
#but you can apply the K.sqrt() in both, and then make the difference,
#and then another square, it's up to you...
return K.mean(K.abs(squareSobelTrue - squareSobelPred))
I want to apply a filter to a tensor and remove values that do not meet my criteria. For example, lets say I have a tensor that looks like this:
softmax_tensor = [[ 0.05 , 0.05, 0.2, 0.7], [ 0.25 , 0.25, 0.3, 0.2 ]]
Right now, the classifier picks the argmax of the tensors to predict:
predictions = [[3],[2]]
But this isn't exactly what I want because I loose information about the confidence of that prediction. I would rather not make a prediction than to make an incorrect prediction. So what I would like to do is return filtered tensors like so:
new_softmax_tensor = [[ 0.05 , 0.05, 0.2, 0.7]]
new_predictions = [[3]]
If this were straight-up python, I'd have no trouble:
new_softmax_tensor = []
new_predictions = []
for idx,listItem in enumerate(softmax_tensor):
# get two highest max values and see if they are far enough apart
M = max(listItem)
M2 = max(n for n in listItem if n!=M)
if M2 - M > 0.3: # just making up a criteria here
new_softmax_tensor.append(listItem)
new_predictions.append(predictions[idx])
but given that tensorflow works on tensors, I'm not sure how to do this - and if I did, would it break the computation graph?
A previous SO post suggested using tf.gather_nd, but in that scenario they already had a tensor that they wated to filter on. I've also looked at tf.cond but still don't understand. I would imagine many other people would benefit from this exact same solution.
Thanks all.
Two things that I would do to solve your problem :
First, I would return the value of the softmax tensor. You look for a reference to it somewhere (you keep a reference to it when you create it, or you find it back in the appropriate tensor collection) And then evaluate it in a sess.run([softmaxtensor,prediction],feed_dict=..) And then you play with it with python as much as you like.
Second If you want to stay within the graph, I would use the build-it tf.where(), working quite alike the np.where function from numpy package doc there
Ok. I've got it sorted out now. Here is a working example.
import tensorflow as tf
#Set dummy example tensor
original_softmax_tensor = tf.Variable([
[0.4,0.2,0.2,0.9,0.1],
[0.5,0.2,0.2,0.9,0.1],
[0.6,0.2,0.2,0.1,0.99],
[0.1,0.8,0.2,0.09,0.99]
],name='original_softmax_tensor')
#Set dummy prediction tensor
original_predictions = tf.Variable([3,3,4,4],name='original_predictions')
#Now create a place to store my new variables
new_softmax_tensor = original_softmax_tensor
new_predictions = original_predictions
#set my cutoff variable
min_diff = tf.constant(0.3)
#initialize
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op) #execute init_op
#There's probably a better way to do this, but I had to do this hack to get
# the difference between the top 2 scores
tmp_diff1, _ = tf.nn.top_k(original_softmax_tensor,k=2,sorted=True)
tmp_diff2, _ = tf.nn.top_k(original_softmax_tensor,k=1,sorted=True)
#subtracting the max scores from both, makes the largest one '0'
actual_diff = tf.subtract(tmp_diff2,tmp_diff1)
#The max value for each will be the actual value of interest
actual_diff = tf.reduce_max(actual_diff,reduction_indices=[1])
#Create a boolean tensor that says to keep or not
cond_result = actual_diff > min_diff
#Keep only the values I want
new_predictions = tf.boolean_mask(original_predictions,cond_result)
new_softmax_tensor = tf.boolean_mask(new_softmax_tensor,cond_result)
new_predictions.eval()
new_softmax_tensor.eval()
# return these if this is in a function
I am trying to implement this algorithm to find the intercept and slope for single variable:
Here is my Python code to update the Intercept and slope. But it is not converging. RSS is Increasing with Iteration rather than decreasing and after some iteration it's becoming infinite. I am not finding any error implementing the algorithm.How Can I solve this problem? I have attached the csv file too.
Here is the code.
import pandas as pd
import numpy as np
#Defining gradient_decend
#This Function takes X value, Y value and vector of w0(intercept),w1(slope)
#INPUT FEATURES=X(sq.feet of house size)
#TARGET VALUE=Y (Price of House)
#W=np.array([w0,w1]).reshape(2,1)
#W=[w0,
# w1]
def gradient_decend(X,Y,W):
intercept=W[0][0]
slope=W[1][0]
#Here i will get a list
#list is like this
#gd=[sum(predicted_value-(intercept+slope*x)),
# sum(predicted_value-(intercept+slope*x)*x)]
gd=[sum(y-(intercept+slope*x) for x,y in zip(X,Y)),
sum(((y-(intercept+slope*x))*x) for x,y in zip(X,Y))]
return np.array(gd).reshape(2,1)
#Defining Resudual sum of squares
def RSS(X,Y,W):
return sum((y-(W[0][0]+W[1][0]*x))**2 for x,y in zip(X,Y))
#Reading Training Data
training_data=pd.read_csv("kc_house_train_data.csv")
#Defining fixed parameters
#Learning Rate
n=0.0001
iteration=1500
#Intercept
w0=0
#Slope
w1=0
#Creating 2,1 vector of w0,w1 parameters
W=np.array([w0,w1]).reshape(2,1)
#Running gradient Decend
for i in range(iteration):
W=W+((2*n)* (gradient_decend(training_data["sqft_living"],training_data["price"],W)))
print RSS(training_data["sqft_living"],training_data["price"],W)
Here is the CSV file.
Firstly, I find that when writing machine learning code, it's best NOT to use complex list comprehension because anything that you can iterate,
it's easier to read if written when normal loops and indentation and/or
it can be done with numpy broadcasting
And using proper variable names can help you better understand the code. Using Xs, Ys, Ws as short hand is nice only if you're good at math. Personally, I don't use them in the code, especially when writing in python. From import this: explicit is better than implicit.
My rule of thumb is to remember that if I write code I can't read 1 week later, it's bad code.
First, let's decide what is the input parameters for gradient descent, you will need:
feature_matrix (The X matrix, type: numpy.array, a matrix of N * D size, where N is the no. of rows/datapoints and D is the no. of columns/features)
output (The Y vector, type: numpy.array, a vector of size N)
initial_weights (type: numpy.array, a vector of size D).
Additionally, to check for convergence you will need:
step_size (the magnitude of change when iterating through to change the weights; type: float, usually a small number)
tolerance (the criteria to break the iterations, when the gradient magnitude is smaller than tolerance, assume that your weights have convereged, type: float, usually a small number but much bigger than the step size).
Now to the code.
def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
converged = False # Set a boolean to check for convergence
weights = np.array(initial_weights) # make sure it's a numpy array
while not converged:
# compute the predictions based on feature_matrix and weights.
# iterate through the row and find the single scalar predicted
# value for each weight * column.
# hint: a dot product can solve this easily
predictions = [??? for row in feature_matrix]
# compute the errors as predictions - output
errors = predictions - output
gradient_sum_squares = 0 # initialize the gradient sum of squares
# while we haven't reached the tolerance yet, update each feature's weight
for i in range(len(weights)): # loop over each weight
# Recall that feature_matrix[:, i] is the feature column associated with weights[i]
# compute the derivative for weight[i]:
# Hint: the derivative is = 2 * dot product of feature_column and errors.
derivative = 2 * ????
# add the squared value of the derivative to the gradient magnitude (for assessing convergence)
gradient_sum_squares += (derivative * derivative)
# subtract the step size times the derivative from the current weight
weights[i] -= (step_size * derivative)
# compute the square-root of the gradient sum of squares to get the gradient magnitude:
gradient_magnitude = ???
# Then check whether the magnitude is lower than the tolerance.
if ???:
converged = True
# Once it while loop breaks, return the loop.
return(weights)
I hope the extended pseudo-code helps you better understand the gradient descent. I won't fill in the ??? so as to not spoil your homework.
Note that your RSS code is also unreadable and unmaintainable. It's easier to do just:
>>> import numpy as np
>>> prediction = np.array([1,2,3])
>>> output = np.array([1,1,5])
>>> residual = output - prediction
>>> RSS = sum(residual * residual)
>>> RSS
5
Going through numpy basics will go a long way to machine learning and matrix-vector manipulation without going nuts with iterations: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.html
I have solved my own problem!
Here is the solved way.
import numpy as np
import pandas as pd
import math
from sys import stdout
#function Takes the pandas dataframe, Input features list and the target column name
def get_numpy_data(data, features, output):
#Adding a constant column with value 1 in the dataframe.
data['constant'] = 1
#Adding the name of the constant column in the feature list.
features = ['constant'] + features
#Creating Feature matrix(Selecting columns and converting to matrix).
features_matrix=data[features].as_matrix()
#Target column is converted to the numpy array
output_array=np.array(data[output])
return(features_matrix, output_array)
def predict_outcome(feature_matrix, weights):
weights=np.array(weights)
predictions = np.dot(feature_matrix, weights)
return predictions
def errors(output,predictions):
errors=predictions-output
return errors
def feature_derivative(errors, feature):
derivative=np.dot(2,np.dot(feature,errors))
return derivative
def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
converged = False
#Initital weights are converted to numpy array
weights = np.array(initial_weights)
while not converged:
# compute the predictions based on feature_matrix and weights:
predictions=predict_outcome(feature_matrix,weights)
# compute the errors as predictions - output:
error=errors(output,predictions)
gradient_sum_squares = 0 # initialize the gradient
# while not converged, update each weight individually:
for i in range(len(weights)):
# Recall that feature_matrix[:, i] is the feature column associated with weights[i]
feature=feature_matrix[:, i]
# compute the derivative for weight[i]:
#predict=predict_outcome(feature,weights[i])
#err=errors(output,predict)
deriv=feature_derivative(error,feature)
# add the squared derivative to the gradient magnitude
gradient_sum_squares=gradient_sum_squares+(deriv**2)
# update the weight based on step size and derivative:
weights[i]=weights[i] - np.dot(step_size,deriv)
gradient_magnitude = math.sqrt(gradient_sum_squares)
stdout.write("\r%d" % int(gradient_magnitude))
stdout.flush()
if gradient_magnitude < tolerance:
converged = True
return(weights)
#Example of Implementation
#Importing Training and Testing Data
# train_data=pd.read_csv("kc_house_train_data.csv")
# test_data=pd.read_csv("kc_house_test_data.csv")
# simple_features = ['sqft_living', 'sqft_living15']
# my_output= 'price'
# (simple_feature_matrix, output) = get_numpy_data(train_data, simple_features, my_output)
# initial_weights = np.array([-100000., 1., 1.])
# step_size = 7e-12
# tolerance = 2.5e7
# simple_weights = regression_gradient_descent(simple_feature_matrix, output,initial_weights, step_size,tolerance)
# print simple_weights
It is so simple
def mean(values):
return sum(values)/float(len(values))
def variance(values, mean):
return sum([(x-mean)**2 for x in values])
def covariance(x, mean_x, y, mean_y):
covar = 0.0
for i in range(len(x)):
covar+=(x[i]-mean_x) * (y[i]-mean_y)
return covar
def coefficients(dataset):
x = []
y = []
for line in dataset:
xi, yi = map(float, line.split(','))
x.append(xi)
y.append(yi)
dataset.close()
x_mean, y_mean = mean(x), mean(y)
b1 = covariance(x, x_mean, y, y_mean)/variance(x, x_mean)
b0 = y_mean-b1*x_mean
return [b0, b1]
dataset = open('trainingdata.txt')
b0, b1 = coefficients(dataset)
n=float(raw_input())
print(b0+b1*n)
reference : www.machinelearningmastery.com/implement-simple-linear-regression-scratch-python/