I'm reading Object Detection API source code and I wonder how to use TFSlim to train model?
More specifically, when we use Tensorflow to train the model, we use something like this:
parameters = model(X_train, Y_train, X_test, Y_test)
# Returns: parameters -- parameters learnt by the model.
# They can then be used to predict.
And to predict the result, we use something like:
y_image_prediction = predict(my_image, parameters)
But in file trainer.py, we don't have something like above, we only get:
slim.learning.train(
train_tensor,
logdir=train_dir,
master=master,
is_chief=is_chief,
session_config=session_config,
startup_delay_steps=train_config.startup_delay_steps,
init_fn=init_fn,
summary_op=summary_op,
number_of_steps=(
train_config.num_steps if train_config.num_steps else None),
save_summaries_secs=120,
sync_optimizer=sync_optimizer,
saver=saver)
And there are no return about this slim.learning.train function. So I wonder what is the using of slim.learning.train function, and how do we get the parameters -- that can be used to predict the result?
HERE is source code of trainer.py.
The train function does not return a value because it modifies the actual parameters of the model. The function does that by running the train_tensor which is: "A Tensor that, when executed, will apply the gradients and return the loss value." as written in the function documentation.
The tensor the documentation talks about what you get when you tell an optimizer to optimize some cost function. It is opt_op in the following example:
opt = GradientDescentOptimizer(learning_rate=0.1)
opt_op = opt.minimize(cost)
Find more in the optimizer documentation.
Related
I trained a set of LinearRegression models using the following GridSearchCV
MAX_COLUMNS=list(range(2, len(house_df.columns)))
X = house_df.drop(columns=['SalePrice'])
y = house_df.loc[:, 'SalePrice']
column_list = MAX_COLUMNS
# Box-cox transform the target
reg_strategy = TransformedTargetRegressor()
bcox_transformer = PowerTransformer(method='box-cox')
model_pipeline = Pipeline([("std_scaler", StandardScaler()),
('feature_selector', SelectKBest()),
('regress', reg_strategy)])
parameter_grid = [{'feature_selector__k' : column_list,
'feature_selector__score_func' : [f_regression, mutual_info_regression],
'regress__regressor' : [LinearRegression()],
'regress__regressor__fit_intercept' : [True],
'regress__transformer' : [None, bcox_transformer]}]
score_types = {'MSE' : 'neg_mean_squared_error', 'r2' : 'r2'}
gs = GridSearchCV(estimator=model_pipeline, param_grid=parameter_grid, scoring=score_types, refit='MSE', cv=5, n_jobs=5, verbose=1)
gs.fit(X, y)
PATH = './datasets/processed_data/'
gridsearch_result_filename = 'pfY_np10_nt2_rfS_ct0_8_st1_orY_ccY_LR1_GS.pkl'
full_path = PATH + gridsearch_result_filename
with open(full_path, 'wb') as file:
pickle.dump(gs, file)
I then load the trained GridSearch and can make predictions using the best estimator as follows:
with open(MODEL_PATH, 'rb') as file:
gs_results = pickle.load(file)
predictions = gs_results.predict(test_df)
The problem I am facing is that since the Box-Cox transform was applied during GridSearch, all of my predictions are in the Box-Cox transformed distributions domain (huge values).
I need to use the PowerTransformers inverse_transform() method on my predictions, but I am not sure how to access it.
I can get the entire pipeline for the best estimator like this
gs_results.best_estimator_
I can then access the TransformedTargetRegressor inside the pipeline like this:
Taking a step further, we get all the way to the PowerTransformer inside the TransformedTargetRegressor like this:
After making it here, I foolishly thought I had made it where I needed to be, and simply needed to use the PowerTransformers inverse_transform method to make predictions that would be back in the original units. However, much to my disappointment, an error is thrown:
The error seems pretty clear, telling me I cannot use the inverse_transform method because the PowerTransformer has not been fit.
This is where I am stumped. It doesn't make sense to say the PowerTransformer has not been fit, when clearly it was fit during the GridSearch process.
This makes me think I am simply accessing the PowerTransformer incorrectly, which is my current question.
Based on the set up above, does anyone know the correct way to take the inverse transform of my predictions so they are in the original units rather than the Box-Cox distributions units?
I have been banging my head against the wall for this and have searched all over for a similar question. Thank you so much in advance!
-Braden
Much like here, the attribute transformer is the unfitted initialization attribute; you need the fitted transformer_ attribute.
However, I'm not sure why predict doesn't already do what you want; the documentation for TransformedTargetRegressor.predict says
Predict using the base regressor, applying inverse.
The regressor is used to predict and the inverse_func or inverse_transform is applied before returning the prediction.
I have an object detection model implemented in tensorflow.keras (version 1.15). I am trying to implement a modified (hybrid) loss function in my model. Basically I need a few variables defined in my loss function because I am processing the y_true and y_pred provided in my classification loss function (a focal loss to be exact). So, I naturally resided in implementing my ops inside the loss function.
I have defined a WrapperClass to initialize my variables:
class LossWrapper(object):
def __init__(self, num_centers):
...
self.total_loss = tf.Variable(0, dtype=tf.float32)
def loss_funcion(self, y_true, y_pred):
...
self.total_loss = self.total_loss + ...
I amd getting an error:
tensorflow.python.framework.errors_impl.FailedPreconditionError:
Attempting to use uninitialized value Variable
using self.total_loss_cosine = tf.zeros(1)[0] I am getting (a similar message):
tensorflow.python.framework.errors_impl.InvalidArgumentError:
Retval[0] does not have value
I came to the conclusion that no matter how I define my variable or where I define it (I have tried inside the __init__ function or in the main function body) I am getting an error stating about attempting to use some uninitialized Variable.
I am starting to think that I cannot initialize variables inside my loss function and probably I should implement them as a typical block outside it. Is this the case? Is the loss functions basically separated from the rest of the network so the typical initialization does not work as expected?
Some remarks:
The loss function seem to work flawless in eager execution mode where the initialization issue obviously does not exist.
In eager execution mode the type of y_true seem to be np.array and not tf.Tensor (or tf.EagerTensor at least). Does this mean that actually y_true and y_pred are propagated as numpy array in general, meaning, that this part is actually detached from the network? (I have tested this on eager execution only though)
I want to use the external optimizer interface within tensorflow, to use newton optimizers, as tf.train only has first order gradient descent optimizers. At the same time, i want to build my network using tf.keras.layers, as it is way easier than using tf.Variables when building large, complex networks. I will show my issue with the following, simple 1D linear regression example:
import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np
#generate data
no = 100
data_x = np.linspace(0,1,no)
data_y = 2 * data_x + 2 + np.random.uniform(-0.5,0.5,no)
data_y = data_y.reshape(no,1)
data_x = data_x.reshape(no,1)
# Make model using keras layers and train
x = tf.placeholder(dtype=tf.float32, shape=[None,1])
y = tf.placeholder(dtype=tf.float32, shape=[None,1])
output = tf.keras.layers.Dense(1, activation=None)(x)
loss = tf.losses.mean_squared_error(data_y, output)
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, method="L-BFGS-B")
sess = K.get_session()
sess.run(tf.global_variables_initializer())
tf_dict = {x : data_x, y : data_y}
optimizer.minimize(sess, feed_dict = tf_dict, fetches=[loss], loss_callback=lambda x: print("Loss:", x))
When running this, the loss just does not change at all. When using any other optimizer from tf.train, it works fine. Also, when using tf.layers.Dense() instead of tf.keras.layers.Dense() it does work using the ScipyOptimizerInterface. So really the question is what is the difference between tf.keras.layers.Dense() and tf.layers.Dense(). I saw that the Variables created by tf.layers.Dense() are of type tf.float32_ref while the Variables created by tf.keras.layers.Dense() are of type tf.float32. As far as I now, _ref indicates that this tensor is mutable. So maybe that's the issue? But then again, any other optimizer from tf.train works fine with keras layers.
Thanks
After a lot of digging I was able to find a possible explanation.
ScipyOptimizerInterface uses feed_dicts to simulate the updates of your variables during the optimization process. It only does an assign operation at the very end. In contrast, tf.train optimizers always do assign operations. The code of ScipyOptimizerInterface is not that complex so you can verify this easily.
Now the problem is that assigining variables with feed_dict is working mostly by accident. Here is a link where I learnt about this. In other words, assigning variables via feed dict, which is what ScipyOptimizerInterface does, is a hacky way of doing updates.
Now this hack mostly works, except when it does not. tf.keras.layers.Dense uses ResourceVariables to model the weights of the model. This is an improved version of simple Variables that has cleaner read/write semantics. The problem is that under the new semantics the feed dict update happens after the loss calculation. The link above gives some explanations.
Now tf.layers is currently a thin wrapper around tf.keras.layer so I am not sure why it would work. Maybe there is some compatibility check somewhere in the code.
The solutions to adress this are somewhat simple.
Either avoid using components that use ResourceVariables. This can be kind of difficult.
Patch ScipyOptimizerInterface to do assignments for variables always. This is relatively easy since all the required code is in one file.
There was some effort to make the interface work with eager (that by default uses the ResourceVariables). Check out this link
I think the problem is with the line
output = tf.keras.layers.Dense(1, activation=None)(x)
In this format output is not a layer but rather the output of a layer, which might be preventing the wrapper from collecting the weights and biases of the layer and feed them to the optimizer. Try to write it in two lines e.g.
output = tf.keras.layers.Dense(1, activation=None)
res = output(x)
If you want to keep the original format then you might have to manually collect all trainables and feed them to the optimizer via the var_list option
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, var_list = [Trainables], method="L-BFGS-B")
Hope this helps.
I am training a CNN LSTM model using Keras, and after the training was done, I tried to evaluate the model on the testing data like I did when I fine-tuned my CNN, however an error appears this time.
After training was done, I tried to following piece of code to evaluate on my testing set:
x, y = zip(*(testgenerator[i] for i in range(len(testgenerator))))
x_test, y_test = np.vstack(x), np.vstack(y)
loss, acc = Bi_LSTM.evaluate(x_test, y_test, batch_size=9)
print("Accuracy: " ,acc)
print("Loss: ", loss)
I have used this code before to evaluate my fine tuned model and it had no issue, but now I get the following error:
TypeError: object of type 'generator' has no len()
I have tried few solutions online like using len(list(generator)) but it did not work. Is it because I am using a custom generator? How can I do to evaluate model in this case ?
I think this line is the problem
x, y = zip(*(testgenerator[i] for i in range(len(testgenerator))))
because you call len on generator object.
The solution may be if you just create some counter, increment it and use it as index in testgenerator[i]
The way I solved this is by using a different method. In this case I do not need to extract values for x,y:
loss, acc = Bi_LSTM.evaluate_generator(testgenerator, batch_size=9)
Using the following with TF .9.0rc0 on 60,000 (train) and 26,000 (test) on or so records with 145 coded columns (1,0) trying to predict 1 or 0 for class identification..
classifier_TensorFlow = learn.TensorFlowDNNClassifier(hidden_units=[10, 20, 10],n_classes=2, steps=100)
classifier_TensorFlow.fit(X_train, y_train.ravel())
I get:
WARNING:tensorflow:TensorFlowDNNClassifier class is deprecated. Please consider using DNNClassifier as an alternative.
Out[34]:TensorFlowDNNClassifier(steps=100, batch_size=32)
And then good results quite fast:
score = metrics.accuracy_score(y_test, classifier_TensorFlow.predict(X_test))
print('Accuracy: {0:f}'.format(score))
Accuracy: 0.923121
And:
print (metrics.confusion_matrix(y_test, X_pred_class))
[[23996 103]
[ 1992 15]]
But when I try to use the new suggested method:
classifier_TensorFlow = learn.DNNClassifier(hidden_units=[10, 20, 10],n_classes=2)
it hangs with no completion? it would not take the "steps" parameter? I get no error messages or output so not much to go on... Any ideas or hints? The documentation is a bit "light?"
I don't think it is a bug, from the source code of DNNClassifier, I can tell that its usage differs from TensorFlowDNNClassifier. The constructor of DNNClassifier doesn't have the steps param:
def __init__(self,
hidden_units,
feature_columns=None,
model_dir=None,
n_classes=2,
weight_column_name=None,
optimizer=None,
activation_fn=nn.relu,
dropout=None,
config=None)
As you could see here. Instead the fit() method that DNNClassifier inherited from BaseEstimator now has the steps param, notice that the same happens with batch_size:
def fit(self, x=None, y=None, input_fn=None, steps=None, batch_size=None,
monitors=None):
For the "it hangs with no completion?", in the doc of the fit() method of BaseEstimator it is explained that if steps is None (as the value by default), the model will train forever.
I still don't get why I would like to train a model forever. My guesses are that creators think this way is better for the classifier if we want to have early stopping on validation data, but as I said is only my guess.
As you could see DNNClassifier doesn't give any feedback as the deprecated
TensorFlowDNNClassifier, it is supposed that the feedback can be setup with the 'config' param that is present in the constructor of DNNClassifier. So you should pass a RunConfig object as config, and in the params of this object you should set the verbose param, unfortunately I tried to set it so I can see the progress of the loss, but didn't get so lucky.
I recommend you to take a look at the latest post of Yuan Tang in his blog here, one of the creators of the skflow, aka tf learn.
I just had a similar issue #Ismael answer is correct. I just wanted to add to the information that now classifier.fit() has the steps parameter that this parameter behaves differently. It doesn't abort earlier. There is another parameter called max_steps. That behaves as the original steps parameter of TensorFlowDNNClassifier.
In short just use the max_steps parameter on fit() like this:
classifier = skflow.DNNClassifier(...)
classifier.fit(X_train, y_train, max_steps=3000)