Loaded model receives different prediction compared to saved model - python

I am trying to save a model and load it into a different session, but I am having prediction inconsistencies, and I would appreciate any help that can be offered. So here is what I did...
First, after running the model, I used this code to save the model:
from sklearn.externals import joblib
joblib.dump(clf, "models.pkl")
and then to load the file in a different colaboratory notebook, I used the function
from sklearn.externals import joblib
loaded_model = joblib.load('models.pkl')
then the program I used to process a single image for testing
img_toArray = cv2.imread("/content/ESD/ESD/folder1/img1.png")
new_array = cv2.resize(img_toArray, (220, 220))
new_array = np.array(new_array).reshape(1,145200)
but this results in an output of array([4]) with every image I test, and I am not sure why.
I have also tried to reload the entire dataset again and separate the labels from the features (the image), and use train_test_split to dedicate 90% of the dataset for testing, and when I run the features (images) to test with, through the block of code:
loaded_model.predict(np.array(xTest[whatEverNumber]).reshape(1,145200))
I get the right predictions. So I am confused as to what I a doing wrong, because in both examples,I am processing the images in basically the same method, and then separating the images and running them through the same prediction method. So I would appreciate any help in figuring out what I did wrong.
Extra information that may prove beneficial: I am using colaboratory and my model is an sklearn SVM that runs through a cross_validation_predict, cross_validation_predict, and finally an SVM fit function.
Thank you in advance!

Is loaded_model always trained with the same data? you might be encountering this problem because your fitted model is trained with different chunks (folds) of your dataset and you are fitting/saving it with the last iteration only and hence, each time you test it, the model learns from different data (given by each fold) and returns different predictions. This if model fitting is within your cross-validation loop. May I ask, what type of train-test split did you use? shuffled?

Related

How to be certain that an xgboost training code creates the separetely given xgboost model in a pickle file?

I was given a pickle file that has an xgboost model in it. I also have a model training code. I need to check if that saved model was created by this training code. What should I do?
I tried to get some predictions for a test dataset from both the saved model and the model created by the training code. But the predictions are not a 100% match. Is looking at the model parameters enough? I'm a bit stuck, couldn't find a solution. Thanks in advance.

Pycharm: Is there a way to run a snippet of code without running the entire file?

I am teaching myself to code Convolutional Neural Networks. In particular I am looking at the "Dogs vs. Cats" challenge (https://medium.com/#mrgarg.rajat/kaggle-dogs-vs-cats-challenge-complete-step-by-step-guide-part-2-e9ee4967b9). I am using PyCharm.
In PyCharm, is there a way of using the trained model to make a prediction on the test data without having to run the entire file each time (and thus retrain the model each time)? Additionally, is there a way to skip the part of the script that prepares the data for input into the CNN? In a similar manner, does PyCharm store variables- can I print individual variables after the script has been run.
Would it be better if I used a different IDLE?
You can use sklearn joblib to save the trained model as a pickle and use it later for predictions.
from sklearn.externals import joblib
# Save the model as a pickle in a file
joblib.dump(knn, 'filename.pkl')
# Load the model from the file
knn_from_joblib = joblib.load('filename.pkl')
# Use the loaded model to make predictions
knn_from_joblib.predict(X_test)

python tsne.transform does not exist?

I am trying to transform two datasets: x_train and x_test using tsne. I assume the way to do this is to fit tsne to x_train, and then transform x_test and x_train. But, I am not able to transform any of the datasets.
tsne = TSNE(random_state = 420, n_components=2, verbose=1, perplexity=5, n_iter=350).fit(x_train)
I assume that tsne has been fitted to x_train.
But, when I do this:
x_train_tse = tsne.transform(x_subset)
I get:
AttributeError: 'TSNE' object has no attribute 'transform'
Any help will be appreciated. (I know I could do fit_transform, but wouldn't I get the same error on x_test?)
Judging by the documentation of sklearn, TSNE simply does not have any transform method.
Also, TSNE is an unsupervised method for dimesionality reduction/visualization, so it does not really work with a TRAIN and TEST. You simply take all of your data and use fit_transform to have the transformation and plot it.
EDIT - It is actually not possible to learn a transformation and reuse it on different data (i.e. Train and Test), as T-sne does not learn a mapping function on a lower dimensional space, but rather runs an iterative procedure on a subspace to find an equilibrium that minimizes a loss/distance ON SOME DATA.
Therefore if you want to preprocess and reduce dimensionality of both a Train and Test datasets, the way to go is PCA/SVD or Autoencoders. T-Sne will only help you for unsupervised tasks :)
As the accepted answer says, there is no separate transform method and it probably wouldn't work in a a train/test setting.
However, you can still use TSNE without information leakage.
Training Time
Calculate the TSNE per record on the training set and use it as a feature in classification algorithm.
Testing Time
Append your training and testing data and fit_transform the TSNE. Now continue on processing your test set, using the TSNE as a feature on those records.
Does this cause information leakage? No.
Inference Time
New records arrive e.g. as images or table rows.
Add the new row(s) to the training table, calculate TSNE (i.e. where the new sample sits in the space relative to your trained samples). Perform any other processing and run your prediction against the row.
It works fine. Sometimes, we worry too much about train/test split because of Kaggle etc. But the main thing is can your method be replicated at inference time and with the same expected accuracy for live use. In this case, yes it can!
Only drawback is you need your training database available at inference time and depending on size, the preprocessing might be costly.
Check the openTSNE1 out. It has all you need.
You can also save the trained model using pickle.dump for example.
[1]: https://opentsne.readthedocs.io/en/latest/index.html

Predict() on Keras gives alway different results even if the NN and the dataset is the same

I have my model and a fixed dataset on which I do the train_test_split twice: once for getting train and test sets and the second time for getting a validation set too.
I have to reuse the same network, on the same data, twice in two different modules but every time I do that I get different results.
Is there a way to fix it?
I have the weights fixed and random_state = 42 so to eliminate every form of randomness but still it does not seem enough.
The optimizer I used is Adam and the loss function is the mean absolute error.
Do you train and evaluate (predict) the model in the same script and process?
Please check the official guide how to obtain reproducible results using keras during development.
In addition you can try to save and load your model (in another file) to check the predictions.

How can I save my trained SVM model to retrieve it later for time saving in python?

I'm new to python and working on machine learning. I have trained LinearSVC from sklearn.svm, and training takes quite a long time, mostly because of stemming (7-8 minutes), I want to know if it is possible to save model results as some extension that can be fed as it is back to python when running the application, just to save the time of the training happening in every run of the application..
My Answer:-
Pickle or Joblib is used to save a trained model
For your reference, check it out the link given below.
Reference Link

Categories