I was given a pickle file that has an xgboost model in it. I also have a model training code. I need to check if that saved model was created by this training code. What should I do?
I tried to get some predictions for a test dataset from both the saved model and the model created by the training code. But the predictions are not a 100% match. Is looking at the model parameters enough? I'm a bit stuck, couldn't find a solution. Thanks in advance.
I am teaching myself to code Convolutional Neural Networks. In particular I am looking at the "Dogs vs. Cats" challenge (https://medium.com/#mrgarg.rajat/kaggle-dogs-vs-cats-challenge-complete-step-by-step-guide-part-2-e9ee4967b9). I am using PyCharm.
In PyCharm, is there a way of using the trained model to make a prediction on the test data without having to run the entire file each time (and thus retrain the model each time)? Additionally, is there a way to skip the part of the script that prepares the data for input into the CNN? In a similar manner, does PyCharm store variables- can I print individual variables after the script has been run.
Would it be better if I used a different IDLE?
You can use sklearn joblib to save the trained model as a pickle and use it later for predictions.
from sklearn.externals import joblib
# Save the model as a pickle in a file
joblib.dump(knn, 'filename.pkl')
# Load the model from the file
knn_from_joblib = joblib.load('filename.pkl')
# Use the loaded model to make predictions
knn_from_joblib.predict(X_test)
I am trying to transform two datasets: x_train and x_test using tsne. I assume the way to do this is to fit tsne to x_train, and then transform x_test and x_train. But, I am not able to transform any of the datasets.
tsne = TSNE(random_state = 420, n_components=2, verbose=1, perplexity=5, n_iter=350).fit(x_train)
I assume that tsne has been fitted to x_train.
But, when I do this:
x_train_tse = tsne.transform(x_subset)
I get:
AttributeError: 'TSNE' object has no attribute 'transform'
Any help will be appreciated. (I know I could do fit_transform, but wouldn't I get the same error on x_test?)
Judging by the documentation of sklearn, TSNE simply does not have any transform method.
Also, TSNE is an unsupervised method for dimesionality reduction/visualization, so it does not really work with a TRAIN and TEST. You simply take all of your data and use fit_transform to have the transformation and plot it.
EDIT - It is actually not possible to learn a transformation and reuse it on different data (i.e. Train and Test), as T-sne does not learn a mapping function on a lower dimensional space, but rather runs an iterative procedure on a subspace to find an equilibrium that minimizes a loss/distance ON SOME DATA.
Therefore if you want to preprocess and reduce dimensionality of both a Train and Test datasets, the way to go is PCA/SVD or Autoencoders. T-Sne will only help you for unsupervised tasks :)
As the accepted answer says, there is no separate transform method and it probably wouldn't work in a a train/test setting.
However, you can still use TSNE without information leakage.
Training Time
Calculate the TSNE per record on the training set and use it as a feature in classification algorithm.
Testing Time
Append your training and testing data and fit_transform the TSNE. Now continue on processing your test set, using the TSNE as a feature on those records.
Does this cause information leakage? No.
Inference Time
New records arrive e.g. as images or table rows.
Add the new row(s) to the training table, calculate TSNE (i.e. where the new sample sits in the space relative to your trained samples). Perform any other processing and run your prediction against the row.
It works fine. Sometimes, we worry too much about train/test split because of Kaggle etc. But the main thing is can your method be replicated at inference time and with the same expected accuracy for live use. In this case, yes it can!
Only drawback is you need your training database available at inference time and depending on size, the preprocessing might be costly.
Check the openTSNE1 out. It has all you need.
You can also save the trained model using pickle.dump for example.
[1]: https://opentsne.readthedocs.io/en/latest/index.html
I have my model and a fixed dataset on which I do the train_test_split twice: once for getting train and test sets and the second time for getting a validation set too.
I have to reuse the same network, on the same data, twice in two different modules but every time I do that I get different results.
Is there a way to fix it?
I have the weights fixed and random_state = 42 so to eliminate every form of randomness but still it does not seem enough.
The optimizer I used is Adam and the loss function is the mean absolute error.
Do you train and evaluate (predict) the model in the same script and process?
Please check the official guide how to obtain reproducible results using keras during development.
In addition you can try to save and load your model (in another file) to check the predictions.
I'm new to python and working on machine learning. I have trained LinearSVC from sklearn.svm, and training takes quite a long time, mostly because of stemming (7-8 minutes), I want to know if it is possible to save model results as some extension that can be fed as it is back to python when running the application, just to save the time of the training happening in every run of the application..
My Answer:-
Pickle or Joblib is used to save a trained model
For your reference, check it out the link given below.
Reference Link