How to use a model that is already trained in SKLearn? - python

Rather than having my model retrain every time I run my code, I just want to test how to classifier responds to certain inputs. Is there a way in SKLearn I can "export" my classifier, save it somewhere and then use it to predict over and over in future?

Yes. You can serialize your model and save it to a file.
This is documented here.
Keep in mind, that there may be problems if you reload a model which was trained with some other version of scikit-learn. Usually you will see a warning then.

Related

Connecting untrained python predictive model to backend

I already have a built predictive model written in Python, however currently it is executed by hand and functions on a single data file. I am hoping to generalize the model so that it can read in different datasets from my backend, each time effectively producing a different model since we are using different data for training as well. How would I be able to add the model onto my backend then?
Store the model as a pickle and read it from your backend when you need analog to your training data.
But you might want to checkout MLFlow for an integrated model handling solution. It is possible to run it on prem. With MLFlow you can easily implement a proper ML lifecycle. You can store your training stats and keep the history of your trained models.

Pre-trained XGBoost model does not reproduce results if loaded from file

I use XGBoostClassifier (XGBoost version 1.6.0) for a simple binary classification model in Google Colab. I save the model into file for further use. Within the same session, the model loaded from file reproduces results well on validation set. But, if the session is over and I connect to Colab from scratch, the same model from the same file shows way worse results on the same validation set, and needs to be trained again to be reproduced.
Tried three different ways to save and load model:
native
xgb_model.save_model('xgb_native_save.model')
joblib
joblib.dump(xgb_model, 'xgb_joblib.model')
pickle
with open('xgb_pickle.pkl','wb') as f:
pickle.dump(xgb_model,f)
Same result with all three methods: the results on validation set are not even close to those the model showed before saving to file.
Random_state is fixed.
Any thoughts on where might the problem be?
I find the same issue with tree based models for binary classification (XGBoost, LightGBM, ...)
When you restart the kernel and load the saved model, the order of the variables inside the booster changes.
This is how I solved it (xgboost):
lst_vars_in_model = model.get_booster().feature_names
model.predict_proba(df[lst_vars_in_model])

word-embendding: Convert supervised model into unsupervised model

I'm wanna to load an pre-trained embendding to initalize my own unsupervise FastText model and retrain with my dataset.
The trained embendding file I have loads fine with gensim.models.KeyedVectors.load_word2vec_format('model.txt'). But when I try:
FastText.load_fasttext_format('model.txt') I get: NotImplementedError: Supervised fastText models are not supported.
Is there any way to convert supervised KeyedVectors to unsupervised FastText? And if possible, is it a bad idea?
I know that has an great difference between supervised and unsupervised models. But I really wanna try use/convert this and retrain it. I'm not finding a trained unsupervised model to load for my case (it's a portuguese dataset), and the best model I find is that
If your model.txt file loads OK with KeyedVectors.load_word2vec_format('model.txt'), then that's just a simple set of word-vectors. (That is, not a 'supervised' model.)
However, Gensim's FastText doesn't support preloading a simple set of vectors for further training - for continued training, it needs a full FastText model, either from Facebook's binary format, or a prior Gensim FastText model .save().
(That trying to load a plain-vectors file generates that error suggests the load_fasttext_format() method is momentarily mis-interpreting it as some other kind of binary FastText model it doesn't support.)
Update after comment below:
Of course you can mutate a model however you like, including ways not officially supported by Gensim. Whether that's helpful is another matter.
You can create an FT model with a compatible/overlapping vocabulary, load old word-vectors separately, then copy each prior vector over to replace the corresponding (randomly-initialized) vectors in the new model. (Note that the property to affect further training is actually ftModel.wv.vectors_vocab trained-up full-word vectors, not the .vectors which is composited from full-words & ngrams,)
But the tradeoffs of such an ad-hoc strategy are many. The ngrams would still start random. Taking some prior model's just-word vectors isn't quite the same as a FastText model's full-words-to-be-later-mixed-with-ngrams.
You'd want to make sure your new model's sense of word-frequencies is meaningful, as those affect further training - but that data isn't usually available with a plain-text prior word-vector set. (You could plausibly synthesize a good-enough set of frequencies by assuming a Zipf distribution.)
Your further training might get a "running start" from such initialization - but that wouldn't necessarily mean the end-vectors remain comparable to the starting ones. (All positions may be arbitrarily changed by the volume of newer training, progressively diluting away most of the prior influence.)
So: you'd be in an improvised/experimental setup, somewhat far from usual FastText practices and thus where you'd want to re-verify lots of assumptions, and rigorously evaluate if those extra steps/approximations are actually improving things.

How can I save my trained SVM model to retrieve it later for time saving in python?

I'm new to python and working on machine learning. I have trained LinearSVC from sklearn.svm, and training takes quite a long time, mostly because of stemming (7-8 minutes), I want to know if it is possible to save model results as some extension that can be fed as it is back to python when running the application, just to save the time of the training happening in every run of the application..
My Answer:-
Pickle or Joblib is used to save a trained model
For your reference, check it out the link given below.
Reference Link

Tensorflow: How can I restore model for training? (Python)

I want to train a cnn for 20000 steps. In the 100th step I want to save all variables and after that I want to re-run my code restoring model and starting from the 100th step. I am trying to make it work with tensorflow documentation: https://www.tensorflow.org/versions/r0.10/how_tos/variables/index.html but I can't. Any help?
Im stuck in something similar but maybe this link can help you. Im new in tensorflow but i think you cant restore and fit without need to training again you model.
This functionality is still unstable , and the documentation is outdated so is confusing, what worked for me(this was a suggestion of people from google that works directly on tensorflow) was to use the model_dir parameter on the constructor of my models before training, in this you will tell where to store your model, after training you just instantiate again a model using the same model_dir and it will restore the model from the files and checkpoints generated.

Categories