Save and reuse ML model - python

First of all, let me introduce myself. I am a young researcher and I am interested in machine learning. I created a model, trained, tested and validated. Now I would like to know if there is a way to save my trained model.
I am also interested in knowing if the model is saved trained.
Finally, is there a way to use the saved (and trained) model with new data without having to train the model again?
I work with python!

Welcome to the community.
Yes, you may save the trained model and reuse it later. There are several ways to do so and I will introduce you to a couple of them here. However, please note which library you used to build your model and use a method for that library.
Pickel: Pickle is the standard way of serializing objects in Python.
import pickle
pickle.dump(model, open(filename, 'wb'))
loaded_model = pickle.load(open(filename, 'rb'))
Joblib: Joblib is part of the SciPy ecosystem and provides utilities for pipelining Python jobs.
import joblib
joblib.dump(model, filename)
loaded_model = joblib.load(filename)
Finally, as suggested by others, if you used libraries such as Tensorflow to build and train your models, please note that they have extensive ways to work with the built model and save/load it. Please check the following information:
Tensorflow Save and Load model

There may be a better way to do this but this is how I have done it before, with python.
So you have a ML model that you have trained. That model is basically just a set of parameters. Depending on what module you are using, you can save those parameters in a file, and import them to regenerate your model later.
Perhaps a simpler way is to save the model object entirely in a file using pickling.
https://docs.python.org/3/library/pickle.html
You can dump the object into a file, and load it back when you want to run it again.

Related

Is there IO functionality to store trained models in kedro?

In the IO section of the kedro API docs I could not find functionality w.r.t. storing trained models (e.g. .pkl, .joblib, ONNX, PMML)? Have I missed something?
There is the pickle dataset in kedro.io, that you can use to save trained models and/or anything you want to pickle and is serialisable (models being a common object). It accepts a backend that defaults to pickle but can be set to joblib if you want to use joblib instead.
I'm just going to quickly note that Kedro is moving to kedro.extras.datasets for its datasets and moving away from having non-core datasets in kedro.io. You might want to look at kedro.extras.datasets and in Kedro 0.16 onwards pickle.PickleDataSet with joblib support.
The Kedro spaceflights tutorial in the documentation actually saves the trained linear regression model using the pickle dataset if you want to see an example of it. The relevant section is here.
There is PickleDataSet in https://kedro.readthedocs.io/en/latest/kedro.extras.datasets.pickle.PickleDataSet.html and joblib support in PickleDataSet is in the next release (see https://github.com/quantumblacklabs/kedro/blob/develop/RELEASE.md)

Load tensorflow SavedModel in Rstudio trained in Google Cloud ML

I trained a model in Google Cloud ML and saved it as a saved model format. I've attached the directory for the saved model below.
https://drive.google.com/drive/folders/18ivhz3dqdkvSQY-dZ32TRWGGW5JIjJJ1?usp=sharing
I am trying to load the model into R using the following code but it is returning <tensorflow.python.training.tracking.tracking.AutoTrackable> with an object size of 552 bytes, definetly not correct. If anyone can properly load the model, I would love to know how you did it. It should also be able to be loaded into python I assume, that could work too. The model was trained on GPU, not sure which tensorflow version. Thank you very much!
library(keras)
list.files("/path/to/inceptdual400OG")
og400<-load_model_tf("/path/to/inceptdual400OG")
Since the shared model is not available anymore (it says that is in the trash folder)and it is not specified in the question I can't tell which framework you used to save the model on first place. I will suggest trying the Keras load function or the Tensorflow load function depending on which type of saved file model you have.
Bear in mind modify this argument as "compile = FALSE" if you have the model already compiled.
Remember to import the latest libraries if you trained your model with tf>=2.0 because of dependencies incompatibilities {Tensorflow, Keras} and rsconnect::appDependencies() output would be worth checking.

How to manually register a sci-kit model with TRAINS python auto-magical experiment manager?

I'm working mostly with scikit-learn, as far as I understand, the TRAINS auto-magic doesn't catch scikit-learn model store/load automatically.
How do I manually register the model after I have 'pickled' it.
For Example:
import pickle
with open("model.pkl", "wb") as file:
pickle.dump(my_model, file)
Assuming you are referring to TRAINS experiment manager: https://github.com/allegroai/trains (which I'm one of the maintainers)
from trains import Task, OutputModel
OutputModel(Task.current_task()).update_weights(weights_filename="model.pkl")
Or, if you have information you want to store together with the pickled model file, you can do:
from trains import Task, OutputModel
model_parameters = {'threshold': 0.123}
OutputModel(Task.current_task(), config_dict=model_parameters).update_weights(weights_filename="model.pkl")
Now, you should see in the UI an output model registered with the experiment. The model contains a link to the pickel file, together with the configuration dictionary.

How to load the pre-trained doc2vec model and use it's vectors

Does anyone know which function should I use if I want to use the pre-trained doc2vec models in this website https://github.com/jhlau/doc2vec?
I know we can use the Keyvectors.load_word2vec_format()to laod the word vectors from pre-trained word2vec models, but do we have a similar function to load pre-trained doc2vec models as well in gensim?
Thanks a lot.
When a model like Doc2Vec is saved with gensim's native save(), it can be reloaded with the native load() method:
model = Doc2Vec.load(filename)
Note that large internal arrays may have been saved alongside the main filename, in other filenames with extra extensions – and all those files must be kept together to re-load a fully-functional model. (You still need to specify only the main save file, and the auxiliary files will be discovered at expected names alongside it in the same directory.)
You may have other issues trying to use those pre-trained models. In particular:
as noted in the linked page, the author used a custom variant of gensim that forked off about 2 years ago; the files might not load in standard gensim, or later gensims
it's not completely clear what parameters were used to train those models (though I suppose if you succeed in loading them you could see them as properties in the model), and how much meta-optimization was used for which purposes, and whether those purposes will match your own project
if the parameters are as shown in one of the repo files, [train_model.py][1], some are inconsistent with best practices (a min_count=1 is usually bad for Doc2Vec) or apparent model-size (a mere 1.4GB model couldn't hold 300-dimensional vectors for all of the millions of documents or word-tokens in 2015 Wikipedia)
I would highly recommend training your own model, on a corpus you understand, with recent code, and using metaparameters optimized for your own purposes.
Try this:
import gensim.models as g
model="model_folder/doc2vec.bin" #point to downloaded pre-trained doc2vec model
#load model
m = g.Doc2Vec.load(model)

How to load a pre-trained Word2vec MODEL File and reuse it?

I want to use a pre-trained word2vec model, but I don't know how to load it in python.
This file is a MODEL file (703 MB).
It can be downloaded here:
http://devmount.github.io/GermanWordEmbeddings/
just for loading
import gensim
# Load pre-trained Word2Vec model.
model = gensim.models.Word2Vec.load("modelName.model")
now you can train the model as usual. also, if you want to be able to save it and retrain it multiple times, here's what you should do
model.train(//insert proper parameters here//)
"""
If you don't plan to train the model any further, calling
init_sims will make the model much more memory-efficient
If `replace` is set, forget the original vectors and only keep the normalized
ones = saves lots of memory!
replace=True if you want to reuse the model
"""
model.init_sims(replace=True)
# save the model for later use
# for loading, call Word2Vec.load()
model.save("modelName.model")
Use KeyedVectors to load the pre-trained model.
from gensim.models import KeyedVectors
from gensim import models
word2vec_path = 'path/GoogleNews-vectors-negative300.bin.gz'
w2v_model = models.KeyedVectors.load_word2vec_format(word2vec_path, binary=True)
I used the same model in my code and since I couldn't load it, I asked the author about it. His answer was that the model has to be loaded in binary format:
gensim.models.KeyedVectors.load_word2vec_format(w2v_path, binary=True)
This worked for me, and I think it should work for you, too.
I met the same issue and I downloaded GoogleNews-vectors-negative300 from Kaggle. I saved and extracted the file in my descktop. Then I implemented this code in python and it worked well:
model = KeyedVectors.load_word2vec_format=(r'C:/Users/juana/descktop/archive/GoogleNews-vectors-negative300.bin')

Categories