How to use pickle to save sklearn model - python

I want to dump and load my Sklearn trained model using Pickle. How to do that?

Save:
import pickle
with open("model.pkl", "wb") as f:
pickle.dump(model, f)
Load:
with open("model.pkl", "rb") as f:
model = pickle.load(f)
In the specific case of scikit-learn, it may be better to use joblib’s
replacement of pickle (dump & load), which is more efficient on
objects that carry large numpy arrays internally as is often the case
for fitted scikit-learn estimators:
Save:
import joblib
joblib.dump(model, "model.joblib")
Load:
model = joblib.load("model.joblib")

Using pickle is same across all machine learning models irrespective of type i.e. clustering, regression etc.
To save your model in dump is used where 'wb' means write binary.
pickle.dump(model, open(filename, 'wb')) #Saving the model
To load the saved model wherever need load is used where 'rb' means read binary.
model = pickle.load(open(filename, 'rb')) #To load saved model from local directory
Here model is kmeans and filename is any local file, so use accordingly.

One can also use joblib
from joblib import dump, load
dump(model, model_save_path)

Related

Problem loading ML model saved using joblib/pickle

I saved a jupyter notebook .pynb file to .pickle format using joblib.
My ML model is built using pandas, numpy and the statsmodels python library.
I saved the fitted model to a variable called fitted_model and here is how I used joblib:
from sklearn.externals import joblib
# Save RL_Model to file in the current working directory
joblib_file = "joblib_RL_Model.pkl"
joblib.dump(fitted_model, joblib_file)
I get this as output:
['joblib_RL_Model.pkl']
But when I try to load from file, in a new jupyter notebook, using:
# Load from file
joblib_file = "joblib_RL_Model.pkl"
joblib_LR_model = joblib.load(joblib_file)
joblib_LR_model
I only get this back:
<statsmodels.tsa.holtwinters.HoltWintersResultsWrapper at 0xa1a8a0ba8>
and no model. I was expecting to see the model load there and see the graph outputs as per original notebook.
Use with open, it is better because, it automatically open and close file. Also with proper mode.
with open('joblib_RL_Model.pkl', 'wb') as f:
pickle.dump(fitted_model, f)
with open('joblib_RL_Model.pkl', 'rb') as f:
joblib_LR_model = pickle.load(f)
And my implementation in Colab is here. Check it.
you can use more quantifiable package which is pickle default package for python to save models
you can use the the following function for the saving of ML Models
import pickle
def save_model(model):
pickle.dump(model, open("model.pkl", "wb"))
template for function would be
import pickle
def save_model(model):
pickle.dump(model, open(PATH_AND_FILE_NAME_TO_BE_SAVED, "wb"))
to load the model when saved it from pickle library you can follow the following function
def load_model(path):
return pickle.load(open(path, 'rb'))
Where path is the path and name to file where model is saved to.
Note:
This would only work for basic ML Models and PyTorch Models, it would not work for Tensorflow based models where you need to use
model.save(PATH_TO_MODEL_AND_NAME)
where model is of type tensorflow.keras.models

How to save pyspark model in to pickle file

How to save pyspark model in to pickle file
final_data=output_fixed.select('features','CreditabilityIndex')
test=final_data.randomSplit([0.7,0.3])
dtc=DecisionTreeClassifier(labelCol='CreditabilityIndex',featuresCol='features')
dtc_model=dtc.fit(train)
You can save the model using save() method where spark is a SparkContext object: docs
dtc_model.save(spark, "/path/to/file")
You could save your models in this fashion too -
lr = LogisticRegression(labelCol="label", featuresCol="features")
lr_model = lr.fit(train2)
lr_model.save("abc.model")
###This is how you can load it back -
sameModel = LogisticRegressionModel.load("abc.model")
PS - It will be saved in the location of your code file. However, sometimes you may not be able to see the actual file. But it will be saved for you to load again. So nothing to worry about.

Is there an equivalent of R's save.image() function in Python?

When working in R, one has the ability to save the entire "workspace", variables, output, etc... to an image file using "save.image()". Is there something equivalent in Python?
Thank you.
I am not familiar with r, but pickle offers functionality to save and load (variables, objects, types, etc...) in a pickle file. In this way you can save any details needed for a later session. I'm unsure if pickle offers a specific way to save all data associated with the current session or if you would be required to manually locate and save. Hope this helps!
import pickle
my_obj = Object()
my_var = (1,"some_data")
filename = "my_dir\my_file.pickle"
with open(filename, ‘wb’) as f: #save data
pickle.dump((my_obj, my_var), f)
with open(filename, ‘rb’) as f: #load data next time
my_saved_obj, my_saved_var = pickle.load(f)

Unable to load CIFAR-10 dataset: Invalid load key '\x1f'

I'm currently playing around with some neural networks in TensorFlow - I decided to try working with the CIFAR-10 dataset. I downloaded the "CIFAR-10 python" dataset from the website: https://www.cs.toronto.edu/~kriz/cifar.html.
In Python, I also tried directly copying the code that is provided to load the data:
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
However, when I run this, I end up with the following error: _pickle.UnpicklingError: invalid load key, '\x1f'. I've also tried opening the file using the gzip module (with gzip.open(file, 'rb') as fo:), but this didn't work either.
Is the dataset simply bad, or this an issue with code? If the dataset's bad, where can I obtain the proper dataset for CIFAR-10?
Extract your *.gz file and use this code
from six.moves import cPickle
f = open("path/data_batch_1", 'rb')
datadict = cPickle.load(f,encoding='latin1')
f.close()
X = datadict["data"]
Y = datadict['labels']
Just extract your tar.gz file, you will get a folder of data_batch_1, data_batch_2, ...
After that just use, the code provided to load data into your project :
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
dict = unpickle('data_batch_1')
It seems like that you need to unzip *gz file and then unzip *tar file to get a folder of data_batches. Afterwards you could apply pickle.load() on these batches.
I was facing the same problem using jupyter(vscode) and python3.8/3.7. I have tried to edit the source cifar.py cifar10.py but without success.
the solution for me was run these two lines of code in separate normal .py file:
from tensorflow.keras.datasets import cifar10
cifar10.load_data()
after that it worked fine on Jupyter.
Try this:
import pickle
import _pickle as cPickle
import gzip
with gzip.open(path_of_your_cpickle_file, 'rb') as f:
var = cPickle.load(f)
Try in this way
import pickle
import gzip
with gzip.open(path, "rb") as f:
loaded = pickle.load(f, encoding='bytes')
it works for me

StratifiedKFold: shuffle and random_state

I've done the following for cross-validation purpose:
from sklearn.cross_validation import StratifiedKFold
n_folds = 5
SKFolds = list(StratifiedKFold(ytrain, n_folds, shuffle=True))
I'm just thinking about one detail: I'd like to have the same final results if somebody (my teacher for instance!) run again the code. However, I forgot to specify the random_state parameter! And I unfortunately can't start again from the begining because my models need a very long time to be fitted and it's quite finished.
My question is the following: is it possible to find what was the random_state which leads to my SKFolds? (my notebook is still opened so maybe the information can be found somewhere?). Or can I do something like save my SKFolds in a csv file and then load it when I will restart my notebook to be sure that I will have the same split on my folds?
Thanks for your help!
You can save SKFolds object with pickle and then you will just have to load it and use it as is.
import cPickle as pickle
# To save the object
pickle.dump( SKFolds , open( "skfolds.p", "wb" ) )
# To load the object
SKFolds = pickle.load( open( "skfolds.p", "rb" ) )

Categories