I am trying to learn an ordinary least squares model using Python's statsmodels library, as described here.
sm.OLS.fit() returns the learned model. Is there a way to save it to the file and reload it? My training data is huge and it takes around half a minute to learn the model. So I was wondering if any save/load capability exists in OLS model.
I tried the repr() method on the model object but it does not return any useful information.
The models and results instances all have a save and load method, so you don't need to use the pickle module directly.
Edit to add an example:
import statsmodels.api as sm
data = sm.datasets.longley.load_pandas()
data.exog['constant'] = 1
results = sm.OLS(data.endog, data.exog).fit()
results.save("longley_results.pickle")
# we should probably add a generic load to the main namespace
from statsmodels.regression.linear_model import OLSResults
new_results = OLSResults.load("longley_results.pickle")
# or more generally
from statsmodels.iolib.smpickle import load_pickle
new_results = load_pickle("longley_results.pickle")
Edit 2 We've now added a load method to main statsmodels API in master, so you can just do
new_results = sm.load('longley_results.pickle')
I've installed the statsmodels library and found that you can save the values using the pickle module in python.
Models and results are pickleable via save/load, optionally saving the model data.
[source]
As an example:
Given that you have the results saved in the variable results:
To save the file:
import pickle
with open('learned_model.pkl','w') as f:
pickle.dump(results,f)
To read the file:
import pickle
with open('learned_model.pkl','r') as f:
model_results = pickle.load(f)
Related
I get a model from Sagemaker of type:
<class 'xgboost.core.Booster'>
I can score this locally which is great but some google searches have shown that it may not be possible to do "standard" things like this taken from here:
plt.barh(boston.feature_names, xgb.feature_importances_)
Is it possible to tranform xgboost.core.Booster to XGBRegressor? Maybe one could use the save_raw method looking at this? Thanks!
So far I tried:
xgb_reg = xgb.XGBRegressor()
xgb_reg._Boster = model
xgb_reg.feature_importances_
but this reults in:
NotFittedError: need to call fit or load_model beforehand
Something along those lines appears to work fine:
local_model_path = "model.tar.gz"
with tarfile.open(local_model_path) as tar:
tar.extractall()
model = xgb.XGBRegressor()
model.load_model(model_file_name)
model can then be used as usual - model.tar.gz is an artifcat coming from sagemaker.
I'm going through the documentation of the sktime package. One thing I just cannot find is the feature importance (that we'd get with sklearn models) or model summary (like the one we can obtain from statsmodels). Is it something that is just not implemented yet?
It seems that this functionality is implemented for models like AutoETS or AutoARIMA.
from matplotlib import pyplot as plt
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.base import ForecastingHorizon
y = load_airline()
y_train,y_test = temporal_train_test_split(y)
fh = ForecastingHorizon(y_test.index, is_relative=False)
from sktime.forecasting.ets import AutoETS
model = AutoETS(trend='add',seasonal='mul',sp=12)
model.fit(y_train,fh=y_test.index)
model.summary()
I wonder if these summaries are accessible from instances like ForecastingPipeline.
Ok, I was able to solve it myself. I'm really glad the functionality is there!
The source code for ForecastingPipeline indicates that an instance of this class has an attribute steps_ - it holds the fitted instance of the model in a pipeline.
from sktime.forecasting.compose import ForecastingPipeline
model = ForecastingPipeline(steps=[
("forecaster", AutoETS(sp=1))])
model.fit(y_train)
model.steps_[-1][1].summary() # model.steps[-1][1].summary() would throw an error
The output of model.steps_ is [('forecaster', AutoETS())] (as mentioned before AutoETS() is in this case already fitted).
I'm trying to build a SpaCy pipeline using multiple components. My current pipeline only has two components at the moment, one entity ruler, and another custom component.
The way I build it is like this:
class EntityLookupComponent:
def __call__(self, doc: Doc) -> Doc:
print("Just testing")
return doc
#Language.factory("entity_lookup_component")
def my_component(nlp, name):
return EntityLookupComponent(nlp)
def main(patterns_path: Path, output_path: Path):
"""Build the spaCy model and output it to disk"""
# Ensure output_path directory exists
if not Path(os.path.dirname(output_path)).is_dir():
os.makedirs(os.path.dirname(output_path))
nlp = English()
nlp.add_pipe("entity_ruler").from_disk(patterns_path)
nlp.add_pipe("entity_lookup_component", name="entity_lookup", last=True)
print(nlp.pipe_names)
nlp.to_disk('./test')
with open(output_path, "wb") as output_file:
pickle.dump(nlp, output_file)
Outputting the pipe_names gives me: ['entity_ruler', 'entity_lookup'].
However, when I then try to load the model and test, by doing:
nlp = spacy.load("en_core_web_lg", disable=["ner"])
nlp.add_pipe("entity_ruler", source=spacy.load("./test"))
It's instantly throwing me the following error:
ValueError: [E002] Can't find factory for 'entity_lookup_component' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `#Language.component` (for function components) or `#Language.factory` (for class components).
This only happens after I added the entity_lookup_component. This component was supposed to use a lookup table, to add some metadata to existing entities.
At the place where you load the model, you need to have access to the code that defined the custom component. So if your file that defines the custom component is custom.py, you can put import custom at the top of the file where you're loading your pipeline and it should work.
Also see the docs on saving and loading custom components.
I have a Python function call like so:
import torchvision
model = torchvision.models.resnet18(pretrained=configs.use_trained_models)
Which works fine.
If I attempt to make it dynamic:
import torchvision
model_name = 'resnet18'
model = torchvision.models[model_name](pretrained=configs.use_trained_models)
then it fails with:
TypeError: 'module' object is not subscriptable
Which makes sense since model is a module which exports a bunch of things, including the resnet functions:
# __init__.py for the "models" module
...
from .resnet import *
...
How can I call this function dynamically without knowing ahead of time its name (other than that I get a string with the function name)?
You can use the getattr function:
import torchvision
model_name = 'resnet18'
model = getattr(torchvision.models, model_name)(pretrained=configs.use_trained_models)
This essentially is along the lines the same as the dot notation just in function form accepting a string to retrieve the attribute/method.
The new APIs since Aug 2022 are as follows:
# returns list of available model names
torchvision.models.list_models()
# returns specified model with pretrained common weights
torchvision.models.get_model("alexnet", weights="DEFAULT")
# returns specified model with pretrained=False
torchvision.models.get_model("alexnet", weights=None)
# returns specified model with specified pretrained weights
torchvision.models.get_model("alexnet", weights=ResNet50_Weights.IMAGENET1K_V2)
Reference:
https://pytorch.org/blog/easily-list-and-initialize-models-with-new-apis-in-torchvision/
I am trying to save my configuration for a Keras model. I would like to be able to read the configuration from the file to be able to reproduce the training.
Before implementing a custom metric in a function I could just do it the way shown below without the mean_pred. Now I am running into the problem TypeError: Object of type 'function' is not JSON serializable.
Here I read that it is possible to get the function name as string by custom_metric_name = mean_pred.__name__. I would like to not only be able to save the name, but to be able to save a reference to the function if possible.
Perhaps I should as mentioned here also think about not just storing my configuration in the .py file but using ConfigObj. Unless this would solve my current problem I would implement this later.
Minimum working example of problem:
import keras.backend as K
import json
def mean_pred(y_true, y_pred):
return K.mean(y_pred)
config = {'epochs':500,
'loss':{'class':'categorical_crossentropy'},
'optimizer':'Adam',
'metrics':{'class':['accuracy', mean_pred]}
}
# Do the training etc...
config_filename = 'config.txt'
with open(config_filename, 'w') as f:
f.write(json.dumps(config))
Greatly appreciate help with this problem as well as other approaches to saving my configuration in the best way possible.
To solve my problem I saved the name of the function as a string in the config file and then extracted the function from a dictionary to use it as metrics in the model. One could additionally use: 'class':['accuracy', mean_pred.__name__] to save the name of the function as a string in the config.
This does also work for multiple custom functions and for more keys to metrics (eg. define metrics for 'reg' like 'class' when doing regression and classification).
import keras.backend as K
import json
from collections import defaultdict
def mean_pred(y_true, y_pred):
return K.mean(y_pred)
config = {'epochs':500,
'loss':{'class':'categorical_crossentropy'},
'optimizer':'Adam',
'metrics':{'class':['accuracy', 'mean_pred']}
}
custom_metrics= {'mean_pred':mean_pred}
metrics = defaultdict(list)
for metric_type, metric_functions in config['metrics'].items():
for function in metric_functions:
if function in custom_metrics.keys():
metrics[metric_type].append(custom_metrics[function])
else:
metrics[metric_type].append(function)
# Do the training, use metrics
config_filename = 'config.txt'
with open(config_filename, 'w') as f:
f.write(json.dumps(config))