Can't load ML models in Python - python

I've built a model in Python and saved it with joblib from sklearn.externals package:
from sklearn.externals import joblib
joblib.dump(rf_Prob_F, 'Model.pkl')
When I try to call the model with the following command, an error appears:
from sklearn.externals import joblib
rf_Prob_F = joblib.load(rf_Prob_F, 'Model.pkl')
NameError: name 'rf_Prob_F' is not defined
What am I missing?
Thank you for your help!

As written in the documentation
for joblib.load(), you need only the name of the file as an argument:
rf_Prob_F = joblib.load('Model.pkl')

Related

model, model_name, None if model_only else prep_pipe, verbose, **kwargs NameError: name 'prep_pipe' is not defined

I have a model in pickle format ,when i am trying to save that model to my local i am getting this error,model is trained on pycaret,
can someone tell me what is wrong
from asyncore import read
import logging
import pickle
import mlflow
from numpy import save
from setuptools import setup
import wrapper
import os
import sqlite3
import sqlalchemy
import sys
from mlflow.models.signature import infer_signature
from pycaret.classification import save_model
class MlflowModelService:
def saveModel(self,model,variant,readable_model_id,preprocess_file_path=None):
print('inside storeModel of model service.......')
readable_model_id = readable_model_id.replace("/","__$__")
model_name = "Original-Model"
with mlflow.start_run() as active_run:#mlflow work starts
active_run = mlflow.active_run()
#mlflow.keras.save_model(model,model_name) #Save a scikit-learn model to a path on the local file system
#print(model , model_name)
#exp_clf101 = setup(data = dataset, target = 'result', use_gpu=False, silent=True)
save_model(model , model_name)#ERROR ON THIS LINE
pyfunc_model_uri = self.logModel(readable_model_id,model_name,preprocess_file_path)
self.registerModel(pyfunc_model_uri,readable_model_id)

HuggingFace SciBert AutoModelForMaskedLM cannot be imported

I am trying to use the pretrained SciBERT model (https://huggingface.co/allenai/scibert_scivocab_uncased) from Huggingface to evaluate masked words in scientific/biomedical text for bias using CrowS-Pairs (https://github.com/nyu-mll/crows-pairs/). The CrowS-Pairs code works great with the built in models like BERT.
I modified the code of metric.py with the goal of allowing an option of using the SciBERT model -
import os
import csv
import json
import math
import torch
import argparse
import difflib
import logging
import numpy as np
import pandas as pd
from transformers import BertTokenizer, BertForMaskedLM
from transformers import AlbertTokenizer, AlbertForMaskedLM
from transformers import RobertaTokenizer, RobertaForMaskedLM
from transformers import AutoTokenizer, AutoModelForMaskedLM
and get the following error
2021-06-21 17:11:38.626413: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
File "metric.py", line 15, in <module>
from transformers import AutoTokenizer, AutoModelForMaskedLM
ImportError: cannot import name 'AutoModelForMaskedLM' from 'transformers' (/usr/local/lib/python3.7/dist-packages/transformers/__init__.py)
Later in the Python file, the AutoTokenizer and AutoModelForMaskedLM are defined as
tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased")
model = AutoModelForMaskedLM.from_pretrained("allenai/scibert_scivocab_uncased")
Libraries
huggingface-hub-0.0.8
sacremoses-0.0.45
tokenizers-0.10.3
transformers-4.7.0
The error occurs with and without GPU support.
Try this:
tokenizer = BertTokenizer.from_pretrained("allenai/scibert_scivocab_uncased", do_lower_case=True)
model = BertForMaskedLM.from_pretrained("allenai/scibert_scivocab_uncased")

ImportError: cannot import name 'AutoModelWithLMHead' from 'transformers'

This is literally all the code that I am trying to run:
from transformers import AutoModelWithLMHead, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
model = AutoModelWithLMHead.from_pretrained("microsoft/DialoGPT-small")
I am getting this error:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-14-aad2e7a08a74> in <module>
----> 1 from transformers import AutoModelWithLMHead, AutoTokenizer
2 import torch
3
4 tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
5 model = AutoModelWithLMHead.from_pretrained("microsoft/DialoGPT-small")
ImportError: cannot import name 'AutoModelWithLMHead' from 'transformers' (c:\python38\lib\site-packages\transformers\__init__.py)
What do I do about it?
I solved it! Apperantly AutoModelWithLMHead is removed on my version.
Now you need to use AutoModelForCausalLM for causal language models, AutoModelForMaskedLM for masked language models and AutoModelForSeq2SeqLM for encoder-decoder models.
So in my case code looks like this:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-small")

Download sklearn datasets behind a proxy

I installed sklearn in my enviorment and running it now on jupyter notebook on windows.
How can I avoid the error:
URLError: urlopen error [Errno 11004] getaddrinfo failed
I am running the following code:
import sklearn
import sklearn.ensemble
import sklearn.metrics
from sklearn.datasets import fetch_20newsgroups
categories = ['alt.atheism', 'soc.religion.christian']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
which gives the error with line 5:
----> 3 newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)
I am behind a proxy on my working computer, is there any option to avoid this error and to be able to use the sample datasets?
According to source code, scikit-learn will download the file from:
https://ndownloader.figshare.com/files/5975967
I am assuming that you cannot reach this location from behind the proxy.
Can you access the dataset by some other means? If yes, then you can download it manually and then run the following script on it:
and keep it at the location:
~/scikit_learn_data/
Here ~ refers to the user home folder. You can use the following code to know the default location of that folder according to your system.
from sklearn.datasets import get_data_home
print(get_data_home())
Update: Once done, use the following script to make it in a form in which scikit-learn keeps its caches
import codecs, pickle, tarfile, shutil
from sklearn.datasets import load_files
data_folder = '~/scikit_learn_data/'
target_folder = data_folder+'20news_home/'
tarfile.open(data_folder+'20newsbydate.tar.gz', "r:gz").extractall(path=target_folder)
cache = dict(train=load_files(target_folder+'20news-bydate-train', encoding='latin1'),
test=load_files(target_folder+'20news-bydate-test', encoding='latin1'))
compressed_content = codecs.encode(pickle.dumps(cache), 'zlib_codec')
with open(data_folder+'20news-bydate_py3.pkz', 'wb') as f:
f.write(compressed_content)
shutil.rmtree(target_folder)
Scikit-learn will always check if the dataset exists locally before attempting to download from internet. For that it will check the above location.
After that you can run the import normally.

NameError: name 'gensim' is not defined

I've imported all the packages I need
from gensim import corpora
from gensim import models
from gensim.models import LdaModel
from gensim.models import TfidfModel
from gensim.models import CoherenceModel
and then I need to run the LdaMallet model so I import them like this
from gensim.models.wrappers import LdaMallet
when run the code below, I've got some Namerror:
mallet_path = 'mallet-2.0.8/bin/mallet' # update this path
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path,corpus=corpus, num_topics=20, id2word=dictionary)
Error occurred:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-22-1c656d4f8c21> in <module>()
1 mallet_path = 'mallet-2.0.8/bin/mallet' # update this path
2
----> 3 ldamallet = gensim.models.wrappers.LdaMallet(mallet_path,corpus=corpus, num_topics=20, id2word=dictionary)
NameError: name 'gensim' is not defined
I thought I've imported all the things that I need, and the lda model ran well before I tried to use mallet. So what's the problem?
Because you have this import:
from gensim import models
you would need to refer to wrappers in your code as models.wrappers, etc., not gensim.models.wrappers.
But you're also doing this:
from gensim.models.wrappers import LdaMallet
so you can just refer to LdaMallet directly, as in:
ldamallet = LdaMallet(mallet_path,corpus=corpus, num_topics=20, id2word=dictionary)
Note that I left out the gensim.models.wrappers. here; you don't need it.
Just use LdaMallet(mallet_path,corpus=corpus, num_topics=20, id2word=dictionary) straightaway because you already have imported the required method from gensim.models.wrappers

Categories