Python: pickle.load __init__ error

Python: pickle.load __init__ error - python

I have trained a HMM model to add punctuation into Arabic text and I want to save it to not repeating the training phase every time I enter a text to the model for tagging it .. I use pickle for these task as I see in tutorials. I do exactly like them but it fail and give me these error!.
Traceback (most recent call last):
File "C:\Python27\file_pun_tag.py", line 205, in <module>
hmm_tagger("test_file.txt")
File "C:\Python27\file_pun_tag.py", line 179, in hmm_tagger
hmm = pickle.load(saved_model)
File "C:\Python27\lib\pickle.py", line 1378, in load
return Unpickler(file).load()
File "C:\Python27\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\Python27\lib\pickle.py", line 1133, in load_reduce
value = func(*args)
TypeError: __init__() takes at least 3 arguments (2 given)
I tried several solutions but none of them working with me ...
Here is the code where I save my model. It is working correctly for saving the model and creating the "hmm.pickle":
file = codecs.open("train_sents_hmm.txt", "r", "utf_8")
train_sents = file.readlines()
labelled_sequences, tag_set, symbols = load_pun(train_sents)
trainer = nltk.HiddenMarkovModelTrainer (tag_set, symbols)
hmm = trainer.train_supervised (labelled_sequences, estimator=lambda fd, bins: LidstoneProbDist(fd, 0.1, bins))
# save object
save_model = open("hmm.pickle", "wb")
pickle.dump(hmm, save_model, -1)
save_model.close()
And here is the code when i'm trying to load the model after saving it, and here where it gives me the error:
saved_model = open("hmm.pickle", "rb")
hmm = pickle.load(saved_model)
saved_model.close()

Related

Error converting Detectron2 torchscript model to CoreML using coremltools

I have a Detectron2 model that is trained to identify specific items on a backend server. I would like to make this model available on iOS devices and convert it to a CoreML model using coremltools v6.1. I used the export_model.py script provided by Facebook to create a torchscript model, but when I try to convert this to coreml I get a KeyError
def save_core_ml_package(scripted_model):
# Using image_input in the inputs parameter:
# Convert to Core ML neural network using the Unified Conversion API.
h = 224
w = 224
ctmodel = ct.convert(scripted_model,
inputs=[ct.ImageType(shape=(1, 3, h, w),
color_layout=ct.colorlayout.RGB)]
)
# Save the converted model.
ctmodel.save("newmodel.mlmodel")
I get the following error
Support for converting Torch Script Models is experimental. If possible you should use a traced model for conversion.
Traceback (most recent call last):
File "/usr/repo/URCV/src/Python/pytorch_to_torchscript.py", line 101, in <module>
save_trace_to_core_ml_package(test_model, outdir=outdir)
File "/usr/repo/URCV/src/Python/pytorch_to_torchscript.py", line 46, in save_trace_to_core_ml_package
ctmodel = ct.convert(
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/_converters_entry.py", line 444, in convert
mlmodel = mil_convert(
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 190, in mil_convert
return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 217, in _mil_convert
proto, mil_program = mil_convert_to_proto(
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 282, in mil_convert_to_proto
prog = frontend_converter(model, **kwargs)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 112, in __call__
return load(*args, **kwargs)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 56, in load
converter = TorchConverter(torchscript, inputs, outputs, cut_at_symbols, specification_version)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 160, in __init__
raw_graph, params_dict = self._expand_and_optimize_ir(self.torchscript)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 486, in _expand_and_optimize_ir
graph, params_dict = TorchConverter._jit_pass_lower_graph(graph, torchscript)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 431, in _jit_pass_lower_graph
_lower_graph_block(graph)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 410, in _lower_graph_block
module = getattr(node_to_module_map[_input], attr_name)
KeyError: images.2 defined in (%images.2 : __torch__.detectron2.structures.image_list.ImageList = prim::CreateObject()
)

From the error message it looks like you are using a torch script model:
Support for converting Torch Script Models is experimental. If
possible you should use a traced model for conversion.
if possible try to use a traced model e.g.:
dummy_input = torch.randn(batch, channels, width, height)
traceable_model = torch.jit.trace(model, dummy_input)
followed by your original code:
ct.convert(traceable_model,...

Problem with Dataloader object not subscriptable

I am now running a Python program using Pytorch. I use my own dataset, not torch.data.dataset. I download data from a pickle file extracted from feature extraction. But the following errors appear:
Traceback (most recent call last):
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 326, in <module>
fire.Fire(demo)
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\fire\core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\fire\core.py", line 468, in _Fire
target=component.__name__)
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\fire\core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 304, in demo
train(model,train_set1, valid_set=valid_set, test_set=test1, save=save, n_epochs=n_epochs,batch_size=batch_size,seed=seed)
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 172, in train
n_epochs=n_epochs,
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 37, in train_epoch
loader=np.asarray(list(loader))
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
data = self._next_data()
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\dataset.py", line 257, in __getitem__
return self.dataset[self.indices[idx]]
TypeError: 'DataLoader' object is not subscriptable
The code is:
train_set1 = Owndata()
train1, test1 = train_set1 .get_splits()
# prepare data loaders
train_dl = torch.utils.data.DataLoader(train1, batch_size=32, shuffle=True)
test_dl =torch.utils.data.DataLoader(test1, batch_size=1024, shuffle=False)
test_set1 = Owndata()
'''print('test_set# ',test_set)'''
if valid_size:
valid_set = Owndata()
indices = torch.randperm(len(train_set1))
train_indices = indices[:len(indices) - valid_size]
valid_indices = indices[len(indices) - valid_size:]
train_set1 = torch.utils.data.Subset(train_dl, train_indices)
valid_set = torch.utils.data.Subset(valid_set, valid_indices)
else:
valid_set = None
model = DenseNet(
growth_rate=growth_rate,
block_config=block_config,
num_classes=10,
small_inputs=True,
efficient=efficient,
)
train(model,train_set1, valid_set=valid_set, test_set=test1, save=save, n_epochs=n_epochs, batch_size=batch_size, seed=seed)
Any help is appreciated! Thanks a lot in advance!!

It is not the line giving you an error as it's the very last train function you are not showing.
You are confusing two things:
torch.utils.data.Dataset object is indexable (dataset[5] works fine for example). It is a simple object which defines how to get a single (usually single) sample of data.
torch.utils.data.DataLoader - non-indexable, only iterable, usually returns batches of data from above Dataset. Can work in parallel using num_workers. It's what you are trying to index while you should use dataset for that.
Please see PyTorch documentation about data to get a better grasp on how those work.

How to use the HuggingFace transformers pipelines?

I'm trying to do a simple text classification project with Transformers, I want to use the pipeline feature added in the V2.3, but there is little to no documentation.
data = pd.read_csv("data.csv")
FLAUBERT_NAME = "flaubert-base-cased"
encoder = LabelEncoder()
target = encoder.fit_transform(data["category"])
y = target
X = data["text"]
model = FlaubertForSequenceClassification.from_pretrained(FLAUBERT_NAME)
tokenizer = FlaubertTokenizer.from_pretrained(FLAUBERT_NAME)
pipe = TextClassificationPipeline(model, tokenizer, device=-1) # device=-1 -> Use only CPU
print("Test #1: pipe('Bonjour le monde')=", pipe(['Bonjour le monde']))
Traceback (most recent call last):
File "C:/Users/PLHT09191/Documents/work/dev/Classif_Annonces/src/classif_annonce.py", line 33, in <module>
model = FlaubertForSequenceClassification.from_pretrained(FLAUBERT_NAME)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_utils.py", line 463, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_flaubert.py", line 343, in __init__
super(FlaubertForSequenceClassification, self).__init__(config)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 733, in __init__
self.transformer = XLMModel(config)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 382, in __init__
self.ffns.append(TransformerFFN(self.dim, self.hidden_dim, self.dim, config=config))
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 203, in __init__
self.lin2 = nn.Linear(dim_hidden, out_dim)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\torch\nn\modules\linear.py", line 72, in __init__
self.weight = Parameter(torch.Tensor(out_features, in_features))
RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 9437184 bytes. Buy new RAM!
Process finished with exit code 1
How can I use my pipeline with my X and y data?

Error when creating vector in python from text file

I want to import data from text file and make vector space representation out of words:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(input="file")
f = open('D:\\test\\17.txt')
bag_of_words = vectorizer.fit(f)
bag_of_words = vectorizer.transform(f)
print(bag_of_words)
But I get this error:
Traceback (most recent call last):
File "D:\test\test.py", line 5, in <module>
bag_of_words = vectorizer.fit(f)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 776, in fit
self.fit_transform(raw_documents)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 804, in fit_transform
self.fixed_vocabulary_)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 739, in _count_vocab
for feature in analyze(doc):
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 236, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 110, in decode
doc = doc.read()
AttributeError: 'str' object has no attribute 'read'
Any ideas?

The vectorizer.fit method expects an iterable of file or string objects (not a single file object), hence you should have vectorizer.fit([f]).
In addition, you cannot reuse the f in the second call to vectorizer.transform (because the file has been read at that moment). What you probably want to do is the following:
vectorizer = CountVectorizer(input="file")
f = open('D:\\test\\17.txt')
bag_of_words = vectorizer.fit_transform([f])

Saving Random Forest

I want to save and load a fitted Random Forest Classifier, but I get an error.
forest = RandomForestClassifier(n_estimators = 100, max_features = mf_val)
forest = forest.fit(L1[0:100], L2[0:100])
joblib.dump(forest, 'screening_forest/screening_forest.pkl')
forest2 = joblib.load('screening_forest/screening_forest.pkl')
The error is:
File "C:\Users\mkolarek\Documents\other\TrackerResultAnalysis\ScreeningClassif
ier\ScreeningClassifier.py", line 67, in <module>
forest2 = joblib.load('screening_forest/screening_forest.pkl')
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py",
line 425, in load
obj = unpickler.load()
File "C:\Python27\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py",
line 285, in load_build
Unpickler.load_build(self)
File "C:\Python27\lib\pickle.py", line 1217, in load_build
setstate(state)
File "_tree.pyx", line 2280, in sklearn.tree._tree.Tree.__setstate__ (sklearn\
tree\_tree.c:18350)
ValueError: Did not recognise loaded array layout
Press any key to continue . . .
Do I have to initialize forest2 or something?

I solved it with cPickle instead:
with open('screening_forest/screening_forest.pickle', 'wb') as f:
cPickle.dump(forest, f)
with open('screening_forest/screening_forest.pickle', 'rb') as f:
forest2 = cPickle.load(f)
but a joblib solution could be useful as well.

Here is the method that you can try
model = RandomForestClassifier()
model.fit(data,lables)
import pickle
Model_file = 'model.pkl'
pickle.dump(model, open(Model_file, 'wb'))
'''Reloading the model
load the model from Saved file'''
loaded_model = pickle.load(open(Model_file, 'rb'))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: pickle.load init error - python

Related

Error converting Detectron2 torchscript model to CoreML using coremltools

Problem with Dataloader object not subscriptable

How to use the HuggingFace transformers pipelines?

Error when creating vector in python from text file

Saving Random Forest

Categories

Resources