I have trained a HMM model to add punctuation into Arabic text and I want to save it to not repeating the training phase every time I enter a text to the model for tagging it .. I use pickle for these task as I see in tutorials. I do exactly like them but it fail and give me these error!.
Traceback (most recent call last):
File "C:\Python27\file_pun_tag.py", line 205, in <module>
hmm_tagger("test_file.txt")
File "C:\Python27\file_pun_tag.py", line 179, in hmm_tagger
hmm = pickle.load(saved_model)
File "C:\Python27\lib\pickle.py", line 1378, in load
return Unpickler(file).load()
File "C:\Python27\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\Python27\lib\pickle.py", line 1133, in load_reduce
value = func(*args)
TypeError: __init__() takes at least 3 arguments (2 given)
I tried several solutions but none of them working with me ...
Here is the code where I save my model. It is working correctly for saving the model and creating the "hmm.pickle":
file = codecs.open("train_sents_hmm.txt", "r", "utf_8")
train_sents = file.readlines()
labelled_sequences, tag_set, symbols = load_pun(train_sents)
trainer = nltk.HiddenMarkovModelTrainer (tag_set, symbols)
hmm = trainer.train_supervised (labelled_sequences, estimator=lambda fd, bins: LidstoneProbDist(fd, 0.1, bins))
# save object
save_model = open("hmm.pickle", "wb")
pickle.dump(hmm, save_model, -1)
save_model.close()
And here is the code when i'm trying to load the model after saving it, and here where it gives me the error:
saved_model = open("hmm.pickle", "rb")
hmm = pickle.load(saved_model)
saved_model.close()
Related
I have a Detectron2 model that is trained to identify specific items on a backend server. I would like to make this model available on iOS devices and convert it to a CoreML model using coremltools v6.1. I used the export_model.py script provided by Facebook to create a torchscript model, but when I try to convert this to coreml I get a KeyError
def save_core_ml_package(scripted_model):
# Using image_input in the inputs parameter:
# Convert to Core ML neural network using the Unified Conversion API.
h = 224
w = 224
ctmodel = ct.convert(scripted_model,
inputs=[ct.ImageType(shape=(1, 3, h, w),
color_layout=ct.colorlayout.RGB)]
)
# Save the converted model.
ctmodel.save("newmodel.mlmodel")
I get the following error
Support for converting Torch Script Models is experimental. If possible you should use a traced model for conversion.
Traceback (most recent call last):
File "/usr/repo/URCV/src/Python/pytorch_to_torchscript.py", line 101, in <module>
save_trace_to_core_ml_package(test_model, outdir=outdir)
File "/usr/repo/URCV/src/Python/pytorch_to_torchscript.py", line 46, in save_trace_to_core_ml_package
ctmodel = ct.convert(
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/_converters_entry.py", line 444, in convert
mlmodel = mil_convert(
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 190, in mil_convert
return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 217, in _mil_convert
proto, mil_program = mil_convert_to_proto(
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 282, in mil_convert_to_proto
prog = frontend_converter(model, **kwargs)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 112, in __call__
return load(*args, **kwargs)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 56, in load
converter = TorchConverter(torchscript, inputs, outputs, cut_at_symbols, specification_version)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 160, in __init__
raw_graph, params_dict = self._expand_and_optimize_ir(self.torchscript)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 486, in _expand_and_optimize_ir
graph, params_dict = TorchConverter._jit_pass_lower_graph(graph, torchscript)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 431, in _jit_pass_lower_graph
_lower_graph_block(graph)
File "/opt/python-venv/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 410, in _lower_graph_block
module = getattr(node_to_module_map[_input], attr_name)
KeyError: images.2 defined in (%images.2 : __torch__.detectron2.structures.image_list.ImageList = prim::CreateObject()
)
From the error message it looks like you are using a torch script model:
Support for converting Torch Script Models is experimental. If
possible you should use a traced model for conversion.
if possible try to use a traced model e.g.:
dummy_input = torch.randn(batch, channels, width, height)
traceable_model = torch.jit.trace(model, dummy_input)
followed by your original code:
ct.convert(traceable_model,...
I am now running a Python program using Pytorch. I use my own dataset, not torch.data.dataset. I download data from a pickle file extracted from feature extraction. But the following errors appear:
Traceback (most recent call last):
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 326, in <module>
fire.Fire(demo)
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\fire\core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\fire\core.py", line 468, in _Fire
target=component.__name__)
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\fire\core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 304, in demo
train(model,train_set1, valid_set=valid_set, test_set=test1, save=save, n_epochs=n_epochs,batch_size=batch_size,seed=seed)
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 172, in train
n_epochs=n_epochs,
File "C:\Users\hp\Downloads\efficient_densenet_pytorch-master\demo-emotion.py", line 37, in train_epoch
loader=np.asarray(list(loader))
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
data = self._next_data()
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\hp\Anaconda3\envs\tf-gpu\lib\site-packages\torch\utils\data\dataset.py", line 257, in __getitem__
return self.dataset[self.indices[idx]]
TypeError: 'DataLoader' object is not subscriptable
The code is:
train_set1 = Owndata()
train1, test1 = train_set1 .get_splits()
# prepare data loaders
train_dl = torch.utils.data.DataLoader(train1, batch_size=32, shuffle=True)
test_dl =torch.utils.data.DataLoader(test1, batch_size=1024, shuffle=False)
test_set1 = Owndata()
'''print('test_set# ',test_set)'''
if valid_size:
valid_set = Owndata()
indices = torch.randperm(len(train_set1))
train_indices = indices[:len(indices) - valid_size]
valid_indices = indices[len(indices) - valid_size:]
train_set1 = torch.utils.data.Subset(train_dl, train_indices)
valid_set = torch.utils.data.Subset(valid_set, valid_indices)
else:
valid_set = None
model = DenseNet(
growth_rate=growth_rate,
block_config=block_config,
num_classes=10,
small_inputs=True,
efficient=efficient,
)
train(model,train_set1, valid_set=valid_set, test_set=test1, save=save, n_epochs=n_epochs, batch_size=batch_size, seed=seed)
Any help is appreciated! Thanks a lot in advance!!
It is not the line giving you an error as it's the very last train function you are not showing.
You are confusing two things:
torch.utils.data.Dataset object is indexable (dataset[5] works fine for example). It is a simple object which defines how to get a single (usually single) sample of data.
torch.utils.data.DataLoader - non-indexable, only iterable, usually returns batches of data from above Dataset. Can work in parallel using num_workers. It's what you are trying to index while you should use dataset for that.
Please see PyTorch documentation about data to get a better grasp on how those work.
I'm trying to do a simple text classification project with Transformers, I want to use the pipeline feature added in the V2.3, but there is little to no documentation.
data = pd.read_csv("data.csv")
FLAUBERT_NAME = "flaubert-base-cased"
encoder = LabelEncoder()
target = encoder.fit_transform(data["category"])
y = target
X = data["text"]
model = FlaubertForSequenceClassification.from_pretrained(FLAUBERT_NAME)
tokenizer = FlaubertTokenizer.from_pretrained(FLAUBERT_NAME)
pipe = TextClassificationPipeline(model, tokenizer, device=-1) # device=-1 -> Use only CPU
print("Test #1: pipe('Bonjour le monde')=", pipe(['Bonjour le monde']))
Traceback (most recent call last):
File "C:/Users/PLHT09191/Documents/work/dev/Classif_Annonces/src/classif_annonce.py", line 33, in <module>
model = FlaubertForSequenceClassification.from_pretrained(FLAUBERT_NAME)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_utils.py", line 463, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_flaubert.py", line 343, in __init__
super(FlaubertForSequenceClassification, self).__init__(config)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 733, in __init__
self.transformer = XLMModel(config)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 382, in __init__
self.ffns.append(TransformerFFN(self.dim, self.hidden_dim, self.dim, config=config))
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 203, in __init__
self.lin2 = nn.Linear(dim_hidden, out_dim)
File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\torch\nn\modules\linear.py", line 72, in __init__
self.weight = Parameter(torch.Tensor(out_features, in_features))
RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 9437184 bytes. Buy new RAM!
Process finished with exit code 1
How can I use my pipeline with my X and y data?
I want to import data from text file and make vector space representation out of words:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(input="file")
f = open('D:\\test\\17.txt')
bag_of_words = vectorizer.fit(f)
bag_of_words = vectorizer.transform(f)
print(bag_of_words)
But I get this error:
Traceback (most recent call last):
File "D:\test\test.py", line 5, in <module>
bag_of_words = vectorizer.fit(f)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 776, in fit
self.fit_transform(raw_documents)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 804, in fit_transform
self.fixed_vocabulary_)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 739, in _count_vocab
for feature in analyze(doc):
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 236, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 110, in decode
doc = doc.read()
AttributeError: 'str' object has no attribute 'read'
Any ideas?
The vectorizer.fit method expects an iterable of file or string objects (not a single file object), hence you should have vectorizer.fit([f]).
In addition, you cannot reuse the f in the second call to vectorizer.transform (because the file has been read at that moment). What you probably want to do is the following:
vectorizer = CountVectorizer(input="file")
f = open('D:\\test\\17.txt')
bag_of_words = vectorizer.fit_transform([f])
I want to save and load a fitted Random Forest Classifier, but I get an error.
forest = RandomForestClassifier(n_estimators = 100, max_features = mf_val)
forest = forest.fit(L1[0:100], L2[0:100])
joblib.dump(forest, 'screening_forest/screening_forest.pkl')
forest2 = joblib.load('screening_forest/screening_forest.pkl')
The error is:
File "C:\Users\mkolarek\Documents\other\TrackerResultAnalysis\ScreeningClassif
ier\ScreeningClassifier.py", line 67, in <module>
forest2 = joblib.load('screening_forest/screening_forest.pkl')
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py",
line 425, in load
obj = unpickler.load()
File "C:\Python27\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py",
line 285, in load_build
Unpickler.load_build(self)
File "C:\Python27\lib\pickle.py", line 1217, in load_build
setstate(state)
File "_tree.pyx", line 2280, in sklearn.tree._tree.Tree.__setstate__ (sklearn\
tree\_tree.c:18350)
ValueError: Did not recognise loaded array layout
Press any key to continue . . .
Do I have to initialize forest2 or something?
I solved it with cPickle instead:
with open('screening_forest/screening_forest.pickle', 'wb') as f:
cPickle.dump(forest, f)
with open('screening_forest/screening_forest.pickle', 'rb') as f:
forest2 = cPickle.load(f)
but a joblib solution could be useful as well.
Here is the method that you can try
model = RandomForestClassifier()
model.fit(data,lables)
import pickle
Model_file = 'model.pkl'
pickle.dump(model, open(Model_file, 'wb'))
'''Reloading the model
load the model from Saved file'''
loaded_model = pickle.load(open(Model_file, 'rb'))