I have written the code for an entity extraction model using bert but when I run the train.py file I get a value error.
This is the structure of my code with the configuration file in VSCode, I have downloaded bert models from here
Error
>> (myenv) PS D:\Transformers\bert-entity-extraction> python src/train.py
Configuration Complete!
Traceback (most recent call last):
File "src/train.py", line 83, in <module>
model = EntityModel(num_tag = num_tag, num_pos = num_pos)
File "D:\Transformers\bert-entity-extraction\src\model.py", line 25, in __init__
self.bert = transformers.BertModel.from_pretrained(config.BASE_MODEL_PATH)
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\modeling_utils.py", line 1080, in from_pretrained
**kwargs,
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\configuration_utils.py", line 427, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\configuration_utils.py", line 492, in get_config_dict
user_agent=user_agent,
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\file_utils.py", line 1289, in cached_path
raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse D:\Transformers\bert-entity-extraction\input\bert-base-uncased_L-12_H-768_A-12\config.json as a URL or as a local path
How to fix this?
Related
I am trying to use pyresparser in pycharm but it is showing this error:
File "C:\Users\DELL\PycharmProjects\pythonProject1\main.py", line 71, in prediction_result
data = ResumeParser(cv_path).get_extracted_data()
File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\pyresparser\resume_parser.py", line 25, in __init__
nlp = spacy.load('en_core_web_sm')
File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\spacy\__init__.py", line 51, in load
return util.load_model(
File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\spacy\util.py", line 427, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
I am having this weird bug that the submitted custom job fails due to not finding the bucket I defined for the training output although I can see it exists. This is the error I am getting:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.local/lib/python3.7/site-packages/trainer/task.py", line 508, in <module>
train_loss = train(scheduler, optimizer)
File "/root/.local/lib/python3.7/site-packages/trainer/task.py", line 390, in train
torch.save(model.state_dict(), args.model_dir)
File "/opt/conda/lib/python3.7/site-packages/torch/serialization.py", line 369, in save
with _open_file_like(f, 'wb') as opened_file:
File "/opt/conda/lib/python3.7/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/opt/conda/lib/python3.7/site-packages/torch/serialization.py", line 211, in __init__
super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'gs://machine-learning-us-central1/test_04_30_15_12'
The bucket and directory in the error gs://machine-learning-us-central1/test_04_30_15_12 do exist. They are also being created before the actual training.
It works when I do not use command line arguments but python code. Thus, I assume I have a bug within the parser which I used for the command line arguments.
Parser:
parser = argparse.ArgumentParser()
parser.add_argument('--model-dir', dest='model_dir',
help='Model dir.')
parser.add_argument('--model-name', dest='model_name',
help='Name of the model',
default='model.pt')
args = parser.parse_args()
How I store the training output:
def save_model(args_dir, args_name):
"""Saves the model to Google Cloud Storage
Args:
args: contains name for saved model.
"""
bucket = storage.Client().bucket(ROOT_BUCKET)
blob = bucket.blob(args_name)
blob.upload_from_filename(args_dir)
#within the training run
torch.save(model.state_dict(), args.model_name)
save_model(args.model_name, args.model_dir)
Model dir seems to be correctly defined as I can store files in it, just not with the parser.
Or could it be that there is an issue with pytorch and how I save the model?
I am trying to use Huggingface transformer api to load a locally downloaded M-BERT model but it is throwing an exception.
I clone this repo: https://huggingface.co/bert-base-multilingual-cased
bert = TFBertModel.from_pretrained("input/bert-base-multilingual-cased")
The directory structure is:
But I am getting this error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1277, in from_pretrained
missing_keys, unexpected_keys = load_tf_weights(model, resolved_archive_file, load_weight_prefix)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 467, in load_tf_weights
with h5py.File(resolved_archive_file, "r") as f:
File "/usr/local/lib/python3.7/dist-packages/h5py/_hl/files.py", line 408, in __init__
swmr=swmr)
File "/usr/local/lib/python3.7/dist-packages/h5py/_hl/files.py", line 173, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 81, in <module>
__main__()
File "train.py", line 59, in __main__
model = create_model(num_classes)
File "/content/drive/My Drive/msc-project/code/model.py", line 26, in create_model
bert = TFBertModel.from_pretrained("input/bert-base-multilingual-cased")
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1280, in from_pretrained
"Unable to load weights from h5 file. "
OSError: Unable to load weights from h5 file. If you tried to load a TF 2.0 model from a PyTorch checkpoint, please set from_pt=True.
Where am I going wrong?
Need help!
Thanks in advance.
As it was already pointed in the comments - your from_pretrained param should be either id of a model hosted on huggingface.co or a local path:
A path to a directory containing model weights saved using
save_pretrained(), e.g., ./my_model_directory/.
See documentation
Looking at your stacktrace it seems like your code is run inside:
/content/drive/My Drive/msc-project/code/model.py so unless your model is in:
/content/drive/My Drive/msc-project/code/input/bert-base-multilingual-cased/ it won't load.
I would also set the path to be similar to documentation example ie:
bert = TFBertModel.from_pretrained("./input/bert-base-multilingual-cased/")
I would like to create a binary of my python code, that contains spaCy.
# main.py
import spacy
import en_core_web_sm
def main() -> None:
nlp = spacy.load("en_core_web_sm")
# nlp = en_core_web_sm.load()
doc = nlp("This is an example")
print([(w.text, w.pos_) for w in doc])
if __name__ == "__main__":
main()
Besides my code, I created two PyInstaller-hooks, as described here
To create the binary I use the following command pyinstaller main.py --additional-hooks-dir=..
On the execution of the binary I get the following error message:
Traceback (most recent call last):
File "main.py", line 19, in <module>
main()
File "main.py", line 12, in main
nlp = spacy.load("en_core_web_sm")
File "spacy/__init__.py", line 47, in load
File "spacy/util.py", line 329, in load_model
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
If I use nlp = en_core_web_sm.load() instead if nlp = spacy.load("en_core_web_sm") to load the spacy model, I get the following error:
Traceback (most recent call last):
File "main.py", line 19, in <module>
main()
File "main.py", line 13, in main
nlp = en_core_web_sm.load()
File "en_core_web_sm/__init__.py", line 10, in load
File "spacy/util.py", line 514, in load_model_from_init_py
File "spacy/util.py", line 389, in load_model_from_path
File "spacy/util.py", line 426, in load_model_from_config
File "spacy/language.py", line 1662, in from_config
File "spacy/language.py", line 768, in add_pipe
File "spacy/language.py", line 659, in create_pipe
File "thinc/config.py", line 722, in resolve
File "thinc/config.py", line 771, in _make
File "thinc/config.py", line 826, in _fill
File "thinc/config.py", line 825, in _fill
File "thinc/config.py", line 1016, in make_promise_schema
File "spacy/util.py", line 137, in get
catalogue.RegistryError: [E893] Could not find function 'spacy.Tok2Vec.v1' in function registry 'architectures'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.
I had this same issue. After the error message you posted above, did you see an "Available names: ..." message? This message suggested that spacy.Tok2Vec.v2 was available but not v1. I was able to edit the config file for en_core_web_sm (for me at dist<name>\en_core_web_sm\en_core_web_sm-3.0.0\config.cfg) and change all references for spacy.Tok2Vec.v1 -> spacy.Tok2Vec.v2. I also had to do this for spacy.MaxoutWindowEncoder.v1. It's still a mystery to me as to why I'm having the issue only in the pyinstaller distributable and not my non-compiled script.
I encountered the same issue and nailed it by copying the spacy-legacy package to the compiled destination directory.
You can also hook it up by Pyinstaller but I did not really try that.
I hope my answer helps.
I am trying to serialize/deserialize spaCy documents (setup is Windows 7, Anaconda) and am getting errors. I haven't been able to find any explanations. Here is a snippet of code and the error it generates:
import spacy
nlp = spacy.load('en')
text = 'This is a test.'
doc = nlp(text)
fout = 'test.spacy' # <-- according to the API for Doc.to_disk(), this needs to be a directory (but for me, spaCy writes a file)
doc.to_disk(fout)
doc.from_disk(fout)
Traceback (most recent call last):
File "<ipython-input-7-aa22bf1b9689>", line 1, in <module>
doc.from_disk(fout)
File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk
File "doc.pyx", line 806, in spacy.tokens.doc.Doc.from_bytes
ValueError: [E033] Cannot load into non-empty Doc of length 5.
I have also tried creating a new Doc object and loading from that, as shown in the example ("Example: Saving and loading a document") in the spaCy docs, which results in a different error:
from spacy.tokens import Doc
from spacy.vocab import Vocab
new_doc = Doc(Vocab()).from_disk(fout)
Traceback (most recent call last):
File "<ipython-input-16-4d99a1199f43>", line 1, in <module>
Doc(Vocab()).from_disk(fout)
File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk
File "doc.pyx", line 838, in spacy.tokens.doc.Doc.from_bytes
File "stringsource", line 646, in View.MemoryView.memoryview_cwrapper
File "stringsource", line 347, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
EDIT:
As pointed out in the replies, the path provided should be a directory. However, the first code snippet creates a file. Changing this to a non-existing directory path doesn't help as spaCy still creates a file. Attempting to write to an existing directory causes an error too:
fout = 'data'
doc.to_disk(fout) Traceback (most recent call last):
File "<ipython-input-8-6c30638f4750>", line 1, in <module>
doc.to_disk(fout)
File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
opener=self._opener)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
return self._accessor.open(self, flags, mode)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: 'data'
Python has no problem writing at this location via standard file operations (open/read/write).
Trying with a Path object yields the same results:
from pathlib import Path
import os
fout = Path(os.path.join(os.getcwd(), 'data'))
doc.to_disk(fout)
Traceback (most recent call last):
File "<ipython-input-17-6c30638f4750>", line 1, in <module>
doc.to_disk(fout)
File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
opener=self._opener)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
return self._accessor.open(self, flags, mode)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Username\\workspace\\data'
Any ideas why this might be happening?
doc.to_disk(fout)
must be
a path to a directory, which will be created if it doesn't exist.
Paths may be either strings or Path-like objects.
as the documentation for spaCy states in https://spacy.io/api/doc
Try changing fout to a directory, it might do the trick.
EDIT:
Examples from the spacy documentation:
for doc.to_disk:
doc.to_disk('/path/to/doc')
and for doc.from_disk:
from spacy.tokens import Doc
from spacy.vocab import Vocab
doc = Doc(Vocab()).from_disk('/path/to/doc')