Unable to load spacy English model - 'WindowsPath' object has no attribute 'read'

Unable to load spacy English model - 'WindowsPath' object has no attribute 'read' - python

I installed spacy using pip and then downloaded the English model using
$ python -m spacy download en
which after downloading gave me the message
You can now load the model via spacy.load('en')
Using IPython,
import spacy
nlp=spacy.load('en')
AttributeError Traceback (most recent call last)
<ipython-input-5-a32b6d2b36d8> in <module>()
----> 1 nlp=spacy.load('en')
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\__init__.pyc in load(n
ame, **overrides)
13 from .deprecated import resolve_load_name
14 name = resolve_load_name(name, **overrides)
---> 15 return util.load_model(name, **overrides)
16
17
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in load_model
(name, **overrides)
102 if isinstance(name, basestring_):
103 if name in set([d.name for d in data_path.iterdir()]): # in data
dir / shortcut
--> 104 return load_model_from_link(name, **overrides)
105 if is_package(name): # installed as package
106 return load_model_from_package(name, **overrides)
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in load_model
_from_link(name, **overrides)
121 "Cant' load '%s'. If you're using a shortcut link, make sure
it "
122 "points to a valid model package (not just a data directory)
." % name)
--> 123 return cls.load(**overrides)
124
125
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\data\en\__init__.pyc i
n load(**overrides)
10
11 def load(**overrides):
---> 12 return load_model_from_init_py(__file__, **overrides)
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in load_model
_from_init_py(init_file, **overrides)
165 if not model_path.exists():
166 raise ValueError("Can't find model directory: %s" % path2str(dat
a_path))
--> 167 return load_model_from_path(data_path, meta, **overrides)
168
169
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in load_model
_from_path(model_path, meta, **overrides)
148 component = nlp.create_pipe(name, config=config)
149 nlp.add_pipe(component, name=name)
--> 150 return nlp.from_disk(model_path)
151
152
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\language.pyc in from_d
isk(self, path, disable)
571 if not (path / 'vocab').exists():
572 exclude['vocab'] = True
--> 573 util.from_disk(path, deserializers, exclude)
574 return self
575
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in from_disk(
path, readers, exclude)
495 for key, reader in readers.items():
496 if key not in exclude:
--> 497 reader(path / key)
498 return path
499
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\language.pyc in <lambd
a>(p)
558 path = util.ensure_path(path)
559 deserializers = OrderedDict((
--> 560 ('vocab', lambda p: self.vocab.from_disk(p)),
561 ('tokenizer', lambda p: self.tokenizer.from_disk(p, vocab=Fa
lse)),
562 ('meta.json', lambda p: p.open('w').write(json_dumps(self.me
ta)))
vocab.pyx in spacy.vocab.Vocab.from_disk()
vectors.pyx in spacy.vectors.Vectors.from_disk()
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in from_disk(
path, readers, exclude)
495 for key, reader in readers.items():
496 if key not in exclude:
--> 497 reader(path / key)
498 return path
499
vectors.pyx in spacy.vectors.Vectors.from_disk.load_keys()
C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\numpy\lib\npyio.pyc in load(
file, mmap_mode, allow_pickle, fix_imports, encoding)
389 _ZIP_PREFIX = asbytes('PK\x03\x04')
390 N = len(format.MAGIC_PREFIX)
--> 391 magic = fid.read(N)
392 fid.seek(-N, 1) # back-up
393 if magic.startswith(_ZIP_PREFIX):
AttributeError: 'WindowsPath' object has no attribute 'read'
I have the English model files(en_core_web_sm) downloaded to the working directory, am I missing something? Do I need to set a path variable? Any help is much appreciated, thanks!

If anybody else receives this error : I opened this as an issue with spaCy's developers on Github. I was suggested using Python 3.6 instead of 2.7 for the moment as there is no alternate workaround to the problem. The next spaCy version should cover this bugfix (I'm told).

Yes, there is glitch involving language downloads in anaconda environments. Here is the pending pull request
https://github.com/explosion/spaCy/pull/1792

Related

unable to load the model [duplicate]

I want to load FaceNet in Keras but I am getting errors.
the modal facenet_keras.h5 is ready but I can't load it.
you can get facenet_keras.h5 from this link:
https://drive.google.com/drive/folders/1pwQ3H4aJ8a6yyJHZkTwtjcL4wYWQb7bn
My tensorflow version is:
tensorflow.__version__
'2.2.0'
and when i want to load data:
from tensorflow.keras.models import load_model
load_model('facenet_keras.h5')
get this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-2a20f38e8217> in <module>
----> 1 load_model('facenet_keras.h5')
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py in load_model(filepath, custom_objects, compile)
182 if (h5py is not None and (
183 isinstance(filepath, h5py.File) or h5py.is_hdf5(filepath))):
--> 184 return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
185
186 if sys.version_info >= (3, 4) and isinstance(filepath, pathlib.Path):
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py in load_model_from_hdf5(filepath, custom_objects, compile)
175 raise ValueError('No model found in config file.')
176 model_config = json.loads(model_config.decode('utf-8'))
--> 177 model = model_config_lib.model_from_config(model_config,
178 custom_objects=custom_objects)
179
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/model_config.py in model_from_config(config, custom_objects)
53 '`Sequential.from_config(config)`?')
54 from tensorflow.python.keras.layers import deserialize # pylint: disable=g-import-not-at-top
---> 55 return deserialize(config, custom_objects=custom_objects)
56
57
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/layers/serialization.py in deserialize(config, custom_objects)
103 config['class_name'] = _DESERIALIZATION_TABLE[layer_class_name]
104
--> 105 return deserialize_keras_object(
106 config,
107 module_objects=globs,
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/utils/generic_utils.py in deserialize_keras_object(identifier, module_objects, custom_objects, printable_module_name)
367
368 if 'custom_objects' in arg_spec.args:
--> 369 return cls.from_config(
370 cls_config,
371 custom_objects=dict(
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/network.py in from_config(cls, config, custom_objects)
984 ValueError: In case of improperly formatted config dict.
985 """
--> 986 input_tensors, output_tensors, created_layers = reconstruct_from_config(
987 config, custom_objects)
988 model = cls(inputs=input_tensors, outputs=output_tensors,
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/network.py in reconstruct_from_config(config, custom_objects, created_layers)
2017 # First, we create all layers and enqueue nodes to be processed
2018 for layer_data in config['layers']:
-> 2019 process_layer(layer_data)
2020 # Then we process nodes in order of layer depth.
2021 # Nodes that cannot yet be processed (if the inbound node
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/network.py in process_layer(layer_data)
1999 from tensorflow.python.keras.layers import deserialize as deserialize_layer # pylint: disable=g-import-not-at-top
2000
-> 2001 layer = deserialize_layer(layer_data, custom_objects=custom_objects)
2002 created_layers[layer_name] = layer
2003
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/layers/serialization.py in deserialize(config, custom_objects)
103 config['class_name'] = _DESERIALIZATION_TABLE[layer_class_name]
104
--> 105 return deserialize_keras_object(
106 config,
107 module_objects=globs,
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/utils/generic_utils.py in deserialize_keras_object(identifier, module_objects, custom_objects, printable_module_name)
367
368 if 'custom_objects' in arg_spec.args:
--> 369 return cls.from_config(
370 cls_config,
371 custom_objects=dict(
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/layers/core.py in from_config(cls, config, custom_objects)
988 def from_config(cls, config, custom_objects=None):
989 config = config.copy()
--> 990 function = cls._parse_function_from_config(
991 config, custom_objects, 'function', 'module', 'function_type')
992
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/layers/core.py in _parse_function_from_config(cls, config, custom_objects, func_attr_name, module_attr_name, func_type_attr_name)
1040 elif function_type == 'lambda':
1041 # Unsafe deserialization from bytecode
-> 1042 function = generic_utils.func_load(
1043 config[func_attr_name], globs=globs)
1044 elif function_type == 'raw':
~/.local/lib/python3.8/site-packages/tensorflow/python/keras/utils/generic_utils.py in func_load(code, defaults, closure, globs)
469 except (UnicodeEncodeError, binascii.Error):
470 raw_code = code.encode('raw_unicode_escape')
--> 471 code = marshal.loads(raw_code)
472 if globs is None:
473 globs = globals()
ValueError: bad marshal data (unknown type code)
thank you.

The possible solutions to this error are shown below:
The Model might have been built and saved in Python 2.x and you might be using Python 3.x. Solution is to use the same Python Version using which the Model has been Built and Saved.
Use the same version of Keras (and, may be, tensorflow), on which your Model was Built and Saved.
The Saved Model might contain Custom Objects. If so, you need to load the Model using the code,
new_model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
If you can recreate the architecture (i.e. you have the original code used to generate it), you can instantiate the model from that code and then use model.load_weights('your_model_file.hdf5') to load in the weights. This isn't an option if you don't have the code used to create the original architecture.
For more details, please refer this Github Issue. For more details regarding Saving and Loading the Model with Custom Objects, please refer this Tensorflow Documentation and this Stack Overflow Answer.

I change python version(3.10 to 3.7) and its solved for me.

estimator.fit hangs on sagemaker on local mode

I am trying to train a pytorch model using Sagemaker on local mode, but whenever I call estimator.fit the code hangs indefinitely and I have to interrupt the notebook kernel. This happens both in my local machine and in Sagemaker Studio. But when I use EC2, the training runs normally.
Here the call to the estimator, and the stack trace once I interrupt the kernel:
import sagemaker
from sagemaker.pytorch import PyTorch
bucket = "bucket-name"
role = sagemaker.get_execution_role()
training_input_path = f"s3://{bucket}/dataset/path"
sagemaker_session = sagemaker.LocalSession()
sagemaker_session.config = {"local": {"local_code": True}}
output_path = "file://."
estimator = PyTorch(
entry_point="train.py",
source_dir="src",
hyperparameters={"max-epochs": 1},
framework_version="1.8",
py_version="py3",
instance_count=1,
instance_type="local",
role=role,
output_path=output_path,
sagemaker_session=sagemaker_session,
)
estimator.fit({"training": training_input_path})
Stack trace:
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-9-35cdd6021288> in <module>
----> 1 estimator.fit({"training": training_input_path})
/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)
678 self._prepare_for_training(job_name=job_name)
679
--> 680 self.latest_training_job = _TrainingJob.start_new(self, inputs, experiment_config)
681 self.jobs.append(self.latest_training_job)
682 if wait:
/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in start_new(cls, estimator, inputs, experiment_config)
1450 """
1451 train_args = cls._get_train_args(estimator, inputs, experiment_config)
-> 1452 estimator.sagemaker_session.train(**train_args)
1453
1454 return cls(estimator.sagemaker_session, estimator._current_job_name)
/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in train(self, input_mode, input_config, role, job_name, output_config, resource_config, vpc_config, hyperparameters, stop_condition, tags, metric_definitions, enable_network_isolation, image_uri, algorithm_arn, encrypt_inter_container_traffic, use_spot_instances, checkpoint_s3_uri, checkpoint_local_path, experiment_config, debugger_rule_configs, debugger_hook_config, tensorboard_output_config, enable_sagemaker_metrics, profiler_rule_configs, profiler_config, environment, retry_strategy)
572 LOGGER.info("Creating training-job with name: %s", job_name)
573 LOGGER.debug("train request: %s", json.dumps(train_request, indent=4))
--> 574 self.sagemaker_client.create_training_job(**train_request)
575
576 def _get_train_request( # noqa: C901
/opt/conda/lib/python3.7/site-packages/sagemaker/local/local_session.py in create_training_job(self, TrainingJobName, AlgorithmSpecification, OutputDataConfig, ResourceConfig, InputDataConfig, **kwargs)
184 hyperparameters = kwargs["HyperParameters"] if "HyperParameters" in kwargs else {}
185 logger.info("Starting training job")
--> 186 training_job.start(InputDataConfig, OutputDataConfig, hyperparameters, TrainingJobName)
187
188 LocalSagemakerClient._training_jobs[TrainingJobName] = training_job
/opt/conda/lib/python3.7/site-packages/sagemaker/local/entities.py in start(self, input_data_config, output_data_config, hyperparameters, job_name)
219
220 self.model_artifacts = self.container.train(
--> 221 input_data_config, output_data_config, hyperparameters, job_name
222 )
223 self.end_time = datetime.datetime.now()
/opt/conda/lib/python3.7/site-packages/sagemaker/local/image.py in train(self, input_data_config, output_data_config, hyperparameters, job_name)
200 data_dir = self._create_tmp_folder()
201 volumes = self._prepare_training_volumes(
--> 202 data_dir, input_data_config, output_data_config, hyperparameters
203 )
204 # If local, source directory needs to be updated to mounted /opt/ml/code path
/opt/conda/lib/python3.7/site-packages/sagemaker/local/image.py in _prepare_training_volumes(self, data_dir, input_data_config, output_data_config, hyperparameters)
487 os.mkdir(channel_dir)
488
--> 489 data_source = sagemaker.local.data.get_data_source_instance(uri, self.sagemaker_session)
490 volumes.append(_Volume(data_source.get_root_dir(), channel=channel_name))
491
/opt/conda/lib/python3.7/site-packages/sagemaker/local/data.py in get_data_source_instance(data_source, sagemaker_session)
52 return LocalFileDataSource(parsed_uri.netloc + parsed_uri.path)
53 if parsed_uri.scheme == "s3":
---> 54 return S3DataSource(parsed_uri.netloc, parsed_uri.path, sagemaker_session)
55 raise ValueError(
56 "data_source must be either file or s3. parsed_uri.scheme: {}".format(parsed_uri.scheme)
/opt/conda/lib/python3.7/site-packages/sagemaker/local/data.py in __init__(self, bucket, prefix, sagemaker_session)
183 working_dir = "/private{}".format(working_dir)
184
--> 185 sagemaker.utils.download_folder(bucket, prefix, working_dir, sagemaker_session)
186 self.files = LocalFileDataSource(working_dir)
187
/opt/conda/lib/python3.7/site-packages/sagemaker/utils.py in download_folder(bucket_name, prefix, target, sagemaker_session)
286 raise
287
--> 288 _download_files_under_prefix(bucket_name, prefix, target, s3)
289
290
/opt/conda/lib/python3.7/site-packages/sagemaker/utils.py in _download_files_under_prefix(bucket_name, prefix, target, s3)
314 if exc.errno != errno.EEXIST:
315 raise
--> 316 obj.download_file(file_path)
317
318
/opt/conda/lib/python3.7/site-packages/boto3/s3/inject.py in object_download_file(self, Filename, ExtraArgs, Callback, Config)
313 return self.meta.client.download_file(
314 Bucket=self.bucket_name, Key=self.key, Filename=Filename,
--> 315 ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
316
317
/opt/conda/lib/python3.7/site-packages/boto3/s3/inject.py in download_file(self, Bucket, Key, Filename, ExtraArgs, Callback, Config)
171 return transfer.download_file(
172 bucket=Bucket, key=Key, filename=Filename,
--> 173 extra_args=ExtraArgs, callback=Callback)
174
175
/opt/conda/lib/python3.7/site-packages/boto3/s3/transfer.py in download_file(self, bucket, key, filename, extra_args, callback)
305 bucket, key, filename, extra_args, subscribers)
306 try:
--> 307 future.result()
308 # This is for backwards compatibility where when retries are
309 # exceeded we need to throw the same error from boto3 instead of
/opt/conda/lib/python3.7/site-packages/s3transfer/futures.py in result(self)
107 except KeyboardInterrupt as e:
108 self.cancel()
--> 109 raise e
110
111 def cancel(self):
/opt/conda/lib/python3.7/site-packages/s3transfer/futures.py in result(self)
104 # however if a KeyboardInterrupt is raised we want want to exit
105 # out of this and propogate the exception.
--> 106 return self._coordinator.result()
107 except KeyboardInterrupt as e:
108 self.cancel()
/opt/conda/lib/python3.7/site-packages/s3transfer/futures.py in result(self)
258 # possible value integer value, which is on the scale of billions of
259 # years...
--> 260 self._done_event.wait(MAXINT)
261
262 # Once done waiting, raise an exception if present or return the
/opt/conda/lib/python3.7/threading.py in wait(self, timeout)
550 signaled = self._flag
551 if not signaled:
--> 552 signaled = self._cond.wait(timeout)
553 return signaled
554
/opt/conda/lib/python3.7/threading.py in wait(self, timeout)
294 try: # restore state no matter what (e.g., KeyboardInterrupt)
295 if timeout is None:
--> 296 waiter.acquire()
297 gotit = True
298 else:
KeyboardInterrupt:

SageMaker Studio does not natively support local mode. Studio Apps are themselves docker containers and therefore they require privileged access if they were to be able to build and run docker containers.
As an alternative solution, you can create a remote docker host on an EC2 instance and setup docker on your Studio App. There is quite a bit of networking and package installation involved, but the solution will enable you to use full docker functionality. Additionally, as of version 2.80.0 of SageMaker Python SDK, it now supports local mode when you are using remote docker host.
sdockerSageMaker Studio Docker CLI extension (see this repo) can simplify deploying the above solution in simple two steps (only works for Studio Domain in VPCOnly mode) and it has an easy to follow example here.
UPDATE:
There is now a UI extension (see repo) which can make the experience much smoother and easier to manage.

UnicodeDecodeError: 'utf-8' codec can't decode byte Error in script

import sys
sys.path.append('E:\MLDS\resparser')

from resume_parser import ResumeParser

data = ResumeParser(filename).get_extracted_data()
This thought the following output, which I am not able to understand what is causing it because the same script runs smoothly on a different computer with a similar version installation.
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-52-d48f50eae8f2> in <module>
4 from resume_parser import ResumeParser
5
----> 6 data = ResumeParser(filename).get_extracted_data()
E:\MLDS\resparser\resume_parser.py in __init__(self, resume, skills_file, custom_regex)
13 def __init__(self, resume, skills_file=None, custom_regex=None):
14 nlp = spacy.load("en_core_web_sm")
---> 15 custom_nlp = spacy.load(os.path.dirname(os.path.abspath(__file__)))
16 self.__skills_file = skills_file
17 self.__custom_regex = custom_regex
C:\Miniconda-38\envs\env3.8\lib\site-packages\spacy\__init__.py in load(name, **overrides)
28 if depr_path not in (True, False, None):
29 warnings.warn(Warnings.W001.format(path=depr_path), DeprecationWarning)
---> 30 return util.load_model(name, **overrides)
31
32
C:\Miniconda-38\envs\env3.8\lib\site-packages\spacy\util.py in load_model(name, **overrides)
170 return load_model_from_package(name, **overrides)
171 if Path(name).exists(): # path to model data directory
--> 172 return load_model_from_path(Path(name), **overrides)
173 elif hasattr(name, "exists"): # Path or Path-like to model data
174 return load_model_from_path(name, **overrides)
C:\Miniconda-38\envs\env3.8\lib\site-packages\spacy\util.py in load_model_from_path(model_path, meta, **overrides)
220 component = nlp.create_pipe(factory, config=config)
221 nlp.add_pipe(component, name=name)
--> 222 return nlp.from_disk(model_path, exclude=disable)
223
224
C:\Miniconda-38\envs\env3.8\lib\site-packages\spacy\language.py in from_disk(self, path, exclude, disable)
972 # Convert to list here in case exclude is (default) tuple
973 exclude = list(exclude) + ["vocab"]
--> 974 util.from_disk(path, deserializers, exclude)
975 self._path = path
976 return self
C:\Miniconda-38\envs\env3.8\lib\site-packages\spacy\util.py in from_disk(path, readers, exclude)
688 # Split to support file names like meta.json
689 if key.split(".")[0] not in exclude:
--> 690 reader(path / key)
691 return path
692
C:\Miniconda-38\envs\env3.8\lib\site-packages\spacy\language.py in <lambda>(p)
958 deserializers["meta.json"] = deserialize_meta
959 deserializers["vocab"] = deserialize_vocab
--> 960 deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(
961 p, exclude=["vocab"]
962 )
tokenizer.pyx in spacy.tokenizer.Tokenizer.from_disk()
tokenizer.pyx in spacy.tokenizer.Tokenizer.from_bytes()
C:\Miniconda-38\envs\env3.8\lib\site-packages\spacy\util.py in from_bytes(bytes_data, setters, exclude)
664
665 def from_bytes(bytes_data, setters, exclude):
--> 666 msg = srsly.msgpack_loads(bytes_data)
667 for key, setter in setters.items():
668 # Split to support file names like meta.json
C:\Miniconda-38\envs\env3.8\lib\site-packages\srsly\_msgpack_api.py in msgpack_loads(data, use_list)
27 # msgpack-python docs suggest disabling gc before unpacking large messages
28 gc.disable()
---> 29 msg = msgpack.loads(data, raw=False, use_list=use_list)
30 gc.enable()
31 return msg
C:\Miniconda-38\envs\env3.8\lib\site-packages\srsly\msgpack\__init__.py in unpackb(packed, **kwargs)
58 object_hook = kwargs.get('object_hook')
59 kwargs['object_hook'] = functools.partial(_decode_numpy, chain=object_hook)
---> 60 return _unpackb(packed, **kwargs)
61
62
C:\Miniconda-38\envs\env3.8\lib\site-packages\srsly\msgpack\_unpacker.pyx in srsly.msgpack._unpacker.unpackb()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte
I tried changing the spacy model to _lg but of no use, I did try the ''python -m spacy validate '' to validate the models and the output was following:
TYPE NAME MODEL VERSION
package en-core-web-sm en_core_web_sm 2.3.1 ✔
package en-core-web-lg en_core_web_lg 2.3.1 ✔
I will appreciate any help which could point me in the right direction.

Unable to load english in spacy in windows

I have successfully installed spacy in windows, but when I load spacy in jupyter notebook I get the error
ValueError: could not broadcast input array from shape (96) into shape (128)
I checked the validation of packages in jupyter terminal as follows
python -m spacy validate
✔ Loaded compatibility table
====================== Installed models (spaCy v2.1.4) ======================
ℹ spaCy installation:
C:\Users\iKhan\AppData\Roaming\Python\Python36\site-packages\spacy
TYPE NAME MODEL VERSION
package en-core-web-sm en_core_web_sm 2.1.0 ✔
package en-core-web-md en_core_web_md 2.1.0 ✔
package en-core-web-lg en_core_web_lg 2.1.0 ✔
I try to import in Jupyter notebook
import spacy
nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner'])
Error
ValueError: could not broadcast input array from shape (96) into shape (128)
Following is the complete traceback of error
'''
ValueError Traceback (most recent call last)
<ipython-input-91-2ffa9ed657fb> in <module>
7 import en_core_web_sm
8 #nlp = en_core_web_sm.load()
----> 9 nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner'])
~\Anaconda3\envs\nlp_course\lib\site-packages\spacy\__init__.py in load(name, **overrides)
19 if depr_path not in (True, False, None):
20 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 21 return util.load_model(name, **overrides)
22
23
~\Anaconda3\envs\nlp_course\lib\site-packages\spacy\util.py in load_model(name, **overrides)
112 return load_model_from_link(name, **overrides)
113 if is_package(name): # installed as package
--> 114 return load_model_from_package(name, **overrides)
115 if Path(name).exists(): # path to model data directory
116 return load_model_from_path(Path(name), **overrides)
~\Anaconda3\envs\nlp_course\lib\site-packages\spacy\util.py in load_model_from_package(name, **overrides)
133 """Load a model from an installed package."""
134 cls = importlib.import_module(name)
--> 135 return cls.load(**overrides)
136
137
~\Anaconda3\envs\nlp_course\lib\site-packages\en_core_web_sm\__init__.py in load(**overrides)
10
11 def load(**overrides):
---> 12 return load_model_from_init_py(__file__, **overrides)
~\Anaconda3\envs\nlp_course\lib\site-packages\spacy\util.py in load_model_from_init_py(init_file, **overrides)
171 if not model_path.exists():
172 raise IOError(Errors.E052.format(path=path2str(data_path)))
--> 173 return load_model_from_path(data_path, meta, **overrides)
174
175
~\Anaconda3\envs\nlp_course\lib\site-packages\spacy\util.py in load_model_from_path(model_path, meta, **overrides)
154 component = nlp.create_pipe(name, config=config)
155 nlp.add_pipe(component, name=name)
--> 156 return nlp.from_disk(model_path)
157
158
~\Anaconda3\envs\nlp_course\lib\site-packages\spacy\language.py in from_disk(self, path, disable)
645 if not (path / 'vocab').exists():
646 exclude['vocab'] = True
--> 647 util.from_disk(path, deserializers, exclude)
648 self._path = path
649 return self
~\Anaconda3\envs\nlp_course\lib\site-packages\spacy\util.py in from_disk(path, readers, exclude)
509 for key, reader in readers.items():
510 if key not in exclude:
--> 511 reader(path / key)
512 return path
513
~\Anaconda3\envs\nlp_course\lib\site-packages\spacy\language.py in <lambda>(p, proc)
641 if not hasattr(proc, 'to_disk'):
642 continue
--> 643 deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
644 exclude = {p: False for p in disable}
645 if not (path / 'vocab').exists():
pipeline.pyx in spacy.pipeline.Tagger.from_disk()
~\Anaconda3\envs\nlp_course\lib\site-packages\spacy\util.py in from_disk(path, readers, exclude)
509 for key, reader in readers.items():
510 if key not in exclude:
--> 511 reader(path / key)
512 return path
513
pipeline.pyx in spacy.pipeline.Tagger.from_disk.load_model()
pipeline.pyx in spacy.pipeline.Tagger.from_disk.load_model()
~\Anaconda3\envs\nlp_course\lib\site-packages\thinc\neural\_classes\model.py in from_bytes(self, bytes_data)
350 def from_bytes(self, bytes_data):
351 data = srsly.msgpack_loads(bytes_data)
--> 352 weights = data[b"weights"]
353 queue = [self]
354 i = 0
~\Anaconda3\envs\nlp_course\lib\site-packages\thinc\neural\util.py in copy_array(dst, src, casting, where)
68
69 def require_gpu():
---> 70 from ._classes.model import Model
71 from .ops import CupyOps
72
ValueError: could not broadcast input array from shape (96) into shape (128)
'''

pytorch RuntimeError: CUDA error: device-side assert triggered

I've a notebook on google colab that fails with following error
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
93 exception = e
---> 94 raise e
95 finally: cb_handler.on_train_end(exception)
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
83 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 84 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
85 if cb_handler.on_batch_end(loss): break
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
24 if opt is not None:
---> 25 loss = cb_handler.on_backward_begin(loss)
26 loss.backward()
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in on_backward_begin(self, loss)
223 for cb in self.callbacks:
--> 224 a = cb.on_backward_begin(**self.state_dict)
225 if a is not None: self.state_dict['last_loss'] = a
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in on_backward_begin(self, smooth_loss, **kwargs)
266 if self.pbar is not None and hasattr(self.pbar,'child'):
--> 267 self.pbar.child.comment = f'{smooth_loss:.4f}'
268
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in __format__(self, format_spec)
377 if self.dim() == 0:
--> 378 return self.item().__format__(format_spec)
379 return object.__format__(self, format_spec)
RuntimeError: CUDA error: device-side assert triggered
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-33-dd390b1c8108> in <module>()
----> 1 lr_find(learn)
2 learn.recorder.plot()
/usr/local/lib/python3.6/dist-packages/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, stop_div, **kwargs)
26 cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
27 a = int(np.ceil(num_it/len(learn.data.train_dl)))
---> 28 learn.fit(a, start_lr, callbacks=[cb], **kwargs)
29
30 def to_fp16(learn:Learner, loss_scale:float=512., flat_master:bool=False)->Learner:
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
160 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
161 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 162 callbacks=self.callbacks+callbacks)
163
164 def create_opt(self, lr:Floats, wd:Floats=0.)->None:
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
93 exception = e
94 raise e
---> 95 finally: cb_handler.on_train_end(exception)
96
97 loss_func_name2activ = {'cross_entropy_loss': partial(F.softmax, dim=1), 'nll_loss': torch.exp, 'poisson_nll_loss': torch.exp,
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in on_train_end(self, exception)
254 def on_train_end(self, exception:Union[bool,Exception])->None:
255 "Handle end of training, `exception` is an `Exception` or False if no exceptions during training."
--> 256 self('train_end', exception=exception)
257
258 class AverageMetric(Callback):
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in __call__(self, cb_name, call_mets, **kwargs)
185 "Call through to all of the `CallbakHandler` functions."
186 if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
--> 187 return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
188
189 def on_train_begin(self, epochs:int, pbar:PBar, metrics:MetricFuncList)->None:
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in <listcomp>(.0)
185 "Call through to all of the `CallbakHandler` functions."
186 if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
--> 187 return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
188
189 def on_train_begin(self, epochs:int, pbar:PBar, metrics:MetricFuncList)->None:
/usr/local/lib/python3.6/dist-packages/fastai/callbacks/lr_finder.py in on_train_end(self, **kwargs)
45 # restore the valid_dl we turned of on `__init__`
46 self.data.valid_dl = self.valid_dl
---> 47 self.learn.load('tmp')
48 if hasattr(self.learn.model, 'reset'): self.learn.model.reset()
49 print('LR Finder complete, type {learner_name}.recorder.plot() to see the graph.')
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in load(self, name, device)
202 "Load model `name` from `self.model_dir` using `device`, defaulting to `self.data.device`."
203 if device is None: device = self.data.device
--> 204 self.model.load_state_dict(torch.load(self.path/self.model_dir/f'{name}.pth', map_location=device))
205 return self
206
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in load(f, map_location, pickle_module)
356 f = open(f, 'rb')
357 try:
--> 358 return _load(f, map_location, pickle_module)
359 finally:
360 if new_fd:
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _load(f, map_location, pickle_module)
527 unpickler = pickle_module.Unpickler(f)
528 unpickler.persistent_load = persistent_load
--> 529 result = unpickler.load()
530
531 deserialized_storage_keys = pickle_module.load(f)
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in persistent_load(saved_id)
493 if root_key not in deserialized_objects:
494 deserialized_objects[root_key] = restore_location(
--> 495 data_type(size), location)
496 storage = deserialized_objects[root_key]
497 if view_metadata is not None:
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in restore_location(storage, location)
376 elif isinstance(map_location, torch.device):
377 def restore_location(storage, location):
--> 378 return default_restore_location(storage, str(map_location))
379 else:
380 def restore_location(storage, location):
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in default_restore_location(storage, location)
102 def default_restore_location(storage, location):
103 for _, _, fn in _package_registry:
--> 104 result = fn(storage, location)
105 if result is not None:
106 return result
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _cuda_deserialize(obj, location)
84 'to an existing device.'.format(
85 device, torch.cuda.device_count()))
---> 86 return obj.cuda(device)
87
88
/usr/local/lib/python3.6/dist-packages/torch/_utils.py in _cuda(self, device, non_blocking, **kwargs)
74 else:
75 new_type = getattr(torch.cuda, self.__class__.__name__)
---> 76 return new_type(self.size()).copy_(self, non_blocking)
77
78
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:20
There is no information about the real cause, I tried to get the stack trace by forcing cuda to run on one gpu (as suggested here) using a cell like this
!export CUDA_LAUNCH_BLOCKING=1
But this does not seem to work, still having the same error with.
Is there another way that works with Google Colab?

Be sure that your targets values starts from zero to number of classes - 1. Ex: you have 100 classification class so your target should be from 0 to 99

!export FOO=blah is usually not useful to run in a notebook because ! means run the following command in a sub-shell, so the effect of the statement is gone by the time the ! returns.
You might have more success by storing your python code in a file and then executing that file in a subshell:
In one cell:
%%writefile foo.py
[...your code...]
In the next cell:
!export CUDA_LAUNCH_BLOCKING=1; python3 foo.py
(or s/python3/python2/ if you're writing py2)

Switch Hardware Accelerator Type to "None" under Runtime->Change Runtime Type . This should give you a more meaningful error message.

The proper way to set environmental variables in Google Colab is to use os:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
Using the os library will allow you to set whatever environmental variables you need. Setting CUDA_LAUNCH_BLOCKING this way enables proper CUDA tracebacks in Google Colab.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to load spacy English model - 'WindowsPath' object has no attribute 'read' - python

If anybody else receives this error : I opened this as an issue with spaCy's developers on Github. I was suggested using Python 3.6 instead of 2.7 for the moment as there is no alternate workaround to the problem. The next spaCy version should cover this bugfix (I'm told).

Yes, there is glitch involving language downloads in anaconda environments. Here is the pending pull request https://github.com/explosion/spaCy/pull/1792

Related

unable to load the model [duplicate]

estimator.fit hangs on sagemaker on local mode

UnicodeDecodeError: 'utf-8' codec can't decode byte Error in script

Unable to load english in spacy in windows

pytorch RuntimeError: CUDA error: device-side assert triggered

Categories

Resources