Error when loading a custom model with spacy - python

I'm trying to load a custom model called 'ru2' into spacy (for npl processing).
it can be found there: https://github.com/buriy/spacy-ru
The problem is when I call the function
nlp = spacy.load('ru2')
doc = nlp(text)
I see the error
C:\ProgramData\Anaconda3\lib\importlib\_bootstrap.py:205: RuntimeWarning: spacy.tokens.span.Span size changed, may indicate binary incompatibility. Expected 72 from C header, got 80 from PyObject
return f(*args, **kwds)
Traceback (most recent call last):
File "C://.../nlp/src/ie/main.py", line 125, in <module>
main(examp_dict['Poroshenko'])
File "C://.../nlp/src/ie/main.py", line 92, in main
nlp = spacy.load('ru2')
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\__init__.py", line 27, in load
return util.load_model(name, **overrides)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 133, in load_model
return load_model_from_path(Path(name), **overrides)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 173, in load_model_from_path
return nlp.from_disk(model_path)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py", line 791, in from_disk
util.from_disk(path, deserializers, exclude)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 630, in from_disk
reader(path / key)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py", line 781, in <lambda>
deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(p, exclude=["vocab"])
File "tokenizer.pyx", line 391, in spacy.tokenizer.Tokenizer.from_disk
File "tokenizer.pyx", line 432, in spacy.tokenizer.Tokenizer.from_bytes
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 606, in from_bytes
msg = srsly.msgpack_loads(bytes_data)
File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\_msgpack_api.py", line 29, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\msgpack\__init__.py", line 60, in unpackb
return _unpackb(packed, **kwargs)
File "_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
TypeError: unhashable type: 'list'
I was searching for similar questions in the Internet:
https://github.com/explosion/spaCy/issues/2715
https://spacy.io/usage#unhashable-list
But non of those solutions work for me.
I use
msgpack==0.5.6 (even downgraded as suggested in the link above)
spacy==2.1.4

Here is from https://spacy.io/usage#troubleshooting
If you’re training models, writing them to disk, and versioning them with git, you might encounter this error when trying to load them in a Windows environment. This happens because a default install of Git for Windows is configured to automatically convert Unix-style end-of-line characters (LF) to Windows-style ones (CRLF) during file checkout (and the reverse when committing). While that’s mostly fine for text files, a trained model written to disk has some binary files that should not go through this conversion. When they do, you get the error above. You can fix it by either changing your core.autocrlf setting to "false", or by committing a .gitattributes file] to your repository to tell git on which files or folders it shouldn’t do LF-to-CRLF conversion, with an entry like path/to/spacy/model/** -text. After you’ve done either of these, clone your repository again.

It might be because the version number of SpaCy used to generate your model is not the same as the version of SpaCy you have installed. (I don't know of course, just mentioning it in case it helps.)

Adding to the answer above, another quick fix would be to manually download the zip from the repository.

Related

Error while running haystack models in CPU [duplicate]

2023-01-25 08:21:21,659 - ERROR - Traceback (most recent call last):
File "/home/xyzUser/project/queue_handler/document_queue_listner.py", line 148, in __process_and_acknowledge
pipeline_result = self.__process_document_type(message, pipeline_input)
File "/home/xyzUser/project/queue_handler/document_queue_listner.py", line 194, in __process_document_type
pipeline_result = bill_parser_pipeline.process(pipeline_input)
File "/home/xyzUser/project/main/billparser/__init__.py", line 18, in process
bill_extractor_model = MachineGeneratedBillExtractorModel()
File "/home/xyzUser/project/main/billparser/models/qa_model.py", line 25, in __new__
cls.__model = TransformersReader(model_name_or_path=cls.__model_path, use_gpu=False)
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/haystack/nodes/base.py", line 48, in wrapper_exportable_to_yaml
init_func(self, *args, **kwargs)
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/haystack/nodes/reader/transformers.py", line 93, in __init__
self.model = pipeline(
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 542, in pipeline
return task_class(model=model, framework=framework, task=task, **kwargs)
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 125, in __init__
super().__init__(
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/transformers/pipelines/base.py", line 691, in __init__
self.device = device if framework == "tf" else torch.device("cpu" if device < 0 else f"cuda:{device}")
TypeError: '<' not supported between instances of 'torch.device' and 'int'
This is the error message i got after installing a requirement.txt file from my project. I think it is related to torch but also dont know how to fix it. I am new to hugging face transformers and dont know if it is a version issue.
This was a bug with the transformers package for a number of versions prior to v4.22.0, given that particular line of code does not discern between the type of the device argument could be a torch.device before comparing that with an int. Tracing through git blame, we can find that this specific change made in changeset 9d4a45509ab include the much needed if isinstance(device, torch.device): provided by line 764 in the resulting file, which will ensure this error won't happen. Checking the tags above will show that the release for v4.22.0 and after should include this particular fix. As a refresher, to update a specific package, activate the environment, and issue the following:
pip install -U transformers
Alternatively with a specific version, e.g.:
pip install -U transformers==4.22.0

Parse postgresql -pycparser.plyparser.ParseError before: pgwin32_signal_event

I need to parse an open-source project Postgresql using pycparser.
While parsing its source-code the following error arises:
Traceback (most recent call last):
File "examples\using_cpp_libc.py", line 48, in <module>
getAllFiles(projectName)
File "examples\using_cpp_libc.py", line 29, in getAllFiles
ast = parse_file(dirName+'\\'+fname, use_cpp = True, cpp_path = 'cpp',
cpp_args = [r'-nostdinc',r'-Iutils/fake_libc_include',r'-
Iprojects/postgresql/src/include'])
File "G:\python\pycparser-master\pycparser\__init__.py", line 92, in
parse_file
return parser.parse(text, filename)
File "G:\python\pycparser-master\pycparser\c_parser.py", line 152, in parse
debug=debuglevel)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 334, in parse
return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 1204, in
parseopt_notrack
tok = call_errorfunc(self.errorfunc, errtoken, self)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 193, in
call_errorfunc
r = errorfunc(token)
File "G:\python\pycparser-master\pycparser\c_parser.py", line 1838, in
p_error
column=self.clex.find_tok_column(p)))
File "G:\python\pycparser-master\pycparser\plyparser.py", line 67, in
_parse_error
raise ParseError("%s: %s" % (coord, msg))
pycparser.plyparser.ParseError:
projects/postgresql/src/include/pg_config_os.h:366:15: before:
pgwin32_signal_event
I am using postgresql-9.6.9, build it using visual studio express 2017 on windows 10 (64-bit)
The blog post you quoted in the comment is the canonical resource. Parsing large C projects is not easy - they have their own quirks - so it takes work. I doubt it's resolvable within the confines of a Stack Overflow question.
You need to start tackling the issues one by one - for example look at the pgwin32_signal_event token in pg_config_os.h - why can't it be parsed? Perhaps its type is unparsable? Was it defined? Could it be added to a "fake" header, etc. Unfortunately, there's no easy way to do this except working through the issues one by one.
Be sure to preprocess the file you're parsing first, dumping the full preprocessed version into a single .c file - this gets all the types into a single file you can work with.

Google App Engine dev_appserver.py: watcher_ignore_re flag "is not JSON serializable"

Why I run the dev_appserver.py with the option watcher_ignore_re, I get an error message that the regex is not JSON serializable.
Is this a bug with the development server? Am I using this command improperly? The command and callstack is printed below.
C:\Users\mes65\Documents\MyProject>"C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin\dev_appserver.py" ^
--watcher_ignore_re="(.*\.git|.*\.idea|tmp\.py)" ^
"C:\Users\mes65\Documents\MyProject"
WARNING 2018-06-06 09:28:59,161 appinfo.py:1622] lxml version "2.3" is deprecated, use one of: "3.7.3"
INFO 2018-06-06 09:28:59,187 devappserver2.py:120] Skipping SDK update check.
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\dev_appserver.py", line 96, in <module>
_run_file(__file__, globals())
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\dev_appserver.py", line 90, in _run_file
execfile(_PATHS.script_file(script_name), globals_)
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 454, in <module>
main()
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 442, in main
dev_server.start(options)
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 163, in start
bool(ssl_certificate_paths), options)
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\metrics.py", line 166, in Start
self._cmd_args = json.dumps(vars(cmd_args)) if cmd_args else None
File "C:\Python27\lib\json\__init__.py", line 244, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "C:\Python27\lib\json\encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <_sre.SRE_Pattern object at 0x00000000063C2188> is not JSON serializable
It looks like it is an issue with the google analytics code built into dev_appserver2 (google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py on or around line 316). It wants to send all of your command line options to google analytics. If you remove the analytics client id by adding the command line option --google_analytics_client_id= (note: '=' without any following value) the appserver won't call the google analytics code where it is trying to JSON serialize an SRE object and failing. However, since you are on Windows I find that the --watcher_ignore_re does not work anyway even when you get past this issue.
There is a comment in file_watcher.py
TODO: b/33178251 - Add watcher_ignore_re support for windows.
I also faced with this usability problem on Windows and was really disappointed. I tried to find some workarounds but I hadn't found any appropriate way.
In the end, I decided to make my own implementation of support watcher_ignore_re for Windows. I put required changes in my Github repo
If describe them in several words:
Add _watcher_ignore_re, _skip_files_re properties and its setters
Add import statement from google.appengine.tools.devappserver2 import watcher_common and use it in newly created def _path_ginored
Filter additional_changes before adding them to watcher changed files
For resolving mentioned problem with not serializable regex attribute we should drop them from the serialized dictionary. Fix for this is added as consequent commit and can be checked at metrics.py:185-193.
I hope it helps other guys enjoy developing on GAE on Windows :)

How can I compile Bootstrap 4 scss with python webassets?

I'm trying to simply compile Bootstrap 4 with python webassets and having zero success. For now I'm just trying to do this within the bootstrap/scss directory so path issues are less of a big deal. Within this directory I have added a main.scss file with one line:
#import "bootstrap.scss";
I have a script called test_scss.py that looks like this:
from webassets import Bundle, Environment
my_env = Environment(directory='.', url='/')
css = Bundle('main.scss', filters='scss', output='all.css')
my_env.register('css_all', css)
print(my_env['css_all'].urls())
When I run this command, I get an error trace like this:
Traceback (most recent call last):
File "./test_scss.py", line 11, in <module>
print(my_env['css_all'].urls())
File "/Users/benlindsay/miniconda/lib/python3.6/site-packages/webassets/bundle.py", line 806, in urls
urls.extend(bundle._urls(new_ctx, extra_filters, *args, **kwargs))
File "/Users/benlindsay/miniconda/lib/python3.6/site-packages/webassets/bundle.py", line 765, in _urls
*args, **kwargs)
File "/Users/benlindsay/miniconda/lib/python3.6/site-packages/webassets/bundle.py", line 619, in _build
force, disable_cache=disable_cache, extra_filters=extra_filters)
File "/Users/benlindsay/miniconda/lib/python3.6/site-packages/webassets/bundle.py", line 543, in _merge_and_apply
kwargs=item_data)
File "/Users/benlindsay/miniconda/lib/python3.6/site-packages/webassets/merge.py", line 276, in apply
return self._wrap_cache(key, func)
File "/Users/benlindsay/miniconda/lib/python3.6/site-packages/webassets/merge.py", line 218, in _wrap_cache
content = func().getvalue()
File "/Users/benlindsay/miniconda/lib/python3.6/site-packages/webassets/merge.py", line 251, in func
getattr(filter, type)(data, out, **kwargs_final)
File "/Users/benlindsay/miniconda/lib/python3.6/site-packages/webassets/filter/sass.py", line 196, in input
self._apply_sass(_in, out, os.path.dirname(source_path))
File "/Users/benlindsay/miniconda/lib/python3.6/site-packages/webassets/filter/sass.py", line 190, in _apply_sass
return self.subprocess(args, out, _in, cwd=child_cwd)
File "/Users/benlindsay/miniconda/lib/python3.6/site-packages/webassets/filter/__init__.py", line 527, in subprocess
proc.returncode, stdout, stderr))
webassets.exceptions.FilterError: scss: subprocess returned a non-success result code: 65, stdout=b'',
stderr=b'DEPRECATION WARNING: Importing from the current working directory will
not be automatic in future versions of Sass. To avoid future errors, you can add it
to your environment explicitly by setting `SASS_PATH=.`, by using the -I command
line option, or by changing your Sass configuration options.
Error: Invalid CSS after "...lor}: #{$value}": expected "{", was ";"
on line 4 of /Users/benlindsay/scratch/python/webassets/test-2/bootstrap/scss/_root.scss
from line 11 of /Users/benlindsay/scratch/python/webassets/test-2/bootstrap/scss/bootstrap.scss
from line 1 of standard input
Use --trace for backtrace.
If I follow the instructions and set environment variable SASS_PATH=., that gets rid of that part of the error message, but I still get the error
Error: Invalid CSS after "...lor}: #{$value}": expected "{", was ";"
on line 4 of /Users/benlindsay/scratch/python/webassets/test-2/bootstrap/scss/_root.scss
from line 11 of /Users/benlindsay/scratch/python/webassets/test-2/bootstrap/scss/bootstrap.scss
from line 1 of standard input
Use --trace for backtrace.
I don't know SCSS syntax well yet, but I'd bet a lot of money this is me doing something wrong and not an error in the Bootstrap SCSS. Any thoughts of what I'm doing wrong would be greatly appreciated.
Turns out it actually kind of was a problem on Bootstrap's end. See https://github.com/sass/sass/issues/2383, specifically the quote:
This is a bug in our implementation—the parser shouldn't crash—but those Bootstrap styles aren't valid for Sass 3.5 as written.
Anyway, I just needed to update to the latest version of Ruby Sass (which apparently the webassets module depends on) and that fixed it.

Unable to ping managed nodes using ansible-2.0

I downloaded the ansible-2.0.0-0.2.alpha2.tar.gz and installed it on my control machine. However now I'm not able to ping any of my machines. Previously using v1.9.2 i could communicate with them. Now it gives the following error:
Unexpected Exception: lstat() argument 1 must be encoded string without NULL bytes, not str
the full traceback was:
Traceback (most recent call last):
File "/usr/bin/ansible", line 79, in
sys.exit(cli.run())
File "/usr/lib/python2.6/site-packages/ansible/cli/adhoc.py", line 111, in run
inventory = Inventory(loader=loader, variable_manager=variable_manager, host_list=self.options.inventory)
File "/usr/lib/python2.6/site-packages/ansible/inventory/init.py", line 77, in init
self.parse_inventory(host_list)
File "/usr/lib/python2.6/site-packages/ansible/inventory/init.py", line 133, in parse_inventory
host.vars = combine_vars(host.vars, self.get_host_variables(host.name))
File "/usr/lib/python2.6/site-packages/ansible/inventory/init.py", line 499, in get_host_variables
self.vars_per_host[hostname] = self.get_host_variables(hostname, vault_password=vault_password)
File "/usr/lib/python2.6/site-packages/ansible/inventory/__init.py", line 529, in get_host_variables
vars = combine_vars(vars, self.get_host_vars(host))
File "/usr/lib/python2.6/site-packages/ansible/inventory/__init_.py", line 653, in get_host_vars
return self.get_hostgroup_vars(host=host, group=None, new_pb_basedir=new_pb_basedir)
File "/usr/lib/python2.6/site-packages/ansible/inventory/__init_.py", line 702, in _get_hostgroup_vars
base_path = os.path.realpath(os.path.join(basedir, "host_vars/%s" % host.name))
File "/usr/lib64/python2.6/posixpath.py", line 365, in realpath
if islink(component):
File "/usr/lib64/python2.6/posixpath.py", line 132, in islink
st = os.lstat(path)
TypeError: lstat() argument 1 must be encoded string without NULL bytes, not str
Any help would be appreciated.
This is a known bug due to some Unicode changes made to the playbook parser in 2.0. Several versions of Python shipped with a version of shlex.split() that fails horribly on Unicode input- you likely have one of them installed. The bug has been worked around and will be included in the next drop. See https://github.com/ansible/ansible/issues/12257

Categories