Create binary of spaCy with PyInstaller - python

I would like to create a binary of my python code, that contains spaCy.
# main.py
import spacy
import en_core_web_sm
def main() -> None:
nlp = spacy.load("en_core_web_sm")
# nlp = en_core_web_sm.load()
doc = nlp("This is an example")
print([(w.text, w.pos_) for w in doc])
if __name__ == "__main__":
main()
Besides my code, I created two PyInstaller-hooks, as described here
To create the binary I use the following command pyinstaller main.py --additional-hooks-dir=..
On the execution of the binary I get the following error message:
Traceback (most recent call last):
File "main.py", line 19, in <module>
main()
File "main.py", line 12, in main
nlp = spacy.load("en_core_web_sm")
File "spacy/__init__.py", line 47, in load
File "spacy/util.py", line 329, in load_model
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
If I use nlp = en_core_web_sm.load() instead if nlp = spacy.load("en_core_web_sm") to load the spacy model, I get the following error:
Traceback (most recent call last):
File "main.py", line 19, in <module>
main()
File "main.py", line 13, in main
nlp = en_core_web_sm.load()
File "en_core_web_sm/__init__.py", line 10, in load
File "spacy/util.py", line 514, in load_model_from_init_py
File "spacy/util.py", line 389, in load_model_from_path
File "spacy/util.py", line 426, in load_model_from_config
File "spacy/language.py", line 1662, in from_config
File "spacy/language.py", line 768, in add_pipe
File "spacy/language.py", line 659, in create_pipe
File "thinc/config.py", line 722, in resolve
File "thinc/config.py", line 771, in _make
File "thinc/config.py", line 826, in _fill
File "thinc/config.py", line 825, in _fill
File "thinc/config.py", line 1016, in make_promise_schema
File "spacy/util.py", line 137, in get
catalogue.RegistryError: [E893] Could not find function 'spacy.Tok2Vec.v1' in function registry 'architectures'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.

I had this same issue. After the error message you posted above, did you see an "Available names: ..." message? This message suggested that spacy.Tok2Vec.v2 was available but not v1. I was able to edit the config file for en_core_web_sm (for me at dist<name>\en_core_web_sm\en_core_web_sm-3.0.0\config.cfg) and change all references for spacy.Tok2Vec.v1 -> spacy.Tok2Vec.v2. I also had to do this for spacy.MaxoutWindowEncoder.v1. It's still a mystery to me as to why I'm having the issue only in the pyinstaller distributable and not my non-compiled script.

I encountered the same issue and nailed it by copying the spacy-legacy package to the compiled destination directory.
You can also hook it up by Pyinstaller but I did not really try that.
I hope my answer helps.

Related

Using pytube on PyCharm

I have a script, which has been working fine in April 2021 when I created it, but now it gives me the following error. I'm not very experienced in coding, so if anyone can help me it would be great.
What I'm trying to do is simply download a song from youtube as a mp4. I can see that the error says there is something wrong with the pytube module imported, but I am not skilled enough to see what it is.
I'm using MacOS 12.1, Pycharm 2020.3, and Python 3.9.
Script:
import pytube
url = str('https://www.youtube.com/watch?v=gJLIiF15wjQ')
youtube = pytube.YouTube(url)
video = youtube.streams.get_by_itag(140)
video.download(output_path='/Users/clarajacobsen/Documents/TrueFIR/Klub100/Songs/', filename='test')
Error:
Traceback (most recent call last):
File "/Users/user/Documents/Folder1/venv/test.py", line 8, in <module>
video = youtube.streams.get_by_itag(140)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/__main__.py", line 292, in streams
return StreamQuery(self.fmt_streams)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/__main__.py", line 177, in fmt_streams
extract.apply_signature(stream_manifest, self.vid_info, self.js)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/extract.py", line 409, in apply_signature
cipher = Cipher(js=js)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/cipher.py", line 43, in __init__
self.throttling_plan = get_throttling_plan(js)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/cipher.py", line 387, in get_throttling_plan
raw_code = get_throttling_function_code(js)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/cipher.py", line 301, in get_throttling_function_code
code_lines_list = find_object_from_startpoint(js, match.span()[1]).split('\n')
AttributeError: 'NoneType' object has no attribute 'span'
After trying out solution 1, suggested by Sarim, error in PyCharm:
Traceback (most recent call last):
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/__main__.py", line 177, in fmt_streams
extract.apply_signature(stream_manifest, self.vid_info, self.js)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/extract.py", line 409, in apply_signature
cipher = Cipher(js=js)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/cipher.py", line 29, in __init__
self.transform_plan: List[str] = get_transform_plan(js)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/cipher.py", line 197, in get_transform_plan
return regex_search(pattern, js, group=1).split(";")
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/helpers.py", line 129, in regex_search
raise RegexMatchError(caller="regex_search", pattern=pattern)
pytube.exceptions.RegexMatchError: regex_search: could not find match for iha=function\(\w\){[a-z=\.\(\"\)]*;(.*);(?:.+)}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/user/Documents/Folder1/venv/test.py", line 5, in <module>
video = youtube.streams.get_by_itag(140)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/__main__.py", line 292, in streams
return StreamQuery(self.fmt_streams)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/__main__.py", line 184, in fmt_streams
extract.apply_signature(stream_manifest, self.vid_info, self.js)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/extract.py", line 409, in apply_signature
cipher = Cipher(js=js)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/cipher.py", line 29, in __init__
self.transform_plan: List[str] = get_transform_plan(js)
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/cipher.py", line 197, in get_transform_plan
return regex_search(pattern, js, group=1).split(";")
File "/Users/user/Documents/Folder1/venv/lib/python3.9/site-packages/pytube/helpers.py", line 129, in regex_search
raise RegexMatchError(caller="regex_search", pattern=pattern)
pytube.exceptions.RegexMatchError: regex_search: could not find match for iha=function\(\w\){[a-z=\.\(\"\)]*;(.*);(?:.+)}
After trying to run it in Google Colab:
/usr/local/lib/python3.7/dist-packages/pytube/cipher.py in get_throttling_function_code(js)
299
300 # Extract the code within curly braces for the function itself, and merge any split lines
--> 301 code_lines_list = find_object_from_startpoint(js, match.span()[1]).split('\n')
302 joined_lines = "".join(code_lines_list)
303
AttributeError: 'NoneType' object has no attribute 'span'
To fix this issue, This doesnt depends on which operating system you are on or which python you are using. Follow these steps:
I used Colab for this, if you are using Google colab use it and test it.
Install Pytube with !pip install pytube
After installing pytube just shutdown the kernel and the application you are using for it. either VSCode, Jupyter notebook or Colab. shut down its kernel.
Then run the enviroment again and try importing and running your code.
It should run now.
or if it gives you the same error as before:
Go to the files where pytube is install and go to folder in pytube named "pytube" then go into "cipher.py" and open it.
Search for the line: 293. Where name = re.escape(get_throttling_function_name(js))
Replace name = "iha"
Then close all kernels and file you are running the code on. and restart them completely after shutting down.
These two solutions should work 100%. Solution that worked for me is first one.
As the error tells us, you have a NoneType object called youtube in line 8 which was created before in line 7. Did you check if the YouTube link or anything on that video page that concerns you has changed?

VSCode BERT ValueError: Unable to access local path

I have written the code for an entity extraction model using bert but when I run the train.py file I get a value error.
This is the structure of my code with the configuration file in VSCode, I have downloaded bert models from here
Error
>> (myenv) PS D:\Transformers\bert-entity-extraction> python src/train.py
Configuration Complete!
Traceback (most recent call last):
File "src/train.py", line 83, in <module>
model = EntityModel(num_tag = num_tag, num_pos = num_pos)
File "D:\Transformers\bert-entity-extraction\src\model.py", line 25, in __init__
self.bert = transformers.BertModel.from_pretrained(config.BASE_MODEL_PATH)
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\modeling_utils.py", line 1080, in from_pretrained
**kwargs,
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\configuration_utils.py", line 427, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\configuration_utils.py", line 492, in get_config_dict
user_agent=user_agent,
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\file_utils.py", line 1289, in cached_path
raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse D:\Transformers\bert-entity-extraction\input\bert-base-uncased_L-12_H-768_A-12\config.json as a URL or as a local path
How to fix this?

FMUException: Failed to setup the experiment

I have a fmu which is created in GT-Suite and am trying to work with it in python.
I have followed jmodelica tutorials
from pyfmi import load_fmu
model = load_fmu('myFMU.fmu')
res = model.simulate(final_time=10)
My fmu gets loaded but when I try to run model.simulate step it throws an error
Traceback (most recent call last):
File "<ipython-input-3-4812da4bb52b>", line 1, in <module>
res = model.simulate(final_time=10)
File "src\pyfmi\fmi.pyx", line 6981, in pyfmi.fmi.FMUModelCS2.simulate
File "src\pyfmi\fmi.pyx", line 304, in pyfmi.fmi.ModelBase._exec_simulate_algorithm
File "src\pyfmi\fmi.pyx", line 298, in pyfmi.fmi.ModelBase._exec_simulate_algorithm
File "C:\Users\chinn\Anaconda3\envs\test_env\lib\site-packages\pyfmi\fmi_algorithm_drivers.py", line 761, in __init__
self.model.setup_experiment(start_time=start_time, stop_time_defined=self.options["stop_time_defined"], stop_time=final_time)
File "src\pyfmi\fmi.pyx", line 4292, in pyfmi.fmi.FMUModelBase2.setup_experiment
FMUException: Failed to setup the experiment.
I have tried running it in multiple environments in my pc but am getting the same error. Googled a lot but couldn't find anything. Can some one help me with resolving this issue?
The fmu is probably not exported with the correct license setting.

spaCy: errors attempting to load serialized Doc

I am trying to serialize/deserialize spaCy documents (setup is Windows 7, Anaconda) and am getting errors. I haven't been able to find any explanations. Here is a snippet of code and the error it generates:
import spacy
nlp = spacy.load('en')
text = 'This is a test.'
doc = nlp(text)
fout = 'test.spacy' # <-- according to the API for Doc.to_disk(), this needs to be a directory (but for me, spaCy writes a file)
doc.to_disk(fout)
doc.from_disk(fout)
Traceback (most recent call last):
File "<ipython-input-7-aa22bf1b9689>", line 1, in <module>
doc.from_disk(fout)
File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk
File "doc.pyx", line 806, in spacy.tokens.doc.Doc.from_bytes
ValueError: [E033] Cannot load into non-empty Doc of length 5.
I have also tried creating a new Doc object and loading from that, as shown in the example ("Example: Saving and loading a document") in the spaCy docs, which results in a different error:
from spacy.tokens import Doc
from spacy.vocab import Vocab
new_doc = Doc(Vocab()).from_disk(fout)
Traceback (most recent call last):
File "<ipython-input-16-4d99a1199f43>", line 1, in <module>
Doc(Vocab()).from_disk(fout)
File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk
File "doc.pyx", line 838, in spacy.tokens.doc.Doc.from_bytes
File "stringsource", line 646, in View.MemoryView.memoryview_cwrapper
File "stringsource", line 347, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
EDIT:
As pointed out in the replies, the path provided should be a directory. However, the first code snippet creates a file. Changing this to a non-existing directory path doesn't help as spaCy still creates a file. Attempting to write to an existing directory causes an error too:
fout = 'data'
doc.to_disk(fout) Traceback (most recent call last):
File "<ipython-input-8-6c30638f4750>", line 1, in <module>
doc.to_disk(fout)
File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
opener=self._opener)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
return self._accessor.open(self, flags, mode)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: 'data'
Python has no problem writing at this location via standard file operations (open/read/write).
Trying with a Path object yields the same results:
from pathlib import Path
import os
fout = Path(os.path.join(os.getcwd(), 'data'))
doc.to_disk(fout)
Traceback (most recent call last):
File "<ipython-input-17-6c30638f4750>", line 1, in <module>
doc.to_disk(fout)
File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
opener=self._opener)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
return self._accessor.open(self, flags, mode)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Username\\workspace\\data'
Any ideas why this might be happening?
doc.to_disk(fout)
must be
a path to a directory, which will be created if it doesn't exist.
Paths may be either strings or Path-like objects.
as the documentation for spaCy states in https://spacy.io/api/doc
Try changing fout to a directory, it might do the trick.
EDIT:
Examples from the spacy documentation:
for doc.to_disk:
doc.to_disk('/path/to/doc')
and for doc.from_disk:
from spacy.tokens import Doc
from spacy.vocab import Vocab
doc = Doc(Vocab()).from_disk('/path/to/doc')

What does this Python message mean?

ho-fe3fdd00-12:~ Sam$ easy_install BeautifulSoup
Traceback (most recent call last):
File "/usr/bin/easy_install", line 8, in <module>
load_entry_point('setuptools==0.6c7', 'console_scripts', 'easy_install')()
File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/setuptools/command/easy_install.py", line 1670, in main
with_ei_usage(lambda:
File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/setuptools/command/easy_install.py", line 1659, in with_ei_usage
return f()
File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/setuptools/command/easy_install.py", line 1674, in <lambda>
distclass=DistributionWithoutHelpCommands, **kw
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/core.py", line 125, in setup
dist.parse_config_files()
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py", line 373, in parse_config_files
parser.read(filename)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ConfigParser.py", line 267, in read
self._read(fp, filename)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ConfigParser.py", line 462, in _read
raise MissingSectionHeaderError(fpname, lineno, line)
ConfigParser.MissingSectionHeaderError: File contains no section headers.
file: /Users/Sam/.pydistutils.cfg, line: 1
'install_lib = ~/Library/Python/$py_version_short/site-packages\n'
I am trying to install beautifulsoup.
The first two lines in ~/.pydistutils.cfg:
install_lib = ~/Library/Python/$py_version_short/site-packages
install_scripts = ~/bin
BeautifulSoup is a pure Python module which you can install by grabbing the BeautifulSoup.py file (eg. from inside the standard .tar.gz distribution) and putting it somewhere on your PythonPath - eg. inside /Users/Sam/Library/Python/2.5/site-packages, if the paths mentioned in the error message are accurate.
No need for fussy and error-prone installers which just overcomplicate the issue.
The configuration file .pydstutils.cfg has a syntax error.
Try to add the line at the top of ~/.pydistutils.cfg:
[easy_install]

Categories