I'm trying to get TextBlob up and running for some teammates on a Unix server, it appears to be working just fine when I run scripts that make use of TextBlob when running as root, however when I try on the new account I create I get the following error:
**********************************************************************
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- '/home/USERNAME/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
**********************************************************************
Traceback (most recent call last):
File "sampleClassifier.py", line 25, in <module>
cl = NaiveBayesClassifier(train)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 192, in __init__
self.train_features = [(self.extract_features(d), c) for d, c in self.train_set]
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 169, in extract_features
return self.feature_extractor(text, self.train_set)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 81, in basic_extractor
word_features = _get_words_from_dataset(train_set)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 63, in _get_words_from_dataset
return set(all_words)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 62, in <genexpr>
all_words = chain.from_iterable(tokenize(words) for words, _ in dataset)
File "/usr/local/lib/python2.7/dist-packages/textblob/classifiers.py", line 59, in tokenize
return word_tokenize(words, include_punc=False)
File "/usr/local/lib/python2.7/dist-packages/textblob/tokenizers.py", line 72, in word_tokenize
for sentence in sent_tokenize(text))
File "/usr/local/lib/python2.7/dist-packages/textblob/base.py", line 64, in itokenize
return (t for t in self.tokenize(text, *args, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/textblob/decorators.py", line 38, in decorated
raise MissingCorpusError()
textblob.exceptions.MissingCorpusError:
Looks like you are missing some required data for this feature.
To download the necessary data, simply run
python -m textblob.download_corpora
or use the NLTK downloader to download the missing data: http://nltk.org/data.html
If this doesn't fix the problem, file an issue at https://github.com/sloria/TextBlob/issues.
The machine we're working with is very small so I can't overwhelm it by downloading the corpora several times for different users - does anyone know how I might fix this issue? I already have it installed for root, but I don't know where the packages are or how to find them.
Following the instructions in the docs should work. Try setting NLTK_DATA environment variable and see if it works for the new user.
Related
2023-01-25 08:21:21,659 - ERROR - Traceback (most recent call last):
File "/home/xyzUser/project/queue_handler/document_queue_listner.py", line 148, in __process_and_acknowledge
pipeline_result = self.__process_document_type(message, pipeline_input)
File "/home/xyzUser/project/queue_handler/document_queue_listner.py", line 194, in __process_document_type
pipeline_result = bill_parser_pipeline.process(pipeline_input)
File "/home/xyzUser/project/main/billparser/__init__.py", line 18, in process
bill_extractor_model = MachineGeneratedBillExtractorModel()
File "/home/xyzUser/project/main/billparser/models/qa_model.py", line 25, in __new__
cls.__model = TransformersReader(model_name_or_path=cls.__model_path, use_gpu=False)
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/haystack/nodes/base.py", line 48, in wrapper_exportable_to_yaml
init_func(self, *args, **kwargs)
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/haystack/nodes/reader/transformers.py", line 93, in __init__
self.model = pipeline(
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 542, in pipeline
return task_class(model=model, framework=framework, task=task, **kwargs)
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 125, in __init__
super().__init__(
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/transformers/pipelines/base.py", line 691, in __init__
self.device = device if framework == "tf" else torch.device("cpu" if device < 0 else f"cuda:{device}")
TypeError: '<' not supported between instances of 'torch.device' and 'int'
This is the error message i got after installing a requirement.txt file from my project. I think it is related to torch but also dont know how to fix it. I am new to hugging face transformers and dont know if it is a version issue.
This was a bug with the transformers package for a number of versions prior to v4.22.0, given that particular line of code does not discern between the type of the device argument could be a torch.device before comparing that with an int. Tracing through git blame, we can find that this specific change made in changeset 9d4a45509ab include the much needed if isinstance(device, torch.device): provided by line 764 in the resulting file, which will ensure this error won't happen. Checking the tags above will show that the release for v4.22.0 and after should include this particular fix. As a refresher, to update a specific package, activate the environment, and issue the following:
pip install -U transformers
Alternatively with a specific version, e.g.:
pip install -U transformers==4.22.0
I'm trying to load a custom model called 'ru2' into spacy (for npl processing).
it can be found there: https://github.com/buriy/spacy-ru
The problem is when I call the function
nlp = spacy.load('ru2')
doc = nlp(text)
I see the error
C:\ProgramData\Anaconda3\lib\importlib\_bootstrap.py:205: RuntimeWarning: spacy.tokens.span.Span size changed, may indicate binary incompatibility. Expected 72 from C header, got 80 from PyObject
return f(*args, **kwds)
Traceback (most recent call last):
File "C://.../nlp/src/ie/main.py", line 125, in <module>
main(examp_dict['Poroshenko'])
File "C://.../nlp/src/ie/main.py", line 92, in main
nlp = spacy.load('ru2')
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\__init__.py", line 27, in load
return util.load_model(name, **overrides)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 133, in load_model
return load_model_from_path(Path(name), **overrides)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 173, in load_model_from_path
return nlp.from_disk(model_path)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py", line 791, in from_disk
util.from_disk(path, deserializers, exclude)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 630, in from_disk
reader(path / key)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py", line 781, in <lambda>
deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(p, exclude=["vocab"])
File "tokenizer.pyx", line 391, in spacy.tokenizer.Tokenizer.from_disk
File "tokenizer.pyx", line 432, in spacy.tokenizer.Tokenizer.from_bytes
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 606, in from_bytes
msg = srsly.msgpack_loads(bytes_data)
File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\_msgpack_api.py", line 29, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\msgpack\__init__.py", line 60, in unpackb
return _unpackb(packed, **kwargs)
File "_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
TypeError: unhashable type: 'list'
I was searching for similar questions in the Internet:
https://github.com/explosion/spaCy/issues/2715
https://spacy.io/usage#unhashable-list
But non of those solutions work for me.
I use
msgpack==0.5.6 (even downgraded as suggested in the link above)
spacy==2.1.4
Here is from https://spacy.io/usage#troubleshooting
If you’re training models, writing them to disk, and versioning them with git, you might encounter this error when trying to load them in a Windows environment. This happens because a default install of Git for Windows is configured to automatically convert Unix-style end-of-line characters (LF) to Windows-style ones (CRLF) during file checkout (and the reverse when committing). While that’s mostly fine for text files, a trained model written to disk has some binary files that should not go through this conversion. When they do, you get the error above. You can fix it by either changing your core.autocrlf setting to "false", or by committing a .gitattributes file] to your repository to tell git on which files or folders it shouldn’t do LF-to-CRLF conversion, with an entry like path/to/spacy/model/** -text. After you’ve done either of these, clone your repository again.
It might be because the version number of SpaCy used to generate your model is not the same as the version of SpaCy you have installed. (I don't know of course, just mentioning it in case it helps.)
Adding to the answer above, another quick fix would be to manually download the zip from the repository.
I need to parse an open-source project Postgresql using pycparser.
While parsing its source-code the following error arises:
Traceback (most recent call last):
File "examples\using_cpp_libc.py", line 48, in <module>
getAllFiles(projectName)
File "examples\using_cpp_libc.py", line 29, in getAllFiles
ast = parse_file(dirName+'\\'+fname, use_cpp = True, cpp_path = 'cpp',
cpp_args = [r'-nostdinc',r'-Iutils/fake_libc_include',r'-
Iprojects/postgresql/src/include'])
File "G:\python\pycparser-master\pycparser\__init__.py", line 92, in
parse_file
return parser.parse(text, filename)
File "G:\python\pycparser-master\pycparser\c_parser.py", line 152, in parse
debug=debuglevel)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 334, in parse
return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 1204, in
parseopt_notrack
tok = call_errorfunc(self.errorfunc, errtoken, self)
File "G:\python\pycparser-master\pycparser\ply\yacc.py", line 193, in
call_errorfunc
r = errorfunc(token)
File "G:\python\pycparser-master\pycparser\c_parser.py", line 1838, in
p_error
column=self.clex.find_tok_column(p)))
File "G:\python\pycparser-master\pycparser\plyparser.py", line 67, in
_parse_error
raise ParseError("%s: %s" % (coord, msg))
pycparser.plyparser.ParseError:
projects/postgresql/src/include/pg_config_os.h:366:15: before:
pgwin32_signal_event
I am using postgresql-9.6.9, build it using visual studio express 2017 on windows 10 (64-bit)
The blog post you quoted in the comment is the canonical resource. Parsing large C projects is not easy - they have their own quirks - so it takes work. I doubt it's resolvable within the confines of a Stack Overflow question.
You need to start tackling the issues one by one - for example look at the pgwin32_signal_event token in pg_config_os.h - why can't it be parsed? Perhaps its type is unparsable? Was it defined? Could it be added to a "fake" header, etc. Unfortunately, there's no easy way to do this except working through the issues one by one.
Be sure to preprocess the file you're parsing first, dumping the full preprocessed version into a single .c file - this gets all the types into a single file you can work with.
I am trying to convert a JaCoCo coverage report to Cobertura format (since Shippable only supports Cobertura). This guy claims to have a tool to convert JaCoCo to Cobertura, however when running his script I get the following error:
Traceback (most recent call last):
File "cover2cover.py", line 151, in <module>
jacoco2cobertura(filename, source_root)
File "cover2cover.py", line 139, in jacoco2cobertura
convert_root(root, into, source_root)
File "cover2cover.py", line 127, in convert_root
packages.append(convert_package(package))
File "cover2cover.py", line 113, in convert_package
c_classes.append(convert_class(j_class, j_package))
File "cover2cover.py", line 100, in convert_class
c_methods.append(convert_method(j_method, j_method_lines))
File "cover2cover.py", line 85, in convert_method
convert_lines(j_lines, c_method)
File "cover2cover.py", line 33, in convert_lines
for jline in j_lines:
File "cover2cover.py", line 23, in method_lines
larger = list(int(jm.attrib['line']) for jm in jmethods if int(jm.attrib['line']) > start_line)
File "cover2cover.py", line 23, in <genexpr>
larger = list(int(jm.attrib['line']) for jm in jmethods if int(jm.attrib['line']) > start_line)
KeyError: 'line'
I know nothing about python, so any help would be appreciated.
I don't know python either, but I know that python 2 and python 3 have significant differences. Perhaps you ran into that?
I was able to run the script ok with this version:
$> python --version
Python 2.7.11
To ensure I got the script without any download or browser or line-ending type issues, I did clone the git repo:
$> git clone https://github.com/rix0rrr/cover2cover.git
Then the script ran first try on my jacoco XML file.
I am trouble getting the Maltparser to work with Python NLTK.
Here is my code so far:
import nltk
os.environ["MALT_PARSER"] = "C:/Python34/maltparser-1.8.1"
os.environ["MALTPARSERHOME"] = "C:/Python34/maltparser-1.8.1"
parser8 = nltk.parse.malt.MaltParser(
... working_dir="C:/Python34/maltparser-1.8.1", mco="engmalt.poly-1.7",
... additional_java_args=['-Xmx512m'])
txt = "This is a test sentence"
parser8.raw_parse(txt)
I have downloaded and selected to used a pre-trained model.
This is the response I get:
runfile('C:/Anaconda/Lib/site-packages/nltk/malt2.py', wdir='C:/Anaconda/Lib/site-packages/nltk')
Traceback (most recent call last):
File "<ipython-input-38-73069e4ee673>", line 1, in <module>
runfile('C:/Anaconda/Lib/site-packages/nltk/malt2.py', wdir='C:/Anaconda/Lib/site-packages/nltk')
File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "C:/Anaconda/Lib/site-packages/nltk/malt2.py", line 14, in <module>
parser8.raw_parse(txt)
File "C:\Anaconda\lib\site-packages\nltk\parse\malt.py", line 139, in raw_parse
return self.parse(words, verbose)
File "C:\Anaconda\lib\site-packages\nltk\parse\malt.py", line 126, in parse
return self.parse_sents([sentence], verbose)[0]
File "C:\Anaconda\lib\site-packages\nltk\parse\malt.py", line 114, in parse_sents
return self.tagged_parse_sents(tagged_sentences, verbose)
File "C:\Anaconda\lib\site-packages\nltk\parse\malt.py", line 194, in tagged_parse_sents
"code %d" % (' '.join(cmd), ret))
Exception: MaltParser parsing (java -Xmx512m -jar C:/Python34/maltparser-1.8.1\malt.jar -w C:/Python34/maltparser-1.8.1 -c engmalt.poly-1.7.mco -i C:\Python34\maltparser-1.8.1\malt_input.conllqgpbye -o C:\Python34\maltparser-1.8.1\malt_output.conllib1nx0 -m parse) failed with exit code 2
I have followed all the advice on this post How to use malt parser in python nltk.
Specifically:
-I downloaded the latest version of MaltParser.
-Using Pip, I uninstalled and re installed NLTK to get the latest version, which includes the addition in malt/py that allows 'additional_java_args' to be added as a parameter.
-I renamed the jar file to 'malt.jar'.
-I set an environmental variable pointing both MALT_PARSER and MALTPARSERHOME to the working directory.
-I've tried both the linear and poly pre-trained models.
-The code for malt.py can be found here http://www.nltk.org/_modules/nltk/parse/malt.html
If there isn't an apparent solution, how can I continue to debug this myself?
It seems that there's some slash (/ ) inconsistency with the exception raised. Nothing I do seems to fix it though.