Updating scikit-learn: 'SVC' object has no attribute '_probA'? - python

We updated to Python 3.8.2 and are getting an error with scikit-learn:
Traceback (most recent call last):
File "manage.py", line 16, in <module>
execute_from_command_line(sys.argv)
File "/home/ubuntu/myWebApp/.venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/home/ubuntu/myWebApp/.venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/ubuntu/myWebApp/.venv/lib/python3.8/site-packages/django/core/management/base.py", line 316, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/ubuntu/myWebApp/.venv/lib/python3.8/site-packages/django/core/management/base.py", line 353, in execute
output = self.handle(*args, **options)
File "/home/ubuntu/myWebApp/server_modules/rss_ml_score/management/commands/rssmlscore.py", line 22, in handle
run.build_and_predict(days=options['days'], rescore=options['rescore'])
File "/home/ubuntu/myWebApp/server_modules/rss_ml_score/utils/run.py", line 96, in build_and_predict
predict_all(filename)
File "/home/ubuntu/myWebApp/server_modules/rss_ml_score/models/predict_model.py", line 135, in predict_all
voting_predicted_hard, voting_predicted_soft = predict_from_multiple_estimator(fitted_estimators, X_predict_list,
File "/home/ubuntu/myWebApp/server_modules/rss_ml_score/models/train_model.py", line 66, in predict_from_multiple_estimator
pred_prob1 = np.asarray([clf.predict_proba(X)
File "/home/ubuntu/myWebApp/server_modules/rss_ml_score/models/train_model.py", line 66, in <listcomp>
pred_prob1 = np.asarray([clf.predict_proba(X)
File "/home/ubuntu/myWebApp/.venv/lib/python3.8/site-packages/sklearn/svm/_base.py", line 662, in _predict_proba
if self.probA_.size == 0 or self.probB_.size == 0:
File "/home/ubuntu/myWebApp/.venv/lib/python3.8/site-packages/sklearn/svm/_base.py", line 759, in probA_
return self._probA
AttributeError: 'SVC' object has no attribute '_probA'
Do I need to use another library in addition to sci-kit learn to access _probA?
Update in response to a comment:
The line of code that's throwing the error is:
pred_prob1 = np.asarray([clf.predict_proba(X)
for clf, X in zip(estimators, X_list)])
...that calls this line in _base.py:
def _predict_proba(self, X):
X = self._validate_for_predict(X)
if self.probA_.size == 0 or self.probB_.size == 0:
...which calls this line, also in _base.py:
#property
def probA_(self):
return self._probA
...which throws the error:
AttributeError: 'SVC' object has no attribute '_probA'
All this has worked fine for many months, but is not currently working, even after updating to the latest scikit-learn.

It turned out that I had to stay with the same version of sci-kit that was used to train the models we currently have (scikit-learn==0.21.2). Later versions of scikit don't work with our existing code / models. If we want to upgrade scikit, we have to retrain our models with the new version of scikit.

I also had this error and I tried so much to resolve it. At the end it got resolved by just version compatible issues. Whatever the version used in building the model, the same version should be used in load

Related

Create binary of spaCy with PyInstaller

I would like to create a binary of my python code, that contains spaCy.
# main.py
import spacy
import en_core_web_sm
def main() -> None:
nlp = spacy.load("en_core_web_sm")
# nlp = en_core_web_sm.load()
doc = nlp("This is an example")
print([(w.text, w.pos_) for w in doc])
if __name__ == "__main__":
main()
Besides my code, I created two PyInstaller-hooks, as described here
To create the binary I use the following command pyinstaller main.py --additional-hooks-dir=..
On the execution of the binary I get the following error message:
Traceback (most recent call last):
File "main.py", line 19, in <module>
main()
File "main.py", line 12, in main
nlp = spacy.load("en_core_web_sm")
File "spacy/__init__.py", line 47, in load
File "spacy/util.py", line 329, in load_model
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
If I use nlp = en_core_web_sm.load() instead if nlp = spacy.load("en_core_web_sm") to load the spacy model, I get the following error:
Traceback (most recent call last):
File "main.py", line 19, in <module>
main()
File "main.py", line 13, in main
nlp = en_core_web_sm.load()
File "en_core_web_sm/__init__.py", line 10, in load
File "spacy/util.py", line 514, in load_model_from_init_py
File "spacy/util.py", line 389, in load_model_from_path
File "spacy/util.py", line 426, in load_model_from_config
File "spacy/language.py", line 1662, in from_config
File "spacy/language.py", line 768, in add_pipe
File "spacy/language.py", line 659, in create_pipe
File "thinc/config.py", line 722, in resolve
File "thinc/config.py", line 771, in _make
File "thinc/config.py", line 826, in _fill
File "thinc/config.py", line 825, in _fill
File "thinc/config.py", line 1016, in make_promise_schema
File "spacy/util.py", line 137, in get
catalogue.RegistryError: [E893] Could not find function 'spacy.Tok2Vec.v1' in function registry 'architectures'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.
I had this same issue. After the error message you posted above, did you see an "Available names: ..." message? This message suggested that spacy.Tok2Vec.v2 was available but not v1. I was able to edit the config file for en_core_web_sm (for me at dist<name>\en_core_web_sm\en_core_web_sm-3.0.0\config.cfg) and change all references for spacy.Tok2Vec.v1 -> spacy.Tok2Vec.v2. I also had to do this for spacy.MaxoutWindowEncoder.v1. It's still a mystery to me as to why I'm having the issue only in the pyinstaller distributable and not my non-compiled script.
I encountered the same issue and nailed it by copying the spacy-legacy package to the compiled destination directory.
You can also hook it up by Pyinstaller but I did not really try that.
I hope my answer helps.

flake8 internal error in regular expression engine

I'm trying to run flake8 on a docker django built like described here (tutorial page)
when building the docker image I get an error from flake8 which is run in an docker-compose file with like so
$ flake8 --ignore=E501,F401 .
multiprocessing.pool.RemoteTraceback:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **
File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 48, in
return list(map(*
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 666, in
return checker.run_checks()
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 598, in run_checks
self.run_ast_checks()
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 495, in run_ast_checks
checker = self.run_check(plugin, tree=ast)
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 426, in run_check
self.processor.keyword_arguments_for( File "/usr/local/lib/python3.8/site-packages/flake8/processor.py", line 241, in keyword_arguments_for arguments[param] = getattr(self, param)
File "/usr/local/lib/python3.8/site-packages/flake8/processor.py", line 119, in file_tokens
self._file_tokens = list(
File "/usr/local/lib/python3.8/tokenize.py", line 525, in _tokenize
pseudomatch = _compile(PseudoToken).match(line, pos)
RuntimeError: internal error in regular expression engine
The above exception was the direct cause of the following exception: Traceback (most recent call last):
File "/usr/local/bin/flake8", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/flake8/main/cli.py", line 18, in main
app.run(argv)
File "/usr/local/lib/python3.8/site-packages/flake8/main/application.py", line 393, in run
self._run(argv)
File "/usr/local/lib/python3.8/site-packages/flake8/main/application.py", line 381, in _run
self.run_checks()
File "/usr/local/lib/python3.8/site-packages/flake8/main/application.py", line 300, in run_checks
self.file_checker_manager.run()
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 329, in run
self.run_parallel()
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 293, in run_parallel
for ret in pool_map:
File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 448, in <genexpr>
return (item for chunk in result for item in chunk)
File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 865, in next
raise value
RuntimeError: internal error in regular expression engine
When I run flake8 with the --verbose flag, I get an error like this:
Fatal Python error: deletion of interned string failed
Python runtime state: initialized
KeyError: 'FILENAME_RE'
from the tokenizer.py
Does anyone know how to solve this?
Additional Data:
running docker-compose v1.25.4 on an raspberry 3 with buster lite.
Installed and compiled Python 3.8.2 from source with the flag --enableloadable-sqlite
Thanks for helping!
If you don't care for flake8, then just delete that line. It will work. I had the same problem.
I know this isn't an answer. I would add a comment instead if I could.
I encountered a very similar error profile when I had refactored and left a lot of data which formerly had been in the top of the project down in what became the package folder. There were 33621 files including .venv, .nox, .pytest_cache, .coverage.
File "/usr/lib/python3.8/multiprocessing/pool.py", line 448, in <genexpr>
return (item for chunk in result for item in chunk)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
IndexError: string index out of range
If you see a similar signature (the parallel processing engine being overwhelmed in some way and throwing an exception), you may want to review your project directory structure and make sure that you did not put this kind of data in your sources directories.

log metric from inside PythonScriptStep to parent PipelineRun

SDK version: 1.0.43
To minimize clicking and compare accuracy between PipelineRuns, I'd like to log a metric from inside a PythonScriptStep to the parent PipelineRun. I thought I could do this like:
from azureml.core import Run
run = Run.get_context()
foo = 0.80
run.parent.log("accuracy",foo)
however I get this error.
Traceback (most recent call last):
File "get_metrics.py", line 62, in <module>
run.parent.log("geo_mean", top3_runs)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/core/run.py", line 459, in parent
return None if parent_run_id is None else get_run(self.experiment, parent_run_id)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/core/run.py", line 1713, in get_run
return next(runs)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/core/run.py", line 297, in _rehydrate_runs
yield factory(experiment, run_dto)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/pipeline/core/run.py", line 325, in _from_dto
return PipelineRun(experiment=experiment, run_id=run_dto.run_id)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/pipeline/core/run.py", line 74, in __init__
service_endpoint=_service_endpoint)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/pipeline/core/_graph_context.py", line 46, in __init__
service_endpoint=service_endpoint)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/pipeline/core/_aeva_provider.py", line 118, in create_provider
service_endpoint=service_endpoint)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/pipeline/core/_aeva_provider.py", line 133, in create_service_caller
service_endpoint = _AevaWorkflowProvider.get_endpoint_url(workspace, experiment_name)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/pipeline/core/_aeva_provider.py", line 153, in get_endpoint_url
workspace_name=workspace.name, workspace_id=workspace._workspace_id)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/core/workspace.py", line 749, in _workspace_id
self.get_details()
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/core/workspace.py", line 594, in get_details
self._subscription_id)
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/_project/_commands.py", line 507, in show_workspace
AzureMachineLearningWorkspaces, subscription_id).workspaces,
File "/azureml-envs/azureml_ffecfef6fbfa1d89f72d5af22e52c081/lib/python3.6/site-packages/azureml/core/authentication.py", line 112, in _get_service_client
all_subscription_list, tenant_id = self._get_all_subscription_ids()
TypeError: 'NoneType' object is not iterable
Update
On further investigation, I tried just printing the parent attribute of the run with the line below and got the same Traceback
print("print run parent attribute", run.parent)
The get_properties() method the below. I'm guessing that azureml just uses the azureml.pipelinerunid property for pipeline tree hierarchy, and that the parent attribute has been left for any user-defined hierarchies.
{
"azureml.runsource": "azureml.StepRun",
"ContentSnapshotId": "45bdecd3-1c43-48da-af5c-c95823c407e0",
"StepType": "PythonScriptStep",
"ComputeTargetType": "AmlCompute",
"azureml.pipelinerunid": "e523d575-c373-46d2-a4bc-1717f5e34ec2",
"_azureml.ComputeTargetType": "batchai",
"AzureML.DerivedImageName": "azureml/azureml_dfd7f4f952ace529f986fe919909c3ec"
}
Please upgrade your SDK to the latest version. Seems like this issue was fixed sometime after 1.0.43.
Was run defined? Can you try ...
run = Run.get_context()
run.parent.log('metric1', 0.80)
I'm unable to repro on 1.0.60 (latest) or 1.0.43 (as in post)
I can do run.parent.log()
run = Run.get_context()
run.parent.log('metric1', 0.80)
in my step
and then I can do
r = Run(ws.experiments["expName"], runid)
r.get_metrics()
and can see metric on pipeline run.
Unclear if i miss something.

torchfile.T7ReaderException

I'm working on Windows 10. I get this issue:
DATAPATH: ../data/coco/test/val_captions.t7
Traceback (most recent call last):
File "main.py", line 77, in <module>
algo.sample(datapath, cfg.STAGE)
File "D:\documenti\Monica\StackGAN-Pytorch\code\trainer.py", line 243, in sample
t_file = torchfile.load(datapath)
File "C:\Users\Utente\venv\lib\site-packages\torchfile.py", line 424, in load
return reader.read_obj()
File "C:\Users\Utente\venv\lib\site-packages\torchfile.py", line 386, in read_obj
v = self.read_obj()
File "C:\Users\Utente\venv\lib\site-packages\torchfile.py", line 386, in read_obj
v = self.read_obj()
File "C:\Users\Utente\venv\lib\site-packages\torchfile.py", line 414, in read_obj
"unknown object type / typeidx: {}".format(typeidx))
torchfile.T7ReaderException: unknown object type / typeidx: -1112529805
Can anyone help me?
I installed python 3.6.8, pytorch, torchfile, torchvision, etc., and I'm working on virtualenv.

OSError: Result too large

I'am playing around with scapy but i cant get it to work. I tried different code's but all gave me the same output:
Traceback (most recent call last):
File "<module1>", line 7, in <module>
File "C:\Python26\lib\site-packages\scapy\sendrecv.py", line 357, in srp
s = conf.L2socket(iface=iface, filter=filter, nofilter=nofilter, type=type)
File "C:\Python26\lib\site-packages\scapy\arch\pcapdnet.py", line 313, in __init__
self.outs = dnet.eth(iface)
File "dnet.pyx", line 112, in dnet.eth.__init__
OSError: Result too large
Iam using python 2.6 with all dependencies installed for scapy.
How to fix this?

Categories