XGBoostError: Unicode-3 is not supported - python

I am trying to load an XGBClassifier in my streamlit app from a pickle file.
When I load it and try to predict on the new input values, it throws the error:
XGBoostError: [11:25:40] c:\users\administrator\workspace\xgboost-win64_release_1.6.0\src\data\array_interface.h:462: Unicode-3 is not supported.
The entire traceback is:
2022-07-02 11:25:40.046 Uncaught app exception
Traceback (most recent call last):
File "C:\Users\\Anaconda3\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 554, in _run_script
exec(code, module.__dict__)
File "temp.py", line 250, in <module>
st.write(clf.predict(feat_list))
File "C:\Users\\Anaconda3\lib\site-packages\xgboost\sklearn.py", line 1434, in predict
class_probs = super().predict(
File "C:\Users\\Anaconda3\lib\site-packages\xgboost\sklearn.py", line 1049, in predict
predts = self.get_booster().inplace_predict(
File "C:\Users\\Anaconda3\lib\site-packages\xgboost\core.py", line 2102, in inplace_predict
_check_call(
File "C:\Users\\Anaconda3\lib\site-packages\xgboost\core.py", line 203, in _check_call
raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [11:25:40] c:\users\administrator\workspace\xgboost-win64_release_1.6.0\src\data\array_interface.h:462: Unicode-3
is not supported.
I load the model this way:
clf = pickle.load(open('xgb.pkl', "rb"))
Or
clf = xgboost.XGBClassifier(tree_method ="hist", enable_categorical=True)
clf.load_model("model.json")
And I predict using:
clf.predict(feat_list)

I had a similar problem which came along with the the same XGBoostError. In my case the reason was the dtype of ndarray, which was supposed to be object.
Assuming that your feat_list is numpy.ndarray and that you create it in such way:
feat_list = np.array(features)
adding dtype=object:
feat_list = np.array(features, dtype=object)
should do the trick.

Related

Calling a pre-trained model from huggingface results in AttributeError

I'm trying to set up a pre-trained model from huggingface. This is the start of my code:
from transformers import pipeline, TFAutoModelForSequenceClassification
model_name = "microsoft/deberta-base-mnli"
text_classifier = pipeline("text-classification", model=model_name)
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, ignore_mismatched_sizes=True, from_pt=True)
But already for this last line I get the following error:
Traceback (most recent call last):
File "/home/liv/PycharmProjects/ZSTC_for_MHC/models/baselines/single_task.py", line 29, in <module>
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, ignore_mismatched_sizes=True,
File "/home/liv/PycharmProjects/ZSTC_for_MHC/.venv/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained
return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
File "/home/liv/PycharmProjects/ZSTC_for_MHC/.venv/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 1972, in from_pretrained
return load_pytorch_checkpoint_in_tf2_model(model, resolved_archive_file, allow_missing_keys=True)
File "/home/liv/PycharmProjects/ZSTC_for_MHC/.venv/lib/python3.9/site-packages/transformers/modeling_tf_pytorch_utils.py", line 122, in load_pytorch_checkpoint_in_tf2_model
logger.info(f"PyTorch checkpoint contains {sum(t.numel() for t in pt_state_dict.values()):,} parameters")
File "/home/liv/PycharmProjects/ZSTC_for_MHC/.venv/lib/python3.9/site-packages/transformers/modeling_tf_pytorch_utils.py", line 122, in <genexpr>
logger.info(f"PyTorch checkpoint contains {sum(t.numel() for t in pt_state_dict.values()):,} parameters")
AttributeError: 'dict' object has no attribute 'numel'
I'm not calling numel myself and I don't even know which dictionary. Furthermore, I want to use TensorFlow for this, so I don't know if it's alright that it shows a PyTorch error. Anyone know how to fix this? Google didn't provide any solutions.

VSCode BERT ValueError: Unable to access local path

I have written the code for an entity extraction model using bert but when I run the train.py file I get a value error.
This is the structure of my code with the configuration file in VSCode, I have downloaded bert models from here
Error
>> (myenv) PS D:\Transformers\bert-entity-extraction> python src/train.py
Configuration Complete!
Traceback (most recent call last):
File "src/train.py", line 83, in <module>
model = EntityModel(num_tag = num_tag, num_pos = num_pos)
File "D:\Transformers\bert-entity-extraction\src\model.py", line 25, in __init__
self.bert = transformers.BertModel.from_pretrained(config.BASE_MODEL_PATH)
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\modeling_utils.py", line 1080, in from_pretrained
**kwargs,
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\configuration_utils.py", line 427, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\configuration_utils.py", line 492, in get_config_dict
user_agent=user_agent,
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\file_utils.py", line 1289, in cached_path
raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse D:\Transformers\bert-entity-extraction\input\bert-base-uncased_L-12_H-768_A-12\config.json as a URL or as a local path
How to fix this?

Cannot load BERT from local disk

I am trying to use Huggingface transformer api to load a locally downloaded M-BERT model but it is throwing an exception.
I clone this repo: https://huggingface.co/bert-base-multilingual-cased
bert = TFBertModel.from_pretrained("input/bert-base-multilingual-cased")
The directory structure is:
But I am getting this error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1277, in from_pretrained
missing_keys, unexpected_keys = load_tf_weights(model, resolved_archive_file, load_weight_prefix)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 467, in load_tf_weights
with h5py.File(resolved_archive_file, "r") as f:
File "/usr/local/lib/python3.7/dist-packages/h5py/_hl/files.py", line 408, in __init__
swmr=swmr)
File "/usr/local/lib/python3.7/dist-packages/h5py/_hl/files.py", line 173, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 81, in <module>
__main__()
File "train.py", line 59, in __main__
model = create_model(num_classes)
File "/content/drive/My Drive/msc-project/code/model.py", line 26, in create_model
bert = TFBertModel.from_pretrained("input/bert-base-multilingual-cased")
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1280, in from_pretrained
"Unable to load weights from h5 file. "
OSError: Unable to load weights from h5 file. If you tried to load a TF 2.0 model from a PyTorch checkpoint, please set from_pt=True.
Where am I going wrong?
Need help!
Thanks in advance.
As it was already pointed in the comments - your from_pretrained param should be either id of a model hosted on huggingface.co or a local path:
A path to a directory containing model weights saved using
save_pretrained(), e.g., ./my_model_directory/.
See documentation
Looking at your stacktrace it seems like your code is run inside:
/content/drive/My Drive/msc-project/code/model.py so unless your model is in:
/content/drive/My Drive/msc-project/code/input/bert-base-multilingual-cased/ it won't load.
I would also set the path to be similar to documentation example ie:
bert = TFBertModel.from_pretrained("./input/bert-base-multilingual-cased/")

Key Error When Creating Training Data in TensorFlow Object Detection

I keep on getting this error with my new training data. I have tried the example data and that works but when I use my own, it gives a "Key Error". The only thing different to my data and the training data is that mine has more classes.
Full Error:
Traceback (most recent call last):
File "object_detection/create_tf_record.py", line 185, in <module>
tf.app.run()
File "C:\Users\edupt\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "object_detection/create_tf_record.py", line 180, in main
image_dir, train_examples)
File "object_detection/create_tf_record.py", line 152, in create_tf_record
tf_example = dict_to_tf_example(data, label_map_dict, image_dir)
File "object_detection/create_tf_record.py", line 97, in dict_to_tf_example
classes.append(label_map_dict[class_name])
KeyError: '300424' <---------- THAT IS THE NAME OF ONE OF THE CLASSES

What is this error raised by multiprocessing of scikit-learn model training?

When I train my models with multiprocessing support its occurs to have such error after a while;
Failed to save <type 'numpy.ndarray'> to .npy file:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/numpy_pickle.py", line 271, in save
obj, filename = self._write_array(obj, filename)
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/numpy_pickle.py", line 231, in _write_array
self.np.save(filename, array)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 491, in save
pickle_kwargs=pickle_kwargs)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/format.py", line 584, in write_array
array.tofile(fp)
IOError: 495340 requested and 502 written
I seems there is a IOError due to space devotion but when I check with df -h every drive has enough space.
How could I solve this problem?

Categories