Getting started: Huggingface Model Cards - python

I just recently started looking into the huggingface transformer library.
When I tried to get started using the model card code at e.g. community model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
However, I got the following error:
Traceback (most recent call last):
File "test.py", line 2, in <module>
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
File "/Users/Lukas/miniconda3/envs/nlp/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 124, in from_pretrained
"'xlm', 'roberta', 'ctrl'".format(pretrained_model_name_or_path))
ValueError: Unrecognized model identifier in emilyalsentzer/Bio_ClinicalBERT. Should contains one of 'bert', 'openai-gpt', 'gpt2', 'transfo-xl', 'xlnet', 'xlm', 'roberta', 'ctrl'
If I try a different tokenizer such as "baykenney/bert-base-gpt2detector-topp92" I get the following error:
OSError: Model name 'baykenney/bert-base-gpt2detector-topp92' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed 'baykenney/bert-base-gpt2detector-topp92' was a path or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
Did I miss anything to get started? I feel like the model cards indicate that these three lines of code should should be enough to get started.
I am using Python 3.7 and the transformer library version 2.1.1 and pytorch 1.5.

Please update your transformers library to at least 2.4.0. You should create a new conda environment and install all your packages directly from pypi with pip to get the most recent version (currently 2.11.0).

Related

Hugging Face: NameError: name 'sentences' is not defined

I am following this tutorial here: https://huggingface.co/transformers/training.html - though, I am coming across an error, and I think the tutorial is missing an import, but i do not know which.
These are my current imports:
# Transformers installation
! pip install transformers
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git
! pip install datasets transformers
from transformers import pipeline
Current code:
from datasets import load_dataset
raw_datasets = load_dataset("imdb")
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
inputs = tokenizer(sentences, padding="max_length", truncation=True)
The error:
NameError Traceback (most recent call last)
<ipython-input-9-5a234f114e2e> in <module>()
----> 1 inputs = tokenizer(sentences, padding="max_length", truncation=True)
NameError: name 'sentences' is not defined
This error is because you have not declared sentences. Now you need to
access raw data using:
k = raw_datasets['train']
sentences = k['text']
create a variable
sentences = ["Hello I'm a single sentence",
"And another sentence",
"And the very very last one"]
"As we saw in Preprocessing data, we can prepare the text inputs for the model with the following command (this is an example, not a command you can execute)"
The error states that you do not have a variable called sentences in the scope. I believe the tutorial presumes you already have a list of sentences and are tokenizing it.
Have a look at the documentation The first argument can be either a string or list of string or list of list of strings.
__call__(text: Union[str, List[str], List[List[str]]],...)

gensim - fasttext - Why `load_facebook_vectors` doesn't work?

I've tried to load pre-trained FastText vectors from fastext - wiki word vectors.
My code is below, and it works well.
from gensim.models import FastText
model = FastText.load_fasttext_format('./wiki.en/wiki.en.bin')
but, the warning message is a little annoying.
gensim_fasttext_pretrained_vector.py:13: DeprecationWarning: Call to deprecated `load_fasttext_format` (use load_facebook_vectors (to use pretrained embeddings)
The message said, load_fasttext_format will be deprecated so, it will be better to use load_facebook_vectors.
So I decided to changed the code. and My changed code is like below.
from gensim.models import FastText
model = FastText.load_facebook_vectors('./wiki.en/wiki.en.bin')
But, the error occurred, the error message is like this.
Traceback (most recent call last):
File "gensim_fasttext_pretrained_vector.py", line 13, in <module>
model = FastText.load_facebook_vectors('./wiki.en/wiki.en.bin')
AttributeError: type object 'FastText' has no attribute 'load_facebook_vectors'
I couldn't understand why these thing happen.
I just change what the messages said, but it doesn't work.
If you know anything about this, please let me know.
Always, thanks for you guys help.
You're almost there, you need to change two things:
First of all, it's fasttext all lowercase letters, not Fasttext.
Second of all, to use load_facebook_vectors, you need first to create a datapath object before using it.
So, you should do like so:
from gensim.models import fasttext
from gensim.test.utils import datapath
wv = fasttext.load_facebook_vectors(datapath("./wiki.en/wiki.en.bin"))

error PlaceholderWithDefault when converting Tensorflow model to CoreML model

I am trying to convert a tensorflow model i trained with tensorflow for poets to a CoreML model so i can run it on my iphone. But when i tried to convert it using this python script:
import tfcoreml as tf_converter
tf_model_path = 'retrained_graph.pb'
mlmodel_path = 'mobilenet_v1_1.0_224.mlmodel'
mlmodel = tf_converter.convert(
tf_model_path = tf_model_path,
mlmodel_path = mlmodel_path,
output_feature_names = ['MobilenetV1/Predictions/Softmax:0'],
input_name_shape_dict = {'input:0':[1,224,224,3]},
image_input_names = ['input:0'],
red_bias = -1,
green_bias = -1,
blue_bias = -1,
image_scale = 2.0/255.0)
It gives me this error:
dyld: warning, LC_RPATH $ORIGIN/../../_solib_darwin_x86_64/_U_S_Stensorflow_Spython_C_Upywrap_Utensorflow_Uinternal.so___Utensorflow in /Library/Python/2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so being ignored in restricted program because it is a relative path
2018-01-04 19:47:30.977648: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX
Traceback (most recent call last):
File "co.py", line 15, in <module>
image_scale = 2.0/255.0)
File "/Library/Python/2.7/site-packages/tfcoreml/_tf_coreml_converter.py", line 478, in convert
predicted_probabilities_output=predicted_probabilities_output)
File "/Library/Python/2.7/site-packages/tfcoreml/_tf_coreml_converter.py", line 143, in _convert_pb_to_mlmodel
_check_unsupported_ops(OPS, output_feature_names)
File "/Library/Python/2.7/site-packages/tfcoreml/_tf_coreml_converter.py", line 111, in _check_unsupported_ops
','.join(unsupported_op_types)))
NotImplementedError: Unsupported Ops of type: PlaceholderWithDefault
I am using a mac with MacOS Sierra.
Hope someone can help.
Greetings Sieuwe
EDIT:
I eventually got it working. I am not 100% sure what fixed it but it was probably something to do with that i hve 2 python versions. I did was:
Uninstalling tensorflow and tfcoreml from pip and from pip3.
Installing tfcoreml and tensorflow with pip not pip3
Uninstalling and installing numpy with pip(this gave me some errors while doing but got it eventually uninstalled)
If it still wont work mayby try to build tfcoreml and tensorflow from source.
I haven't used tfcoreml yet, but the error "Unsupported Ops of type: PlaceholderWithDefault" means that your TF graph uses an operation that is not supported by the converter.
If you look at the list of supported ops at https://github.com/tf-coreml/tf-coreml you'll see that Placeholder is supported but not PlaceholderWithDefault.
In the TensorFlow for Poets 2 tutorial there is a step that lets you optimize the model for mobile. I'm not 100% sure but this might replace the PlaceholderWithDefault with a regular Placeholder. It's worth doing anyway. :-)
i see that you've submitted a github ticket: https://github.com/tf-coreml/tf-coreml/issues/99 for this issue as well... i had the same issue you saw, but by following the responder's advice and got it to work. can you update your answer if that helped?

Dlib,python,face_detection with neural network

When ever i tried to load the trained model of
cnn based face_detectorin dlib.i got this error.
detector = dlib.simple_object_detector('mmod_human_face_detector.dat')
Traceback (most recent call last):
File "/home/hasans/Desktop/1/face_recognition1/face_detector.py", line 51, in <module>
detector = dlib.simple_object_detector('mmod_human_face_detector.dat')
RuntimeError: Unsupported version found when deserializing a scan_fhog_pyramid object</br>
how to get rid of this error?
.
I don't know where you got your code from, but the cnn-based face-detector is used differently, as given in this official demo.
Init looks like:
cnn_face_detection_model = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')
(I used it successfully)
Warning: the python-wrapper needed for this was only recently added (18.8.17) and as of now (3 days later) is only available within git, not any official release!
DNN based face detector is now officially available for using in python in its latest release 19.6.Everyone can download it from
https://dlib.net
cheers !!!

Fasttext for Python - module 'fasttext' has no attribute 'load_model'

Please forgive my newbness here, but fasttext is not working for me on python. I am using anaconda running python 3.6. My code is as follows(just an example):
import fasttext
model = fasttext.load_model('/home/sproc/share/fastText/model.bin')
print(model.words)
This returns the following error:
Traceback (most recent call last):
File "/media/sf_VBoxShare/LiClipseWorkspace/test/testpack/fasttext.py", line 1, in <module>
import fasttext
File "/media/sf_VBoxShare/LiClipseWorkspace/test/testpack/fasttext.py", line 3, in <module>
model = fasttext.load_model('/home/sproc/share/fastText/model.bin')
AttributeError: module 'fasttext' has no attribute 'load_model'
Does the same thing with cbow and skipgram when trying to create word vectors. I check the init.py file from the .../site-packages/fasttext directory and it imports said attributes, but they are not part of the model.py module. I'm guessing this has something to do with the shared object file but I am not sure. Any help is greatly appreciated.
Here is a solution that worked for me when I got the error you are getting;
Import FastText
from gensim.models.wrappers import FastText
Load the binary
model=FastText.load_fasttext_format('wiki.simple.bin')
Rename your python file .
Don't name it as fasttext.py .If your name it like this , what you import by "import fasttext.py " will be your own file.
You can rename it as 'fast_text.py' or something else .
If you install fastText package instead of the old fasttext, then
import fastText
model = fastText.load_model('/home/sproc/share/fastText/model.bin')
should work as expected.
#spencerktm30 I recommend you using pyfasttext instead of fasttext which is no longer active and it has a lot of bugs. link to pyfasttext
Actually, I faced similar issue when trying to load a C++ pre trained model and I had to switch to using pyfasttext to get it to work.
So this should hopefully work for you:
>>> from pyfasttext import FastText
>>> model = FastText('/home/sproc/share/fastText/model.bin')
Rename the file from fasttext.py to another name, it will work.
Apparently there are different fasttext python libraries out there!
fasttext != fasttext-win

Categories