I've tried to load pre-trained FastText vectors from fastext - wiki word vectors.
My code is below, and it works well.
from gensim.models import FastText
model = FastText.load_fasttext_format('./wiki.en/wiki.en.bin')
but, the warning message is a little annoying.
gensim_fasttext_pretrained_vector.py:13: DeprecationWarning: Call to deprecated `load_fasttext_format` (use load_facebook_vectors (to use pretrained embeddings)
The message said, load_fasttext_format will be deprecated so, it will be better to use load_facebook_vectors.
So I decided to changed the code. and My changed code is like below.
from gensim.models import FastText
model = FastText.load_facebook_vectors('./wiki.en/wiki.en.bin')
But, the error occurred, the error message is like this.
Traceback (most recent call last):
File "gensim_fasttext_pretrained_vector.py", line 13, in <module>
model = FastText.load_facebook_vectors('./wiki.en/wiki.en.bin')
AttributeError: type object 'FastText' has no attribute 'load_facebook_vectors'
I couldn't understand why these thing happen.
I just change what the messages said, but it doesn't work.
If you know anything about this, please let me know.
Always, thanks for you guys help.
You're almost there, you need to change two things:
First of all, it's fasttext all lowercase letters, not Fasttext.
Second of all, to use load_facebook_vectors, you need first to create a datapath object before using it.
So, you should do like so:
from gensim.models import fasttext
from gensim.test.utils import datapath
wv = fasttext.load_facebook_vectors(datapath("./wiki.en/wiki.en.bin"))
Related
I am following this tutorial here: https://huggingface.co/transformers/training.html - though, I am coming across an error, and I think the tutorial is missing an import, but i do not know which.
These are my current imports:
# Transformers installation
! pip install transformers
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git
! pip install datasets transformers
from transformers import pipeline
Current code:
from datasets import load_dataset
raw_datasets = load_dataset("imdb")
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
inputs = tokenizer(sentences, padding="max_length", truncation=True)
The error:
NameError Traceback (most recent call last)
<ipython-input-9-5a234f114e2e> in <module>()
----> 1 inputs = tokenizer(sentences, padding="max_length", truncation=True)
NameError: name 'sentences' is not defined
This error is because you have not declared sentences. Now you need to
access raw data using:
k = raw_datasets['train']
sentences = k['text']
create a variable
sentences = ["Hello I'm a single sentence",
"And another sentence",
"And the very very last one"]
"As we saw in Preprocessing data, we can prepare the text inputs for the model with the following command (this is an example, not a command you can execute)"
The error states that you do not have a variable called sentences in the scope. I believe the tutorial presumes you already have a list of sentences and are tokenizing it.
Have a look at the documentation The first argument can be either a string or list of string or list of list of strings.
__call__(text: Union[str, List[str], List[List[str]]],...)
I just recently started looking into the huggingface transformer library.
When I tried to get started using the model card code at e.g. community model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
However, I got the following error:
Traceback (most recent call last):
File "test.py", line 2, in <module>
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
File "/Users/Lukas/miniconda3/envs/nlp/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 124, in from_pretrained
"'xlm', 'roberta', 'ctrl'".format(pretrained_model_name_or_path))
ValueError: Unrecognized model identifier in emilyalsentzer/Bio_ClinicalBERT. Should contains one of 'bert', 'openai-gpt', 'gpt2', 'transfo-xl', 'xlnet', 'xlm', 'roberta', 'ctrl'
If I try a different tokenizer such as "baykenney/bert-base-gpt2detector-topp92" I get the following error:
OSError: Model name 'baykenney/bert-base-gpt2detector-topp92' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed 'baykenney/bert-base-gpt2detector-topp92' was a path or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
Did I miss anything to get started? I feel like the model cards indicate that these three lines of code should should be enough to get started.
I am using Python 3.7 and the transformer library version 2.1.1 and pytorch 1.5.
Please update your transformers library to at least 2.4.0. You should create a new conda environment and install all your packages directly from pypi with pip to get the most recent version (currently 2.11.0).
It seems I need to use freeze_graph.py (unless I"m mistaken)
'https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py'
This is the graph I want to extract:
C:\tmp\output_graph.pb
I'm looking at freeze_graph.py but I"m having trouble understanding some of the
arguments of the script AND I'm ending in an import error.
--input_graph = the graph I want to extract.
--input_checkpoint ? I don't think I need this
--input binary I don't think I need this as my inputs were image file values
--output_node_names The labels?
----input_meta_graph not sure what this is.
I don't see an output argument for a text.file or pickle file that will
hold all weights.
I tried running the following line from command prompt:
python3.6 C:\Users\Moondra\tensorflow\tensorflow\python\tools\freeze_graph.py --input-graph=C:\tmp\output_graph.pb
However, I'm getting an import error:
from tensorflow.python.tools import saved_model_util
importError: cannot import name 'saved_model_utils'
saved_model_utils seems to be in the same folder, so not sure what the problem is.
Thank you.
When ever i tried to load the trained model of
cnn based face_detectorin dlib.i got this error.
detector = dlib.simple_object_detector('mmod_human_face_detector.dat')
Traceback (most recent call last):
File "/home/hasans/Desktop/1/face_recognition1/face_detector.py", line 51, in <module>
detector = dlib.simple_object_detector('mmod_human_face_detector.dat')
RuntimeError: Unsupported version found when deserializing a scan_fhog_pyramid object</br>
how to get rid of this error?
.
I don't know where you got your code from, but the cnn-based face-detector is used differently, as given in this official demo.
Init looks like:
cnn_face_detection_model = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')
(I used it successfully)
Warning: the python-wrapper needed for this was only recently added (18.8.17) and as of now (3 days later) is only available within git, not any official release!
DNN based face detector is now officially available for using in python in its latest release 19.6.Everyone can download it from
https://dlib.net
cheers !!!
Please forgive my newbness here, but fasttext is not working for me on python. I am using anaconda running python 3.6. My code is as follows(just an example):
import fasttext
model = fasttext.load_model('/home/sproc/share/fastText/model.bin')
print(model.words)
This returns the following error:
Traceback (most recent call last):
File "/media/sf_VBoxShare/LiClipseWorkspace/test/testpack/fasttext.py", line 1, in <module>
import fasttext
File "/media/sf_VBoxShare/LiClipseWorkspace/test/testpack/fasttext.py", line 3, in <module>
model = fasttext.load_model('/home/sproc/share/fastText/model.bin')
AttributeError: module 'fasttext' has no attribute 'load_model'
Does the same thing with cbow and skipgram when trying to create word vectors. I check the init.py file from the .../site-packages/fasttext directory and it imports said attributes, but they are not part of the model.py module. I'm guessing this has something to do with the shared object file but I am not sure. Any help is greatly appreciated.
Here is a solution that worked for me when I got the error you are getting;
Import FastText
from gensim.models.wrappers import FastText
Load the binary
model=FastText.load_fasttext_format('wiki.simple.bin')
Rename your python file .
Don't name it as fasttext.py .If your name it like this , what you import by "import fasttext.py " will be your own file.
You can rename it as 'fast_text.py' or something else .
If you install fastText package instead of the old fasttext, then
import fastText
model = fastText.load_model('/home/sproc/share/fastText/model.bin')
should work as expected.
#spencerktm30 I recommend you using pyfasttext instead of fasttext which is no longer active and it has a lot of bugs. link to pyfasttext
Actually, I faced similar issue when trying to load a C++ pre trained model and I had to switch to using pyfasttext to get it to work.
So this should hopefully work for you:
>>> from pyfasttext import FastText
>>> model = FastText('/home/sproc/share/fastText/model.bin')
Rename the file from fasttext.py to another name, it will work.
Apparently there are different fasttext python libraries out there!
fasttext != fasttext-win