Please forgive my newbness here, but fasttext is not working for me on python. I am using anaconda running python 3.6. My code is as follows(just an example):
import fasttext
model = fasttext.load_model('/home/sproc/share/fastText/model.bin')
print(model.words)
This returns the following error:
Traceback (most recent call last):
File "/media/sf_VBoxShare/LiClipseWorkspace/test/testpack/fasttext.py", line 1, in <module>
import fasttext
File "/media/sf_VBoxShare/LiClipseWorkspace/test/testpack/fasttext.py", line 3, in <module>
model = fasttext.load_model('/home/sproc/share/fastText/model.bin')
AttributeError: module 'fasttext' has no attribute 'load_model'
Does the same thing with cbow and skipgram when trying to create word vectors. I check the init.py file from the .../site-packages/fasttext directory and it imports said attributes, but they are not part of the model.py module. I'm guessing this has something to do with the shared object file but I am not sure. Any help is greatly appreciated.
Here is a solution that worked for me when I got the error you are getting;
Import FastText
from gensim.models.wrappers import FastText
Load the binary
model=FastText.load_fasttext_format('wiki.simple.bin')
Rename your python file .
Don't name it as fasttext.py .If your name it like this , what you import by "import fasttext.py " will be your own file.
You can rename it as 'fast_text.py' or something else .
If you install fastText package instead of the old fasttext, then
import fastText
model = fastText.load_model('/home/sproc/share/fastText/model.bin')
should work as expected.
#spencerktm30 I recommend you using pyfasttext instead of fasttext which is no longer active and it has a lot of bugs. link to pyfasttext
Actually, I faced similar issue when trying to load a C++ pre trained model and I had to switch to using pyfasttext to get it to work.
So this should hopefully work for you:
>>> from pyfasttext import FastText
>>> model = FastText('/home/sproc/share/fastText/model.bin')
Rename the file from fasttext.py to another name, it will work.
Apparently there are different fasttext python libraries out there!
fasttext != fasttext-win
Related
I just recently started looking into the huggingface transformer library.
When I tried to get started using the model card code at e.g. community model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
However, I got the following error:
Traceback (most recent call last):
File "test.py", line 2, in <module>
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
File "/Users/Lukas/miniconda3/envs/nlp/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 124, in from_pretrained
"'xlm', 'roberta', 'ctrl'".format(pretrained_model_name_or_path))
ValueError: Unrecognized model identifier in emilyalsentzer/Bio_ClinicalBERT. Should contains one of 'bert', 'openai-gpt', 'gpt2', 'transfo-xl', 'xlnet', 'xlm', 'roberta', 'ctrl'
If I try a different tokenizer such as "baykenney/bert-base-gpt2detector-topp92" I get the following error:
OSError: Model name 'baykenney/bert-base-gpt2detector-topp92' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed 'baykenney/bert-base-gpt2detector-topp92' was a path or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
Did I miss anything to get started? I feel like the model cards indicate that these three lines of code should should be enough to get started.
I am using Python 3.7 and the transformer library version 2.1.1 and pytorch 1.5.
Please update your transformers library to at least 2.4.0. You should create a new conda environment and install all your packages directly from pypi with pip to get the most recent version (currently 2.11.0).
I've tried to load pre-trained FastText vectors from fastext - wiki word vectors.
My code is below, and it works well.
from gensim.models import FastText
model = FastText.load_fasttext_format('./wiki.en/wiki.en.bin')
but, the warning message is a little annoying.
gensim_fasttext_pretrained_vector.py:13: DeprecationWarning: Call to deprecated `load_fasttext_format` (use load_facebook_vectors (to use pretrained embeddings)
The message said, load_fasttext_format will be deprecated so, it will be better to use load_facebook_vectors.
So I decided to changed the code. and My changed code is like below.
from gensim.models import FastText
model = FastText.load_facebook_vectors('./wiki.en/wiki.en.bin')
But, the error occurred, the error message is like this.
Traceback (most recent call last):
File "gensim_fasttext_pretrained_vector.py", line 13, in <module>
model = FastText.load_facebook_vectors('./wiki.en/wiki.en.bin')
AttributeError: type object 'FastText' has no attribute 'load_facebook_vectors'
I couldn't understand why these thing happen.
I just change what the messages said, but it doesn't work.
If you know anything about this, please let me know.
Always, thanks for you guys help.
You're almost there, you need to change two things:
First of all, it's fasttext all lowercase letters, not Fasttext.
Second of all, to use load_facebook_vectors, you need first to create a datapath object before using it.
So, you should do like so:
from gensim.models import fasttext
from gensim.test.utils import datapath
wv = fasttext.load_facebook_vectors(datapath("./wiki.en/wiki.en.bin"))
I have the following code and I made sure its extension and name are correct. However, I still get the error outputted as seen below.
I did see another person asked a similar question here on Stack Overflow, and read the answer but it did not help me.
Failed to load a .bin.gz pre trained words2vecx
Any suggestions how to fix this?
Input:
import gensim
word2vec_path = "GoogleNews-vectors-negative300.bin.gz"
word2vec = gensim.models.KeyedVectors.load_word2vec_format(word2vec_path, binary=True)
Output:
OSError: Not a gzipped file (b've')
The problem is that the file you've downloaded is not a gzip file. If you check the size of the file it maybe in KBs (that is what happened with me, when I downloaded it from this Github link because it needed git-lfs)
Here is an alternate solution to resolve this issue:
Download the model using the below command on your terminal:
wget -c "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"
Then, load the model as you would using gensim:
from gensim import models
w = models.KeyedVectors.load_word2vec_format(
'GoogleNews-vectors-negative300.bin', binary=True)
Hope this helps you!!
Try this
import tensorflow
word2vec_path = 'https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz'
word2vec = models.KeyedVectors.load_word2vec_format(word2vec_path, binary=True)
It seems I need to use freeze_graph.py (unless I"m mistaken)
'https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py'
This is the graph I want to extract:
C:\tmp\output_graph.pb
I'm looking at freeze_graph.py but I"m having trouble understanding some of the
arguments of the script AND I'm ending in an import error.
--input_graph = the graph I want to extract.
--input_checkpoint ? I don't think I need this
--input binary I don't think I need this as my inputs were image file values
--output_node_names The labels?
----input_meta_graph not sure what this is.
I don't see an output argument for a text.file or pickle file that will
hold all weights.
I tried running the following line from command prompt:
python3.6 C:\Users\Moondra\tensorflow\tensorflow\python\tools\freeze_graph.py --input-graph=C:\tmp\output_graph.pb
However, I'm getting an import error:
from tensorflow.python.tools import saved_model_util
importError: cannot import name 'saved_model_utils'
saved_model_utils seems to be in the same folder, so not sure what the problem is.
Thank you.
When ever i tried to load the trained model of
cnn based face_detectorin dlib.i got this error.
detector = dlib.simple_object_detector('mmod_human_face_detector.dat')
Traceback (most recent call last):
File "/home/hasans/Desktop/1/face_recognition1/face_detector.py", line 51, in <module>
detector = dlib.simple_object_detector('mmod_human_face_detector.dat')
RuntimeError: Unsupported version found when deserializing a scan_fhog_pyramid object</br>
how to get rid of this error?
.
I don't know where you got your code from, but the cnn-based face-detector is used differently, as given in this official demo.
Init looks like:
cnn_face_detection_model = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')
(I used it successfully)
Warning: the python-wrapper needed for this was only recently added (18.8.17) and as of now (3 days later) is only available within git, not any official release!
DNN based face detector is now officially available for using in python in its latest release 19.6.Everyone can download it from
https://dlib.net
cheers !!!