cannot import name 'ESMForMaskedLM' from 'transformers' on Google colab - python

I am fine tuning the ESM facebook transformer with a fasta file of sequences. However, I get ImportError: cannot import name 'ESMForMaskedLM' from 'transformers'when running the cell, I have been following: the hugging face model but I haven't managed to make the import work, I am using Google Colab. Help is much appreciated:
the code:
!pip install transformers
from transformers import ESMForMaskedLM, ESMTokenizer, pipeline
tokenizer = ESMTokenizer.from_pretrained("facebook/esm-1b", do_lower_case=False)
model = ESMForMaskedLM.from_pretrained("facebook/esm-1b")
unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
unmasker('QERLKSIVRILE<mask>SLGYNIVAT')

Related

tokenizer.push_to_hub(repo_name) is not working

I'm trying to puch my tokonizer to my huggingface repo...
it consist of the model vocab.Json (I'm making a speech recognition model)
My code:
vocab_dict["|"] = vocab_dict[" "]
del vocab_dict[" "]
vocab_dict["[UNK]"] = len(vocab_dict)
vocab_dict["[PAD]"] = len(vocab_dict)
len(vocab_dict)
import json
with open('vocab.json', 'w') as vocab_file:
json.dump(vocab_dict, vocab_file)
from transformers import Wav2Vec2CTCTokenizer
tokenizer = Wav2Vec2CTCTokenizer.from_pretrained("./", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
from huggingface_hub import login
login('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')
repo_name = "Foxasdf/ArabicTextToSpeech"
add_to_git_credential=True
tokenizer.push_to_hub(repo_name)
the tokenizer.push_to_hub(repo_name) is giving me this error:
TypeError: create_repo() got an unexpected keyword argument 'organization'
I have logged in my huggingface account using
from huggingface_hub import notebook_login
notebook_login()
but the error is still the same..
here's a link of the my collab notebook you can see the full code there and the error: https://colab.research.google.com/drive/11tkQ85SfaT6U_1PXDNwk0Q6qogw2r2sw?hl=ar&hl=en&authuser=0#scrollTo=WkbZ_Wcidq8Z
I have the same problem. It is somehow associated with version of transformers - I have 4.6. When I change environment to the one with 4.11.3 transformers version the problem is that code tries to clone repository which I am going to yet create and there is an error " Remote repository not found ... "
Checked more and it looks like issue with version with huggingface_hub library - when it is downgraded to 0.10.1 it should work.

Problem building tensorflow model from huggingface weights

I need to work with the pretrained BERT model ('dbmdz/bert-base-italian-xxl-cased') from Huggingface with Tensorflow (at this link).
After reading this on the website,
Currently only PyTorch-Transformers compatible weights are available. If you need access to TensorFlow checkpoints, please raise an issue!
I raised the issue and promptly a download link to an archive containing the following files was given to me. The files are the following ones:
$ ls bert-base-italian-xxl-cased/
config.json model.ckpt.index vocab.txt
model.ckpt.data-00000-of-00001 model.ckpt.meta
I'm now trying to load the model and work with it but everything I tried failed.
I tried following this suggestion from an Huggingface discussion site:
bert_folder = str(Config.MODELS_CONFIG.BERT_CHECKPOINT_DIR) # folder in which I have the files extracted from the archive
from transformers import BertConfig, TFBertModel
config = BertConfig.from_pretrained(bert_folder) # this gets loaded correctly
After this point I tried several combinations in order to load the model but always unsuccessfully.
eg:
model = TFBertModel.from_pretrained("../../models/pretrained/bert-base-italian-xxl-cased/model.ckpt.index", config=config)
model = TFBertModel.from_pretrained("../../models/pretrained/bert-base-italian-xxl-cased/model.ckpt.index", config=config, from_pt=True)
model = TFBertModel.from_pretrained("../../models/pretrained/bert-base-italian-xxl-cased/model.ckpt.index", config=config, from_pt=True)
model = TFBertModel.from_pretrained("../../models/pretrained/bert-base-italian-xxl-cased", config=config, local_files_only=True)
Always results in this error:
404 Client Error: Not Found for url: https://huggingface.co/models/pretrained/bert-base-italian-xxl-cased/model.ckpt.index/resolve/main/tf_model.h5
...
...
OSError: Can't load weights for '../../models/pretrained/bert-base-italian-xxl-cased/model.ckpt.index'. Make sure that:
- '../../models/pretrained/bert-base-italian-xxl-cased/model.ckpt.index' is a correct model identifier listed on 'https://huggingface.co/models'
- or '../../models/pretrained/bert-base-italian-xxl-cased/model.ckpt.index' is the correct path to a directory containing a file named one of tf_model.h5, pytorch_model.bin.
So my question is: How can I load this pre-trained BERT model from those files and use it in tensorflow?
You can try the following snippet to load dbmdz/bert-base-italian-xxl-cased in tensorflow.
from transformers import AutoTokenizer, TFBertModel
model_name = "dbmdz/bert-base-italian-cased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFBertModel.from_pretrained(model_name)
If you want to load from the given tensorflow checkpoint, you could try like this:
model = TFBertModel.from_pretrained("../../models/pretrained/bert-base-italian-xxl-cased/model.ckpt.index", config=config, from_tf=True)

I can't create the body of ResNet18 with fastai

I'm trying to build the body of the ResNet18 in this code:
from fastai.vision.data import create_body
from fastai.vision import models
from torchvision.models.resnet import resnet18
from fastai.vision.models.unet import DynamicUnet
import torch
def build_res_unet(n_input=1, n_output=2, size=256):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
body = create_body(resnet18, n_in = n_input, pretrained=True, cut=-2)
net_G = DynamicUnet(body, n_output, (size, size)).to(device)
return net_G
net_G = build_res_unet(n_input=1, n_output=2, size=256)
but I keep getting an error:
TypeError: create_body() got an unexpected keyword argument 'n_in'
but in the fastai docs the n_in parameter is present.
How can I create the body, am I missing something?
I tested the code on my local machine and it runs perfectly, maybe there are some problems on Google Colab! I will update this answer if I found a way to make it run on colab
EDIT: I solved the problem by adding !pip install fastai==2.4 on google colab, the version used by colab was very old

Getting 'Create Version failed. Bad model detected with error' on AI platform when trying to create a custom model on Google Cloud AI platform

I am trying to deploy custom model on AI platform. I have followed the steps as mentioned in the Google Document: https://cloud.google.com/ai-platform/prediction/docs/deploying-models#global-endpoint.
The saved model is stored in Google Cloud Storage and trained with python 3.7.
These are the gcloud commands used to deploy
gcloud ai-platform models create title_topic_custom \
--regions=europe-west1 --enable_logging
MODEL_DIR="gs://ai_platform_custom/SavedModel"
VERSION_NAME="V3"
MODEL_NAME="title_topic_custom"
CUSTOM_CODE_PATH="gs://ai_platform_custom/SavedModel/my_custom_code-0.1.tar.gz"
PREDICTOR_CLASS="predictor.py.MyPredictor"
gcloud beta ai-platform versions create $VERSION_NAME \
--model=$MODEL_NAME \
--origin=$MODEL_DIR \
--runtime-version=2.1 \
--python-version=3.7 \
--machine-type=mls1-c1-m2 \
--package-uris=$CUSTOM_CODE_PATH \
--prediction-class=$PREDICTOR_CLASS
I get the following error after executing those commands:
Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......failed.
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: "There was a problem processing the user code: predictor.py.MyPredictor cannot be found. Please make sure (1) prediction_class is the fully qualified function name, and (2) it uses the correct package name as provided by the package_uris: ['gs://ai_platform_custom/SavedModel/my_custom_code-0.1.tar.gz'] (Error code: 4)"
The predictor code is as below:
%%writefile predictor.py
import os
import spacy
import numpy as np
import joblib
import tensorflow as tf
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
class MyPredictor(object):
def __init__(self, model, topic_encoder):
self._model = model
self._nlp = spacy.load('en_core_web_sm')
self._stopwords = stopwords.words('english')
self._topic_encoder = topic_encoder
def predict(self, instances, **kwargs):
inputs = np.asarray(instances)
inputs_t = [' '.join([i for i in x.split() if i not in self._stopwords]) for x in inputs]
preprocessed_inputs = [' '.join([i.lemma_ for i in self._nlp(x)]) for x in inputs_t]
outputs = self._model.predict(preprocessed_inputs)
return [self._topic_encoder[key] for key in np.argmax(outputs, axis=1)]
#classmethod
def from_path(cls, model_dir):
model_path = os.path.join(model_dir)
model = tf.keras.models.load_model(model_path)
topic_encoder = {0:'topic1',1:'topic2',3:'topic3'}
return cls(model, topic_encoder)
This is the setup file
from setuptools import setup
setup(
name='my-custom-code',
version='0.1',
install_requires=['nltk','spacy','joblib'],
scripts=['predictor.py'])
Any workarounds?
Try using
PREDICTOR_CLASS="predictor.MyPredictor" instead of "predictor.py.MyPredictor"

ValueError while deploying tensorflow model to Amazon SageMaker

I want to deploy my trained tensorflow model to the amazon sagemaker, I am following the official guide here: https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/ to deploy my model using jupyter notebook.
But when I try to use code:
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.t2.medium')
It gives me the following error message:
ValueError: Error hosting endpoint sagemaker-tensorflow-2019-08-07-22-57-59-547: Failed Reason: The image '520713654638.dkr.ecr.us-west-1.amazonaws.com/sagemaker-tensorflow:1.12-cpu-py3 ' does not exist.
I think the tutorial does not tell me to create an image, and I do not know what to do.
import boto3, re
from sagemaker import get_execution_role
role = get_execution_role()
# make a tar ball of the model data files
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
archive.add('export', recursive=True)
# create a new s3 bucket and upload the tarball to it
import sagemaker
sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')
from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
role = role,
framework_version = '1.12',
entry_point = 'train.py',
py_version='py3')
%%time
#here I fail to deploy the model and get the error message
predictor = sagemaker_model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge')
https://github.com/aws/sagemaker-python-sdk/issues/912#issuecomment-510226311
As mentioned in the issue
Python 3 isn't supported using the TensorFlowModel object, as the container uses the TensorFlow serving api library in conjunction with the GRPC client to handle making inferences, however the TensorFlow serving api isn't supported in Python 3 officially, so there are only Python 2 versions of the containers when using the TensorFlowModel object.
If you need Python 3 then you will need to use the Model object defined in #2 above. The inference script format will change if you need to handle pre and post processing. https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing.

Categories