BERT tokenizer and model without transformers library

BERT tokenizer and model without transformers library - python

I'm trying to use both BertTokenizer and BertForTokenClassification offline. The goal of the project is to use these pretrained models and apply them to my own dataset for an NLP task. Due to network and security limitations, I am not able to install the transformers library from HuggingFace. I am only able to use PyTorch.
At this point, I have downloaded and saved the following bert-base-uncased files from the HuggingFace website to a local directory:
config.json
python_model.bin
vocab.txt
I've used the transformers library before, so I'm familiar with initializing the models from local files using something like BertTokenizer.from_pretrained('/path/to/local'). However, since I'm not able to install the package and call the model classes, I don't know how to use these downloaded local files in a similar manner. How do I use these local files to use BertTokenizer and BertForTokenClassification?
I've been instructed to use the following link to implement this: https://pytorch.org/tutorials/beginner/saving_loading_models.html

Related

How do you locally load model.tar.gz file from Sagemaker?

I'm new to Sagemaker and I trained a classifier model with the built in XGBoost. It saved a "Model.tar.gz" at an S3. I downloaded the file because I was planning to deploy the model else where. So to experiment, I started loading the file locally first. I tried this code.
import pickle as pkl
import tarfile
t = tarfile.open('model.tar.gz', 'r:gz')
t.extractall()
model = pkl.load('xgboost-model', 'rb')
But it's only giving me this error
XGBoostError: [13:32:18] /opt/concourse/worker/volumes/live/7a2b9f41-3287-451b-6691-43e9a6c0910f/volume/xgboost-split_1619728204606/work/src/learner.cc:922: Check failed: header == serialisation_header_:
If you are loading a serialized model (like pickle in Python) generated by older
XGBoost, please export the model by calling `Booster.save_model` from that version
first, then load it back in current version. There's a simple script for helping
the process.
So I tried using the Booster.save_model function at sagemaker notebook but it doesnt work nor does pickling the trained model work.
I also tried this code
model = xgb.Booster()
model.load_model('xgboost-model')
but it's giving me this error
XGBoostError: std::bad_alloc
Any help would be greatly appreciated.

found the answer to my question. Apparently, the sagemaker environment is using an old build of XGBoost, around version 0.9. As the XGboost team make constant upgrades and changes to their library, AWS was unable to keep up with it.
That said I was able to run my code below by downgrading the XGBoost library on my environment from 1.7 to 0.9 and it works like a charm.
t = tarfile.open('model.tar.gz', 'r:gz')
t.extractall()
model = pkl.load('xgboost-model', 'rb')

How do I use YOLOv5 without the internet for kaggle competition

I am attempting to use a software called YOLOv5 in my python code for a Kaggle competition. The only thing is it requires you not to use the internet. I have the yolov5 file already loaded into the kaggle code. but am unsure how to call in in my code.
I've tried loading it in the following way but keep getting errors.
import torch
torch.save('yolov5x6.pt', 'yolov5')
yolov5x6_model = torch.load('yolov5')

Yolov5 is a follow up version of yolo which is a neural network library in c language, also known as Darknet created by pjreddie.
It is an object detector model which can be trained to recognise objects in images or videos.
If you just want to detect some daily life object then you can just run inference on images/videos using python and trained weights and config file. You will find these files under the pretrained checkpoints section at the following link.
https://github.com/ultralytics/yolov5

converting pretrained tensorflow models for tensorflow serving

I'm trying to use tensorflow serving. However, any of the pretrained models that are available for download (like from here: the TF detection zoo) don't have any files in the saved_models/variables directory that is required by the serving model.
How do you create the files required in the saved_models/variables directory using the pretrained models available from the detection model zoo?
There is some information from the official documentation, but it doesn't cover my use case of converting a pretrained model to be served.
Other things I've tried is to use the tensorflow serving examples. However, most of the existing documentation uses the Resent implementation as an example, and the pretrained model for resnet has been removed by Tensorflow. This is the linked that tutorials use, note that there's no direct link to download the models. As an aside, but an additional funsy, the python examples in the Tensorflow Serving repo don't work with Tensorflow 2.0.
It appears that this link may be useful in the conversion: https://github.com/tensorflow/models/issues/1988

Ok, as of the time of writing the object detection tutorials only support tensorflow 1.12.0.
It's a little difficult to do this because it's so multitiered, but you need to:
clone the tensorflow open model zoo
patch the models/research/object_detection/exporter.py according to these instructions. Alternatively, you can use this patch which are the aforementioned instructions.
Follow the object detection installation instructions as found here in your cloned repo. It's important to both follow the protobuf compilation steps AND update your python path for the slim libraries.
Follow the instructions for exporting a trained model for inference. Note that the important part of the instruction that is important is that the downloaded model will come will three model.ckpt filenames. The filename that needs to be passed into the exporting script is the base filename of these three filenames. So if the three files are /path/to/model.ckpt.data-00000-of-00001, /path/to/model.ckpt.meta, and /path/to/model.ckpt.index, the parameter to pass into to the script is: /path/to/model.ckpt
Enjoy your new model!

Can't choose framework for segmentation_models library in python

I am trying to build a ResNet34 model using segmentation_models(sm) library in python. The sm library uses keras framework by default when importing, but I am working with tf.keras to build my datasets used for training and testing.
The documentation says that in order to change the default framework I should either use an environmental variable SM_FRAMEWORK=tf.keras before importing (which I tried but it didn't work) or set it using the method set_framework (which doesn't show up in the suggestions/it says it doesn't exist when I try to execute it).
Is there any other way to overcome this problem?

I trained model on colab and I used :"%env SM_FRAMEWORK=tf.keras" to set the environment to tf.keras and it worked perfectly.

Loading Torch7 trained models (.t7) in PyTorch

I am using Torch7 library for implementing neural networks. Mostly, I rely on pre-trained models. In Lua I use torch.load function to load a model saved as torch .t7 file. I am curious about switching to PyTorch( http://pytorch.org) and I read the documents. I couldn't find any information regarding the mechanisms to load a pre-trained model. The only relevant information I was able to find is this page:http://pytorch.org/docs/torch.html
But the function torch.load described in the page seems to load a file saved with pickle. If someone has additional information on loading .t7 models in PyTorch, please share it here.

The correct function is load_lua:
from torch.utils.serialization import load_lua
x = load_lua('x.t7')

As of PyTorch 1.0 torch.utils.serialization is completely removed. Hence no one can import models from Lua Torch into PyTorch anymore. Instead, I would suggest installing PyTorch 0.4.1 through pip in a conda environment (so that you can remove it after this) and use this repo to convert your Lua Torch model to PyTorch model, not just the torch.nn.legacy model that you cannot use for training. Then use PyTorch 1.xx to do whatever with it. You can also train your converted Lua Torch models in PyTorch this way :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

BERT tokenizer and model without transformers library - python

Related

How do you locally load model.tar.gz file from Sagemaker?

How do I use YOLOv5 without the internet for kaggle competition

converting pretrained tensorflow models for tensorflow serving

Can't choose framework for segmentation_models library in python

Loading Torch7 trained models (.t7) in PyTorch

Categories

Resources