How do you locally load model.tar.gz file from Sagemaker? - python

I'm new to Sagemaker and I trained a classifier model with the built in XGBoost. It saved a "Model.tar.gz" at an S3. I downloaded the file because I was planning to deploy the model else where. So to experiment, I started loading the file locally first. I tried this code.
import pickle as pkl
import tarfile
t = tarfile.open('model.tar.gz', 'r:gz')
t.extractall()
model = pkl.load('xgboost-model', 'rb')
But it's only giving me this error
XGBoostError: [13:32:18] /opt/concourse/worker/volumes/live/7a2b9f41-3287-451b-6691-43e9a6c0910f/volume/xgboost-split_1619728204606/work/src/learner.cc:922: Check failed: header == serialisation_header_:
If you are loading a serialized model (like pickle in Python) generated by older
XGBoost, please export the model by calling `Booster.save_model` from that version
first, then load it back in current version. There's a simple script for helping
the process.
So I tried using the Booster.save_model function at sagemaker notebook but it doesnt work nor does pickling the trained model work.
I also tried this code
model = xgb.Booster()
model.load_model('xgboost-model')
but it's giving me this error
XGBoostError: std::bad_alloc
Any help would be greatly appreciated.

found the answer to my question. Apparently, the sagemaker environment is using an old build of XGBoost, around version 0.9. As the XGboost team make constant upgrades and changes to their library, AWS was unable to keep up with it.
That said I was able to run my code below by downgrading the XGBoost library on my environment from 1.7 to 0.9 and it works like a charm.
t = tarfile.open('model.tar.gz', 'r:gz')
t.extractall()
model = pkl.load('xgboost-model', 'rb')

Related

Speech Prediction not appear

I'm trying to run the Speech Recognition Project by Udacity (you can check the whole program in here) on Google Colab but I have a problem because Google Colab doesn't support Tensorflow 1.x anymore. I've tried running it before and it ran smoothly without any issues but now it can't.
Here are some things I have tried so far:
Scene 1
Using tensorflow.compat.v1 as instructed in TF2 migration from tensorflow
it worked when training the model
but failed to predict the model
Scene 2
I also have tried to convert the whole project using !tf_upgrade_v2 but there's no change (.py and .ipnyb still the same code)
Scene 3
Installing specific tensorflow (1.15), keras(2.0.5) and h5py(2.7.0) in Colab,
it worked too but when I predict the model it gave no transcription
I really need help how to fix this and able to execute this.

Is there a way to save a model using Tensorflow 2.6.0 so that it can be loaded back using Tensorflow 2.3.0?

I began writing my code with Tensorflow 2.6.0, but now I need to downgrade it to 2.3.0 since it needs to run on a GPU that doesn't support CUDA 11.
Some of the neural networks in my code are loaded from json files using the keras.models.model_from_json function, which apparently behaves differently in these two tf versions. While it works fine in 2.6.0, I get the following error in 2.3.0:
AttributeError: module 'tensorflow.python.framework.ops' has no attribute '_TensorLike'
Since it is loaded correctly with the more recent version of tf, I'm trying to save it in another format to hopefully be able to load it with tf 2.3.0. I tried both the keras SavedModel and HDF5 approach as well as the tf.saved_model.load approach:
When I try to load the models as SavedModels I get OSError: Unable to open file Permission denied, seems that keras is expecting a file rather than a folder;
Using either the HDF5 method or the tf.saved_model module ends with AttributeError: 'str' object has no attribute 'decode' when trying to load the models.
And I'm stuck here. The aforementioned methods work without any issues when I load the models back using Tensorflow 2.6.0, so I'm sure I saved them correctly.
Any help from someone more knowledgeable than me would be appreciated.

BERT tokenizer and model without transformers library

I'm trying to use both BertTokenizer and BertForTokenClassification offline. The goal of the project is to use these pretrained models and apply them to my own dataset for an NLP task. Due to network and security limitations, I am not able to install the transformers library from HuggingFace. I am only able to use PyTorch.
At this point, I have downloaded and saved the following bert-base-uncased files from the HuggingFace website to a local directory:
config.json
python_model.bin
vocab.txt
I've used the transformers library before, so I'm familiar with initializing the models from local files using something like BertTokenizer.from_pretrained('/path/to/local'). However, since I'm not able to install the package and call the model classes, I don't know how to use these downloaded local files in a similar manner. How do I use these local files to use BertTokenizer and BertForTokenClassification?
I've been instructed to use the following link to implement this: https://pytorch.org/tutorials/beginner/saving_loading_models.html

converting pretrained tensorflow models for tensorflow serving

I'm trying to use tensorflow serving. However, any of the pretrained models that are available for download (like from here: the TF detection zoo) don't have any files in the saved_models/variables directory that is required by the serving model.
How do you create the files required in the saved_models/variables directory using the pretrained models available from the detection model zoo?
There is some information from the official documentation, but it doesn't cover my use case of converting a pretrained model to be served.
Other things I've tried is to use the tensorflow serving examples. However, most of the existing documentation uses the Resent implementation as an example, and the pretrained model for resnet has been removed by Tensorflow. This is the linked that tutorials use, note that there's no direct link to download the models. As an aside, but an additional funsy, the python examples in the Tensorflow Serving repo don't work with Tensorflow 2.0.
It appears that this link may be useful in the conversion: https://github.com/tensorflow/models/issues/1988
Ok, as of the time of writing the object detection tutorials only support tensorflow 1.12.0.
It's a little difficult to do this because it's so multitiered, but you need to:
clone the tensorflow open model zoo
patch the models/research/object_detection/exporter.py according to these instructions. Alternatively, you can use this patch which are the aforementioned instructions.
Follow the object detection installation instructions as found here in your cloned repo. It's important to both follow the protobuf compilation steps AND update your python path for the slim libraries.
Follow the instructions for exporting a trained model for inference. Note that the important part of the instruction that is important is that the downloaded model will come will three model.ckpt filenames. The filename that needs to be passed into the exporting script is the base filename of these three filenames. So if the three files are /path/to/model.ckpt.data-00000-of-00001, /path/to/model.ckpt.meta, and /path/to/model.ckpt.index, the parameter to pass into to the script is: /path/to/model.ckpt
Enjoy your new model!

Can't choose framework for segmentation_models library in python

I am trying to build a ResNet34 model using segmentation_models(sm) library in python. The sm library uses keras framework by default when importing, but I am working with tf.keras to build my datasets used for training and testing.
The documentation says that in order to change the default framework I should either use an environmental variable SM_FRAMEWORK=tf.keras before importing (which I tried but it didn't work) or set it using the method set_framework (which doesn't show up in the suggestions/it says it doesn't exist when I try to execute it).
Is there any other way to overcome this problem?
I trained model on colab and I used :"%env SM_FRAMEWORK=tf.keras" to set the environment to tf.keras and it worked perfectly.

Categories