What is the difference of xgboost and sagemaker.xgboost - python

Question is really clear. Nowadays im learning AWS world, and this question is eating my head up. What is the difference of import xgboost and import sagemaker.xgboost.
On SageMaker i can work with normal XGBoost library, and i know i can select different EC2 types with sagemaker.xgboost. But except this, what is the difference?
Are there any big difference?

Using model training as an example task: sagemaker.xgboost provides the ability to create Amazon SageMaker training jobs (and related AWS resources) in an environment that has the XGBoost library installed. So import xgboost gives you the modules for writing a training script that actually trains a model whereas import sagemaker.xgboost gives you modules for performing the training task on SageMaker.
The same applies for other tasks (e.g. predictions).
SageMaker XGBoost documentation: https://sagemaker.readthedocs.io/en/stable/frameworks/xgboost/using_xgboost.html#use-the-open-source-xgboost-algorithm

Related

create model python in classic azure machine learning

I read the documents in azure ML that they have supported a create model python pill but I go to the experiments and search for that pill but doesn't exist.
enter image description here
Anyone can show me how can I create my own model in classic Azure ML. I want to implement SGDClassifier that only support in sklearn library
(https://learn.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/create-python-model)
Machine Learning classic studio does not have a Create python model pill whereas you can get it using Machine learning.
Also, the Machine Learning Classic studio is about to get deprecated so I would recommend using the above Machine Learning resource which has an advanced and simple UX design that has many advantages.
You can find SGDClassifier in sklearn.linear_model
import pandas as pd
from sklearn.linear_model import SGDClassifier
For more information on SGD Classifier you can refer sklearn.linear_model.SGDClassifier — scikit-learn 0.24.2 documentation and python-examples/SGDClassifier_example.py at master · WilliamQLiu/python-examples · GitHub

How to run your own python code on amazon sagemaker

I have a python code which uses keras and tensorflow backend. My system doesn't support training this model due to low memory space. I want to take use of Amazon sagemaker.
However all the tutorials I find are about deploying your model in docker containers. My model isn't trained and I want to train it on Amazon Sagemaker.
Is there a way to do this?
EDIT : Also can I make a script of my python code and run on it on AWS sagemaker?
SageMaker provides the capability for users to bring in their custom training scripts and train their algorithms using the script it on SageMaker using one of the pre-built containers for frameworks like TensorFlow, MXNet, PyTorch.
Please take a look at https://github.com/aws/amazon-sagemaker-examples/blob/master/frameworks/tensorflow/get_started_mnist_train.ipynb
It walks through how you can bring in your training script using TensorFlow and train it using SageMaker.
There are several other examples in the repository which will help you answer other questions you might have as you progress on with your SageMaker journey.

Preprocessing data for Sagemaker Inference Pipeline with Blazingtext

I'm trying to figure out the best way to preprocess my input data for my inference endpoint for AWS Sagemaker. I'm using the BlazingText algorithm.
I'm not really sure the best way forward and I would be thankful for any pointers.
I currently train my model using a Jupyter notebook in Sagemaker and that works wonderfully, but the problem is that I use NLTK to clean my data (Swedish stopwords and stemming etc):
import nltk
nltk.download('punkt')
nltk.download('stopwords')
So the question is really, how do I get the same pre-processing logic to the inference endpoint ?
I have a couple of thoughts about how to proceed:
Build a docker container with the python libs & data installed with the sole purpose of pre-processing the data. Then use this container in the inference pipeline.
Supply the Python libs and Script to an existing container in the same way you can do for external lib an notebook
Build a custom fastText container with the libs I need and run it outside of Sagemaker.
Will probably work, but feels like a "hack": Build a Lambda function that has the proper Python libs&data installed and calls the Sagemaker Endpoint. I'm worried about cold start delays as the prediction traffic volume will be low.
I would like to go with the first option, but I'm struggling a bit to understand if there is a docker image that I could build from, and add my dependencies to, or if I need to build something from the ground up. For instance, would the image sagemaker-sparkml-serving:2.2 be a good candidate?
But maybe there is a better way all around?

With AWS SageMaker, is it possible to deploy a pre-trained model using the sagemaker SDK?

I'm trying to avoid migrating an existing model training process to SageMaker and avoid creating a custom Docker container to host our trained model.
My hope was to inject our existing, trained model into the pre-built scikit learn container that AWS provides via the sagemaker-python-sdk. All of the examples that I have found require training the model first which creates the model/model configuration in SageMaker. This is then deployed with the deploy method.
Is it possible to provide a trained model to the deploy method and have it hosted in the pre-built scikit learn container that AWS provides?
For reference, the examples I've seen follow this order of operations:
Creating an instance of sagemaker.sklearn.estimator.SKLearn and providing a training script
Call the fit method on it
This creates the model/model configuration in SageMaker
Call the deploy method on the SKLearn instance which automagically takes the model created in step 2/3 and deploys it in the pre-build scikit learn container as an HTTPS endpoint.
Yes, you can import existing models to SageMaker.
For scikit-learn, you would use the SKLearnModel() object to load to model from S3 and create it in SageMaker. Then, you could deploy it as usual.
https://sagemaker.readthedocs.io/en/latest/sagemaker.sklearn.html
Here's a full example based on MXNet that will point you in the right direction:
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/mxnet_onnx_superresolution/mxnet_onnx.ipynb
Struggled with the same use case for a couple days.
We used sagemaker.model.Model class and sagemaker.pipeline.PipelineModel
Outlined our solution here.
How to handle custom transformation/ inference and requirements in sagemaker endpoints

does tensorflow-serving or hosted Google ML allow for data preprocessing with 3rd-party libs when making online predictions? (Python 3)

I have many tensorflow models which make use of 3rd party libraries (i.e. Gensim) to preprocess data prior to training and evaluation. This same preprocessing needs to happen when querying the model to make predictions.
If using either tensorflow-serving or the hosted Google ML solution, can I bundle 3rd party libs and a custom preprocessing step along with the model, and have either of the two serving solutions run it? Or, if I want to use 3rd party libraries, do I have to preprocess the data client-side? I have not come across any examples of this.
Just to be explicit - I know you can do server-side preprocessing using tensorflow's libs, I'm specifically interested in the 3rd-party case.
As far as the ML Engine is concerned, I don't see how this would be possible. Models deployed there need to be in the SavedModel format. This doesn't include any Python files for example where you could run custom processing. In contrast the training job that creates the model can include custom dependencies.

Categories