How can I use pmml model in PySpark script? - python

I have xgboost model, which was trained on pure Python and converted to pmml format. Now I need to use this model in PySpark script, but I out of ideas, how can I realize it. Are there methods that allow import pmml model in Python and use it for predict? Thanks for any suggestions.
BR,
Vladimir

Spark does not support importing from PMML directly. While I have not encountered a pyspark PMML importer there is one for java (https://github.com/jpmml/jpmml-evaluator-spark). What you can do is wrap the java (or scala) so you can access it from python (e.g. see http://aseigneurin.github.io/2016/09/01/spark-calling-scala-code-from-pyspark.html).

You could use PyPMML-Spark to import PMML in PySpark script, for example:
from pypmml_spark import ScoreModel
model = ScoreModel.fromFile('the/pmml/file/path')
score_df = model.transform(df)

Related

How to use scikit learn model from inside sagemaker 'model.tar.gz' file?

New to Sagemaker..
Trained a "linear-learner" classification model using the Sagemaker API, and it saved a "model.tar.gz" file in my s3 path. From what I understand SM just used an image of a scikit logreg model.
Finally, I'd like to gain access to the model object itself, so I unpacked the "model.tar.gz" file only to find another file called "model_algo-1" with no extension.
Can anyone tell me how I can find the "real" modeling object without using the inference/Endpoint delpoy API provided by Sagemaker? There are some things I want to look at manually.
Thanks,
Craig
Linear-Learner is a built in algorithm written using MX-net and the binary is also MXNET compatible. You can't use this model outside of SageMaker as there is no open source implementation for this.

How to convert LIGHTGBM to PMML?

I would like to know if there exist a way to convert a LightGBM model to a PMML. Starting from the lightgbm.basic.Booster object I would like to know how to convert it to a PMML or MOJO/POJO object. If is not possible, i would like to know if it is possible to save the LGBM model as Pickle and than convert it to a PMML (or MOJO/POJO) object.
For now, there are at least two ways to create PMML from lightGBM, such as sklearn2pmml and Nyoka, but both cannot create PMML from a learned Booster.
To create PMML, we need to use a Scikit-learn API such as LGBMClassifier and Pipeline. Both packages can create PMML in almost the same way. The detailed usage is described in here for sklearn2pmml and here for Nyoka, and both are pretty simple.
Nyoka just uses the normal Scikit-learn API for training, but sklearn2pmml requires Java to be installed and PMMLPipeline to be used during training, so if you are using python and sklearn, Nyoka may be a better choice.
It would be nice if there was a way to create PMML directly from a trained Booster or a way to convert Booster to LGBMClassifier and then create PMML, but there are no other packages to create PMML from Booster directly and according to this, there is no official way to convert Booster to LGBMClassifier.

Load and use XGBoost PMML or XGBoost .rds model in python sklearn with out loosing its dependencies/nature

I have a XGB model which is built in R, I wanted to use the same model in python for predictions without loosing its nature/embedded dependencies.
I've tried rpy2 and sklearn-pmml-model, seems there is no help for XGBoost models.
Appreciate any help.
You could use PyPMML to evaluate XGBoost PMML in Python, for example:
from pypmml import Model
model = Model.fromFile('the/pmml/model/file/path')
result = predict(data)
The data could be in dict, Series or DataFrame of Pandas, or JSON string.

Importing PMML models into Python (Scikit-learn)

There seem to be a few options for exporting PMML models out of scikit-learn, such as sklearn2pmml, but a lot less information going in the other direction. My case is an XGboost model previously built in R, and saved to PMML using r2pmml, that I would like to use in Python. Scikit normally uses pickle to save/load models, but is it also possible to import models into scikit-learn using PMML?
You can't connect different specialized representations (such as R and Scikit-Learn native data structures) over a generalized representation (such as PMML). You may have better luck trying to translate R data structures to Scikit-Learn data structures directly.
XGBoost is really an exception to the above rule, because its R and Scikit-Learn implementations are just thin wrappers around the native XGBoost library. Inside a trained R XGBoost object there's a blob raw, which is the model in its native XGBoost representation. Save it to a file, and load in Python using the xgb.Booster.load_model(fname) method.
If you know that you need to the deploy XGBoost model in Scikit-Learn, then why do you train it in R?

Apply PMML predictor model in python

Knime has generated for me a PMML model. At this time I want to apply this model to a python process. What is the right way to do this?
More in depth: I develop a django student attendance system. The application is already so mature that I have time to implement the 'I'm feeling lucky' button to automatically fill an attendance form. Here is where PMML comes in. Knime has generated a PMML model that predicts student attendance. Also, thanks to django for being so productive that I time for this great work ;)
Finally I have wrote my own code. Be free to contribute or fork it:
https://github.com/ctrl-alt-d/lightpmmlpredictor
The code for Augustus, to score PMML models in Python, is at https://code.google.com/p/augustus/
You could use PyPMML to apply PMML in Python, for example:
from pypmml import Model
model = Model.fromFile('the/pmml/file/path')
result = model.predict(data)
The data could be dict, json, Series or DataFrame of Pandas.
If you use PMML in PySpark, you could use PyPMML-Spark, for example:
from pypmml_spark import ScoreModel
model = ScoreModel.fromFile('the/pmml/file/path')
score_df = model.transform(df)
The df is a DataFrame of PySpark.
For more info about other PMML libraries, be free to see:
https://github.com/autodeployai

Categories