Knime has generated for me a PMML model. At this time I want to apply this model to a python process. What is the right way to do this?
More in depth: I develop a django student attendance system. The application is already so mature that I have time to implement the 'I'm feeling lucky' button to automatically fill an attendance form. Here is where PMML comes in. Knime has generated a PMML model that predicts student attendance. Also, thanks to django for being so productive that I time for this great work ;)
Finally I have wrote my own code. Be free to contribute or fork it:
https://github.com/ctrl-alt-d/lightpmmlpredictor
The code for Augustus, to score PMML models in Python, is at https://code.google.com/p/augustus/
You could use PyPMML to apply PMML in Python, for example:
from pypmml import Model
model = Model.fromFile('the/pmml/file/path')
result = model.predict(data)
The data could be dict, json, Series or DataFrame of Pandas.
If you use PMML in PySpark, you could use PyPMML-Spark, for example:
from pypmml_spark import ScoreModel
model = ScoreModel.fromFile('the/pmml/file/path')
score_df = model.transform(df)
The df is a DataFrame of PySpark.
For more info about other PMML libraries, be free to see:
https://github.com/autodeployai
Related
New to Sagemaker..
Trained a "linear-learner" classification model using the Sagemaker API, and it saved a "model.tar.gz" file in my s3 path. From what I understand SM just used an image of a scikit logreg model.
Finally, I'd like to gain access to the model object itself, so I unpacked the "model.tar.gz" file only to find another file called "model_algo-1" with no extension.
Can anyone tell me how I can find the "real" modeling object without using the inference/Endpoint delpoy API provided by Sagemaker? There are some things I want to look at manually.
Thanks,
Craig
Linear-Learner is a built in algorithm written using MX-net and the binary is also MXNET compatible. You can't use this model outside of SageMaker as there is no open source implementation for this.
First of all, let me introduce myself. I am a young researcher and I am interested in machine learning. I created a model, trained, tested and validated. Now I would like to know if there is a way to save my trained model.
I am also interested in knowing if the model is saved trained.
Finally, is there a way to use the saved (and trained) model with new data without having to train the model again?
I work with python!
Welcome to the community.
Yes, you may save the trained model and reuse it later. There are several ways to do so and I will introduce you to a couple of them here. However, please note which library you used to build your model and use a method for that library.
Pickel: Pickle is the standard way of serializing objects in Python.
import pickle
pickle.dump(model, open(filename, 'wb'))
loaded_model = pickle.load(open(filename, 'rb'))
Joblib: Joblib is part of the SciPy ecosystem and provides utilities for pipelining Python jobs.
import joblib
joblib.dump(model, filename)
loaded_model = joblib.load(filename)
Finally, as suggested by others, if you used libraries such as Tensorflow to build and train your models, please note that they have extensive ways to work with the built model and save/load it. Please check the following information:
Tensorflow Save and Load model
There may be a better way to do this but this is how I have done it before, with python.
So you have a ML model that you have trained. That model is basically just a set of parameters. Depending on what module you are using, you can save those parameters in a file, and import them to regenerate your model later.
Perhaps a simpler way is to save the model object entirely in a file using pickling.
https://docs.python.org/3/library/pickle.html
You can dump the object into a file, and load it back when you want to run it again.
I want to use additional data to 'update' an already trained Light Gradient Boosting Model (LGBM). Is there a way to do that?
I am looking for an approach that uses the Skealrn API and thus can be used in a pipeline.
An LGBM model in python can be fitted both with the original model API and with the Sklearn API.
I couldn't find any examples of using the Sklearn API for continuous learning.
Regardless of that, you can fit a model either way and it is compatible with the .train() function from the original API.
It can be saved with save_model() or with joblib.dump().
This does not affect its compatibility with Python Pileline() - it is perfectly compatible.
I have pmml file generated by python having random forest classifier, I need to test the model again in python. Kindly let me know how to import the pmml file back to python so that I can test the model using new dataset.
I have tried using titanium package but it went to error because of the version issue of PMML.
The expected output to be the predicted value of the model so that I can verify the accuracy of the model.
You could use PyPMML to load PMML in Python, then make predictions on new dataset, e.g.
from pypmml import Model
model = Model.fromFile('the/pmml/file/path')
result = model.predict(data)
The data could be dict, string in JSON, Series or DataFrame of Pandas.
I have xgboost model, which was trained on pure Python and converted to pmml format. Now I need to use this model in PySpark script, but I out of ideas, how can I realize it. Are there methods that allow import pmml model in Python and use it for predict? Thanks for any suggestions.
BR,
Vladimir
Spark does not support importing from PMML directly. While I have not encountered a pyspark PMML importer there is one for java (https://github.com/jpmml/jpmml-evaluator-spark). What you can do is wrap the java (or scala) so you can access it from python (e.g. see http://aseigneurin.github.io/2016/09/01/spark-calling-scala-code-from-pyspark.html).
You could use PyPMML-Spark to import PMML in PySpark script, for example:
from pypmml_spark import ScoreModel
model = ScoreModel.fromFile('the/pmml/file/path')
score_df = model.transform(df)