I have once created and trained a GeneralizedLinearModel in Matlab, then saved it to my drive. Regularly I load this model in Matlab and use the '.predict' on some data. Now I would like to load the saved model (with its specific coefficients) into Python and run the'.predic' the same as in Matlab.
Does anyone know how to do that or alternatively how to implement the prediction given the coefficients?
Related
I am currently working on a Record Linkage (identifying data sets, which describe the same entity in the real world) Program. Herefore, I am using the Python Record Linkage Toolkit (https://recordlinkage.readthedocs.io/en/latest/ref-classifiers.html#classifiers) and the provided ECMClassifier. I get correct results, but right now I need to train the Classifier again every time I run my script. The relevant codelines are:
ecm = recordlinkage.ECMClassifier(binarize=0.4)
matches = ecm.fit_predict(comparison_vectors)
Now, my question is, can I just save the Classifier after the training and reload it when I run the script the next time ? This way I could save training time and maybe train the Classifier more and more each time.
Saving ML-models is not new and there are easy python packages to do this like pickle5 (as described here: https://www.projectpro.io/recipes/save-trained-model-in-python ).
What I'm concerned about is:
Does the classifier itself change/learn with each use or is everything happening only in the _fit_predict function and therefore the progress can not be saved ?
Is it a problem, that training and prediction happen in one method ? Is it not useful, to save the classifier after both steps have already been undertaken ?
Are there other things to consider when saving/reloading a classifer that are not in line with the default way pickle saves and loads objects?
I am using Python version: 3.8.13.
Thanks in advance!
I am trying to create a large numpy array, say
S=0.5
a=np.random.normal(size=(100000,10000))
x=np.maximum(S-a,1)
#This is just an example. The calculation is more complicated than this.
But it is too large for the memory. After creating this array, I also need to do manipulation, and use it as training data in machine learning( eg, xgboost, or CART).
So my questions are : 1.How would I create such a big array without getting a memory error and could let me do calculations as well? Could you recommend some packages or links I could learn to do this please? 2. Suppose this array is already saved in a file. How do I load it and then train my model without causing memory error?
I have read
<https://pythonspeed.com/articles/mmap-vs-zarr-hdf5/ >
but it didn't say how to write data onto disk.
Anyone could help please? Thanks a lot.
Dask can help with large NumPy arrays, but they do not support all functions of NumPy API.
Since you have mentioned in your question that your final target is to use the data for training a machine learning model lets look at the problem from the other end.
Assuming that you somehow managed to load the data into memory how do you plan to pass it over to the underlining ML models ? Most of the available classical ML models work on Numpy arrays, so even if you manage to load the data is some format you cannot pass it on to the ML model to train it using those representations until unless they are numpy arrays.
In case your data is sparse you can store them in sparse numpy arrays and some classical models can handle sparse numpy arrays.
This is the general case when data is very large to be fit into memory. Ideally you have to look at ML models which can be trained on a batch at a time. This way you can load a batch of data and train one batch at a time. Any ML model which can be trained using gradient decent algorithm can be trained on a batch at a time. Deep learning models are trained using gradient decent and so they all work on batch of data at a time.
So if you decide to use deep learning models, you normally will end up writing a data loader which will load a batch of data at a time.
If you do not want to use the batch based training models:
Bottom line is, since your final target it to train a ML model, first find out the data representation of the ML model you want to use then try to solve your problem of fitting the data into that format. It will be waste of your time and effort to figure out how to fit the data into memory and then realize that your ML model can not work on that representation.
I'm having trouble figuring out how to use a support vector machine trained on Weka for real time processing with python.
For example when you train a back propagation algorithm on Matlab, you can extract the weights and biases and use them to replicate the network on other programs (e.g python) in feed-forward.
Thanks for your suggestions.
Assuming you want to continue using Python and Weka, the easiest way is to just call the Weka command line using subprocess (see https://docs.python.org/2/library/subprocess.html). You can then train and save your models and use them as needed. See this reference: https://weka.wikispaces.com/Saving+and+loading+models
What I have read about the tutorials is that you create your data then write the model using protobuf and then you write the solver file. Finally you train the model and you get your generated file. All this is done though command line. Now there are two questions
1) Suppose I have the generated model now how do I load a new image not in the test folder and perform a forward pass. Should it be done though command line or from some language(c++, python) ?
2) I guess above was one way of doing it. What is the best way to train the classifier (command line train/ or though coding) and how to use the generated model file(after training) in your code.
I want to interface caffe with my code but I am not able to find a short tutorial which will give me step by step on any database say mnist and the model doesn't need to be as complicated as LeNet but a simple Fully connected layer will also do. But can anyone tell me how to just write a simple code using C++ or python and train any dataset from scratch.
A sample C++/python code for training a classifier and using it to predict new data using caffe would also be appreciated.
Training is best done using the command line. See this tutorial.
Once you trained a model and you have a myModel.caffemodel file (a binary file storing the wieghts of the different layers) and a deploy.prototxt file (a text file describing your net), you can use python interface to classify images.
You can run python script classify.py to classify image(s) from command line. This script wraps around classifier.py - a python object that holds a trained net and allows you to perform forward passes in python.
How to obtain the weights for SVM in OpenCV 2.4.6 for Python 2.7.5?
I need this to calculate the primal form of my cv2.SVM() to feed it to a cv2.HOGDescriptor().setSVMDetector.
Found this and this SOF useful, but it seems like SVM.decision_func is protected and I cannot access this variable to obtain the weights.
Are there any other ways to do this in Python+OpenCV?
You can use Save method to save SVM to hard drive.
If you need only weights then I think you need to modify code or look closer at format in which opencv svm saved.
here some info
OpenCv SVM output file format