Has anyone succeeded in speeding up scikit-learn models using numba and jit compilaition. The specific models I am looking at are regression models such as Logistic Regressions.
I am able to use numba to optimize the functions I write using sklearn models, but the model functions themselves are not affected by this and are not optimized, thus not providing a notable increase in speed. Is there are way to optimize the sklearn functions?
Any info about this would be much appreciated.
Scikit-learn makes heavy use of numpy, most of which is written in C and already compiled (hence not eligible for JIT optimization).
Further, the LogisticRegression model is essentially LinearSVC with the appropriate loss function. I could be slightly wrong about that, but in any case, it uses LIBLINEAR to do the solving, which is again a compiled C library.
The makers of scikit-learn also make heavy use of one of the python-to-compiled systems, Pyrex I think, which again results in optimized machine compiled code ineligible for JIT compilation.
The #numba.vectorize for arrays and #numba.guvectorise for matricies, are decorators may help since they work to combine loop operations. They generate so called "ufunc"s which achieve this goal, but instead of having to manually write the c code yourself it generates it from the python input.
See: http://numba.pydata.org/numba-doc/dev/user/vectorize.html
Related
I'm trying to perform an evaluation of total floating-point operations (FLOPs) of a neural network.
My problem is the following. I'm using a sigmoid function. My question is how to eval the FLOPs of the exponential function. I'm using Tensorflow which relies on NumPy for the exp function.
I tried to dig into the Numpy code but didn't find the implementation ... I saw some subjects here talking about fast implementation of exponential but it doesn't really help.
My guess is that it would use a Taylor implementation or Chebychev.
Do you have any clue about this? And if so an estimation of the amount of FLOPs. I tried to find some references as well on Google but nothing really standardized ...
Thank you a lot for your answers.
I looked into it for a bit and what i found is that numpy indeed uses the C implementation as seen here.
Tensorflow though doesnt use nmpy implementation, instead it uses the scalar_logistics_opfunction from the C++ library called Eigen. The source for that can be found here.
I've been running some large logistic regression models in SAS, which take 4+ hours to converge. Recently however I acquired access to a Hadoop cluster and can use Python to fit the same models much faster (something more like 10-15 minutes).
Problematically, I have some complete/quasi-complete separation of data points in my data which results in failure to converge; I was using the FIRTH command in SAS to produce robust parameter estimates despite that, but there seems to be no equivalent option for Python, either in sklearn or statsmodels (I'm mostly using the latter).
Is there another way to get around this problem in Python?
AFAIK, there is no Firth penalization available in Python. Statsmodels has an open issue but nobody is working on it at the moment.
As alternative it would be possible to use a different kind of penalization, e.g. as available in sklearn or maybe statsmodels.
The other option is to change the observed response variable. Firth can be implemented by augmenting the dataset. However, I don't know of any recipe or prototype for this in Python.
https://github.com/statsmodels/statsmodels/issues/3561
Statsmodels has ongoing work on penalization but currently the emphasis is on feature/variable selection (elastic net, SCAD) and quadratic penalization for generalized additive models GAM, especially for splines.
Firth uses data dependent penalization which does not fit the generic penalization framework where the penalization structure is a data independent "prior".
Conditional likelihood is another way to work around perfect separation. This is in a Statsmodels PR that is basically ready to use:
https://github.com/statsmodels/statsmodels/pull/5304
I am interested in the performance of Pyomo to generate an OR model with a huge number of constraints and variables (about 10e6). I am currently using GAMS to launch the optimizations but I would like to use the different python features and therefore use Pyomo to generate the model.
I made some tests and apparently when I write a model, the python methods used to define the constraints are called each time the constraint is instanciated. Before going further in my implementation, I would like to know if there exists a way to create directly a block of constraints based on numpy array data ? From my point of view, constructing constraints by block may be more efficient for large models.
Do you think it is possible to obtain performance comparable to GAMS or other AML languages with pyomo or other python modelling library ?
Thanks in advance for your help !
While you can use NumPy data when creating Pyomo constraints, you cannot currently create blocks of constraints in a single NumPy-style command with Pyomo. Fow what it's worth, I don't believe that you can in languages like AMPL or GAMS, either. While Pyomo may eventually support users defining constraints using matrix and vector operations, it is not likely that that interface would avoid generating the individual constraints, as the solver interfaces (e.g., NL, LP, MPS files) are all "flat" representations that explicit represent individual constraints. This is because Pyomo needs to explicitly generate representations of the algebra (i.e., the expressions) to send out to the solvers. In contrast, NumPy only has to calculate the result: it gets its efficiency by creating the data in a C/C++ backend (i.e., not in Python), relying on low-level BLAS operations to compute the results efficiently, and only bringing the result back to Python.
As far as performance and scalability goes, I have generated raw models with over 13e6 variables and 21e6 constraints. That said, Pyomo was designed for flexibility and extensibility over speed. Runtimes in Pyomo can be an order of magnitude slower than AMPL when using cPython (although that can shrink to within a factor of 4 or 5 using pypy). At least historically, AMPL has been faster than GAMS, so the gap between Pyomo and GAMS should be smaller.
I was also wondering the same when I came across this piece of code from Jonas Hörsch and Tom Brown and it was very useful to me:
https://github.com/FRESNA/PyPSA/blob/master/pypsa/opt.py
They define classes to define constraints more efficiently than the original Pyomo parser do. I did some tests on a large model that I have and it reduced the generation time considerably.
You can build big linear (LP) and mix-integer (MILP) optimization problems in Python with the open-source tool Linopy. Linopy promises a speedup of times 4-6 and a memory reduction of roughly 50% reaching roughly Julia JUMP performance. See the benchmark:
The tool is part of the PyPSA ecosystem. This tool is the next-level version of the PyPSA 'opt.py' developments that Jon Cardodo mentioned. It has roughly the same speed, same performance but better usability -- reported by developers.
I am new to machine learning and Python. I am trying to understand when to use the functions in sklearn.linear_model (linearregression and logisticregression) and when to implement my own code for the same. Any suggestions or references will be highly appreciated.
regards
Souvik
I recommend you use as much as possible the functions given by sklearn or another ML library (I like TensorFlow). That's because it's very difficult to get the performance of any library. They are calculating in a low level of the operating system, meanwhile the common users don't implement the computational actions outside the python environment.
Moreover, python by itself is not very efficient in relation to the data structures. For example,a simple array is implemented as a LinkedList. The ML libraries use Numpy to their computations to get a better performance.
I've looked through the documentation over at scikit learn and I haven't seen a straightforward way to replace the matrix-vector product evaluations during propagation with a custom evaluation call.
Is there a way to do this that's already part of the API...or are there any tricks that would allow me to inject a custom matrix vector product evaluator?
In short - no, it is not possible. Mostly because some aritchmetical operations are not even performed in python when you use scikit-learn - they are actually performed by C-based extensions (like libsvm library). You could monkey patch .dot of numpy to do what you want, but you do not have any guarantee that scikit-learn will still work, since it performs some operations using numpy and others using C-extensions.