I am using one of the new MacBooks with the M1 chip. I cannot use SciPy natively unless I go through Rosetta, but for other reasons I cannot do that now.
The ONLY thing I need from SciPy is scipy.linalg.expm. Is there an alternative library where there is an implementation of expm that is equally (or almost equally) fast?
Related
Has somebody managed to apply SciPy functions on CUDA kernels? As far as I see from the literature, there is no explicit support of SciPy for GPU computing.
I would like to speed-up code with some SciPy.optimize functions by applying it on CUDA kernels. If somebody knows of any other library for nonlinear optimization that can be applied on CUDA GPUs, would very much appreciate sharing it here.
Numba might be what you are looking for, it allows you to run python code on GPUs. Another good resource might be this tutorial
I have right now a similar problem and came across CuPy. It supports CUDA and has an interface that allows a drop-in replacement for NumPy and SciPy.
Have run into similar problems and I am now using tensorflow_probability's optimize capabilities. The biggest problem is that tensorflow ops often cannot be directly implemented from numpy functions, so there is a pretty steep learning curve. But it is very fast once your code is written.
I am writing a python script that needs to make a distribution fit against some generated data.
I found that this is possible using SciPy, or other packages having SciPy as dependency; however, due to administrative constraints, I am unable to install SciPy's dependencies (such as Blas) on the machine where the script will run.
Is there a way to perform distribution fitting in Python without using SciPy or packages depending on it?
EDIT: as asked in a comment, what I want to do is perform an Anderson-Darling test for normality.
The alternatives I found so far (but had to disregard):
statsmodel: has SciPy as dependency
R and Matlab python apis: need setup of external software, same problem for me as SciPy
Fitting the normal distribution only requires calculating mean and standard deviation.
The Anderson-Darling test only requires numpy or alternatively could be rewritten using list comprehension. The critical values for the AD-test are tabulated or based on a simple approximation formula. It does not use any difficult parts of scipy like optimize or special.
So, I think it should not be too difficult to translate either the scipy.stats or the statsmodels version to using pure Python or only with numpy as dependency.
Both SciPy and Numpy have built in functions for singular value decomposition (SVD). The commands are basically scipy.linalg.svd and numpy.linalg.svd. What is the difference between these two? Is any of them better than the other one?
From the FAQ page, it says scipy.linalg submodule provides a more complete wrapper for the Fortran LAPACK library whereas numpy.linalg tries to be able to build independent of LAPACK.
I did some benchmarks for the different implementation of the svd functions and found scipy.linalg.svd is faster than the numpy counterpart:
However, jax wrapped numpy, aka jax.numpy.linalg.svd is even faster:
Full notebook for the benchmarks are available here.
Apart from the error checking, the actual work seems to be done within lapack
both with numpy and scipy.
Without having done any benchmarking, I guess the performance should be identical.
I've been using spatial.cKDTree in scipy to calculate distances between points. It has always run very quickly (~1 s) for my typical data sets (finding distances for ~1000 points to an array of ~1e6 points).
I'm running this code in python 2.7.6 on a computer with Ubuntu 14.10. Up until this morning, I had managed most python packages with apt-get, including scipy and numpy. I wanted up-to-date versions of a few packages though, so I decided to packages installed in /usr/lib/python2.7/ by apt-get, and re-installed all packages with pip install (taking care of scipy dependencies like liblapack-dev with apt-get, as necessary). Everything installed and is importable without a problem.
import scipy
import cython
scipy.__version__
'0.16.0'
cython.__version__
'0.22.1'
Now, running spatial.cKDTree on the same size data sets is going really slowly. I'm seeing run time of ~500 s rather than ~1 s. I'm having trouble figuring out what is going on.
Any suggestions as to what I might have done in installing using pip rather than apt-get that would have caused scipy.spatial.cKDTree to run so slowly?
In 0.16.x I added options to build the cKDTree with median or sliding midpoint rules, as well as choosing whether to recompute the bounding hyperrectangle at each node in the kd-tree. The defaults are based on experiences about the performance of scipy.spatial.cKDTree and sklearn.neighbors.KDTree. In some contrived cases (data that are highly streched along a dimension) it can have negative impact, but usually it should be faster. Experiment with bulding the cKDTree with balanced_tree=False and/or compact_nodes=False. Setting both to False gives you the same behavior as 0.15.x. Unfortunately it is difficult to set defaults that make everyone happy because the performance depends on the data.
Also note that with balanced_tree=True we compute medians by quickselect when the kd-tree is constructed. If the data for some reason is pre-sorted, it will be very slow. In this case it will help to shuffle the rows of the input data. Or you can set balanced_tree=False to avoid the partial quicksorts.
There is also a new option to multithread the nearest-neighbor query. Try to call query with n_jobs=-1 and see if it helps for you.
Update June 2020:
SciPy 1.5.0 will use a new algorithm (introselect based partial sort, from C++ STL) which solves the problems reported here.
In the next release of SciPy, balanced kd-trees will be created with introselect instead of quickselect, which is much faster on structured datasets. If you use cKDTree on a structured data set such as an image or a grid, you can look forward to a major boost in performance. It is already available if you build SciPy from its master branch on GitHub.
Is there a good (small and light) alternative to numpy for python, to do linear algebra?
I only need matrices (multiplication, addition), inverses, transposes and such.
Why?
I am tired of trying to install numpy/scipy - it is such a pita to get
it to work - it never seems to install correctly (esp. since I have
two machines, one linux and one windows): no matter what I do: compile
it or install from pre-built binaries. How hard is it to make a
"normal" installer that just works?
I'm surprised nobody mentioned SymPy, which is written entirely in Python and does not require compilation like Numpy.
There is also tinynumpy, which is a pure python alternative to Numpy with limited features.
Given your question, I decided just factor out the matrix code from where I were using it, and put it in a publicly accessible place -
So, this is basically a pure python ad-hoc implementation of a Matrix class which can perform addition, multiplication, matrix determinant and matrix inversion - should be of some use -
Since it is in pure python, and not worried with performance at all it unsuitable for any real calculation - but it is good enough for playing around with matrices in an interactive way, or where matrix algebra is far from being the critical part of the code.
The repository is here,
https://bitbucket.org/jsbueno/toymatrix/
And you can download it straight from here:
https://bitbucket.org/jsbueno/toymatrix/downloads/toymatrix_0.1.tar.gz
I hear you, I have been there as well. Numpy/scipy are really wonderful libraries and it is a pity that installation problems get somewhat often in the way of their usage.
Also, as far as I understand there are not very many good (easier to use) options either. The only possibly easier solution for you I know about is the "Yet Another Matrix Module" (see NumericAndScientific/Libraries listing on python.org). I am not aware of the status of this library (stability, speed, etc.). The possibility is that in the long run your needs will outgrow any simple library and you will end up installing numpy anyway.
Another notable downside on using any other library is that your code will potentially be incompatible with numpy, which happens to be the de facto library for linear algebra in python. Note also that numpy has been heavily optimized - speed is something you are not guaranteed to get with other libraries.
I would really just put more effort on solving the installation/setup problems. The alternatives are potentially much worse.
Have you ever tried anaconda? https://www.anaconda.com/download
This should allow it to install those packages easily.
conda install -c conda-forge scipy
conda install -c conda-forge numpy
Apart from offering you an easy way to install them in linux/mac/linux you will get virtualenviroments management too
I sometimes have this problem..not sure if this works but I often install it using my own account then try to run it in an IDE(komodo in my case) and it doesn't work. Like your issue it says it cannot find it. The way I solve this is to use sudo -i to get into root and then install it from there.
If that does not work can you update your answer to provide a bit more info about the type of system your using(linux, mac, windows), version of python/numpy and how your accessing it so it'll be easier to help.
For people who still have the problem: Try python portable:
http://portablepython.com/wiki/Download/
Have a look: tinynumpy, tinyarray and sympy
https://github.com/wadetb/tinynumpy
https://github.com/kwant-project/tinyarray
https://docs.sympy.org/latest/index.html