I'm trying to perform Fitted Value Iteration (FVI) in python (involving approximating a 5 dimensional function using piecewise linear interpolation).
scipy.interpolate.griddata works perfectly for this. However, I need to call the interpolation routine several thousand times (since FVI is a MC based algorithm).
So basically, the set of points where the function is known is static (and large - say 32k), but the points i need to approximate (which are small perturbations of the original set) is very large (32k x 5000 say).
Is there an implementation of what scipy.interpolate.griddata does that's been ported to CUDA?
alternatively, is there a way to speed up the calculation somehow?
Thanks.
For piece-wise linear interpolation, the docs say that scipy.interpolate.griddata uses the methods of scipy.interpolate.LinearNDInterpolator, which in turn uses qhull to do a Delaunay tesellation of the input points, then performs standard barycentric interpolation, where for each point you have to determine inside which hypertetrahedron each point is, then use its barycentric coordinates as the interpolation weights for the hypertetrahedron node values.
The tesellation is probably hard to parallelize, but you can access the CPU version with scipy.spatial.Delaunay. The other two steps are easily parallelized, although I don't know of any freely available implementation.
If your known-function points are on a regular grid, the method described here is specially easy to implement in CUDA, and I have worked with actual implementations of it, albeit none publicly available.
So I am afraid you are going to have to do most of the work yourself...
Related
Is there any place with a brief description of each of the algorithms for the parameter method in the minimize function of the lmfit package? Both there and in the documentation of SciPy there is no explanation about the details of each algorithm. Right now I know I can choose between them but I don't know which one to choose...
My current problem
I am using lmfit in Python to minimize a function. I want to minimize the function within a finite and predefined range where the function has the following characteristics:
It is almost zero everywhere, which makes it to be numerically identical to zero almost everywhere.
It has a very, very sharp peak in some point.
The peak can be anywhere within the region.
This makes many minimization algorithms to not work. Right now I am using a combination of the brute force method (method="brute") to find a point close to the peak and then feed this value to the Nelder-Mead algorithm (method="nelder") to finally perform the minimization. It is working approximately 50 % of the times, and the other 50 % of the times it fails to find the minimum. I wonder if there are better algorithms for cases like this one...
I think it is a fair point that docs for lmfit (such as https://lmfit.github.io/lmfit-py/fitting.html#fit-methods-table) and scipy.optimize (such as https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#optimization-scipy-optimize) do not give detailed mathematical descriptions of the algorithms.
Then again, most of the docs for scipy, numpy, and related libraries describe how to use the methods, but do not describe in much mathematical detail how the algorithms work.
In fairness, the different optimization algorithms share many features and the differences between them can get pretty technical. All of these methods try to minimize some metric (often called "cost" or "residual") by changing the values of parameters for the supplied function.
It sort of takes a text book (or at least a Wikipedia page) to establish the concepts and mathematical terms used for these methods, and then a paper (or at least a Wikipedia page) to describe how each method differs from the others. So, I think the basic answer would be to look up the different methods.
Given a set of 2D points, I would like to fit the optimal spline to this data with a given number of internal knots.
I have seen that we can use scipy's LSQUnivariateSpline to specify the number and position of knots, however it does not allow us to only specify the number of knots.
From the UnivariateSpline documentation, it seems implied that they have a method for fitting the spline with a given number of knots, as the documentation for the smoothing factor s states (emphasis mine):
Positive smoothing factor used to choose the number of knots. Number
of knots will be increased until the smoothing condition is satisfied...
So while I could go about this in a kind of backwards way and search through smoothing factors until it yields a spline with the desired number of knots, this seems to be a rather ridiculous way to approach this from a computational efficiency standpoint. Two extra search steps are happening just to cancel each other out and obtain a result that was already computed directly at the start.
I've searched around but haven't found a function to access this spline interpolation with a given number of knots directly. I'm not sure if I've missed something simple, or if it's hidden deeper down somewhere and/or not available in the API.
Note: a scipy solution is not required, any python libraries or handcrafted python code is fine (I am using scipy here just because that's where all of my searches about spline interpolation in python have landed me).
Unfortunately, it looks like the UnivariateSpline constructor passes off the computational work to the function dfitpack.curf0, which is implemented in Fortran.
Therefore, although the documentation indicates that the smoothing requirement is met by adjusting the number of knots, there is no way to directly access the function which fits a spline given a number of knots from the python API.
In light of this, it looks like one may need to look to another library or write the algorithm oneself, if avoiding the roundabout double search method is desired. However, in many cases, it may be acceptable to simply run a binary search for the desired number of knots by adjusting the smoothing parameter.
Scipy does not have smoothing splines with a fixed number of knots. You either provide your knots, or let FITPACK select it via the smoothing condition knob.
I need to find the diameter of the points cloud (two points with maximum distance between them) in 3-dimensional space. As a temporary solution, right now I'm just iterating through all possible pairs and comparing the distance between them, which is a very slow, O(n^2) solution.
I believe it can be done in O(n log n). It's a fairly easy task in 2D (just find the convex hull and then apply the rotating calipers algorithm), but in 3D I can't imagine how to use rotating calipers, since there is no way to order the points.
Is there any simple way to do it (or ready-to-use implementation in python or C/C++)?
PS: There are similar questions on StackOverflow, but the answers that I found only refers to Rotating Calipers (or similar) algorithms, which works fine in 2D situation but not really clear how to implement in 3D (or higher dimensionals).
While O(n log n) expected time algorithms exist in 3d, they seem tricky to implement (while staying competitive to brute-force O(n^2) algorithms).
An algorithm is described in Har-Peled 2001. The authors provide a source code than can optionally be used for optimal computation. I was not able to download the latest version, the "old" version could be enough for your purpose, or you might want to contact the authors for the code.
An alternative approach is presented in Malandain & Boissonnat 2002 and the authors provide code. Altough this algorithm is presented as approximate in higher dimensions, it could fit your purpose. Note that their code provide an implementation of Har-Peled's method for exact computation that you might also check.
In any case, in a real-world usage you should always check that your algorithm remains competitive with respect to the naïve O(n^2) approach.
So, I have the following data I've plotted in Python.
The data is input for a forcing term in a system of differential equations I am working with. Thus, I need to fit a continuous function to this data so I will not have to deal with stability issues that could come with discontinuities of a step-wise function. Unfortunately, it's a pretty large data set.
I am trying to end up with a fitted function that is possible and not too tedious to translate into Stan, the language that I am coding the differential equations in, so was preferring something in piece-wise polynomial form with a maximum of just a few pieces that I can manually code.
I started off with polyfit from numpy, which was not very good. Using UnivariateSpline from scipy gave me a decent fit, but it did not give me something that looked tractable for translation into Stan. Hence, I was looking for suggestions into other fits I could try that would return functions that are more easily translatable into other languages? Looking at the shape of my data, is there a periodic spline fit that could be useful?
The UnivariateSpline object has get_knots and get_coeffs methods. They give you the knots and coefficients of the fit in the b-spline basis.
An alternative, equivalent, way is to use splrep for fitting (and splev for evaluations).
To convert to a piecewise polynomial representation, use PPoly.from_spline (check the docs for the latter for the exact format)
If what you want is a Fourier space representation, you can use leastsq or least_squares. It'd be essential to provide sensible starting values for NLSQ fit parameters. At least I'd start from e.g. max-to-max distance estimate for the period and max-to-min estimate for the amplitude.
As always with non-linear fitting, YMMV, however.
From the direction field, it seems that a fit involving the sum of or composition of multiple sinusoidal functions might be it.
Ex: sin(cos(2x)), sin(x)+2cos(x), etc.
I would use Wolfram Alpha, Mathematica, or Matlab to create direction fields.
I'm trying to use KernelPCA for reducing the dimensionality of a dataset to 2D (both for visualization purposes and for further data analysis).
I experimented computing KernelPCA using a RBF kernel at various values of Gamma, but the result is unstable:
(each frame is a slightly different value of Gamma, where Gamma is varying continuously from 0 to 1)
Looks like it is not deterministic.
Is there a way to stabilize it/make it deterministic?
Code used to generate transformed data:
def pca(X, gamma1):
kpca = KernelPCA(kernel="rbf", fit_inverse_transform=True, gamma=gamma1)
X_kpca = kpca.fit_transform(X)
#X_back = kpca.inverse_transform(X_kpca)
return X_kpca
KernelPCA should be deterministic and evolve continuously with gamma. It is different from RBFSampler that does have built-in randomness in order to provide an efficient (more scalable) approximation of the RBF kernel.
However what can change in KernelPCA is the order of the principal components: in scikit-learn they are returned sorted in order of descending eigenvalue, so if you have 2 eigenvalues close to each other it could be that the order changes with gamma.
My guess (from the gif) is that this is what is happening here: the axes along which you are plotting are not constant so your data seems to jump around.
Could you provide the code you used to produce the gif?
I'm guessing it is a plot of the data points along the 2 first principal components but it would help to see how you produced it.
You could try to further inspect it by looking at the values of kpca.alphas_ (the eigenvectors) for each value of gamma.
Hope this makes sense.
EDIT: As you noted it looks like the points are reflected against the axis, the most plausible explanation is that one of the eigenvector flips sign (note this does not affect the eigenvalue).
I put in a simple gist to reproduce the issue (you'll need a Jupyter notebook to run it). You can see the sign-flipping when you change the value of gamma.
As a complement note that this kind of discrepancy happens only because you fit several times the KernelPCA object several times. Once you settled with a particular gamma value and you've fit kpca once you can call transform several times and get consistent results.
For the classical PCA the docs mention that:
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
I don't know about the behavior of a single KernelPCA object that you would fit several times (I did not find anything relevant in the docs).
It does not apply to your case though as you have to fit the object with several gamma values.
So... I can't give you a definitive answer on why KernelPCA is not deterministic. The behavior resembles the differences I've observed between the results of PCA and RandomizedPCA. PCA is deterministic, but RandomizedPCA is not, and sometimes the eigenvectors are flipped in sign relative to the PCA eigenvectors.
That leads me to my vague idea of how you might get more deterministic results....maybe. Use RBFSampler with a fixed seed:
def pca(X, gamma1):
kernvals = RBFSampler(gamma=gamma1, random_state=0).fit_transform(X)
kpca = PCA().fit_transform(X)
X_kpca = kpca.fit_transform(X)
return X_kpca