I need to approximate (by the gaussian function) 2 or 3 dimensional data sets using Python, but i found only interpolation methods. Anyone have heard about some library, that can do that ?
Related
I am trying to implement a discrete wavelet transform (DWT) in 3D, and I have found the MATLAB equivalent, wavedec3. Does anyone know if there is a Python equivalent I can use rather than going ahead and writing my own?
I used pywt the wavedecn(array, method, level) and it does what I wanted to do: perform discrete wavelet transform on a 3D array in multi-levels.
I am looking for a Scala implementation of Python's sklearn.preprocessing.QuantileTransformer class. There doesn't seem to be a single Class that can implement the entire functionality in scala.
The Python implementation has 3 major parts:
1) Compute quantiles for given data and percentile array using numpy.percentile(). If quantile lies between two input data points, then linear interpolation is used. The closest I can find in Scala is in breeze, which has percentile() function (Observation: The DataFrame.stats.approxQuantile() does not perform the linear interpolation and thus can't be used here).
2) Uses numpy.interp() to convert the input range of values to a given range. Eg If input data range is 1-100, it can be converted to any given range say 0-1. Again this uses linear interpolation when input data is present between 2 quantiles. The closest I can find in Scala is breeze.interpolation class.
3)Calculate the inverse CDF using numpy.ppf(). I believe, for this I can use the NormalDistribution class as one answer below or StandardScaler class.
Anything better to make the coding short and simple?
The Apache Commons Math library has a NormalDistribution class, which has an inverseCumulativeProbability method that calculates the specified quantile value. That should suit your purposes.
I used sklearn cluster-algorithm dbscan to get clusters of my data.
Data: Non-Geometrical objects based on hex-decimal strings
I used a simple distance to create a distance matrix as input for dbscan resulting in expected clusters.
Question Is it possible to create a plot of these cluster-results like in demo
I didn't found a solution through search.
I need to graphically demonstrate the similarities of the objects and clusters to each other.
Since I am using python for everything (in that project) I would appreciate it to choose a solution in python.
I don't use python, so I cannot give you example code.
If your data isn't 2 dimensional, you can try to find a good 2-dimensional approximation using Multidimensional Scaling.
Essentially, it takes an input matrix (which should satistify triangular ineuqality, and ideally be derived from Euclidean distance in some vector space; but you can often get good results if this does not strictly hold). It then tries to find the best 2-dimensional data set that has the same distances.
I'm trying to perform Fitted Value Iteration (FVI) in python (involving approximating a 5 dimensional function using piecewise linear interpolation).
scipy.interpolate.griddata works perfectly for this. However, I need to call the interpolation routine several thousand times (since FVI is a MC based algorithm).
So basically, the set of points where the function is known is static (and large - say 32k), but the points i need to approximate (which are small perturbations of the original set) is very large (32k x 5000 say).
Is there an implementation of what scipy.interpolate.griddata does that's been ported to CUDA?
alternatively, is there a way to speed up the calculation somehow?
Thanks.
For piece-wise linear interpolation, the docs say that scipy.interpolate.griddata uses the methods of scipy.interpolate.LinearNDInterpolator, which in turn uses qhull to do a Delaunay tesellation of the input points, then performs standard barycentric interpolation, where for each point you have to determine inside which hypertetrahedron each point is, then use its barycentric coordinates as the interpolation weights for the hypertetrahedron node values.
The tesellation is probably hard to parallelize, but you can access the CPU version with scipy.spatial.Delaunay. The other two steps are easily parallelized, although I don't know of any freely available implementation.
If your known-function points are on a regular grid, the method described here is specially easy to implement in CUDA, and I have worked with actual implementations of it, albeit none publicly available.
So I am afraid you are going to have to do most of the work yourself...
Given a 1D array of values, what is the simplest way to figure out what the best fit bimodal distribution to it is, where each 'mode' is a normal distribution? Or in other words, how can you find the combination of two normal distributions that bests reproduces the 1D array of values?
Specifically, I'm interested in implementing this in python, but answers don't have to be language specific.
Thanks!
What you are trying to do is called a Gaussian Mixture model. The standard approach to solving this is using Expectation Maximization, scipy svn includes a section on machine learning and em called scikits. I use it a a fair bit.
I suggest using the awesome scipy package.
It provides a few methods for optimisation.
There's a big fat caveat with simply applying a pre-defined least square fit or something along those lines.
Here are a few problems you will run into:
Noise larger than second/both peaks.
Partial peak - your data is cut of at one of the borders.
Sampling - width of peaks are smaller than your sampled data.
It isn't normal - you'll get some result ...
Overlap - If peaks overlap you'll find that often one peak is fitted correctly but the second will apporach zero...