Show uncertainty about isocontour in Matplotlib - python

I'm currently working with uncertainty visualization, using Python / Matplotlib, and I would like to be able to display the uncertainty about an isocontour in 2D. I will try to be more specific:
As input, I have N realizations of a scalar field.
These realizations are simulations of the same experiment.
For each of these N realizations, I have an isocontour.
Given that, I know that I can display the contour using the plt.contour function.
However, what I would like to do is to map each possible contour (by level) to a color map showing its probability of happening, something like its positional uncertainty. I.e, each one of the contour levels would be mapped to a color to represent its probability of happening.
It is possible to do that somehow? I don't know how I could verify if, for each contour at level b through n realizations, they are the same or not.

Related

Python distribution statistics on scatter plot style data

I'm trying to get statistics on a distribution but all the libraries I've seen require the input to be in histogram style. That is, with a huge long array of numbers like what plt.hist wants as an input.
I have the bar chart equivalent, i.e. 2 arrays; one with the x-axis centre points, and one with y-axis values for the corresponding value of each point. The plot looks like this:
My question is how can I apply statistics such as mean, range, skewness and kurtosis on this dataset. The numbers are not always integers. It seems very inefficient to force python to make a histogram style array with, for example, 180x 0.125's, 570x 0.25's e.t.c. as in the figure above.
Doing mean on the current array I have will give me the average frequency of all sizes, i.e. plotting a horizontal line on the figure above. I'd like a vertical line to show the average, as if it were a distribution.
Feels like there should be an easy solution! Thanks in advance.

Python adaptive histogram widths

I am currently working on a project where I have to bin up to 10-dimensional data. This works totally fine with numpy.histogramdd, however with one have a serious obstacle:
My parameter space is pretty large, but only a fraction is actually inhabited by data (say, maybe a few % or so...). In these regions, the data is quite rich, so I would like to use relatively small bin widths. The problem here, however, is that the RAM usage totally explodes. I see usage of 20GB+ for only 5 dimensions which is already absolutely not practical. I tried defining the grid myself, but the problem persists...
My idea would be to manually specify the bin edges, where I just use very large bin widths for empty regions in the data space. Only in regions where I actually have data, I would need to go to a finer scale.
I was wondering if anyone here knows of such an implementation already which works in arbitrary numbers of dimensions.
thanks 😊
I think you should first remap your data, then create the histogram, and then interpret the histogram knowing the values have been transformed. One possibility would be to tweak the histogram tick labels so that they display mapped values.
One possible way of doing it, for example, would be:
Sort one dimension of data as an unidimensional array;
Integrate this array, so you have a cumulative distribution;
Find the steepest part of this distribution, and choose a horizontal interval corresponding to a "good" bin size for the peak of your histogram - that is, a size that gives you good resolution;
Find the size of this same interval along the vertical axis. That will give you a bin size to apply along the vertical axis;
Create the bins using the vertical span of that bin - that is, "draw" horizontal, equidistant lines to create your bins, instead of the most common way of drawing vertical ones;
That way, you'll have lots of bins where data is more dense, and lesser bins where data is more sparse.
Two things to consider:
The mapping function is the cumulative distribution of the sorted values along that dimension. This can be quite arbitrary. If the distribution resembles some well known algebraic function, you could define it mathematically and use it to perform a two-way transform between actual value data and "adaptive" histogram data;
This applies to only one dimension. Care must be taken as how this would work if the histograms from multiple dimensions are to be combined.

Tricontour with triangle values

I would like to produce a tricontour plot similar to these with matplotlib. The difference between these examples and my situation is that I don't have the values of my function in the grid points: they are defined in my triangles (e.g. in the centroid of each triangle).
I would like to plot the result of a finite volume simulation, where the values are defined for each control volume, not for each grid point.
I suppose one simple solution would be to average the values at each grid point. I would like to know if there are any more direct solutions.
Maybe not exactly what you are looking for, but tripcolor function is designed for this use case (value defined at triangle centroid)
See for instance:
http://matplotlib.org/examples/pylab_examples/tripcolor_demo.html

How interpolate 3D coordinates

I have data points in x,y,z format. They form a point cloud of a closed manifold. How can I interpolate them using R-Project or Python? (Like polynomial splines)
It depends on what the points originally represented. Just having an array of points is generally not enough to derive the original manifold from. You need to know which points go together.
The most common low-level boundary representation ("brep") is a bunch of triangles. This is e.g. what OpenGL and Directx get as input. I've written a Python software that can convert triangular meshes in STL format to e.g. a PDF image. Maybe you can adapt that to for your purpose. Interpolating a triangle is usually not necessary, but rather trivail to do. Create three new points each halfway between two original point. These three points form an inner triangle, and the rest of the surface forms three triangles. So with this you have transformed one triangle into four triangles.
If the points are control points for spline surface patches (like NURBS, or Bézier surfaces), you have to know which points together form a patch. Since these are parametric surfaces, once you know the control points, all the points on the surface can be determined. Below is the function for a Bézier surface. The parameters u and v are the the parametric coordinates of the surface. They run from 0 to 1 along two adjecent edges of the patch. The control points are k_ij.
The B functions are weight functions for each control point;
Suppose you want to approximate a Bézier surface by a grid of 10x10 points. To do that you have to evaluate the function p for u and v running from 0 to 1 in 10 steps (generating the steps is easily done with numpy.linspace).
For each (u,v) pair, p returns a 3D point.
If you want to visualise these points, you could use mplot3d from matplotlib.
By "compact manifold" do you mean a lower dimensional function like a trajectory or a surface that is embedded in 3d? You have several alternatives for the surface-problem in R depending on how "parametric" or "non-parametric" you want to be. Regression splines of various sorts could be applied within the framework of estimating mean f(x,y) and if these values were "tightly" spaced you may get a relatively accurate and simple summary estimate. There are several non-parametric methods such as found in packages 'locfit', 'akima' and 'mgcv'. (I'm not really sure how I would go about statistically estimating a 1-d manifold in 3-space.)
Edit: But if I did want to see a 3D distribution and get an idea of whether is was a parametric curve or trajectory, I would reach for package:rgl and just plot it in a rotatable 3D frame.
If you are instead trying to form the convex hull (for which the word interpolate is probably the wrong choice), then I know there are 2-d solutions and suspect that searching would find 3-d solutions as well. Constructing the right search strategy will depend on specifics whose absence the 2 comments so far reflects. I'm speculating that attempting to model lower and higher order statistics like the 1st and 99th percentile as a function of (x,y) could be attempted if you wanted to use a regression effort to create boundaries. There is a quantile regression package, 'rq' by Roger Koenker that is well supported.

Matplotlib density plot with distinct lines

I would like to create a density plot in a 2D parameter space. However, the sample consists of distinct solutions which form lines in the parameter space such that putting everything into a matrix and using imshow is not desirable because of the pixelation artefacts (Figure 1).
I have tried to plot each distinct solution as a line with opacity set to the probability it corresponds to but the blending of the different lines does not seem to be additive (i.e. the location where all lines overlap is not black). See Figure 2.
Opacity is not additive; if you overlap two objects with opacity 0.5 you won't get a region of black (opacity 1.0).
Additionally, the point at which all your lines overlap may be smaller than 1 pixel in size, which will allow the surrounding colors to bleed in due to antialiasing.
I think the pixelated version is the most accurate solution, unless you feel like rendering all the different intersecting regions as shapes and coloring them manually.

Categories