I would like to create a density plot in a 2D parameter space. However, the sample consists of distinct solutions which form lines in the parameter space such that putting everything into a matrix and using imshow is not desirable because of the pixelation artefacts (Figure 1).
I have tried to plot each distinct solution as a line with opacity set to the probability it corresponds to but the blending of the different lines does not seem to be additive (i.e. the location where all lines overlap is not black). See Figure 2.
Opacity is not additive; if you overlap two objects with opacity 0.5 you won't get a region of black (opacity 1.0).
Additionally, the point at which all your lines overlap may be smaller than 1 pixel in size, which will allow the surrounding colors to bleed in due to antialiasing.
I think the pixelated version is the most accurate solution, unless you feel like rendering all the different intersecting regions as shapes and coloring them manually.
Related
I'm using Python, and I have some data which can be projected on to points within an equilateral triangle whose side lengths sum to 1.
I'm not sure if there is an easy way to visualise a plot like this from Matplotlib or similar libraries, or if I'm just going to have to use a drawing package from scratch to make it happen. Any pointers gratefully recieved. Thanks!
If all you want to do is plot a few dots on a graph, you can infact use Matfplotlib's scatter plots for this:
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html
Using plt.xlim(*min*, *max*) and plt.ylim(*min*, *max*), you can set the limits of the graph manually to fit your values (might not be neccessary though).
You can draw lines on the graph for the triangle shape (if you need that): How to draw a line with matplotlib?
If you don't need a scale, you can even remove that to have a blank canvas: Matplotlib plots: removing axis, legends and white spaces
I am plotting volumetric distributions with x, y and z axis (particle size, particle shape, volume (%) (z-axis as colour scale). Upon exporting the figure as svg, white space appears between the plotted rectangles. I therefore tried setting the edgecolor of the patchcollection. This works, however, the edgecolors have different clim and therefore the color of the edges differs from the patches themselves (see example figure). Code for setting the patchcollection:
p = PatchCollection(rects,cmap=matplotlib.cm.viridis)
colors = curem # colour of rectangles based on volume of the size-shape bins of curem
p.set_array(np.array(colors))
p.set_edgecolor(matplotlib.cm.viridis(np.array(colors)))
p.set_clim([0, math.ceil(np.max(ems[ii]))]) # set maximum to max Rsqclass rounded up to next 1/5th
axs[zz].add_collection(p) # add all the rectangles to the figure
So how do I go about setting the color limits for the edges? Thanks in advance, Hans
After trying out several of Diziet Asahi's proposed solutions, I found that for this specific case, the following works:
p.set_edgecolor('face') # edgecolor equal to facecolor
p.set_linewidth(0.5)
The edgecolor is now equal to the facecolor. A small linewidth (e.g. the suggested 0.000000000001) does not work. White lines still appear when the svg is viewed in microsoft internet explorer or inkscape. A linewidth of 0.5 gets rid of the wite space. However, it should be noted that the required linewidth depends on the level of zoom. E.g., at 400% zoom a linewidth of 0.1 is sufficient. At 30% zoom a linewidth of 0.5 is still insufficient.
One further note: the above solution is specific to svg vector graphics. When stored as .pdf, the solution that Diziet linked to actually does work (set linewidth to an infinitesimally small value).
There seem to be a general problem with showing undesired linestrokes with vector backends, as per this issue on matplotlib's github
A proposed temporary fix is to use very small, but non-zero linewidths:
either pass linewidths=0.000000000001 to the constructor
p = PatchCollection(rects,cmap=matplotlib.cm.viridis, linewidths=0.000000000001)
or use
p.set_linewidth(0.000000000001)
Consider a Manhattan plot for a genome-wide association study. The density of dots at the bottom of the plot is very high -- individual points are no longer visible. I'd like to skip plotting the points that completely overlap with other points (even though their x,y is not identical) to reduce the plotting time and the size of the exported PDF. Any recipes for achieving this? Collision detection? Subsampling?
I'd like to use matplotlib, though this requirement is optional. Ideally, the output should be visually identical to the "full" plot.
Some background info on the plot type:
https://en.wikipedia.org/wiki/Manhattan_plot
I am currently working on a project where I have to bin up to 10-dimensional data. This works totally fine with numpy.histogramdd, however with one have a serious obstacle:
My parameter space is pretty large, but only a fraction is actually inhabited by data (say, maybe a few % or so...). In these regions, the data is quite rich, so I would like to use relatively small bin widths. The problem here, however, is that the RAM usage totally explodes. I see usage of 20GB+ for only 5 dimensions which is already absolutely not practical. I tried defining the grid myself, but the problem persists...
My idea would be to manually specify the bin edges, where I just use very large bin widths for empty regions in the data space. Only in regions where I actually have data, I would need to go to a finer scale.
I was wondering if anyone here knows of such an implementation already which works in arbitrary numbers of dimensions.
thanks 😊
I think you should first remap your data, then create the histogram, and then interpret the histogram knowing the values have been transformed. One possibility would be to tweak the histogram tick labels so that they display mapped values.
One possible way of doing it, for example, would be:
Sort one dimension of data as an unidimensional array;
Integrate this array, so you have a cumulative distribution;
Find the steepest part of this distribution, and choose a horizontal interval corresponding to a "good" bin size for the peak of your histogram - that is, a size that gives you good resolution;
Find the size of this same interval along the vertical axis. That will give you a bin size to apply along the vertical axis;
Create the bins using the vertical span of that bin - that is, "draw" horizontal, equidistant lines to create your bins, instead of the most common way of drawing vertical ones;
That way, you'll have lots of bins where data is more dense, and lesser bins where data is more sparse.
Two things to consider:
The mapping function is the cumulative distribution of the sorted values along that dimension. This can be quite arbitrary. If the distribution resembles some well known algebraic function, you could define it mathematically and use it to perform a two-way transform between actual value data and "adaptive" histogram data;
This applies to only one dimension. Care must be taken as how this would work if the histograms from multiple dimensions are to be combined.
I am using python to plot points. The plot shows relationship between area and the # of points of interest (POIs) in this area. I have 3000 area values and 3000 # of POI values.
Now the plot looks like this:
The problem is that, at lower left side, points are severely overlapping each other so it is hard to get enough information. Most areas are not that big and they don't have many POIs.
I want to make a plot with little overlapping. I am wondering whether I can use unevenly distributed axis or use histogram to make a beautiful plot. Can anyone help me?
I would suggest using a logarithmic scale for the y axis. You can either use pyplot.semilogy(...) or pyplot.yscale('log') (http://matplotlib.org/api/pyplot_api.html).
Note that points where area <= 0 will not be rendered.
I think we have two major choices here. First adjusting this plot, and second choosing to display your data in another type of plot.
In the first option, I would suggest clipping the boundries. You have plenty of space around the borders. If you limit the plot to the boundries, your data would scale better. On top of it, you may choose to plot the points with smaller dots, so that they would seem less overlapping.
Second option would be to choose displaying data in a different view, such as histograms. This might give a better insight in terms of distribution of your data among different bins. But this would be completely different type of view, in regards to the former plot.
I would suggest trying to adjust the plot by limiting the boundries of the plot to the data points, so that the plot area would have enough space to scale the data and try using histograms later. But as I mentioned, these are two different things and would give different insights about your data.
For adjusting you might try this:
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,y1,y2))
You would probably need to make minor adjustments to the axis variables. Note that there should definetly be better options instead of this, but this was the first thing that came to my mind.