Consider a Manhattan plot for a genome-wide association study. The density of dots at the bottom of the plot is very high -- individual points are no longer visible. I'd like to skip plotting the points that completely overlap with other points (even though their x,y is not identical) to reduce the plotting time and the size of the exported PDF. Any recipes for achieving this? Collision detection? Subsampling?
I'd like to use matplotlib, though this requirement is optional. Ideally, the output should be visually identical to the "full" plot.
Some background info on the plot type:
https://en.wikipedia.org/wiki/Manhattan_plot
Related
I need to draw additional graphics on top of plotly go.Box traces, therefore I need to know X and Y coordinates for boxplot rectangle vertices. So far the only solution I came up with is basically recalculating everything (quartiles; X positions based on boxgap, boxgroupgap, etc.), then manually setting the y-axis range to know where everything will end up on the plot. This seems very cumbersome.
Is there a way in python to get the coordinates of go.Box boxplot elements, especially the grouped boxplots with categorical x-axis? As far as I understand these coordinates are calculated in JS frontend -- maybe there is some trick to get them back with Dash using callbacks?
I'm using Python, and I have some data which can be projected on to points within an equilateral triangle whose side lengths sum to 1.
I'm not sure if there is an easy way to visualise a plot like this from Matplotlib or similar libraries, or if I'm just going to have to use a drawing package from scratch to make it happen. Any pointers gratefully recieved. Thanks!
If all you want to do is plot a few dots on a graph, you can infact use Matfplotlib's scatter plots for this:
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html
Using plt.xlim(*min*, *max*) and plt.ylim(*min*, *max*), you can set the limits of the graph manually to fit your values (might not be neccessary though).
You can draw lines on the graph for the triangle shape (if you need that): How to draw a line with matplotlib?
If you don't need a scale, you can even remove that to have a blank canvas: Matplotlib plots: removing axis, legends and white spaces
I am plotting a distribution of variables that are outputs from two different versions of a program. They look very similar (this is great because they should!) and I am showing their ratio in the same figure but on a different axis. My goal is to show the ratio as a scatter plot but with a horizontal line at y=1.0 to show 100% agreement. The issue I am having is even if I plot the line first and then the scatter, my scatter points still show underneath the line plot. (Please see the image linked below.) You can see the scatter in black underneath the line plot in red, even though I call the plot function first. Any recommendations? Thank you!
Distribution of two variables with ratio plot underneath
I want to plot boxplots on top of the scattered points like this.
I know I have to bin the data into intervals first but I couldn't find the function that does all of this. Sample x and y data are saved here as .npy.
I would look into using matplotlib. Boxes can be drawn as such:
https://matplotlib.org/gallery/pyplots/boxplot_demo_pyplot.html?highlight=boxplot
and scatter plots can also be drawn as such: https://matplotlib.org/gallery/lines_bars_and_markers/scatter_demo2.html?highlight=scatter
There is a search functionality on their site, along with plenty of documentation on how to utilize their library.
As for your specific question, you can specify zorder when drawing many of the things in matplotlib, and you could use that to define your boxplots to be on top. I believe if no zorder is defined that it draws items in the order they are encountered in your program (so you could draw scatter plots and then box plots and they should appear correctly as in your diagram above!
I would like to create a density plot in a 2D parameter space. However, the sample consists of distinct solutions which form lines in the parameter space such that putting everything into a matrix and using imshow is not desirable because of the pixelation artefacts (Figure 1).
I have tried to plot each distinct solution as a line with opacity set to the probability it corresponds to but the blending of the different lines does not seem to be additive (i.e. the location where all lines overlap is not black). See Figure 2.
Opacity is not additive; if you overlap two objects with opacity 0.5 you won't get a region of black (opacity 1.0).
Additionally, the point at which all your lines overlap may be smaller than 1 pixel in size, which will allow the surrounding colors to bleed in due to antialiasing.
I think the pixelated version is the most accurate solution, unless you feel like rendering all the different intersecting regions as shapes and coloring them manually.