Add labels ONLY to SELECTED data points in seaborn scatter plot - python

I have created a seaborn scatter plot and added a trendline to it. I have some datapoints that fall very far away from the trendline (see the ones highlighted in yellow) so I'd like to add data labels only to these points, NOT to all the datapoints in the graph.
Does anyone know what's the best way to do this?
So far I've found answers to "how to add labels to ALL data points" (see this link) but this is not my case.

In the accepted answer to the question that you reference you can see that the way they add labels to all data points is by looping over the data points and calling .text(x, y, string) on the axes. You can find the documentation for this method here (seaborn is implemented on top of matplotlib). You'll have to call this method for the selected points.
In your specific case I don't know exactly what formula you want to use to find your outliers but to literally get the ones beyond the limits of the yellow rectangle that you've drawn you could try the following:
for x,y in zip(xarr, yarr):
if x < 5 and y > 5.5:
ax.text(x+0.01, y, 'outlier', horizontalalignment='left', size='medium', color='black')
Where xarr is your x-values, yarr your y-values and ax the returned axes from your call to seaborn.

Related

Random (false data) lines appearing in contourf plot at certain # of levels

I'm trying to use matplotlib and contourf to generate some filled (polar) contour plots of velocity data. I have some data (MeanVel_Z_Run16_np) I am plotting on theta (Th_Run16) and r (R_Run16), as shown here:
fig,ax = plt.subplots(subplot_kw={'projection':'polar'})
levels = np.linspace(-2.5,4,15)
cplot = ax.contourf(Th_Run16,R_Run16,MeanVel_Z_Run16_np,levels,cmap='plasma')
ax.set_rmax(80)
ax.set_rticks([15,30,45,60])
rlabels = ax.get_ymajorticklabels()
for label in rlabels:
label.set_color('#E6E6FA')
cbar = plt.colorbar(cplot,pad=0.1,ticks=[0,3,6,9,12,15])
cbar.set_label(r'$V_{Z}$ [m/s]')
plt.show()
This generates the following plot:
Velocity plot with 15 levels:
Which looks great (and accurate), outside of that random straight orange line roughly between 90deg and 180deg. I know that this is not real data because I plotted this in MATLAB and it did not appear there. Furthermore, I have realized it appears to relate to the number of contour levels I use. For example, if I bump this code up to 30 levels instead of 15, the result changes significantly, with odd triangular regions of uniform value:
Velocity plot with 30 levels:
Does anyone know what might be going on here? How can I get contourf to just plot my data without these strange misrepresentations? I would like to use 15 contour levels at least. Thank you.

Contour Plot of Binary Data (0 or 1)

I have x values, y values, and z values. The z values are either 0 or 1, essentially indicating whether an (x,y) pair is a threat (1) or not a threat (0).
I have been trying to plot a 2D contour plot using the matplotlib contourf. This seems to have been interpolating between my z values, which I don't want. So, I did a bit of searching and found that I could use pcolormesh to better plot binary data. However, I am still having some issues.
First, the colorbar of my pcolormesh plot doesn't show two distinct colors (white or red). Instead, it shows a full spectrum from white to red. See the attached plot for what I mean. How do I change this so that the colorbar only shows two colors, for 0 and 1? Second, is there a way to draw a grid of squares into the contour plot so that it is more clear for which x and y intervals the 0s and 1s are occurring. Third, my code calls for minorticks. However, these do not show up in the plot. Why?
The code which I use is shown here. The vels and ms for x and y can really be anything, and the threat_bin is just the corresponding 0 or 1 values for all the (vets,ms) pairs:
fig=plt.figure(figsize=(6,5))
ax2=fig.add_subplot(111)
from matplotlib import cm
XX,YY=np.meshgrid(vels, ms)
cp=ax2.pcolormesh(XX/1000.0,YY,threat_bin, cmap=cm.Reds)
ax2.minorticks_on()
ax2.set_ylabel('Initial Meteoroid Mass (kg)')
ax2.set_xlabel('Initial Meteoroid Velocity (km/s)')
ax2.set_yscale('log')
fig.colorbar(cp, ticks=[0,1], label='Threat Binary')
plt.show()
Please be simple with your recommendations, and let me know the code I should include or change with respect to what I have at the moment.

Plot two datasets at same position based on their index

I'm trying to plot two datasets (called Height and Temperature) on different y axes.
Both datasets have the same length.
Both datasets are linked together by a third dataset, RH.
I have tried to use matplotlib to plot the data using twiny() but I am struggling to align both datasets together on the same plot.
Here is the plot I want to align.
The horizontal black line on the figure is defined as the 0°C degree line that was found from Height and was used to test if both datasets, when plotted, would be aligned. They do not. There is a noticable difference between the black line and the 0°C tick from Temperature.
Rather than the two y axes changing independently from each other I would like to plot each index from Height and Temperature at the same y position on the plot.
Here is the code that I used to create the plot:
#Define number of subplots sharing y axis
f, ax1 = plt.subplots()
ax1.minorticks_on()
ax1.grid(which='major',axis='both',c='grey')
#Set axis parameters
ax1.set_ylabel('Height $(km)$')
ax1.set_ylim([np.nanmin(Height), np.nanmax(Height)])
#Plot RH
ax1.plot(RH, Height, label='Original', lw=0.5)
ax1.set_xlabel('RH $(\%)$')
ax2 = ax1.twinx()
ax2.plot(RH, Temperature, label='Original', lw=0.5, c='black')
ax2.set_ylabel('Temperature ($^\circ$C)')
ax2.set_ylim([np.nanmin(Temperature), np.nanmax(Temperature)])
Any help on this would be amazing. Thanks.
Maybe the atmosphere is wrong. :)
It sounds like you are trying to align the two y axes at particular values. Why are you doing this? The relationship of Height vs. Temperature is non-linear, so I think you are setting the stage for a confusing graph. Any particular line you plot can only be interpreted against one vertical axis.
If needed, I think you will be forced to "do some math" on the limits of the y axes. This link may be helpful:
align scales

Aspect ratio in semi-log plot with Matplotlib

When I plot a function in matplotlib, the plot is framed by a rectangle. I want the ratio of the length and height of this rectangle to be given by the golden mean ,i.e., dx/dy=1.618033...
If the x and y scale are linear I found this solution using google
import numpy as np
import matplotlib.pyplot as pl
golden_mean = (np.sqrt(5)-1.0)/2.0
dy=pl.gca().get_ylim()[1]-pl.gca().get_ylim()[0]
dx=pl.gca().get_xlim()[1]-pl.gca().get_xlim()[0]
pl.gca().set_aspect((dx/dy)*golden_mean,adjustable='box')
If it is a log-log plot I came up with this solution
dy=np.abs(np.log10(pl.gca().get_ylim()[1])-np.log10(pl.gca().get_ylim()[0]))
dx=np.abs(np.log10(pl.gca().get_xlim()[1])-np.log10(pl.gca().get_xlim()[0]))
pl.gca().set_aspect((dx/dy)*golden_mean,adjustable='box')
However, for a semi-log plot, when I call set_aspect, I get
UserWarning: aspect is not supported for Axes with xscale=log, yscale=linear
Can anyone think of a work-around for this?
the most simple solution would be to log your data and then use the method for lin-lin.
you can then label the axes to let it look like a normal log-plot.
ticks = np.arange(min_logx, max_logx, 1)
ticklabels = [r"$10^{}$".format(tick) for tick in ticks]
pl.yticks(ticks, ticklabels)
if you have higher values than 10e9 you will need three pairs of braces, two pairs for the LaTeX braces and one for the .format()
ticklabels = [r"$10^{{{}}}$".format(tick) for tick in ticks]
Edit:
if you want also the ticks for 0.1ex ... 0.9ex, you want to use the minor ticks as well:
they need to be located at log10(1), log10(2), log10(3) ..., log10(10), log10(20) ...
you can create and set them with:
minor_ticks = []
for i in range(min_exponent, max_exponent):
for j in range(2,10):
minor_ticks.append(i+np.log10(j))
plt.gca().set_yticks(minor_labels, minor=True)

How can I draw a graph or plot with 4 quadrants using Python matplotlib?

My objective is to draw a graph with 4 quadrants and plot points in the same. And also, how can I divide a quadrant into several sectors? How can I do the same in matplotlib: a graph/plot with 4 quadrants. With x axis (1-9) and y-axis(1-9)?
From the question, it sounds like you want a single graph with several delineated regions with a specific xy range. This is pretty straightforward to do. You can always just draw lines on the plot to delineate the regions of interest. Here is a quick example based on your stated objectives:
import matplotlib.pyplot as plt
plt.figure()
# Set x-axis range
plt.xlim((1,9))
# Set y-axis range
plt.ylim((1,9))
# Draw lines to split quadrants
plt.plot([5,5],[1,9], linewidth=4, color='red' )
plt.plot([1,9],[5,5], linewidth=4, color='red' )
plt.title('Quadrant plot')
# Draw some sub-regions in upper left quadrant
plt.plot([3,3],[5,9], linewidth=2, color='blue')
plt.plot([1,5],[7,7], linewidth=2, color='blue')
plt.show()
I would take a look at the AxesGrid toolkit:
http://matplotlib.sourceforge.net/mpl_toolkits/axes_grid/index.html
Perhaps the middle image at the top of this page is something along the lines of what you are looking for. There are examples on the following page in the API documentation that should be a good starting point:
http://matplotlib.sourceforge.net/mpl_toolkits/axes_grid/users/overview.html
Without an example of what you want to do exactly it is difficult to give you the best advice.
you need subplot see this example:
http://matplotlib.sourceforge.net/examples/pylab_examples/subplot_toolbar.html

Categories