Zoom in points in scattered plot - python

I have two sets of data points
s_pos = [8.8333, 12.8033 , 27.4410 , 30.4982 ,42.8710 ,46.0770,.......]
mux = [ 0.604598, 0.840701, 1.556915, 1.731411, 2.575856 ,3.158237,........]
I made a scattering plot as following:
a =s_pos
b =mux
plt.scatter(a, b, s=1, c='r')
plt.show
a and b has 620 data points, due to the large number of data i cant see clear individual points,
is there any way that i can zoom in specific part of the plot so i can check the individual points without removing any data ?

If you just want to center visualization, you should set it centered into datapoints range. Please see plt.xlim and plt.ylim
Otherwise, interactive matplotlib has zoom tool in toolbar:

Related

Random (false data) lines appearing in contourf plot at certain # of levels

I'm trying to use matplotlib and contourf to generate some filled (polar) contour plots of velocity data. I have some data (MeanVel_Z_Run16_np) I am plotting on theta (Th_Run16) and r (R_Run16), as shown here:
fig,ax = plt.subplots(subplot_kw={'projection':'polar'})
levels = np.linspace(-2.5,4,15)
cplot = ax.contourf(Th_Run16,R_Run16,MeanVel_Z_Run16_np,levels,cmap='plasma')
ax.set_rmax(80)
ax.set_rticks([15,30,45,60])
rlabels = ax.get_ymajorticklabels()
for label in rlabels:
label.set_color('#E6E6FA')
cbar = plt.colorbar(cplot,pad=0.1,ticks=[0,3,6,9,12,15])
cbar.set_label(r'$V_{Z}$ [m/s]')
plt.show()
This generates the following plot:
Velocity plot with 15 levels:
Which looks great (and accurate), outside of that random straight orange line roughly between 90deg and 180deg. I know that this is not real data because I plotted this in MATLAB and it did not appear there. Furthermore, I have realized it appears to relate to the number of contour levels I use. For example, if I bump this code up to 30 levels instead of 15, the result changes significantly, with odd triangular regions of uniform value:
Velocity plot with 30 levels:
Does anyone know what might be going on here? How can I get contourf to just plot my data without these strange misrepresentations? I would like to use 15 contour levels at least. Thank you.

Contour Plot of Binary Data (0 or 1)

I have x values, y values, and z values. The z values are either 0 or 1, essentially indicating whether an (x,y) pair is a threat (1) or not a threat (0).
I have been trying to plot a 2D contour plot using the matplotlib contourf. This seems to have been interpolating between my z values, which I don't want. So, I did a bit of searching and found that I could use pcolormesh to better plot binary data. However, I am still having some issues.
First, the colorbar of my pcolormesh plot doesn't show two distinct colors (white or red). Instead, it shows a full spectrum from white to red. See the attached plot for what I mean. How do I change this so that the colorbar only shows two colors, for 0 and 1? Second, is there a way to draw a grid of squares into the contour plot so that it is more clear for which x and y intervals the 0s and 1s are occurring. Third, my code calls for minorticks. However, these do not show up in the plot. Why?
The code which I use is shown here. The vels and ms for x and y can really be anything, and the threat_bin is just the corresponding 0 or 1 values for all the (vets,ms) pairs:
fig=plt.figure(figsize=(6,5))
ax2=fig.add_subplot(111)
from matplotlib import cm
XX,YY=np.meshgrid(vels, ms)
cp=ax2.pcolormesh(XX/1000.0,YY,threat_bin, cmap=cm.Reds)
ax2.minorticks_on()
ax2.set_ylabel('Initial Meteoroid Mass (kg)')
ax2.set_xlabel('Initial Meteoroid Velocity (km/s)')
ax2.set_yscale('log')
fig.colorbar(cp, ticks=[0,1], label='Threat Binary')
plt.show()
Please be simple with your recommendations, and let me know the code I should include or change with respect to what I have at the moment.

Creating a pseudo color plot with a linear and nonlinear axis and computing values based on the center of grid values

I have the equation: z(x,y)=1+x^(2/3)y^(-3/4)
I would like to calculate values of z for x=[0,100] and y=[10^1,10^4]. I will do this for 100 points in each axis direction. My grid, then, will be 100x100 points. In the x-direction I want the points spaced linearly. In the y-direction I want the points space logarithmically.
Were I to need these values I could easily go through the following:
x=np.linspace(0,100,100)
y=np.logspace(1,4,100)
z=np.zeros( (len(x), len(y)) )
for i in range(len(x)):
for j in range(len(y)):
z[i,j]=1+x[i]**(2/3)*y[j]**(-3/4)
The problem for me comes with visualizing these results. I know that I would need to create a grid of points. I feel my options are to create a meshgrid with the values and then use pcolor.
My issue here is that the values at the center of the block do not coincide with the calculated values. In the x-direction I could fix this by shifting the x-vector by half of dx (the step between successive values). I'm not so sure how I would do this for the y-axis. Furthermore, If I wanted to compute values for each of the y-direction values, including the end points, they would not all show up.
In the final visualization I would like to have the y-axis as a log scale and the x axis as a linear scale. I would also like the tick marks to fall in the center of the cells, correlating with the correct value. Can someone point me to the correct plotting functions for this. I have to resolve the issue using pcolor or pcolormesh.
Should you require more details, please let me know.
In current matplotlib, you can use pcolormesh with shading='nearest', and it will center the blocks with the values:
import matplotlib.pyplot as plt
y_plot = np.log10(y)
z[5, 5] = 0 # to make it more evident
plt.pcolormesh(x, y_plot, z, shading="nearest")
plt.colorbar()
ax = plt.gca()
ax.set_xticks(x)
ax.set_yticks(y_plot)
plt.axvline(x[5])
plt.axhline(y_plot[5])
Output:

Add labels ONLY to SELECTED data points in seaborn scatter plot

I have created a seaborn scatter plot and added a trendline to it. I have some datapoints that fall very far away from the trendline (see the ones highlighted in yellow) so I'd like to add data labels only to these points, NOT to all the datapoints in the graph.
Does anyone know what's the best way to do this?
So far I've found answers to "how to add labels to ALL data points" (see this link) but this is not my case.
In the accepted answer to the question that you reference you can see that the way they add labels to all data points is by looping over the data points and calling .text(x, y, string) on the axes. You can find the documentation for this method here (seaborn is implemented on top of matplotlib). You'll have to call this method for the selected points.
In your specific case I don't know exactly what formula you want to use to find your outliers but to literally get the ones beyond the limits of the yellow rectangle that you've drawn you could try the following:
for x,y in zip(xarr, yarr):
if x < 5 and y > 5.5:
ax.text(x+0.01, y, 'outlier', horizontalalignment='left', size='medium', color='black')
Where xarr is your x-values, yarr your y-values and ax the returned axes from your call to seaborn.

Matplotlib markers which plot and render fast

I'm using matplotlib to plot 5 sets of approx. 400,000 data points each. Although each set of points is plotted in a different color, I need different markers for people reading the graph on black and white print-outs. The issue I'm facing is that almost all of the possible markers available in the documentation at http://matplotlib.org/api/markers_api.html take too much time to plot and render while displaying. I could only find two markers which plot and render quickly, these are '-' and '--'. Here's my code:
plt.plot(series1,'--',label='Label 1',lw=5)
plt.plot(series2,'-',label='Label 2',lw=5)
plt.plot(series3,'^',label='Label 3',lw=5)
plt.plot(series4,'*',label='Label 4',lw=5)
plt.plot(series5,'_',label='Label 5',lw=5)
I tried multiple markers. Series 1 and series 2 plot quickly and render in no time. But series 3, 4, and 5 take forever to plot and AGES to display.
I'm not able to figure out the reason behind this. Does someone know of more markers that plot and render quickly?
The first two ('--' and '-') are linestyles not markers. Thats why they are rendered faster.
It doesn't make sense to plot ~400,000 markers. You wont be able to see all of them... However, what you could do is to only plot a subset of the points.
So add the line with all your data (even though you could probably also subsample that too) and then add a second "line" with only the markers.
for that you need an "x" vectors, which you can subsample too:
# define the number of markers you want
nrmarkers = 100
# define a x-vector
x = np.arange(len(series3))
# calculate the subsampling step size
subsample = int(len(series3) / nrmarkers)
# plot the line
plt.plot(x, series3, color='g', label='Label 3', lw=5)
# plot the markers (using every `subsample`-th data point)
plt.plot(x[::subsample], series3[::subsample], color='g',
lw=5, linestyle='', marker='*')
# similar procedure for series4 and series5
Note: The code is written from scratch and not tested

Categories