I am currently trying to find out on what basis matplotlib sets its automatic plot limit.
The question arose when I plotted some x_values against some y_values.
For the x_values the following holds: min(x_values) = -801.01 and max(x_values) = 798.80. The limits set by matplotlib are (-1000, 800).
As the data is almost symmetrical around 0, therefore I would like it to be plotted symmetrically around 0. Is there anyway I can tell matplotlib to automatically center the plot? Also matplotlib seems to set the "resolution" on it's limits as 200 in this case which seems a bit high to me.
Of course I could set limits manually, but I want to avoid that if possible.
PS: I don't know if it matters but I plot the values somewhere and later add the Line2D object to the figure.
Related
[edited as per comments 05-mar-22]
I found the problem: the marker point was not scaled to the same units as the x-axis data. I was over thinking it. Thanks for suggesting I try a simple example to recreate. I shouldn't have gotten overwhelmed by the matplotlib man pages. Sorry everyone!
I'm relatively new to Pyplt. I'm having trouble plotting a marker in absolute data coordinates, because at times Pyplt puts the x-axis into relative coordinates about a center value.
In other words, when the x-data min and max are numerically far apart, everything plots as expected. If x-axis min and max are very close together, Pyplot switches to a relative axis mode.
For example, 1.59e3 to 1.61e3 plots as expected. But when the data is from 1.599e3 to 1.601e3 (example), the axis switches from absolute numbers to a relative axis with 0 in the middle, and a label underneath saying 1.60000000e3.
This is fine, and looks great too. But when I try to add a marker in absolute data coordinates, i.e. centred on a data point, the axis goes haywire and goes from 0 to 1.6e9, with the data plotted itself at around 0. The marker ends up in the right place, though.
I see documentation for matplotlib transforms, but it seems to apply only to matlab? Or I'm confused (the most likely issue). Maybe my search-fu is out of whack. How can I deal with the switch to relative axes?
[Example relative axes][1]
The following code gives me a very nice violinplot (and boxplot within).
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
foo = np.random.rand(100)
sns.violinplot(foo)
plt.boxplot(foo)
plt.show()
So far so good. However, when I look at foo, the variable does not contain any negative values. The seaborn plot seems misleading here. The normal matplotlib boxplot gives something closer to what I would expect.
How can I make violinplots with a better fit (not showing false negative values)?
As the comments note, this is a consequence (I'm not sure I'd call it an "artifact") of the assumptions underlying gaussian KDE. As has been mentioned, this is somewhat unavoidable, and if your data don't meet those assumptions, you might be better off just using a boxplot, which shows only points that exist in the actual data.
However, in your response you ask about whether it could be fit "tighter", which could mean a few things.
One answer might be to change the bandwidth of the smoothing kernel. You do that with the bw argument, which is actually a scale factor; the bandwidth that will be used is bw * data.std():
data = np.random.rand(100)
sns.violinplot(y=data, bw=.1)
Another answer might be to truncate the violin at the extremes of the datapoints. The KDE will still be fit with densities that extend past the bounds of your data, but the tails will not be shown. You do that with the cut parameter, which specifies how many units of bandwidth past the extreme values the density should be drawn. To truncate, set it to 0:
sns.violinplot(y=data, cut=0)
By the way, the API for violinplot is going to change in 0.6, and I'm using the development version here, but both the bw and cut arguments exist in the current released version and behave more or less the same way.
Is there a way to let matplotlib know to recompute the optimal bounds of a plot?
My problem is that, I am manually computing a bunch of boxplots, putting them at various locations in a plot. By the end, some boxplots extend beyond the plot frame. I could hard-code some xlim and ylim's for now, but I want a more general solution.
What I was thinking was a feature where you say "ok plt I am done plotting, now please adjust the bounds so that all my data is nicely within the bounds".
Is this possible?
EDIT:
The answer is yes.
Follow-up question: Can this be done for the ticks as well?
You want to use matplotlib's automatic axis scaling. You can do this with either axes.axis with the "auto" input or axes.set_autoscale_on
ax.axis('auto')
ax.set_autoscale_on()
If you want to auto-scale only the x or y axis, you can use set_autoscaley_on or set_autoscalex_on.
In a standard 3D python plot, each data point is, by default, represented as a sphere in 3D. For the data I'm plotting, the z-axis is very sensitive, while the x and y axes are very general, so is there a way to make each point on the scatter plot spread out over the x and y direction as it normally would with, for example, s=500, but not spread at all along the z-axis? Ideally this would look like a set of stacked discs, rather than overlapping spheres.
Any ideas? I'm relatively new to python and I don't know if there's a way to make custom data points like this with a scatter plot.
I actually was able to do this using the matplotlib.patches library, creating a patch for every data point, and then making it whatever shape I wanted with the help of mpl_toolkits.mplot3d.art3d.
You might look for something called "jittering". Take a look at
Matplotlib: avoiding overlapping datapoints in a "scatter/dot/beeswarm" plot
It works by adding random noise to your data.
Another way might be to reduce the variance of the data on your z-axis (e.g. applying a log-function) or adjusting the scale. You could do that with ax.set_zscale("log"). It is documented here http://matplotlib.org/mpl_toolkits/mplot3d/api.html#mpl_toolkits.mplot3d.axes3d.Axes3D.set_zscale
I'm using this code to plot a cumulative frequency plot:
lot = ocum.plot(x='index', y='cdf', yticks=np.arange(0.0, 1.05, 0.1))
plot.set_xlabel("Data usage")`
plot.set_ylabel("CDF")
fig = plot.get_figure()
fig.savefig("overall.png")
How it appears as follows and is very crowded around the initial part. This is due to my data spread. How can I make it more clear? (uploading to postimg because I don't have enough reputation points)
http://postimg.org/image/ii5z4czld/
I hope that I understood what you want: give more space to the visualization of the "CDF" development for smaller "data usage" values, right? Typically, you would achieve this by changing your X axis scale from linear to logarithmic. Head over to Plot logarithmic axes with matplotlib in python for seeing different ways to achieve that. The simplest might be, in your case, to replace plot() with semilogx().