Matplotlib: Avoid congestion in X axis - python

I'm using this code to plot a cumulative frequency plot:
lot = ocum.plot(x='index', y='cdf', yticks=np.arange(0.0, 1.05, 0.1))
plot.set_xlabel("Data usage")`
plot.set_ylabel("CDF")
fig = plot.get_figure()
fig.savefig("overall.png")
How it appears as follows and is very crowded around the initial part. This is due to my data spread. How can I make it more clear? (uploading to postimg because I don't have enough reputation points)
http://postimg.org/image/ii5z4czld/

I hope that I understood what you want: give more space to the visualization of the "CDF" development for smaller "data usage" values, right? Typically, you would achieve this by changing your X axis scale from linear to logarithmic. Head over to Plot logarithmic axes with matplotlib in python for seeing different ways to achieve that. The simplest might be, in your case, to replace plot() with semilogx().

Related

Python/Seaborn: What does the inside horizontal distribution of the data-points means or is it random?

It seems like that inside-distribution of the histogram data points is almost random every time you plot (using Seaborn) - is it for the ease of readability or other meaningful purpose?
I am using Python 3.0 and Seaborn provided dataset called 'tips' for this question.
import seaborn as sns
tips = sns.load_dataset("tips")
After I ran my same code below twice I see differences of inside points distribution. Here is the code you can run a couple of times:
ax = sns.stripplot(x="day", y="total_bill", data=tips, alpha=.55,
palette='Set1', jitter=True, linewidth=1 )
Now, if you look into the plots (if you ran it twice for example) you will notice that the distribution of the points is not the same between 2 plots:
Please explain why points are not distributed identically with 2 separate runs? Also, judging those points on the horizontal scale; is there a reason why (for example) one red point is further left than other red point OR is it simply for readability?
Thank you in advance!
After a bit more research, I believe that the distribution of data points is random but uniform (thank you #ImportanceOfBeingErnest for pointing to the code). Therefore, answering my own questions there is no hidden meaning in terms of distribution and horizontal range is simply set for visibility that also changes or stays the same based on set/notset seed.
I do think that both displays are identical along the vertical axis (I.e. : both distributions are equal since they represent the same scatter plot of a given dataset). The slight visual differences comes along the position onto the horizontal (categorical days) axis; this one comes from the 'jitter' option (=True) that induces slight random relatively to the vertical axis they are related to (day). The jitter option helps to distinguish scatter plots with the same total_bill value (that should be superimposed if equal) : thus the difference comes from the jitter option set to True, that is used for readability.

Re-adjusting (automatically) limits on plot in matplotlib

Is there a way to let matplotlib know to recompute the optimal bounds of a plot?
My problem is that, I am manually computing a bunch of boxplots, putting them at various locations in a plot. By the end, some boxplots extend beyond the plot frame. I could hard-code some xlim and ylim's for now, but I want a more general solution.
What I was thinking was a feature where you say "ok plt I am done plotting, now please adjust the bounds so that all my data is nicely within the bounds".
Is this possible?
EDIT:
The answer is yes.
Follow-up question: Can this be done for the ticks as well?
You want to use matplotlib's automatic axis scaling. You can do this with either axes.axis with the "auto" input or axes.set_autoscale_on
ax.axis('auto')
ax.set_autoscale_on()
If you want to auto-scale only the x or y axis, you can use set_autoscaley_on or set_autoscalex_on.

Python: how to plot points with little overlapping

I am using python to plot points. The plot shows relationship between area and the # of points of interest (POIs) in this area. I have 3000 area values and 3000 # of POI values.
Now the plot looks like this:
The problem is that, at lower left side, points are severely overlapping each other so it is hard to get enough information. Most areas are not that big and they don't have many POIs.
I want to make a plot with little overlapping. I am wondering whether I can use unevenly distributed axis or use histogram to make a beautiful plot. Can anyone help me?
I would suggest using a logarithmic scale for the y axis. You can either use pyplot.semilogy(...) or pyplot.yscale('log') (http://matplotlib.org/api/pyplot_api.html).
Note that points where area <= 0 will not be rendered.
I think we have two major choices here. First adjusting this plot, and second choosing to display your data in another type of plot.
In the first option, I would suggest clipping the boundries. You have plenty of space around the borders. If you limit the plot to the boundries, your data would scale better. On top of it, you may choose to plot the points with smaller dots, so that they would seem less overlapping.
Second option would be to choose displaying data in a different view, such as histograms. This might give a better insight in terms of distribution of your data among different bins. But this would be completely different type of view, in regards to the former plot.
I would suggest trying to adjust the plot by limiting the boundries of the plot to the data points, so that the plot area would have enough space to scale the data and try using histograms later. But as I mentioned, these are two different things and would give different insights about your data.
For adjusting you might try this:
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,y1,y2))
You would probably need to make minor adjustments to the axis variables. Note that there should definetly be better options instead of this, but this was the first thing that came to my mind.

Making Dendrogram Bins Thicker in matplotlib?

I have a linkage matrix of about size 10,000 that I've plotted using scipy.cluster.hierarchical. The default rendering is poor -- as expected, given the size of the input -- because the bins are way too narrow to discern any meaningful structure in the dendrogram. How can I force the bins to be further apart so I can see the data better? I realize this will require the image to be huge, but that's OK.
I'm aware of dendrogram's truncate functionality. I will likely end up using it, but I'd like to get a look at the full data in a presentation I can grok visually before I start truncating.
Here's the rendering as it appears now. Increasing the image size using figsize does not appear to help, nor does xtick.major.pad.
fig = pylab.figure(figsize=(10, 10))
Z = sch.dendrogram(Y, leaf_rotation=90)
fig.show()
fig.savefig('dendrogram.jpg')
Thank you for your help in advance!

contourf result differs when switching axes

I am plotting a contourmap. When first plotting I noticed I had my axes wrong. So I switched the axes and noticed that the structure of both plots is different. On the first plot the axes and assignments are correct, but the structure is messy. On the second plot it is the other way around.
Since it's a square matrix I don't see why there should be a sampling issue.
Transposing the matrix with z-values or the meshgrid of x and y does not help either. Whatever way I plot x and y correctly it keeps looking messy.
Does anybody here know any more ideas which I can try or what might solve it?
The problem was the sampling. Although the arrays have the same size, the stepsize in the plot is not equal for x and y axis.

Categories