I'm trying to plot input data in an histogram in log-log scale (to quickly view if this could fit a power law), but I'm having trouble in outputting the way I want. I'm using Python and more specificaly the matplotlib/numpy libraries:
thebins = N.linspace(min_data.min(),min_data.max(),int(sys.argv[len(sys.argv)-1]))
thebins = N.log(thebins)
bar_min = plt.hist(min_data,bins=thebins,alpha=0.40,label=['Minimal Distance'],log=True)
min_data is my 1d data array, the two first lines are for creating the bins and then putting them in a log scale. The final line is for 'filling' the bins/histogram with log y scale.
The graphical output is:
It may seem fussy but I'm not satisifed with having bins of different thickness, it seems to me that the data is harder to read or can even be misread from that. Not all log-log histogram have same width bins and I'm convinced it can be done within Python; do you have an idea of to change my code to get there?
Thank you in advance ;)
Should have been a nobrainer: I only had to take the log of my data for the x axis, and then build the histogram passing the argument "log=True" for the y axis.
Related
I am doing three plots:
Line plot, Boxplot and Histogram.
All these plots don't really need hoverinfo. Also these plots don't really need to plot each point of the data.
Ass you can see the plots are very simple, however, when dealing with huge data (30 million of observations) the html results weights 5 MBs, which is a lot because there are 100 plots more like this.
At the moment I have made some optimizations...
When saving to html I put these parameters:
fig.to_html( include_plotlyjs="cdn", full_html=False)
Which reduces plot size a lot, however is not enough.
I have also tried in the Line plot specifying this parameter line = {"simplify":True} and hoverinfo = "skip". However, file size is almost the same.
Any help/ workaround is appreciated
I'm plotting a Matrix with contourf, the Matrix is 883x883, the problem is that when plotting it the axis in the plots go from 0 to 883, but I would like to give it another values, more exactly, I'd like it to go from -20 to 20. How can I set that? I am very new in python, so I'd appreciate your help.
When you use contourf, you can provide the location of your data points using the optional X and Y arguments. This will only work as expected if your data is structured, meaning if you can generate a grid made of rectangles for which the nodes would represent the location of your data points. If this is not the case, then I would suggest using a triangulation and provide it to tricontourf.
In a standard 3D python plot, each data point is, by default, represented as a sphere in 3D. For the data I'm plotting, the z-axis is very sensitive, while the x and y axes are very general, so is there a way to make each point on the scatter plot spread out over the x and y direction as it normally would with, for example, s=500, but not spread at all along the z-axis? Ideally this would look like a set of stacked discs, rather than overlapping spheres.
Any ideas? I'm relatively new to python and I don't know if there's a way to make custom data points like this with a scatter plot.
I actually was able to do this using the matplotlib.patches library, creating a patch for every data point, and then making it whatever shape I wanted with the help of mpl_toolkits.mplot3d.art3d.
You might look for something called "jittering". Take a look at
Matplotlib: avoiding overlapping datapoints in a "scatter/dot/beeswarm" plot
It works by adding random noise to your data.
Another way might be to reduce the variance of the data on your z-axis (e.g. applying a log-function) or adjusting the scale. You could do that with ax.set_zscale("log"). It is documented here http://matplotlib.org/mpl_toolkits/mplot3d/api.html#mpl_toolkits.mplot3d.axes3d.Axes3D.set_zscale
I'm using this code to plot a cumulative frequency plot:
lot = ocum.plot(x='index', y='cdf', yticks=np.arange(0.0, 1.05, 0.1))
plot.set_xlabel("Data usage")`
plot.set_ylabel("CDF")
fig = plot.get_figure()
fig.savefig("overall.png")
How it appears as follows and is very crowded around the initial part. This is due to my data spread. How can I make it more clear? (uploading to postimg because I don't have enough reputation points)
http://postimg.org/image/ii5z4czld/
I hope that I understood what you want: give more space to the visualization of the "CDF" development for smaller "data usage" values, right? Typically, you would achieve this by changing your X axis scale from linear to logarithmic. Head over to Plot logarithmic axes with matplotlib in python for seeing different ways to achieve that. The simplest might be, in your case, to replace plot() with semilogx().
I have a linkage matrix of about size 10,000 that I've plotted using scipy.cluster.hierarchical. The default rendering is poor -- as expected, given the size of the input -- because the bins are way too narrow to discern any meaningful structure in the dendrogram. How can I force the bins to be further apart so I can see the data better? I realize this will require the image to be huge, but that's OK.
I'm aware of dendrogram's truncate functionality. I will likely end up using it, but I'd like to get a look at the full data in a presentation I can grok visually before I start truncating.
Here's the rendering as it appears now. Increasing the image size using figsize does not appear to help, nor does xtick.major.pad.
fig = pylab.figure(figsize=(10, 10))
Z = sch.dendrogram(Y, leaf_rotation=90)
fig.show()
fig.savefig('dendrogram.jpg')
Thank you for your help in advance!