Seaborn: I just want a log scale - python

I'm using seaborn to plot some biology data.
I just want a distribution one gene against another (expression in ~300 patients), and that's all worked fine and dandy with graph = sns.jointplot(x='Gene1',y='Gene2',data=data,kind='reg')
I like that the graph gives me a nice linear fit and a PearsonR and a P value.
All I want is to plot my data on a log scale, which is the way that such gene data is usually represented.
I've looked at a few solutions online, but they all get rid of my PearsonR value or my linear fit or they just don't look as good. I'm new to this, but it seems like graphing on a log scale shouldn't be too much trouble.
Any comments or solutions?
Thanks!
Edit: In response to comments, I've gotten closer to my answer. I now have a plot (shown below), but I need a line of fit and to do some statistics. Working on that now, but any answers/suggestions in the meantime are more than welcome.

mybins=np.logspace(0, np.log(100), 100)
g = sns.JointGrid(data1, data2, data, xlim=[.5, 1000000],
ylim=[.1, 10000000])
g.plot_marginals(sns.distplot, color='blue', bins=mybins)
g = g.plot(sns.regplot, sns.distplot)
g = g.annotate(stats.pearsonr)
ax = g.ax_joint
ax.set_xscale('log')
ax.set_yscale('log')
g.ax_marg_x.set_xscale('log')
g.ax_marg_y.set_yscale('log')
This worked just fine. In the end, I decided to just convert my table values into log(x), since that made the graph easier to scale and visualize in the short run.

Related

Sawtooth look in violin plot [duplicate]

The following code gives me a very nice violinplot (and boxplot within).
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
foo = np.random.rand(100)
sns.violinplot(foo)
plt.boxplot(foo)
plt.show()
So far so good. However, when I look at foo, the variable does not contain any negative values. The seaborn plot seems misleading here. The normal matplotlib boxplot gives something closer to what I would expect.
How can I make violinplots with a better fit (not showing false negative values)?
As the comments note, this is a consequence (I'm not sure I'd call it an "artifact") of the assumptions underlying gaussian KDE. As has been mentioned, this is somewhat unavoidable, and if your data don't meet those assumptions, you might be better off just using a boxplot, which shows only points that exist in the actual data.
However, in your response you ask about whether it could be fit "tighter", which could mean a few things.
One answer might be to change the bandwidth of the smoothing kernel. You do that with the bw argument, which is actually a scale factor; the bandwidth that will be used is bw * data.std():
data = np.random.rand(100)
sns.violinplot(y=data, bw=.1)
Another answer might be to truncate the violin at the extremes of the datapoints. The KDE will still be fit with densities that extend past the bounds of your data, but the tails will not be shown. You do that with the cut parameter, which specifies how many units of bandwidth past the extreme values the density should be drawn. To truncate, set it to 0:
sns.violinplot(y=data, cut=0)
By the way, the API for violinplot is going to change in 0.6, and I'm using the development version here, but both the bw and cut arguments exist in the current released version and behave more or less the same way.

Python Matplotlib nonlinear scaling in contour plot

I'm having some trouble visualizing a certain dataset that I have in a contour plot. The issue is that I have a bunch of datapoints (X,Y,Z) for which the Z values range from about 2 to 0, where a lot of the interesting features are located in the 0 to 0.3 range. Using a normal scaling, they are very difficult to see, as illustrated in this image:
Now, I have thought about what else to do. Of course there is logarithmic scaling, but then I first need to think about some sort of mapping, and I am not 100% sure how one would do that. Inspired by this question one could think of a mapping of the type scaling(x) = Log(x/min)/Log(max/min) which worked reasonably well in that question.
Also interesting was the followup discussed here.
where they used some sort of ArcSinh scaling function. That seemed to enlarge the small features quite well, proportionally to the whole.
So my question is two fold in a way I suppose.
How would one scale the data in my contour plot in such a way that the small amplitude features do not get blown away by the outliers?
Would you do it using either of the methods mentioned above, or using something completely different?
I am rather new to python and I am constantly amazed by all the things that are already out there, so I am sure there might be a built in way that is better than anything I mentioned above.
For completeness I uploaded the datafile here (the upload site is robustfiles.com, which a quick google search told me is a trustworthy website to share things like these)
I plotted the above with
data = np.load("D:\SavedData\ThreeQubitRess44SpecHighResNormalFreqs.npy")
fig, (ax1) = plt.subplots(1,figsize=(16,16))
cs = ax1.contourf(X, Y, data, 210, alpha=1,cmap='jet')
fig.colorbar(cs, ax=ax1, shrink=0.9)
ax1.set_title("Freq vs B")
ax1.set_ylabel('Frequency (GHz)'); ax1.set_xlabel('B (arb.)')
Excellent question.
Don't scale the data. You'll be looking for compromises in ranges with many scaling functions.
Instead, use a custom colormap. That way, you won't have to remap your actual data and can easily customize the visualization of the regions you'd like to highlight. Another example can be found in the scipy cookbook and there's quite a few more on the internet.
Another option is to break the plot into 2 separate regions by breaking the axis like so

Matplotlib: Avoid congestion in X axis

I'm using this code to plot a cumulative frequency plot:
lot = ocum.plot(x='index', y='cdf', yticks=np.arange(0.0, 1.05, 0.1))
plot.set_xlabel("Data usage")`
plot.set_ylabel("CDF")
fig = plot.get_figure()
fig.savefig("overall.png")
How it appears as follows and is very crowded around the initial part. This is due to my data spread. How can I make it more clear? (uploading to postimg because I don't have enough reputation points)
http://postimg.org/image/ii5z4czld/
I hope that I understood what you want: give more space to the visualization of the "CDF" development for smaller "data usage" values, right? Typically, you would achieve this by changing your X axis scale from linear to logarithmic. Head over to Plot logarithmic axes with matplotlib in python for seeing different ways to achieve that. The simplest might be, in your case, to replace plot() with semilogx().

Plotting an histogram in log log scale with identical bar thickness

I'm trying to plot input data in an histogram in log-log scale (to quickly view if this could fit a power law), but I'm having trouble in outputting the way I want. I'm using Python and more specificaly the matplotlib/numpy libraries:
thebins = N.linspace(min_data.min(),min_data.max(),int(sys.argv[len(sys.argv)-1]))
thebins = N.log(thebins)
bar_min = plt.hist(min_data,bins=thebins,alpha=0.40,label=['Minimal Distance'],log=True)
min_data is my 1d data array, the two first lines are for creating the bins and then putting them in a log scale. The final line is for 'filling' the bins/histogram with log y scale.
The graphical output is:
It may seem fussy but I'm not satisifed with having bins of different thickness, it seems to me that the data is harder to read or can even be misread from that. Not all log-log histogram have same width bins and I'm convinced it can be done within Python; do you have an idea of to change my code to get there?
Thank you in advance ;)
Should have been a nobrainer: I only had to take the log of my data for the x axis, and then build the histogram passing the argument "log=True" for the y axis.

matplotlib: plot a histogram from data

I have data (a spectrum) that I want to plot as a histogram.
I import the data and spectrum.shape shows me (1024,) as the format,
however plt.hist does not plot the data correctly.
If I use plt.bar(...) it works just fine, but for aesthetic reasons (I want to use the "stepfilled" histogram design) and I have to employ plt.hist which offers this option.
I really don't know what to do.
Here is my code:
import matplotlib.pyplot as plt
import numpy as np
spectrum = np.loadtxt('3000.mp', skiprows=53)
y1=spectrum[:]
num_bins = 1024
diagram = plt.hist(y1, num_bins, alpha=0.5)
plt.xlabel("TOF / $\mu$s")
plt.ylabel("# ions")
plt.show()
I hope for your help.
I am interested also in this answer. Would you share how did you get the stepfilled design with bars?
Myself I am looking for something like this:
(this image comes from http://astroplotlib.stsci.edu/page_histograms.htm)
But I do not manage to generate it easily with a spectrum as an input.

Categories