how to rotate a seaborn lineplot

how to rotate a seaborn lineplot - python

How can I rotate a seaborn.lineplot so that the result will be as a function of y and not a function of x.
For example, this code:
import pandas as pd
import seaborn as sns
df = pd.DataFrame([[0,1],[0,2],[0,1.5],[1,1],[1,5]], columns=['group','val'])
sns.lineplot(x='group',y='val',data=df)
Create this figure:
But is there a way to rotate the figure in 90° ? so that in the X we will have "val" and in Y we will have "group" and the std will go from left to right and not from bottom to up.
Thanks
EDIT: I've opened a ticket in seaborn to ask for this feature: https://github.com/mwaskom/seaborn/issues/1661

Per the seaborn docs on lineplot, the dataframe passed to data must be
Tidy (“long-form”) dataframe where each column is a variable and each row is an observation.
Which seems to imply there is no way to force the axes to switch, even by manipulating the data. If there is a way to do that I haven't found it - I'm sure there is a more elegant way to do this, but one way you could go about it is to do it by hand so to speak. Something like this would do the trick
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame([[0,1],[0,2],[0,1.5],[1,1],[1,5]], columns=['group','val'])
group = df['group'].tolist()
val = df['val'].tolist()
yl = list()
yu = list()
avg = list()
ii = 0
while ii < len(group): #Loop through all the groups
g = group[ii]
y0 = val[ii]
y1 = val[ii]
s = 0
jj = ii
while (jj < len(group) and group[jj] == g):
s += val[jj]
#This takes the min and max, but could easily take the standard deviation
if val[jj] > y1:
y1 = val[jj]
if val[jj] < y0:
y0 = val[jj]
jj += 1
avg.append(s/(jj - ii))
ii = jj
yl.append(y0)
yu.append(y1)
x = np.linspace(min(group), max(group), len(yl))
plt.ylabel(df.columns[0])
plt.xlabel(df.columns[1])
plt.plot(avg, x, color="#5a9edd", linestyle="-", linewidth=1.5)
plt.fill_betweenx(x, yl, yu, alpha=0.3)
This will give you the following plot:
For brevity this uses the minimum and maximum from each group to give the error band, but that can be easily changed to standard error or standard deviation as needed.

Consider what you'd do if not using seaborn. You would calculate the mean and standard deviation and plot those as a function of the group. Now it is quite straight forward to exchange x and y for a plot(x,y): plot(y,x). For the filled region, you can use fill_betweenx instead of fill_between.
Below the two cases for comparisson.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame([[0,1],[0,2],[0,1.5],[1,1],[1,5]], columns=['group','val'])
mean = df.groupby("group").mean()
std = df.groupby("group").std()
fig, (ax, ax2) = plt.subplots(ncols=2)
ax.plot(mean.index, mean["val"].values)
ax.fill_between(mean.index, (mean-std)["val"].values, (mean+std)["val"].values, alpha=.5)
ax.set(xlabel="group", ylabel="val")
ax2.plot(mean["val"].values, mean.index)
ax2.fill_betweenx(mean.index, (mean-std)["val"].values, (mean+std)["val"].values, alpha=.5)
ax2.set(ylabel="group", xlabel="val")
fig.tight_layout()
plt.show()

Related

How can I force seaborn or matplotlib to always render isolated values in heatmaps?

I am analyzing long series of events using heatmaps. Values of the column are most of the time 0 but occasionally are 1 unfortunately the rendering behaviour often hide the 1 occurrences because of the 0 surrounding them. I have tried to use antialiased=False but it did not solve the problem:
This code reproduce the issue:
import numpy as np
import pandas as pd
import seaborn as sns
d = pd.DataFrame(np.zeros((2000, 4)))
for i in range(4):
for j in [34,223,56,666]:
d[i][j] = 1
axS = sns.heatmap(d,antialiased=False)
There should be 4 lines instead only one is visible. Of course, if I stretch the plot I have better results but still some values are hidden.
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (10,30)
axB = sns.heatmap(d,antialiased=False)
I would like to force the rendering of isolated values. Is there any way to get this behaviour?
P.S. I need heatmaps because I compare multiple variables with float values, so spy for instance is not a good option for me.

If these events are infrequent, you could reduce the resolution of the vertical axis by aggregating the values into bins.
import numpy as np
import pandas as pd
import seaborn as sns
d = pd.DataFrame(np.zeros((2000, 4)))
for i in range(4):
for j in [34,223,56,666]:
d[i][j] = 1
x1 = d.index.min() - 1e-9
x2 = d.index.max()
bin_width = 50
bin_edge = np.arange(x1, x2 + bin_width, bin_width)
bin_center = np.arange(x1 + bin_width/2, x2, bin_width)
index_binned = pd.cut(d.index, bins=bin_edge, labels=bin_center)
d = d.join(pd.Series(index_binned, name="index_binned"))
d_binned = d.groupby('index_binned').sum()
sns.heatmap(data=d_binned, antialiased=False)
Output:

sns.relplot(): Set only one-side limits for facet plot

let's assume I have the following data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.DataFrame(dict(x=[1,2,3]*4,
y=list(range(12)),
row_separator = ["a"]*6 + ["b"]*6,
col_separator = (["a"]*3 + ["b"]*3) * 2
)
)
data
I'd like to have a simple line plot of x and y for each combination of row_separator and col_separator, and I want to do this using sns.relplot(). y-scales are different, which is fine, but I want to have the same lower bound 0 for all sub-plots.
With plt.subplots, it's like this:
fig, axs = plt.subplots(2,2)
for i in [0,1]:
for j in [0,1]:
ind = (data.row_separator == data.row_separator.unique()[i]) &\
(data.col_separator == data.col_separator.unique()[j])
axs[i,j].plot(data.loc[ind,"x"], data.loc[ind,"y"])
axs[i,j].set_ylim(bottom=0)
With seaborn, I'm struggling how to do this. (My) Basis would be
g = sns.relplot(data=data,
x="x",
y="y",
col="row_separator",
row="col_separator",
kind="line"
)
I would expect facet_kws={"ylim":(0, None)} to do what I want, but it isn't (it's rather limiting all plots to 0 and 1).

Plotting probability density function in Python

I want to plot two probability density functions (pdf) based on values of a certain column in a dataframe. The first one for all the values that correspond to rows with target label = 0 and second one where target label = 1.
My attempt is below, but as you can see the curves do not look like a pdf (the max value is 0 and they are not confined to X axis in range 0-1 and 5-6. I assume I can get something close by playing around with bw factor, but I am looking for a one-liner that just figures out right params and plots a pdf(including figuiring out the right X-axis start/end to use). Is there any such built in function that does this. If not, would appreciate some pointers on how to build something like this.
#matplotloib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.neighbors import KernelDensity
values = np.random.rand(10)
values_shift5 = np.random.rand(10) + 5
df = pd.DataFrame({'values' : values, 'label' : np.zeros(10)})
df = pd.concat([df, pd.DataFrame({'values' : values_shift5, 'label' : np.ones(10)})])
kde_label_0 = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(df[df.label == 0]['values'].values.reshape(-1, 1))
kde_label_1 = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(df[df.label == 1]['values'].values.reshape(-1, 1))
X_plot = np.linspace(0, 10, 50).reshape(-1, 1)
log_density_0 = kde_label_0.score_samples(X_plot)
log_density_1 = kde_label_1.score_samples(X_plot)
plt.plot(X_plot, log_density_0, label='Label 0')
plt.plot(X_plot, log_density_1, label='Label 1')
plt.legend()
plt.show()

How to get the full width at half maximum (FWHM) from kdeplot

I have used seaborn's kdeplot on some data.
import seaborn as sns
import numpy as np
sns.kdeplot(np.random.rand(100))
Is it possible to return the fwhm from the curve created?
And if not, is there another way to calculate it?

You can extract the generated kde curve from the ax. Then get the maximum y value and search the x positions nearest to the half max:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
ax = sns.kdeplot(np.random.rand(100))
kde_curve = ax.lines[0]
x = kde_curve.get_xdata()
y = kde_curve.get_ydata()
halfmax = y.max() / 2
maxpos = y.argmax()
leftpos = (np.abs(y[:maxpos] - halfmax)).argmin()
rightpos = (np.abs(y[maxpos:] - halfmax)).argmin() + maxpos
fullwidthathalfmax = x[rightpos] - x[leftpos]
ax.hlines(halfmax, x[leftpos], x[rightpos], color='crimson', ls=':')
ax.text(x[maxpos], halfmax, f'{fullwidthathalfmax:.3f}\n', color='crimson', ha='center', va='center')
ax.set_ylim(ymin=0)
plt.show()
Note that you can also calculate a kde curve from scipy.stats.gaussian_kde if you don't need the plotted version. In that case, the code could look like:
import numpy as np
from scipy.stats import gaussian_kde
data = np.random.rand(100)
kde = gaussian_kde(data)
x = np.linspace(data.min(), data.max(), 1000)
y = kde(x)
halfmax = y.max() / 2
maxpos = y.argmax()
leftpos = (np.abs(y[:maxpos] - halfmax)).argmin()
rightpos = (np.abs(y[maxpos:] - halfmax)).argmin() + maxpos
fullwidthathalfmax = x[rightpos] - x[leftpos]
print(fullwidthathalfmax)

I don't believe there's a way to return the fwhm from the random dataplot without writing the code to calculate it.
Take into account some example data:
import numpy as np
arr_x = np.linspace(norm.ppf(0.00001), norm.ppf(0.99999), 10000)
arr_y = norm.pdf(arr_x)
Find the minimum and maximum points and calculate difference.
difference = max(arr_y) - min(arr_y)
Find the half max (in this case it is half min)
HM = difference / 2
Find the nearest data point to HM:
nearest = (np.abs(arr_y - HM)).argmin()
Calculate the distance between nearest and min to get the HWHM, then mult by 2 to get the FWHM.

How to add axis offset in matplotlib plot?

I'm drawing several point plots in seaborn on the same graph. The x-axis is ordinal, not numerical; the ordinal values are the same for each point plot. I would like to shift each plot a bit to the side, the way pointplot(dodge=...) parameter does within multiple lines within a single plot, but in this case for multiple different plots drawn on top of each other. How can I do that?
Ideally, I'd like a technique that works for any matplotlib plot, not just seaborn specifically. Adding an offset to the data won't work easily, since the data is not numerical.
Example that shows the plots overlapping and making them hard to read (dodge within each plot works okay)
import pandas as pd
import seaborn as sns
df1 = pd.DataFrame({'x':list('ffffssss'), 'y':[1,2,3,4,5,6,7,8], 'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
sns.pointplot(data=df1, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='<')
sns.pointplot(data=df2, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='>')
I could use something other than seaborn, but the automatic confidence / error bars are very convenient so I'd prefer to stick with seaborn here.

Answering this for the most general case first.
A dodge can be implemented by shifting the artists in the figure by some amount. It might be useful to use points as units of that shift. E.g. you may want to shift your markers on the plot by 5 points.
This shift can be accomplished by adding a translation to the data transform of the artist. Here I propose a ScaledTranslation.
Now to keep this most general, one may write a function which takes the plotting method, the axes and the data as input, and in addition some dodge to apply, e.g.
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
The full functional code:
import matplotlib.pyplot as plt
from matplotlib import transforms
import numpy as np
import pandas as pd
def draw_dodge(*args, **kwargs):
func = args[0]
dodge = kwargs.pop("dodge", 0)
ax = kwargs.pop("ax", plt.gca())
trans = ax.transData + transforms.ScaledTranslation(dodge/72., 0,
ax.figure.dpi_scale_trans)
artist = func(*args[1:], **kwargs)
def iterate(artist):
if hasattr(artist, '__iter__'):
for obj in artist:
iterate(obj)
else:
artist.set_transform(trans)
iterate(artist)
return artist
X = ["a", "b"]
Y = np.array([[1,2],[2,2],[3,2],[1,4]])
Dodge = np.arange(len(Y),dtype=float)*10
Dodge -= Dodge.mean()
fig, ax = plt.subplots()
for y,d in zip(Y,Dodge):
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
ax.margins(x=0.4)
plt.show()
You may use this with ax.plot, ax.scatter etc. However not with any of the seaborn functions, because they don't return any useful artist to work with.
Now for the case in question, the remaining problem is to get the data in a useful format. One option would be the following.
df1 = pd.DataFrame({'x':list('ffffssss'),
'y':[1,2,3,4,5,6,7,8],
'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
N = len(np.unique(df1["x"].values))*len([df1,df2])
Dodge = np.linspace(-N,N,N)/N*10
fig, ax = plt.subplots()
k = 0
for df in [df1,df2]:
for (n, grp) in df.groupby("h"):
x = grp.groupby("x").mean()
std = grp.groupby("x").std()
draw_dodge(ax.errorbar, x.index, x.values,
yerr =std.values.flatten(), ax=ax,
dodge=Dodge[k], marker="o", label=n)
k+=1
ax.legend()
ax.margins(x=0.4)
plt.show()

You can use linspace to easily shift your graphs to where you want them to start and end. The function also makes it very easy to scale the graph so they would be visually the same width
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
start_offset = 3
end_offset = start_offset
y1 = np.random.randint(0, 10, 20) ##y1 has 20 random ints from 0 to 10
y2 = np.random.randint(0, 10, 10) ##y2 has 10 random ints from 0 to 10
x1 = np.linspace(0, 20, y1.size) ##create a number of steps from 0 to 20 equal to y1 array size-1
x2 = np.linspace(0, 20, y2.size)
plt.plot(x1, y1)
plt.plot(x2, y2)
plt.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to rotate a seaborn lineplot - python

Related

How can I force seaborn or matplotlib to always render isolated values in heatmaps?

sns.relplot(): Set only one-side limits for facet plot

Plotting probability density function in Python

How to get the full width at half maximum (FWHM) from kdeplot

How to add axis offset in matplotlib plot?

Categories

Resources