I am analyzing long series of events using heatmaps. Values of the column are most of the time 0 but occasionally are 1 unfortunately the rendering behaviour often hide the 1 occurrences because of the 0 surrounding them. I have tried to use antialiased=False but it did not solve the problem:
This code reproduce the issue:
import numpy as np
import pandas as pd
import seaborn as sns
d = pd.DataFrame(np.zeros((2000, 4)))
for i in range(4):
for j in [34,223,56,666]:
d[i][j] = 1
axS = sns.heatmap(d,antialiased=False)
There should be 4 lines instead only one is visible. Of course, if I stretch the plot I have better results but still some values are hidden.
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (10,30)
axB = sns.heatmap(d,antialiased=False)
I would like to force the rendering of isolated values. Is there any way to get this behaviour?
P.S. I need heatmaps because I compare multiple variables with float values, so spy for instance is not a good option for me.
If these events are infrequent, you could reduce the resolution of the vertical axis by aggregating the values into bins.
import numpy as np
import pandas as pd
import seaborn as sns
d = pd.DataFrame(np.zeros((2000, 4)))
for i in range(4):
for j in [34,223,56,666]:
d[i][j] = 1
x1 = d.index.min() - 1e-9
x2 = d.index.max()
bin_width = 50
bin_edge = np.arange(x1, x2 + bin_width, bin_width)
bin_center = np.arange(x1 + bin_width/2, x2, bin_width)
index_binned = pd.cut(d.index, bins=bin_edge, labels=bin_center)
d = d.join(pd.Series(index_binned, name="index_binned"))
d_binned = d.groupby('index_binned').sum()
sns.heatmap(data=d_binned, antialiased=False)
Output:
let's assume I have the following data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.DataFrame(dict(x=[1,2,3]*4,
y=list(range(12)),
row_separator = ["a"]*6 + ["b"]*6,
col_separator = (["a"]*3 + ["b"]*3) * 2
)
)
data
I'd like to have a simple line plot of x and y for each combination of row_separator and col_separator, and I want to do this using sns.relplot(). y-scales are different, which is fine, but I want to have the same lower bound 0 for all sub-plots.
With plt.subplots, it's like this:
fig, axs = plt.subplots(2,2)
for i in [0,1]:
for j in [0,1]:
ind = (data.row_separator == data.row_separator.unique()[i]) &\
(data.col_separator == data.col_separator.unique()[j])
axs[i,j].plot(data.loc[ind,"x"], data.loc[ind,"y"])
axs[i,j].set_ylim(bottom=0)
With seaborn, I'm struggling how to do this. (My) Basis would be
g = sns.relplot(data=data,
x="x",
y="y",
col="row_separator",
row="col_separator",
kind="line"
)
I would expect facet_kws={"ylim":(0, None)} to do what I want, but it isn't (it's rather limiting all plots to 0 and 1).
I want to plot two probability density functions (pdf) based on values of a certain column in a dataframe. The first one for all the values that correspond to rows with target label = 0 and second one where target label = 1.
My attempt is below, but as you can see the curves do not look like a pdf (the max value is 0 and they are not confined to X axis in range 0-1 and 5-6. I assume I can get something close by playing around with bw factor, but I am looking for a one-liner that just figures out right params and plots a pdf(including figuiring out the right X-axis start/end to use). Is there any such built in function that does this. If not, would appreciate some pointers on how to build something like this.
#matplotloib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.neighbors import KernelDensity
values = np.random.rand(10)
values_shift5 = np.random.rand(10) + 5
df = pd.DataFrame({'values' : values, 'label' : np.zeros(10)})
df = pd.concat([df, pd.DataFrame({'values' : values_shift5, 'label' : np.ones(10)})])
kde_label_0 = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(df[df.label == 0]['values'].values.reshape(-1, 1))
kde_label_1 = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(df[df.label == 1]['values'].values.reshape(-1, 1))
X_plot = np.linspace(0, 10, 50).reshape(-1, 1)
log_density_0 = kde_label_0.score_samples(X_plot)
log_density_1 = kde_label_1.score_samples(X_plot)
plt.plot(X_plot, log_density_0, label='Label 0')
plt.plot(X_plot, log_density_1, label='Label 1')
plt.legend()
plt.show()
I have used seaborn's kdeplot on some data.
import seaborn as sns
import numpy as np
sns.kdeplot(np.random.rand(100))
Is it possible to return the fwhm from the curve created?
And if not, is there another way to calculate it?
You can extract the generated kde curve from the ax. Then get the maximum y value and search the x positions nearest to the half max:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
ax = sns.kdeplot(np.random.rand(100))
kde_curve = ax.lines[0]
x = kde_curve.get_xdata()
y = kde_curve.get_ydata()
halfmax = y.max() / 2
maxpos = y.argmax()
leftpos = (np.abs(y[:maxpos] - halfmax)).argmin()
rightpos = (np.abs(y[maxpos:] - halfmax)).argmin() + maxpos
fullwidthathalfmax = x[rightpos] - x[leftpos]
ax.hlines(halfmax, x[leftpos], x[rightpos], color='crimson', ls=':')
ax.text(x[maxpos], halfmax, f'{fullwidthathalfmax:.3f}\n', color='crimson', ha='center', va='center')
ax.set_ylim(ymin=0)
plt.show()
Note that you can also calculate a kde curve from scipy.stats.gaussian_kde if you don't need the plotted version. In that case, the code could look like:
import numpy as np
from scipy.stats import gaussian_kde
data = np.random.rand(100)
kde = gaussian_kde(data)
x = np.linspace(data.min(), data.max(), 1000)
y = kde(x)
halfmax = y.max() / 2
maxpos = y.argmax()
leftpos = (np.abs(y[:maxpos] - halfmax)).argmin()
rightpos = (np.abs(y[maxpos:] - halfmax)).argmin() + maxpos
fullwidthathalfmax = x[rightpos] - x[leftpos]
print(fullwidthathalfmax)
I don't believe there's a way to return the fwhm from the random dataplot without writing the code to calculate it.
Take into account some example data:
import numpy as np
arr_x = np.linspace(norm.ppf(0.00001), norm.ppf(0.99999), 10000)
arr_y = norm.pdf(arr_x)
Find the minimum and maximum points and calculate difference.
difference = max(arr_y) - min(arr_y)
Find the half max (in this case it is half min)
HM = difference / 2
Find the nearest data point to HM:
nearest = (np.abs(arr_y - HM)).argmin()
Calculate the distance between nearest and min to get the HWHM, then mult by 2 to get the FWHM.
I'm drawing several point plots in seaborn on the same graph. The x-axis is ordinal, not numerical; the ordinal values are the same for each point plot. I would like to shift each plot a bit to the side, the way pointplot(dodge=...) parameter does within multiple lines within a single plot, but in this case for multiple different plots drawn on top of each other. How can I do that?
Ideally, I'd like a technique that works for any matplotlib plot, not just seaborn specifically. Adding an offset to the data won't work easily, since the data is not numerical.
Example that shows the plots overlapping and making them hard to read (dodge within each plot works okay)
import pandas as pd
import seaborn as sns
df1 = pd.DataFrame({'x':list('ffffssss'), 'y':[1,2,3,4,5,6,7,8], 'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
sns.pointplot(data=df1, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='<')
sns.pointplot(data=df2, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='>')
I could use something other than seaborn, but the automatic confidence / error bars are very convenient so I'd prefer to stick with seaborn here.
Answering this for the most general case first.
A dodge can be implemented by shifting the artists in the figure by some amount. It might be useful to use points as units of that shift. E.g. you may want to shift your markers on the plot by 5 points.
This shift can be accomplished by adding a translation to the data transform of the artist. Here I propose a ScaledTranslation.
Now to keep this most general, one may write a function which takes the plotting method, the axes and the data as input, and in addition some dodge to apply, e.g.
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
The full functional code:
import matplotlib.pyplot as plt
from matplotlib import transforms
import numpy as np
import pandas as pd
def draw_dodge(*args, **kwargs):
func = args[0]
dodge = kwargs.pop("dodge", 0)
ax = kwargs.pop("ax", plt.gca())
trans = ax.transData + transforms.ScaledTranslation(dodge/72., 0,
ax.figure.dpi_scale_trans)
artist = func(*args[1:], **kwargs)
def iterate(artist):
if hasattr(artist, '__iter__'):
for obj in artist:
iterate(obj)
else:
artist.set_transform(trans)
iterate(artist)
return artist
X = ["a", "b"]
Y = np.array([[1,2],[2,2],[3,2],[1,4]])
Dodge = np.arange(len(Y),dtype=float)*10
Dodge -= Dodge.mean()
fig, ax = plt.subplots()
for y,d in zip(Y,Dodge):
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
ax.margins(x=0.4)
plt.show()
You may use this with ax.plot, ax.scatter etc. However not with any of the seaborn functions, because they don't return any useful artist to work with.
Now for the case in question, the remaining problem is to get the data in a useful format. One option would be the following.
df1 = pd.DataFrame({'x':list('ffffssss'),
'y':[1,2,3,4,5,6,7,8],
'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
N = len(np.unique(df1["x"].values))*len([df1,df2])
Dodge = np.linspace(-N,N,N)/N*10
fig, ax = plt.subplots()
k = 0
for df in [df1,df2]:
for (n, grp) in df.groupby("h"):
x = grp.groupby("x").mean()
std = grp.groupby("x").std()
draw_dodge(ax.errorbar, x.index, x.values,
yerr =std.values.flatten(), ax=ax,
dodge=Dodge[k], marker="o", label=n)
k+=1
ax.legend()
ax.margins(x=0.4)
plt.show()
You can use linspace to easily shift your graphs to where you want them to start and end. The function also makes it very easy to scale the graph so they would be visually the same width
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
start_offset = 3
end_offset = start_offset
y1 = np.random.randint(0, 10, 20) ##y1 has 20 random ints from 0 to 10
y2 = np.random.randint(0, 10, 10) ##y2 has 10 random ints from 0 to 10
x1 = np.linspace(0, 20, y1.size) ##create a number of steps from 0 to 20 equal to y1 array size-1
x2 = np.linspace(0, 20, y2.size)
plt.plot(x1, y1)
plt.plot(x2, y2)
plt.show()