How to get the full width at half maximum (FWHM) from kdeplot

How to get the full width at half maximum (FWHM) from kdeplot - python

I have used seaborn's kdeplot on some data.
import seaborn as sns
import numpy as np
sns.kdeplot(np.random.rand(100))
Is it possible to return the fwhm from the curve created?
And if not, is there another way to calculate it?

You can extract the generated kde curve from the ax. Then get the maximum y value and search the x positions nearest to the half max:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
ax = sns.kdeplot(np.random.rand(100))
kde_curve = ax.lines[0]
x = kde_curve.get_xdata()
y = kde_curve.get_ydata()
halfmax = y.max() / 2
maxpos = y.argmax()
leftpos = (np.abs(y[:maxpos] - halfmax)).argmin()
rightpos = (np.abs(y[maxpos:] - halfmax)).argmin() + maxpos
fullwidthathalfmax = x[rightpos] - x[leftpos]
ax.hlines(halfmax, x[leftpos], x[rightpos], color='crimson', ls=':')
ax.text(x[maxpos], halfmax, f'{fullwidthathalfmax:.3f}\n', color='crimson', ha='center', va='center')
ax.set_ylim(ymin=0)
plt.show()
Note that you can also calculate a kde curve from scipy.stats.gaussian_kde if you don't need the plotted version. In that case, the code could look like:
import numpy as np
from scipy.stats import gaussian_kde
data = np.random.rand(100)
kde = gaussian_kde(data)
x = np.linspace(data.min(), data.max(), 1000)
y = kde(x)
halfmax = y.max() / 2
maxpos = y.argmax()
leftpos = (np.abs(y[:maxpos] - halfmax)).argmin()
rightpos = (np.abs(y[maxpos:] - halfmax)).argmin() + maxpos
fullwidthathalfmax = x[rightpos] - x[leftpos]
print(fullwidthathalfmax)

I don't believe there's a way to return the fwhm from the random dataplot without writing the code to calculate it.
Take into account some example data:
import numpy as np
arr_x = np.linspace(norm.ppf(0.00001), norm.ppf(0.99999), 10000)
arr_y = norm.pdf(arr_x)
Find the minimum and maximum points and calculate difference.
difference = max(arr_y) - min(arr_y)
Find the half max (in this case it is half min)
HM = difference / 2
Find the nearest data point to HM:
nearest = (np.abs(arr_y - HM)).argmin()
Calculate the distance between nearest and min to get the HWHM, then mult by 2 to get the FWHM.

Related

How to plot heatmap onto mplsoccer pitch?

Wondering how I can plot a seaborn plot onto a different matplotlib plot. Currently I have two plots (one a heatmap, the other a soccer pitch), but when I plot the heatmap onto the pitch, I get the results below. (Plotting the pitch onto the heatmap isn't pretty either.) Any ideas how to fix it?
Note: Plots don't need a colorbar and the grid structure isn't required either. Just care about the heatmap covering the entire space of the pitch. Thanks!
import pandas as pd
import numpy as np
from mplsoccer import Pitch
import seaborn as sns
nmf_shot_W = pd.read_csv('https://raw.githubusercontent.com/lucas-nelson-uiuc/datasets/main/nmf_show_W.csv').iloc[:, 1:]
nmf_shot_ThierryHenry = pd.read_csv('https://raw.githubusercontent.com/lucas-nelson-uiuc/datasets/main/nmf_show_Hth.csv')['Thierry Henry']
pitch = Pitch(pitch_type='statsbomb', line_zorder=2,
pitch_color='#22312b', line_color='#efefef')
dfdfdf = np.array(np.matmul(nmf_shot_W, nmf_shot_ThierryHenry)).reshape((24,25))
g_ax = sns.heatmap(dfdfdf)
pitch.draw(ax=g_ax)
Current output:
Desired output:

Use the built-in pitch.heatmap:
pitch.heatmap expects a stats dictionary of binned data, bin mesh, and bin centers:
stats (dict) – The keys are statistic (the calculated statistic), x_grid and y_grid (the bin's edges), and cx and cy (the bin centers).
In the mplsoccer heatmap demos, they construct this stats object using pitch.bin_statistic because they have raw data. However, you already have binned data ("calculated statistic"), so reconstruct the stats object manually by building the mesh and centers:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mplsoccer import Pitch
nmf_shot_W = pd.read_csv('71878281/nmf_show_W.csv', index_col=0)
nmf_shot_ThierryHenry = pd.read_csv('71878281/nmf_show_Hth.csv')['Thierry Henry']
statistic = np.dot(nmf_shot_W, nmf_shot_ThierryHenry.to_numpy()).reshape((24, 25))
# construct stats object from binned data, bin mesh, and bin centers
y, x = statistic.shape
x_grid = np.linspace(0, 120, x + 1)
y_grid = np.linspace(0, 80, y + 1)
cx = x_grid[:-1] + 0.5 * (x_grid[1] - x_grid[0])
cy = y_grid[:-1] + 0.5 * (y_grid[1] - y_grid[0])
stats = dict(statistic=statistic, x_grid=x_grid, y_grid=y_grid, cx=cx, cy=cy)
# use pitch.draw and pitch.heatmap as per mplsoccer demo
pitch = Pitch(pitch_type='statsbomb', line_zorder=2, pitch_color='#22312b', line_color='#efefef')
fig, ax = pitch.draw(figsize=(6.6, 4.125))
pcm = pitch.heatmap(stats, ax=ax, cmap='plasma')
cbar = fig.colorbar(pcm, ax=ax, shrink=0.6)
cbar.outline.set_edgecolor('#efefef')
cbar.ax.yaxis.set_tick_params(color='#efefef')
plt.setp(plt.getp(cbar.ax.axes, 'yticklabels'), color='#efefef')

how to rotate a seaborn lineplot

How can I rotate a seaborn.lineplot so that the result will be as a function of y and not a function of x.
For example, this code:
import pandas as pd
import seaborn as sns
df = pd.DataFrame([[0,1],[0,2],[0,1.5],[1,1],[1,5]], columns=['group','val'])
sns.lineplot(x='group',y='val',data=df)
Create this figure:
But is there a way to rotate the figure in 90° ? so that in the X we will have "val" and in Y we will have "group" and the std will go from left to right and not from bottom to up.
Thanks
EDIT: I've opened a ticket in seaborn to ask for this feature: https://github.com/mwaskom/seaborn/issues/1661

Per the seaborn docs on lineplot, the dataframe passed to data must be
Tidy (“long-form”) dataframe where each column is a variable and each row is an observation.
Which seems to imply there is no way to force the axes to switch, even by manipulating the data. If there is a way to do that I haven't found it - I'm sure there is a more elegant way to do this, but one way you could go about it is to do it by hand so to speak. Something like this would do the trick
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame([[0,1],[0,2],[0,1.5],[1,1],[1,5]], columns=['group','val'])
group = df['group'].tolist()
val = df['val'].tolist()
yl = list()
yu = list()
avg = list()
ii = 0
while ii < len(group): #Loop through all the groups
g = group[ii]
y0 = val[ii]
y1 = val[ii]
s = 0
jj = ii
while (jj < len(group) and group[jj] == g):
s += val[jj]
#This takes the min and max, but could easily take the standard deviation
if val[jj] > y1:
y1 = val[jj]
if val[jj] < y0:
y0 = val[jj]
jj += 1
avg.append(s/(jj - ii))
ii = jj
yl.append(y0)
yu.append(y1)
x = np.linspace(min(group), max(group), len(yl))
plt.ylabel(df.columns[0])
plt.xlabel(df.columns[1])
plt.plot(avg, x, color="#5a9edd", linestyle="-", linewidth=1.5)
plt.fill_betweenx(x, yl, yu, alpha=0.3)
This will give you the following plot:
For brevity this uses the minimum and maximum from each group to give the error band, but that can be easily changed to standard error or standard deviation as needed.

Consider what you'd do if not using seaborn. You would calculate the mean and standard deviation and plot those as a function of the group. Now it is quite straight forward to exchange x and y for a plot(x,y): plot(y,x). For the filled region, you can use fill_betweenx instead of fill_between.
Below the two cases for comparisson.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame([[0,1],[0,2],[0,1.5],[1,1],[1,5]], columns=['group','val'])
mean = df.groupby("group").mean()
std = df.groupby("group").std()
fig, (ax, ax2) = plt.subplots(ncols=2)
ax.plot(mean.index, mean["val"].values)
ax.fill_between(mean.index, (mean-std)["val"].values, (mean+std)["val"].values, alpha=.5)
ax.set(xlabel="group", ylabel="val")
ax2.plot(mean["val"].values, mean.index)
ax2.fill_betweenx(mean.index, (mean-std)["val"].values, (mean+std)["val"].values, alpha=.5)
ax2.set(ylabel="group", xlabel="val")
fig.tight_layout()
plt.show()

Histogram manipulation to remove unwanted data

How do I remove data from a histogram in python under a certain frequency count?
Say I have 10 bins, the first bin has a count of 4, the second has 2, the third has 1, fourth has 5, etc...
Now I want to get rid of the data that has a count of 2 or less. So the second bin would go to zero, as would the third.
Example:
import numpy as np
import matplotlib.pyplot as plt
gaussian_numbers = np.random.randn(1000)
plt.hist(gaussian_numbers, bins=12)
plt.title("Gaussian Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
fig = plt.gcf()
Gives:
and I want to get rid of the bins with fewer than a frequency of say 'X' (could be frequency = 100 for example).
want:
thank you.

Une np.histogram to create the histogram.
Then use np.where. Given a condition, it yields an array of booleans you can use to index your histogram.
import numpy as np
import matplotlib.pyplot as plt
gaussian_numbers = np.random.randn(1000)
# Get histogram
hist, bins = np.histogram(gaussian_numbers, bins=12)
# Threshold frequency
freq = 100
# Zero out low values
hist[np.where(hist <= freq)] = 0
# Plot
width = 0.7 * (bins[1] - bins[0])
center = (bins[:-1] + bins[1:]) / 2
plt.bar(center, hist, align='center', width=width)
plt.title("Gaussian Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
(Plot part inspired from here.)

What is the source of discrepancy in 2D interpolated spectrogram with matplotlib?

I am trying to interpolate spectrogram obtained from matplotlib using scipy's inetrp2d function, but somehow fail to get the same spectrogram. The data is available here
The actual spectrogram is:
And interpolated spectrogram is:
The code looks okay, but even then something is wrong. The code used is:
from __future__ import division
from matplotlib import ticker as mtick
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
import numpy as np
from bisect import bisect
from scipy import interpolate
from matplotlib.ticker import MaxNLocator
data = np.genfromtxt('spectrogram.dat', skiprows = 2, delimiter = ',')
pressure = data[:, 1] * 0.065
time = data[:, 0]
cax = plt.specgram(pressure * 100000, NFFT = 256, Fs = 50000, noverlap=4, cmap=plt.cm.gist_heat, zorder = 1)
f = interpolate.interp2d(cax[2], cax[1], cax[0], kind='cubic')
xnew = np.linspace(cax[2][0], cax[2][-1], 100)
ynew = np.linspace(cax[1][0], cax[1][-1], 100)
znew = 10 * np.log10(f(xnew, ynew))
fig = plt.figure(figsize=(6, 3.2))
ax = fig.add_subplot(111)
ax.set_title('colorMap')
plt.pcolormesh(xnew, ynew, znew, cmap=plt.cm.gist_heat)
# plt.colorbar()
plt.title('Interpolated spectrogram')
plt.colorbar(orientation='vertical')
plt.savefig('interp_spectrogram.pdf')
How to interpolate a spectrogram correctly with Python?

The key to your solution is in this warning, which you may or may not have seen:
RuntimeWarning: invalid value encountered in log10
znew = 10 * np.log10(f(xnew, ynew))
If your data is actually a power whose log you'd like to view explicitly as decibel power, take the log first, before fitting to the spline:
spectrum, freqs, t, im = cax
dB = 10*np.log10(spectrum)
#f = interpolate.interp2d(t, freqs, dB, kind='cubic') # docs for this recommend next line
f = interpolate.RectBivariateSpline(t, freqs, dB.T) # but this uses xy not ij, hence the .T
xnew = np.linspace(t[0], t[-1], 10*len(t))
ynew = np.linspace(freqs[0], freqs[-1], 10*len(freqs)) # was it wider spaced than freqs on purpose?
znew = f(xnew, ynew).T
Then plotting as you have:
Previous answer:
If you just want to plot on logscale, use matplotlib.colors.LogNorm
znew = f(xnew, ynew) # Don't take the log here
plt.figure(figsize=(6, 3.2))
plt.pcolormesh(xnew, ynew, znew, cmap=plt.cm.gist_heat, norm=colors.LogNorm())
And that looks like this:
Of course that still has gaps where its value is negative when plotted on a log scale. What your data means to you when the value is negative should dictate how you fill this in. One simple solution is to just set those values to the smallest positive value and they'd fill in as black:

Matplotlib How to get length line

I use Matplotlib with Python 2.7.6 and my code
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import spline
x = np.array([1,2,3])
y = np.array([1,5,15])
x_smooth = np.linspace(x.min(), x.max(), 100)
y_smooth = spline(x, y, x_smooth)
plt.plot(x_smooth, y_smooth)
plt.show()
When I run it show image
How to get the length of Spline ? Help me

import math
distance = 0
for count in range(1,len(y_smooth)):
distance += math.sqrt(math.pow(x_smooth[count]-x_smooth[count-1],2) + math.pow(y_smooth[count]-y_smooth[count-1],2))
in cartesian geometry the distance between two points is calculated as
p1(x,y), p2(a,b)
[p1p2] = sqrt((a-x)^2 + (b-y)^2)
I'm sure there is a more elegant way to do this, probably in a one-liner

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get the full width at half maximum (FWHM) from kdeplot - python

I have used seaborn's kdeplot on some data. import seaborn as sns import numpy as np sns.kdeplot(np.random.rand(100)) Is it possible to return the fwhm from the curve created? And if not, is there another way to calculate it?

Related

How to plot heatmap onto mplsoccer pitch?

how to rotate a seaborn lineplot

Histogram manipulation to remove unwanted data

What is the source of discrepancy in 2D interpolated spectrogram with matplotlib?

Matplotlib How to get length line

Categories

Resources