histogram and KernelDensity in log scale - python

In plt.hist I can define logarithmic bins which show up as linearly spaced in log scale (see image).
Is there a way to do the same with the bandwidth of sklearn.neighbors.KernelDensity?
Choosing a single number for bandwidth in KernelDensity, and plotting in logscale as below, gives non-equally spaced kernel densities.
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import KernelDensity
def kernel_density_histo(sig, band=.1):
X = np.linspace(np.min(sig)*0.9, np.max(sig)*1.1, 1000)[:, np.newaxis]
kde = KernelDensity(kernel='gaussian', bandwidth=band).fit(sig[:, np.newaxis])
dens = np.exp(kde.score_samples(X))
plt.figure('kernel_density_histo', clear=True)
plt.semilogx(X,dens, label='kernel')
sig = np.random.lognormal(size=100)
kernel_density_histo(sig)
plt.hist(sig, bins=np.logspace(np.log(np.min(sig)), np.log(np.max(sig)), 30), density=True, rwidth=0.8, label='hist')
plt.legend()
plt.show()

Related

Rotate PSD plot in Python by 90 degrees

I have generated a Power Spectral Density (PSD) plot using the command
plt.psd(x,512,fs)
I am attempting to duplicate this plot from a paper:
I am able to get the spectrogram and the PSD graph. I however need to get the PSD rotated 90 degrees counter clockwise to show up properly. Can you assist me in rotating the PSD graph 90 degrees counterclockwise? Thanks!
Here is the code that I have so far:
import matplotlib.pyplot as plt
from matplotlib import transforms
import numpy as np
from numpy.fft import fft, rfft
from scipy.io import wavfile
from scipy import signal
import librosa
import librosa.display
from matplotlib.gridspec import GridSpec
input_file = (r'G:/File.wav')
fs, x = wavfile.read(input_file)
nperseg = 1025
noverlap = nperseg - 1
f, t, Sxx = signal.spectrogram(x, fs,
nperseg=nperseg,
noverlap=noverlap,
window='hann')
def format_axes(fig):
for i, ax in enumerate(fig.axes):
ax.tick_params(labelbottom=False, labelleft=False)
fig = plt.figure(constrained_layout=True)
gs = GridSpec(6, 5, figure=fig)
ax1 = plt.subplot(gs.new_subplotspec((0, 1), colspan=4))
ax2 = plt.subplot(gs.new_subplotspec((1, 0), rowspan=4))
plt.psd(x, 512, fs) # How to rotate this plot 90 counterclockwise?
plt.ylabel("")
plt.xlabel("")
# plt.xlim(0, t)
fig.suptitle("Sound Analysis")
format_axes(fig)
plt.show()
I would suggest outputting the values for the power spectrum and the frequencies in order to manually create the rotated plot.
For instance, let us consider a random array x consisting of 10,000 samples, sampled at Fs=1,000:
import matplotlib.pyplot as plt
import numpy as np
x=np.random.random(10000)
fs=1000
Pxx, freq = plt.psd(x, 512, fs)
This snippet retuns the following image:
In order to create the rotated plot, just use plot:
plt.plot(10*np.log10(Pxx),freq)
plt.xlabel("Power Spectrial Density (dB/Hz)")
plt.ylabel('Frequency')
This will return:
EDIT: please keep in mind that the function psd outputs Pxx, but what you need to plot is 10*np.log10(Pxx). As stated on the psd help page: for plotting, the power is plotted as 10log10(Pxx) for decibels, though Pxx itself is returned.

Trying to interpolate the output of a histogram function in Python

What I am trying to do is to play around with some random distribution. I don't want it to be normal. But for the time being normal is easier.
import matplotlib.pyplot as plt
from scipy.stats import norm
ws=norm.rvs(4.0, 1.5, size=100)
density, bins = np.histogram(ws, 50,normed=True, density=True)
unity_density = density / density.sum()
fig, ((ax1, ax2)) = plt.subplots(nrows=1, ncols=2, sharex=True, figsize=(12,6))
widths = bins[:-1] - bins[1:]
ax1.bar(bins[1:], unity_density, width=widths)
ax2.bar(bins[1:], unity_density.cumsum(), width=widths)
fig.tight_layout()
Then what I can do it visualize CDF in terms of points.
density1=unity_density.cumsum()
x=bins[:-1]
y=density1
plt.plot(x, density1, 'o')
So what I have been trying to do is to use the np.interp function on the output of np.histogram in order to obtain a smooth curve representing the CDF and extracting the percent points to plot them. Ideally, I need to try to do it all both manually and using ppf function from scipy.
I have always struggled with statistics as an undergraduate. I am in grad school now and try to put me through as many exercises like this as possible in order to get a deeper understanding of what is happening. I've reached a point of desperation with this task.
Thank you!
One possibility to get smoother results is to use more samples, by using 10^5 samples and 100 bins I get the following images:
ws = norm.rvs(loc=4.0, scale=1.5, size=100000)
density, bins = np.histogram(ws, bins=100, normed=True, density=True)
In general you could use scipys interpolation module to smooth your CDF.
For 100 samples and a smoothing factor of s=0.01 I get:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import splev, splrep
density1 = unity_density.cumsum()
x = bins[:-1]
y = density1
# Interpolation
spl = splrep(x, y, s=0.01, per=False)
x2 = np.linspace(x[0], x[-1], 200)
y2 = splev(x2, spl)
# Plotting
fig, ax = plt.subplots()
plt.plot(x, density1, 'o')
plt.plot(x2, y2, 'r-')
The third possibility is to calculate the CDF analytically. If you generate the noise yourself with a numpy / scipy function most of the time there is already an implementation of the CDF available, otherwise you should find it on Wikipedia. If your samples come from measurements that is of course a different story.
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = np.linspace(-2, 10)
y = norm(loc=4.0, scale=1.5).cdf(x)
ax.plot(x, y, 'bo-')

Poor estimation of the probability density function (pdf) close to boundaries

When estimating the pdf of values that are in [0, 1] using stats.kde.gaussian_kde, then if the values are uniformly distributed, stats.kde.gaussian_kde gives very poor results close to the boundaries (close to 0 and close to 1). See the code and picture below.
Is there any way to deal with this poor estimation close to the boundaries ?
import numpy as np
import random
import scipy.stats as stats
import matplotlib.pyplot as plt
X = [random.uniform(0,1) for _ in range(10000)]
linsp = np.linspace(0, 1, 1000)
nparam_density = stats.kde.gaussian_kde(X)
nparam_density = nparam_density(linsp)
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(X, bins=30, normed=True)
ax.plot(linsp, nparam_density, 'r-', label='non-parametric density (smoothed by Gaussian kernel)')
ax.legend(loc='best')
fig.savefig('pdf.png')

Python - Matplotlib: normalize axis when plotting a Probability Density Function

I'm using Python and some of its extensions to get and plot the Probability Density Function. While I manage to plot it, in its form, at least, I don't manage to succeed on scalating the axis.
import decimal
import numpy as np
import scipy.stats as stats
import pylab as pl
import matplotlib.pyplot as plt
from decimal import *
from scipy.stats import norm
lines=[]
fig, ax = plt.subplots(1, 1)
mean, var, skew, kurt = norm.stats(moments='mvsk')
#Here I delete some lines aimed to fill the list with values
Long = len(lines)
Maxim = max(lines) #MaxValue
Minim = min(lines) #MinValue
av = np.mean(lines) #Average
StDev = np.std(lines) #Standard Dev.
x = np.linspace(Minim, Maxim, Long)
ax.plot(x, norm.pdf(x, av, StDev),'r-', lw=3, alpha=0.9, label='norm pdf')
weights = np.ones_like(lines)/len(lines)
ax.hist(lines, weights = weights, normed=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()
The result is
While I would like to have it expressed
- In the x-axis centered in 0 and related to the standard deviation
- In the y-axis, related to the histogram and the %s (normalized to 1)
For the x-axis as the image below
And like this last image for the y-axis
I've managed to escalate the y-axis in a histogram by plotting it individually with the instruction weights = weights and setting it into the plot, but I can't do it here. I include it in the code but actually it does nothing in this case.
Any help would be appreciated
the y-axis is normed in a way, that the area under the curve is one.
And adding equal weights for every data point makes no sense if you normalize anyway with normed=True.
first you need to shift your data to 0:
lines -= mean(lines)
then plot it.
ythis should be a working minimal example:
import numpy as np
from numpy.random import normal
import matplotlib.pyplot as plt
from scipy.stats import norm
# gaussian distributed random numbers with mu =4 and sigma=2
x = normal(4, 2, 10000)
mean = np.mean(x)
sigma = np.std(x)
x -= mean
x_plot = np.linspace(min(x), max(x), 1000)
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.hist(x, bins=50, normed=True, label="data")
ax.plot(x_plot, norm.pdf(x_plot, mean, sigma), 'r-', label="pdf")
ax.legend(loc='best')
x_ticks = np.arange(-4*sigma, 4.1*sigma, sigma)
x_labels = [r"${} \sigma$".format(i) for i in range(-4,5)]
ax.set_xticks(x_ticks)
ax.set_xticklabels(x_labels)
plt.show()
output image is this:
and you have too much imports.
you import decimals twice, one time even with *
and then numpy, pyplot and scipy are included in pylab. Also why import the whole scipy.stats and then again import just norm from it?

How can I find the right gaussian curve given some data?

I have code that draws from a gaussian in 1D:
import numpy as np
from scipy.stats import norm
from scipy.optimize import curve_fit
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import gauss
# Beginning in one dimension:
mean = 0; Var = 1; N = 1000
scatter = np.random.normal(mean,np.sqrt(Var),N)
scatter = np.sort(scatter)
mu,sigma = norm.fit(scatter)
I obtain mu and sigma using norm.fit()
Now I'd like to obtain my parameters using
xdata = np.linspace(-5,5,N)
pop, pcov = curve_fit(gauss.gauss_1d,xdata,scatter)
The problem is I don't know how to map my scattered points (drawn from a 1D gaussian) to the x-line in order to use curve_fit.
Also, suppose I simply use and mu and sigma as earlier.
I plot using:
n, bins, patches = plt.hist(scatter,50,facecolor='green')
y = 2*max(n)*mlab.normpdf(bins,mu,sigma)
l = plt.plot(bins,y,'r--')
plt.xlabel('x-coord')
plt.ylabel('Occurrences')
plt.grid(True)
plt.show()
But I have to guess the amplitude as 2*max(n). It works but it's not robust. How can I find the amplitude without guessing?
To avoid guessing the amplitude, call hist() with normed=True, then the amplitude corresponds to normpdf().
For doing a curve fit, I suggest to use not the density but the cumulative distribution: Each sample has a height of 1/N, which successively sum up to 1. This has the advantage that you don't need to group samples in bins.
import numpy as np
from scipy.stats import norm
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
# Beginning in one dimension:
mean = 0; Var = 1; N = 100
scatter = np.random.normal(mean,np.sqrt(Var),N)
scatter = np.sort(scatter)
mu1,sigma1 = norm.fit(scatter) # classical fit
scat_sum = np.cumsum(np.ones(scatter.shape))/N # cumulative samples
[mu2,sigma2],Cx = curve_fit(norm.cdf, scatter, scat_sum, p0=[0,1]) # curve fit
print(u"norm.fit(): µ1= {:+.4f}, σ1={:.4f}".format(mu1, sigma1))
print(u"curve_fit(): µ2= {:+.4f}, σ2={:.4f}".format(mu2, sigma2))
fg = plt.figure(1); fg.clf()
ax = fg.add_subplot(1, 1, 1)
t = np.linspace(-4,4, 1000)
ax.plot(t, norm.cdf(t, mu1, sigma1), alpha=.5, label="norm.fit()")
ax.plot(t, norm.cdf(t, mu2, sigma2), alpha=.5, label="curve_fit()")
ax.step(scatter, scat_sum, 'x-', where='post', alpha=.5, label="Samples")
ax.legend(loc="best")
ax.grid(True)
ax.set_xlabel("$x$")
ax.set_ylabel("Cumulative Probability Density")
ax.set_title("Fit to Normal Distribution")
fg.canvas.draw()
plt.show()
prints
norm.fit(): µ1= +0.1534, σ1=1.0203
curve_fit(): µ2= +0.1135, σ2=1.0444
and plots

Categories