Nonnegative matrix factorization is lauded for generating sparse basis sets. However, when I run sklearn.decomposition.NMF the factors are not sparse. Older versions of NMF had a 'degree of sparseness' parameter beta. Newer versions do not, but I want my basis matrix W to actually be sparse. What can I do? (Code to reproduce problem is below).
I have toyed around with increasing various regularization parameters (e.g., alpha), but am not getting anything very sparse (like in the paper by Lee and Seung (1999) when I apply it to the Olivetti faces dataset. They still basically end up looking like eigenfaces.
My CNM output (not very sparse):
Lee and Seung CNM paper output basis columns (looks sparse to me):
Code to reproduce my problem:
from sklearn.datasets import fetch_olivetti_faces
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import NMF
faces, _ = fetch_olivetti_faces(return_X_y=True)
# run nmf on the faces data set
num_nmf_components = 50
estimator = NMF(num_nmf_components,
init='nndsvd',
tol=5e-3,
max_iter=1000,
alpha_W=0.01,
l1_ratio=0)
H = estimator.fit_transform(faces)
W = estimator.components_
# plot the basis faces
n_row, n_col = 6, 4 # how many faces to plot
image_shape = (64, 64)
n_samples, n_features = faces.shape
plt.figure(figsize=(10,12))
for face_id, face in enumerate(W[:n_row*n_col]):
plt.subplot(n_row, n_col, face_id+1)
plt.imshow(face.reshape(image_shape), cmap='gray')
plt.axis('off')
plt.tight_layout()
Is there some combinations of parameters with sklearn.decomposition.NMF() that lets you dial in sparseness? I have played with different combinations of alpha_W and l1_ratio and even tweaked the number of components. I still end up with eigen-face looking things.
There are a couple of things going on here that we need to disentangle. First, what happened to sparseness? Second, how do you generate sparse faces using the sklearn function?
Where did the sparseness go?
The sklearn.decomposition.NMF function went through a major change from versions 0.16 to 0.19. There are multiple ways to implement nonnetative matrix factorization.
Before 0.16, NMF used projected gradient descent as described in Hoyer 2004, and included a sparseness parameter (which as OP noted let you adjust the sparseness of the resulting W basis).
Because of various limitations outlined in this extremely thorough issue at sklearn's github repo, it was decided to move on to two additional methods:
Release 0.16: coordinate descent (PR here which was in version 0.16)
Release 0.19: multiplicative update (PR here which was in version 0.19)
This was a pretty major undertaking, and the upshot is we now have a great deal more freedom in terms of error functions, initialization, and regularization. You can read about that at the issue. The objective function is now:
You can read more details/explanation at the docs, but to note a few things relevant to the question:
The solver param which takes in mu for multiplicative update or cd for coordinate descent. The older projected gradient descent method (with the sparseness parameter) is deprecated.
As you can see in the objective function, there are weights for regularizing W and for H (alpha_W and alpha_H respectively). In theory if you want to reign in W, you should increase alpha_W.
You can regularize using the L1 or L2 norm, and the ratio between the two is set by l1_ratio. The larger you make l1_ratio, the more you weight the L1 norm over L2 norm. Note: the L1 norm tends to generate more sparse parameter sets, while the L2 norm tends to generate small parameter sets, so in theory if you want sparseness, then set your l1_ratio high.
How to generate sparse faces?
The examination of the objective function suggests what to do. Crank up alpha_W and l1_ratio. But also note that the Lee and Seung paper used multiplicative update (mu), so if you wanted to reproduce their results, I would recommend setting solver to mu, setting alpha_W high, and l1_ratio high, and see what happens.
In the OP's question, they implicitly used the cd solver (which is the default), and set alpha_W=0.01 and l1_ratio=0, which I wouldn't necessarily expect to create a sparse basis set.
But things are actually not that simple. I tried some initial runs of coordinate descent with high l1_ratio and alpha_W and found very low sparseness. So to quantify some of this, I did a grid search, and used a sparseness measure.
Quantifying sparseness is itself a cottage industry (e.g., see this post, and the paper cited there). I used Hoyer's measure of sparsity, adapted from the one used in the nimfa package:
def sparseness_hoyer(x):
"""
The sparseness of array x is a real number in [0, 1], where sparser array
has value closer to 1. Sparseness is 1 iff the vector contains a single
nonzero component and is equal to 0 iff all components of the vector are
the same
modified from Hoyer 2004: [sqrt(n)-L1/L2]/[sqrt(n)-1]
adapted from nimfa package: https://nimfa.biolab.si/
"""
from math import sqrt # faster than numpy sqrt
eps = np.finfo(x.dtype).eps if 'int' not in str(x.dtype) else 1e-9
n = x.size
# measure is meant for nmf: things get weird for negative values
if np.min(x) < 0:
x -= np.min(x)
# patch for array of zeros
if np.allclose(x, np.zeros(x.shape), atol=1e-6):
return 0.0
L1 = abs(x).sum()
L2 = sqrt(np.multiply(x, x).sum())
sparseness_num = sqrt(n) - (L1 + eps) / (L2 + eps)
sparseness_den = sqrt(n) - 1
return sparseness_num / sparseness_den
What this measures actually quantifies is sort of complicated, but roughly a sparse image is one with only a few pixels active, a non-sparse image has lots of pixels active. If we run PCA on the faces example from the OP, we can see the sparseness values is low around 0.04 for the eigenfaces:
Sparsifying using coordinate descent?
If we run NMF using the params used in the OP (using coordinate descent, with low W_alpha and l1_ratio, except with 200 components), the sparseness values are again low:
If you look at the histogram of sparseness values this is verified:
Different, but not super impressive, compared with PCA.
I next did a grid search through W_alpha and l1_ratio space, varying them between 0 and 1 (at 0.1 step increments). I found that sparsity was not maximized when they were 1. Surprisingly, contrary to theoretical expectations, I found that sparsity was only high when l1_ratio was 0 and it dropped of precipitously above 0. And within this slice of parameters, sparsity was maximized when alpha_W was 0.9:
Intuitively, this is a huge improvement. There is still a lot of variation in the distribution of sparseness values, but they are much higher:
However, maybe in order to replicate the Lee and Seung results, and better control sparseness, we should be using multiplicative update (which is what they used). Let's try that next.
Sparsifying using multiplicative update
For the next attempt, I used multiplicative update, and this behaved much more as expected, with sparse, parts-based representations emerging:
You can see the drastic difference, and this is reflected in the histogram of sparseness values:
Note the code to generate this is below.
One final interesting thing to note: the sparseness values with this method seem to increase with the component number. I plotted sparseness as a function of component, and this is (roughly) born out, and was born out consistently over all my runs of the algorithm:
I have not seen this discussed elsewhere, so thought I'd mention it.
Code to generate sparse representation of faces using the mu NMF algorithm:
from sklearn.datasets import fetch_olivetti_faces
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import NMF
faces, _ = fetch_olivetti_faces(return_X_y=True)
num_nmf_components = 200
alph_W = 0.9 # cd: .9, mu: .9
L1_ratio = 0.9 # cd: 0, L1_ratio: 0.9
try:
del estimator
except:
print("first run")
estimator = NMF(num_nmf_components,
init='nndsvdar', # nndsvd
solver='mu',
max_iter=50,
alpha_W=alph_W,
alpha_H=0, zeros
l1_ratio=L1_ratio,
shuffle=True)
H = estimator.fit_transform(faces)
W = estimator.components_
# plot the basis faces
n_row, n_col = 5, 7 # how many faces to plot
image_shape = (64, 64)
n_samples, n_features = faces.shape
plt.figure(figsize=(10,12))
for face_id, face in enumerate(W[:n_row*n_col]):
plt.subplot(n_row, n_col, face_id+1)
face_sparseness = sparseness_hoyer(face)
plt.imshow(face.reshape(image_shape), cmap='gray')
plt.title(f"{face_sparseness: 0.2f}")
plt.axis('off')
plt.suptitle('NMF', fontsize=16, y=1)
plt.tight_layout()
Related
I'm confused with sklearn's PCA(here is the documentation), and its relation with Singular Value Decomposition (SVD).
In Wikipedia we have,
The full principal components decomposition of X can, therefore, be given as T=WX,
where W is a p-by-p matrix of weights whose columns are the eigenvectors of $X^T X$. The transpose of W is sometimes called the whitening or sphering transformation.
Later once it explains the relationship with SVD, we have:
X=U $\Sigma W^T$
So I assume that matrix W, embeds samples into latent space (which makes sense noting the dimension of the matrices) and using transform module of the class PCA in sklearn should give the same result as if I was multiplying observation matrix by W. However, I checked them and they don't match.
Is there anything wrong that I'm missing or there's a bug in the code?
import numpy as np
from sklearn.decomposition import PCA
x = np.random.rand(200).reshape(20,10)
x = x-x.mean(axis=0)
u, s, vh = np.linalg.svd(x, full_matrices=False)
pca = PCA().fit(x)
# transformed version based on WIKI: t = X#vh.T = u#np.diag(s)
t_svd1= x#vh.T
t_svd2= u#np.diag(s)
# the pca transform
t_pca = pca.transform(x)
print(np.abs(t_svd1-t_pca).max()) # should be a small value, but it's not :(
print(np.abs(t_svd2-t_pca).max()) # should be a small value, but it's not :(
There is a difference between the theoretical Wikipedia description and the practical sklearn implementation, but it is not a bug, merely just a stability and reproducibility enhancement.
You have almost pretty much nailed the exact implementation of the PCA, however in order to be able to fully reproduce the computation, sklearn developers added one more enforcement to their implementation. The problem stems from the indeterministic nature of SVD, i.e. the SVD does not have a unique solution. That can be easily seen from your equation as well by setting U_s = -U and W_s = -W, then U_s and W_s also satisfy:
X=U_s $\Sigma W_s^T$
More importantly this holds also when switching the signs of columns of U and W. If we just reverse the signs of k-th column of U and W, the equality will still hold. You can read more about this issue f.e. here https://prod-ng.sandia.gov/techlib-noauth/access-control.cgi/2007/076422.pdf.
The implementation of PCA deals with this problem by enforcing the highest loading values in absolute values to be always positive, specifically the method sklearn.utils.extmath.svd_flip is being used. This way, no matter which sign the resulting vectors have from the indeterministic method np.linalg.svd, the loading values in absolutes will remain the same, i.e. the signs of the matrices will remain the same.
Thus in order for your code to have the same result as the PCA implementation:
import numpy as np
from sklearn.decomposition import PCA
np.random.seed(41)
x = np.random.rand(200).reshape(20,10)
x = x-x.mean(axis=0)
u, s, vh = np.linalg.svd(x, full_matrices=False)
max_abs_cols = np.argmax(np.abs(u), axis=0)
signs = np.sign(u[max_abs_cols, range(u.shape[1])])
u *= signs
vh *= signs.reshape(-1,1)
pca = PCA().fit(x)
# transformed version based on WIKI: t = X#vh.T = u#np.diag(s)
t_svd1= x#vh.T
t_svd2= u#np.diag(s)
# the pca transform
t_pca = pca.transform(x)
print(np.abs(t_svd1-t_pca).max()) # pretty small value :)
print(np.abs(t_svd2-t_pca).max()) # pretty small value :)
I have some 2D data (GPS data) with clusters (stop locations) that I know resemble Gaussians with a characteristic standard deviation (proportional to the inherent noise of GPS samples). The figure below visualizes a sample that I expect has two such clusters. The image is 25 meters wide and 13 meters tall.
The sklearn module has a function sklearn.mixture.GaussianMixture which allows you to fit a mixture of Gaussians to data. The function has a parameter, covariance_type, that enables you to assume different things about the shape of the Gaussians. You can, for example, assume them to be uniform using the 'tied' argument.
However, it does not appear directly possible to assume the covariance matrices to remain constant. From the sklearn source code it seems trivial to make a modification that enables this but it feels a bit excessive to make a pull request with an update that allows this (also I don't want to accidentally add bugs in sklearn). Is there a better way to fit a mixture to data where the covariance matrix of each Gaussian is fixed?
I want to assume that the SD should remain constant at around 3 meters for each component, since that is roughly the noise level of my GPS samples.
It is simple enough to write your own implementation of EM algorithm. It would also give you a good intuition of the process. I assume that covariance is known and that prior probabilities of components are equal, and fit only means.
The class would look like this (in Python 3):
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
class FixedCovMixture:
""" The model to estimate gaussian mixture with fixed covariance matrix. """
def __init__(self, n_components, cov, max_iter=100, random_state=None, tol=1e-10):
self.n_components = n_components
self.cov = cov
self.random_state = random_state
self.max_iter = max_iter
self.tol=tol
def fit(self, X):
# initialize the process:
np.random.seed(self.random_state)
n_obs, n_features = X.shape
self.mean_ = X[np.random.choice(n_obs, size=self.n_components)]
# make EM loop until convergence
i = 0
for i in range(self.max_iter):
new_centers = self.updated_centers(X)
if np.sum(np.abs(new_centers-self.mean_)) < self.tol:
break
else:
self.mean_ = new_centers
self.n_iter_ = i
def updated_centers(self, X):
""" A single iteration """
# E-step: estimate probability of each cluster given cluster centers
cluster_posterior = self.predict_proba(X)
# M-step: update cluster centers as weighted average of observations
weights = (cluster_posterior.T / cluster_posterior.sum(axis=1)).T
new_centers = np.dot(weights, X)
return new_centers
def predict_proba(self, X):
likelihood = np.stack([multivariate_normal.pdf(X, mean=center, cov=self.cov)
for center in self.mean_])
cluster_posterior = (likelihood / likelihood.sum(axis=0))
return cluster_posterior
def predict(self, X):
return np.argmax(self.predict_proba(X), axis=0)
On the data like yours, the model would converge quickly:
np.random.seed(1)
X = np.random.normal(size=(100,2), scale=3)
X[50:] += (10, 5)
model = FixedCovMixture(2, cov=[[3,0],[0,3]], random_state=1)
model.fit(X)
print(model.n_iter_, 'iterations')
print(model.mean_)
plt.scatter(X[:,0], X[:,1], s=10, c=model.predict(X))
plt.scatter(model.mean_[:,0], model.mean_[:,1], s=100, c='k')
plt.axis('equal')
plt.show();
and output
11 iterations
[[9.92301067 4.62282807]
[0.09413883 0.03527411]]
You can see that the estimated centers ((9.9, 4.6) and (0.09, 0.03)) are close to the true centers ((10, 5) and (0, 0)).
I think the best option would be to "roll your own" GMM model by defining a new scikit-learn class that inherits from GaussianMixture and overwrites the methods to get the behavior you want. This way you just have an implementation yourself and you don't have to change the scikit-learn code (and create a pull-request).
Another option that might work is to look at the Bayesian version of GMM in scikit-learn. You might be able to set the prior for the covariance matrix so that the covariance is fixed. It seems to use the Wishart distribution as a prior for the covariance. However I'm not familiar enough with this distribution to help you out more.
First, you can use spherical option, which will give you single variance value for each component. This way you can check yourself, and if the received values of variance are too different then something went wrong.
In a case you want to preset the variance, you problem degenerates to finding only best centers for your components. You can do it by using k-means, for example. If you don't know the number of the components, you may sweep over all logical values (like 1 to 20) and evaluate the decrement in fitting error. Or you can optimize your own EM function, to find the centers and the number of components simultaneously.
I have to create 13 white Gaussian noises which are completely decorrelated to each others.
I've been told that PCA can achieve it so I searched some information and tools which I can use in python.
I use PCA module from sklearn to perform PCA.The following is my code.
import numpy as np
from sklearn.decomposition import PCA
n = 13 # number of completely decorrelated noises
ms = 10000 #duration of noise in milli-seconds
fs = 44100 # sampling rate
x = np.random.randn(int(np.ceil(fs*ms/1000)),n)
# calculate the correlation between any two noise
for i in range(n):
for j in range(n):
omega = np.corrcoef(x[:,i],x[:,j])[0,1]
print omega
# perform PCA
pca = PCA(n_components=n)
pca.fit(x)
y = pca.transform(x)
for i in range(n):
for j in range(n):
omega_new = np.corrcoef(y[:,i],y[:,j])[0,1]
print omega_new
The correlation coefficients before PCA is around 0.0005~0.0014, and reduced to about 1e-16 after performing PCA.
I don't know about PCA very well, so I'm not sure whether I did it right.
In addition, after performing PCA transformation, are those new data sets still Gaussion white noises? I will normalize each noise so that their maximum amplitude is 0.999 before write them into wave files. Do I still get 13 Gaussian white noises with similar average power?
I might be doing a strawman, but here's an attack on a much reduced problem: if I average two gaussian noises, do I get a gausian noise?
If we isolate the new noise, it is undoubtedly gaussian. If we assume precise calculations (no floating point error), I believe there is no way the new noise could be distinguished from a freshly generated noise.
However, if we look at it in relation to one or both of the noises we averaged, it becomes obvious that it's their average.
I'm not sure about how exactly PCA works, but the transformation seems also to be linear in nature.
TBH, I don't know enough about PCA to comment on your situation, but I'm hoping that further edits would help extend this answer to fit your question.
I've been working on getting a hierarchical model of some psychophysical behavioral data up and running in pymc3. I'm incredibly impressed with things overall, but after trying to get up to speed with Theano and pymc3 I have a model that mostly works, however has a couple problems.
The code is built to fit a parameterized version of a Weibull to seven sets of data. Each trial is modeled as a binary Bernoulli outcome, while the thresholds (output of thact as the y values which are used to fit a Gaussian function for height, width, and elevation (a, c, and d on a typical Gaussian).
Using the parameterized Weibull seems to be working nicely, and is now hierarchical for the slope of the Weibull while the thresholds are fit separately for each chunk of data. However - the output I'm getting from k and y_est leads me to believe they may not be the correct size, and unlike the probability distributions, it doesn't look like I can specify shape (unless there's a theano way to do this that I haven't found - though from what I've read specifying shape in theano is tricky).
Ultimately, I'd like to use y_est to estimate the gaussian height or width, however the output right now results in an incredible mess that I think originates with size problems in y_est and k. Any help would be fantastic - the code below should simulate some data and is followed by the model. The model does a nice job fitting each individual threshold and getting the slopes, but falls apart when dealing with the rest.
Thanks for having a look - I'm super impressed with pymc3 so far!
EDIT: Okay, so the shape output by y_est.tag.test_value.shape looks like this
y_est.tag.test_value.shape
(101, 7)
k.tag.test_value.shape
(7,)
I think this is where I'm running into trouble, though it may just be poorly constructed on my part. k has the right shape (one k value per unique_xval). y_est is outputting an entire set of data (101x7) instead of a single estimate (one y_est per unique_xval) for each difficulty level. Is there some way to specify that y_est get specific subsets of df_y_vals to control this?
#Import necessary modules and define our weibull function
import numpy as np
import pylab as pl
from scipy.stats import bernoulli
#x stimulus intensity
#g chance (0.5 for 2AFC)
# m slope
# t threshold
# a performance level defining threshold
def weib(x,g,a,m,t):
k=-np.log(((1-a)/(1-g))**(1/t))
return 1- (1-g)*np.exp(- (k*x/t)**m);
#Output values from weibull function
xit=101
xvals=np.linspace(0.05,1,xit)
out_weib=weib(xvals, 0.5, 0.8, 3, 0.6)
#Okay, fitting the perfect output of a Weibull should be easy, contaminate with some noise
#Slope of 3, threshold of 0.6
#How about 5% noise!
noise=0.05*np.random.randn(np.size(out_weib))
out=out_weib+noise
#Let's make this more like a typical experiment -
#i.e. no percent correct, just one or zero
#Randomly pick based on the probability at each point whether they got the trial right or wrong
trial=np.zeros_like(out)
for i in np.arange(out.size):
p=out_weib[i]
trial[i] = bernoulli.rvs(p)
#Iterate for 6 sets of data, similar slope (from a normal dist), different thresh (output from gaussian)
#Gauss parameters=
true_gauss_height = 0.3
true_gauss_width = 0.01
true_gauss_elevation = 0.2
#What thresholds will we get then? 6 discrete points along that gaussian, from 0 to 180 degree mask
x_points=[0, 30, 60, 90, 120, 150, 180]
x_points=np.asarray(x_points)
gauss_points=true_gauss_height*np.exp(- ((x_points**2)/2*true_gauss_width**2))+true_gauss_elevation
import pymc as pm2
import pymc3 as pm
import pandas as pd
slopes=pm2.rnormal(3, 3, size=7)
out_weib=np.zeros([xvals.size,x_points.size])
for i in np.arange(x_points.size):
out_weib[:,i]=weib(xvals, 0.5, 0.8, slopes[i], gauss_points[i])
#Let's make this more like a typical experiment - i.e. no percent correct, just one or zero
#Randomly pick based on the probability at each point whether they got the trial right or wrong
trials=np.zeros_like(out_weib)
for i in np.arange(len(trials)):
for ii in np.arange(gauss_points.size):
p=out_weib[i,ii]
trials[i,ii] = bernoulli.rvs(p)
#Let's make that data into a DataFrame for pymc3
y_vals=np.tile(xvals, [7, 1])
df_correct = pd.DataFrame(trials, columns=x_points)
df_y_vals = pd.DataFrame(y_vals.T, columns=x_points)
unique_xvals=x_points
import theano as th
with pm.Model() as hierarchical_model:
# Hyperpriors for group node
mu_slope = pm.Normal('mu_slope', mu=3, sd=1)
sigma_slope = pm.Uniform('sigma_slope', lower=0.1, upper=2)
#Priors for the overall gaussian function - 3 params, the height of the gaussian
#Width, and elevation
gauss_width = pm.HalfNormal('gauss_width', sd=1)
gauss_elevation = pm.HalfNormal('gauss_elevation', sd=1)
slope = pm.Normal('slope', mu=mu_slope, sd=sigma_slope, shape=unique_xvals.size)
thresh=pm.Uniform('thresh', upper=1, lower=0.1, shape=unique_xvals.size)
k = -th.tensor.log(((1-0.8)/(1-0.5))**(1/thresh))
y_est=1-(1-0.5)*th.tensor.exp(-(k*df_y_vals/thresh)**slope)
#We want our model to predict either height or width...height would be easier.
#Our Gaussian function has y values estimated by y_est as the 82% thresholds
#and Xvals based on where each of those psychometrics were taken.
#height_est=pm.Deterministic('height_est', (y_est/(th.tensor.exp((-unique_xvals**2)/2*gauss_width)))+gauss_elevation)
height_est = pm.Deterministic('height_est', (y_est-gauss_elevation)*th.tensor.exp((unique_xvals**2)/2*gauss_width**2))
#Define likelihood as Bernoulli for each binary trial
likelihood = pm.Bernoulli('likelihood',p=y_est, shape=unique_xvals.size, observed=df_correct)
#Find start
start=pm.find_MAP()
step=pm.NUTS(state=start)
#Do MCMC
trace = pm.sample(5000, step, njobs=1, progressbar=True) # draw 5000 posterior samples using NUTS sampling
I'm not sure exactly what you want to do when you say "Is there some way to specify that y_est get specific subsets of df_y_vals to control this". Can you describe for each y_est value what values of df_y_vals are you supposed to use? What's the shape of df_y_vals? What's the shape of y_est supposed to be? (7,)?
I suspect what you want is to index into df_y_vals using numpy advanced indexing, which works the same in PyMC as in numpy. Its hard to say exactly without more information.
I need to perform a convolution using a Gaussian, however the width of the Gaussian needs to change. I'm not doing traditional signal processing but instead I need to take my perfect Probability Density Function (PDF) and ``smear" it, based on the resolution of my equipment.
For instance, suppose my PDF starts out as a spike/delta-function. I'll model this as a very narrow Gaussian. After being run through my equipment, it will be smeared out according to some Gaussian resolution. I can calculate this using the scipy.signal convolution functions.
import numpy as np
import matplotlib.pylab as plt
import scipy.signal as signal
import scipy.stats as stats
# Create the initial function. I model a spike
# as an arbitrarily narrow Gaussian
mu = 1.0 # Centroid
sig=0.001 # Width
original_pdf = stats.norm(mu,sig)
x = np.linspace(0.0,2.0,1000)
y = original_pdf.pdf(x)
plt.plot(x,y,label='original')
# Create the ``smearing" function to convolve with the
# original function.
# I use a Gaussian, centered at 0.0 (no bias) and
# width of 0.5
mu_conv = 0.0 # Centroid
sigma_conv = 0.5 # Width
convolving_term = stats.norm(mu_conv,sigma_conv)
xconv = np.linspace(-5,5,1000)
yconv = convolving_term.pdf(xconv)
convolved_pdf = signal.convolve(y/y.sum(),yconv,mode='same')
plt.plot(x,convolved_pdf,label='convolved')
plt.ylim(0,1.2*max(convolved_pdf))
plt.legend()
plt.show()
This all works no problem. But now suppose my original PDF is not a spike, but some broader function. For example, a Gaussian with sigma=1.0. And now suppose my resolution actually varys over x: at x=0.5, the smearing function is a Gaussian with sigma_conv=0.5, but at x=1.5, the smearing function is a Gaussian with sigma_conv=1.5. And suppose I know the functional form of the x-dependence of my smearing Gaussian. Naively, I thought I would change the line above to
convolving_term = stats.norm(mu_conv,lambda x: 0.2*x + 0.1)
But that doesn't work, because the norm function expects a value for the width, not a function. In some sense, I need my convolving function to be a 2D array, where I have a different smearing Gaussian for each point in my original PDF, which remains a 1D array.
So is there a way to do this with functions already defined in Python? I have some code to do this that I wrote myself....but I want to make sure I've not just re-invented the wheel.
Thanks in advance!
Matt
Question, in brief:
How to convolve with a non-stationary kernel, for example, a Gaussian that changes width for different locations in the data, and does a Python an existing tool for this?
Answer, sort-of:
It's difficult to prove a negative, but I do not think that a function to perform a convolution with a non-stationary kernel exists in scipy or numpy. Anyway, as you describe it, it can't really be vectorized well, so you may as well do a loop or write some custom C code.
One trick that might work for you is, instead of changing the kernel size with position, stretch the data with the inverse scale (ie, at places where you'd want to the Gaussian with to be 0.5 the base width, stretch the data to 2x). This way, you can do a single warping operation on the data, a standard convolution with a fixed width Gaussian, and then unwarp the data to original scale.
The advantages of this approach are that it's very easy to write, and is completely vectorized, and therefore probably fairly fast to run.
Warping the data (using, say, an interpolation method) will cause some loss of accuracy, but if you choose things so that the data is always expanded and not reduced in your initial warping operation, the losses should be minimal.