Pytorch/Tensorflow: Compute gradient of Mixture of Gaussians log density

Pytorch/Tensorflow: Compute gradient of Mixture of Gaussians log density - python

I have a mixture of three Gaussians and would like to compute the gradient of the log-density using Pytorch or Tensorflow. How can I do that?
from numpy import eye, log
from scipy.stats import multivariate_normal as MVN
μs = [[0, 0], [2, 0], [0, 2]] # Means
Σs = [eye(2), eye(2), eye(2)] # Covariance Matrices
cs = [1 / 3] * 3 # Mixture coefficients
MVNs = [MVN(μ, Σ) for (μ, Σ) in zip(μs, Σs)] # List of Gaussians
log_density = lambda x: log((sum([c * MVN.pdf(x) for (c, MVN) in zip(cs, MVNs)])))
Essentially I would like to compute the gradient of log_density. I tried using autograd.grad but it fails because of the array assignment.
Attempted Pytorch Solution
from torch import tensor, eye, sqrt, zeros, log, exp
from torch.distributions import MultivariateNormal as MVN
μs = [tensor([0, 0]), tensor([2, 0]), tensor([0, 2])] # Means
Σs = [eye(2), eye(2), eye(2)] # Covariance Matrices
cs = [1 / 3] * 3 # Mixture coefficients
MVNs = [MVN(μ, Σ) for (μ, Σ) in zip(μs, Σs)] # List of Gaussians
log_density = lambda x: log((sum([c * exp(MVN.log_prob(x)) for (c, MVN) in zip(cs, MVNs)])))
Attempted Autograd Solution (won't work)
from numpy import eye, log, zeros
from scipy.stats import multivariate_normal as MVN
from autograd import grad
μs = [[0, 0], [2, 0], [0, 2]] # Means
Σs = [eye(2), eye(2), eye(2)] # Covariance Matrices
cs = [1 / 3] * 3 # Mixture coefficients
MVNs = [MVN(μ, Σ) for (μ, Σ) in zip(μs, Σs)] # List of Gaussians
log_density = lambda x: log((sum([c * MVN.pdf(x) for (c, MVN) in zip(cs, MVNs)])))
gradient = grad(log_density)
# If you try using this gradient function you get an error
gradient(zeros(2))
The error I get is
ValueError: setting an array element with a sequence.
Naive Autograd Solution
There is, of course, a bad Autograd solution that won't scale well. For instance
from autograd.numpy import log, eye, zeros, array
from autograd.scipy.stats import multivariate_normal as MVN
from autograd import grad
μs = [[0, 0], [2, 0], [0, 2]] # Means
Σs = [eye(2), eye(2), eye(2)] # Covariance Matrices
cs = [1 / 3] * 3 # Mixture coefficients
def log_density(x):
return log((1/3) * MVN.pdf(x, zeros(2), eye(2)) + (1/3) * MVN.pdf(x, array([2, 0]), eye(2)) + (1/3) * MVN.pdf(x, array([0, 2]), eye(2)))
grad(log_density)(zeros(2)) # Works!

You can do
from torch import tensor, eye, sqrt, zeros, log, exp
from torch.distributions import MultivariateNormal as MVN
μs = [tensor([0, 0]), tensor([2, 0]), tensor([0, 2])] # Means
Σs = [eye(2), eye(2), eye(2)] # Covariance Matrices
cs = [1 / 3] * 3 # Mixture coefficients
MVNs = [MVN(μ, Σ) for (μ, Σ) in zip(μs, Σs)] # List of Gaussians
x = tensor((0.0,0.0), requires_grad=True)
log_density = log((sum([c * exp(MVN.log_prob(x)) for (c, MVN) in zip(cs, MVNs)])))
log_density.backward()
print(x.grad)
which will print the gradient at (0.0,0.0). However as pytorch is not generating a static computation graph, I could not find an easy way to calculate the gradient at another point without rebuilding the computation graph. You could try to use tensorflow, which gives you more control on the computation graphs and allows you to construct a graph for the gradient computation.
Edit With tensorflow you could do something like
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
import tensorflow_probability as tfp
#tf.function
def mygrad(x):
print("building graph")
us = tf.stack([tf.constant([0.0, 0.0]), tf.constant([2., 0.]), tf.constant([0., 2.])])
covs = tf.stack([tf.eye(2), tf.eye(2), tf.eye(2)])
cs = tf.constant([1 / 3] * 3)
with tf.GradientTape() as gt:
gt.watch(x)
log_density = tf.math.log(tf.math.reduce_sum(tfp.distributions.MultivariateNormalTriL(us,covs).prob(x) * cs) )
return gt.gradient(log_density,x)
print(mygrad(tf.constant([0.0,0.0])).numpy()) #gradient at 0.0,0.0
print(mygrad(tf.constant([1.0,0.0])).numpy()) #gradient at 1.0,0.0
Essentially you do automatic differentiation with the tf.GradientTape and capture the computation graph in a tf.function. There is more background information on the very extensive Tensorflow API documentation.

Related

Expectation Maximization Algorithm (EM) for Gaussian Mixture Models (GMMs)

I'm trying to apply the Expectation Maximization Algorithm (EM) to a Gaussian Mixture Model (GMM) using Python and NumPy. The PDF document I am basing my implementation on can be found here.
Below are the equations:
When applying the algorithm I get the mean of the first and second cluster equal to:
array([[2.50832195],
[2.51546208]])
When the actual vector means for the first and second cluster are, respectively:
array([[0],
[0]])
and:
array([[5],
[5]])
The same thing happens when getting the values of the covariance matrices I get:
array([[7.05168736, 6.17098629],
[6.17098629, 7.23009494]])
When it should be:
array([[1, 0],
[0, 1]])
for both clusters.
Here is the code:
np.random.seed(1)
# first cluster
X_11 = np.random.normal(0, 1, 1000)
X_21 = np.random.normal(0, 1, 1000)
# second cluster
X_12 = np.random.normal(5, 1, 1000)
X_22 = np.random.normal(5, 1, 1000)
X_1 = np.concatenate((X_11,X_12), axis=None)
X_2 = np.concatenate((X_21,X_22), axis=None)
# data matrix of k x n dimensions (2 x 2000 dimensions)
X = np.concatenate((np.array([X_1]),np.array([X_2])), axis=0)
# multivariate normal distribution function gives n x 1 vector (2000 x 1 vector)
def normal_distribution(x, mu, sigma):
mvnd = []
for i in range(np.shape(x)[1]):
gd = (2*np.pi)**(-2/2) * np.linalg.det(sigma)**(-1/2) * np.exp((-1/2) * np.dot(np.dot((x[:,i:i+1]-mu).T, np.linalg.inv(sigma)), (x[:,i:i+1]-mu)))
mvnd.append(gd)
return np.reshape(np.array(mvnd), (np.shape(x)[1], 1))
# Initialized parameters
sigma_1 = np.array([[10, 0],
[0, 10]])
sigma_2 = np.array([[10, 0],
[0, 10]])
mu_1 = np.array([[10],
[10]])
mu_2 = np.array([[10],
[10]])
pi_1 = 0.5
pi_2 = 0.5
Sigma_1 = np.empty([2000, 2, 2])
Sigma_2 = np.empty([2000, 2, 2])
for i in range(10):
# E-step:
w_i1 = (pi_1*normal_distribution(X, mu_1, sigma_1))/(pi_1*normal_distribution(X, mu_1, sigma_1) + pi_2*normal_distribution(X, mu_2, sigma_2))
w_i2 = (pi_2*normal_distribution(X, mu_2, sigma_2))/(pi_1*normal_distribution(X, mu_1, sigma_1) + pi_2*normal_distribution(X, mu_2, sigma_2))
# M-step:
pi_1 = np.sum(w_i1)/2000
pi_2 = np.sum(w_i2)/2000
mu_1 = np.array([(1/(np.sum(w_i1)))*np.sum(w_i1.T*X, axis=1)]).T
mu_2 = np.array([(1/(np.sum(w_i2)))*np.sum(w_i2.T*X, axis=1)]).T
for i in range(2000):
Sigma_1[i:i+1, :, :] = w_i1[i:i+1,:]*np.dot((X[:,i:i+1]-mu_1), (X[:,i:i+1]-mu_1).T)
Sigma_2[i:i+1, :, :] = w_i2[i:i+1,:]*np.dot((X[:,i:i+1]-mu_2), (X[:,i:i+1]-mu_2).T)
sigma_1 = (1/(np.sum(w_i1)))*np.sum(Sigma_1, axis=0)
sigma_2 = (1/(np.sum(w_i2)))*np.sum(Sigma_2, axis=0)
Would really appreciate if someone could point out the mistake in my code or in my misunderstanding of the algorithm..

Prove Convolution is Equivariant with respect to translation

I was reading the following statement about how convolution is equivariant with respect to translation from the Deep Learning Book.
Let g be a function mapping one image function to another image
function, such that I'=g(I) is the image function with I'(x, y)
=I(x−1, y). This shifts every pixel ofIone unit to the right. If we apply this transformation to I, then apply convolution, the result will
be the same as if we applied convolution to I', then applied the
transformation g to the output.
For the last line I bolded, they are applying convolution to I', but shouldn't this be I? I' is the translated image. Otherwise it would effectively be saying:
f(g(I)) = g( f(g(I)) )
where f is convolution & g is translation.
I am trying to execute the same myself in python using 3D kernel equal to the depth of the image as would be the case in the convolution layer for a colored image, a house.
Here is my code for applying a translation & then convolution to an image.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import scipy
import scipy.ndimage
I = scipy.ndimage.imread('pics/house.jpg')
def convolution(A, B):
return np.sum( np.multiply(A, B) )
k = np.array([[[0,1,-1],[1,-1,0],[0,0,0]], [[-1,0,-1],[1,-1,0],[1,0,0]], [[1,-1,0],[1,0,1],[-1,0,1]]]) #kernel
## Translation
translated = 100
new_I = np.zeros( (I.shape[0]-translated, I.shape[1], I.shape[2]) )
for i in range(translated, I.shape[0]):
for j in range(I.shape[1]):
for l in range(I.shape[2]):
new_I[i-translated,j,l] = I[i,j,l]
## Convolution
conv = np.zeros( (int((new_I.shape[0]-3)/2), int((new_I.shape[1]-3)/2) ) )
for i in range( conv.shape[0] ):
for j in range(conv.shape[1]):
conv[i, j] = convolution(new_I[2*i:2*i+3, 2*j:2*j+3, :], k)
scipy.misc.imsave('pics/convoled_image_2nd.png', conv)
I get the following output:
Now, I switch the convolution and Translation steps:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import scipy
import scipy.ndimage
I = scipy.ndimage.imread('pics/house.jpg')
def convolution(A, B):
return np.sum( np.multiply(A, B) )
k = np.array([[[0,1,-1],[1,-1,0],[0,0,0]], [[-1,0,-1],[1,-1,0],[1,0,0]], [[1,-1,0],[1,0,1],[-1,0,1]]]) #kernel
## Convolution
conv = np.zeros( (int((I.shape[0]-3)/2), int((I.shape[1]-3)/2) ) )
for i in range( conv.shape[0] ):
for j in range(conv.shape[1]):
conv[i, j] = convolution(I[2*i:2*i+3, 2*j:2*j+3, :], k)
## Translation
translated = 100
new_I = np.zeros( (conv.shape[0]-translated, conv.shape[1]) )
for i in range(translated, conv.shape[0]):
for j in range(conv.shape[1]):
new_I[i-translated,j] = conv[i,j]
scipy.misc.imsave('pics/conv_trans_image.png', new_I)
And now I get the following output:
Shouldn't they be the same according the book? What am I doing wrong?

Just as the book says, the linearity properties of convolution and translation guarantee that their order is interchangable, excepting boundary effects.
For instance:
import numpy as np
from scipy import misc, ndimage, signal
def translate(img, dx):
img_t = np.zeros_like(img)
if dx == 0: img_t[:, :] = img[:, :]
elif dx > 0: img_t[:, dx:] = img[:, :-dx]
else: img_t[:, :dx] = img[:, -dx:]
return img_t
def convolution(img, k):
return np.sum([signal.convolve2d(img[:, :, c], k[:, :, c])
for c in range(img.shape[2])], axis=0)
img = ndimage.imread('house.jpg')
k = np.array([
[[ 0, 1, -1], [1, -1, 0], [ 0, 0, 0]],
[[-1, 0, -1], [1, -1, 0], [ 1, 0, 0]],
[[ 1, -1, 0], [1, 0, 1], [-1, 0, 1]]])
ct = translate(convolution(img, k), 100)
tc = convolution(translate(img, 100), k)
misc.imsave('conv_then_trans.png', ct)
misc.imsave('trans_then_conv.png', tc)
if np.all(ct[2:-2, 2:-2] == tc[2:-2, 2:-2]):
print('Equal!')
Prints:
Equal!
The problem is that you're overtranslating in the second example. After you shrink the image 2x, try translating by 50 instead.

Matrix QR factorization algorithms

I've been trying to visualize QR decomposition in a step by step fashion, but I'm not getting expected results. I'm new to numpy so it'd be nice if any expert eye could spot what I might be missing:
import numpy as np
from scipy import linalg
A = np.array([[12, -51, 4],
[6, 167, -68],
[-4, 24, -41]])
#Givens
v = np.array([12, 6])
vnorm = np.linalg.norm(v)
W_12 = np.array([[v[0]/vnorm, v[1]/vnorm, 0],
[-v[1]/vnorm, v[0]/vnorm, 0],
[0, 0, 1]])
W_12 * A #this should return a matrix such that [1,0] = 0
#gram-schmidt
A[:,0]
v = np.linalg.norm(A[:,0]) * np.array([1, 0, 0])
u = (A[:,0] - v)
u = u / np.linalg.norm(u)
W1 = np.eye(3) - 2 * np.outer(u, u.transpose())
W1 * A #this matrix's first column should look like [a, 0, 0]
any help clarifying the fact that this intermediate results don't show the properties that they are supposed to will be greatly received

NumPy is designed to work with homogeneous multi-dimensional arrays, it is not specifically a linear algebra package. So by design, the * operator is element-wise multiplication, not the matrix product.
If you want to get the matrix product, there are a few ways:
You can create np.matrix objects, rather than np.ndarray objects, for which the * operator is the matrix product.
You can also use the # operator, as in W_12 # A, which is the matrix product.
Or you can use np.dot(W_12, A) or W_12.dot(A), which computes the dot product.
Any one of these, using the data you give, returns the following for Givens rotation:
>>> np.dot(W_12 A)[1, 0]
-2.2204460492503131e-16
And this for the Gram-Schmidt step:
>>> (W1.dot(A))[:, 0]
array([ 1.40000000e+01, -4.44089210e-16, 4.44089210e-16])

How can I smooth elements of a two-dimensional array with differing gaussian functions in python?

How could I smooth the x[1,3] and x[3,2] elements of the array,
x = np.array([[0,0,0,0,0],[0,0,0,1,0],[0,0,0,0,0],[0,0,1,0,0],[0,0,0,0,0]])
with two two-dimensional gaussian functions of width 1 and 2, respectively? In essence I need a function that allows me to smooth single "point like" array elements with gaussians of differing widths, such that I get an array with smoothly varying values.

I am a little confused with the question you asked and the comments you have posted. It seems to me that you want to use scipy.ndimage.filters.gaussian_filter but I don't understand what you mean by:
[...] gaussian functions with different sigma values to each pixel. [...]
In fact, since you use a 2-dimensional array x the gaussian filter will have 2 parameters. The rule is: one sigma value per dimension rather than one sigma value per pixel.
Here is a short example:
import matplotlib.pyplot as pl
import numpy as np
import scipy as sp
import scipy.ndimage
n = 200 # widht/height of the array
m = 1000 # number of points
sigma_y = 3.0
sigma_x = 2.0
# Create input array
x = np.zeros((n, n))
i = np.random.choice(range(0, n * n), size=m)
x[i / n, i % n] = 1.0
# Plot input array
pl.imshow(x, cmap='Blues', interpolation='nearest')
pl.xlabel("$x$")
pl.ylabel("$y$")
pl.savefig("array.png")
# Apply gaussian filter
sigma = [sigma_y, sigma_x]
y = sp.ndimage.filters.gaussian_filter(x, sigma, mode='constant')
# Display filtered array
pl.imshow(y, cmap='Blues', interpolation='nearest')
pl.xlabel("$x$")
pl.ylabel("$y$")
pl.title("$\sigma_x = " + str(sigma_x) + "\quad \sigma_y = " + str(sigma_y) + "$")
pl.savefig("smooth_array_" + str(sigma_x) + "_" + str(sigma_y) + ".png")
Here is the initial array:
Here are some results for different values of sigma_x and sigma_y:
This allows to properly account for the influence of the second parameter of scipy.ndimage.filters.gaussian_filter.
However, according to the previous quote, you might be more interested in the assigement of different weights to each pixel. In this case, scipy.ndimage.filters.convolve is the function you are looking for. Here is the corresponding example:
import matplotlib.pyplot as pl
import numpy as np
import scipy as sp
import scipy.ndimage
# Arbitrary weights
weights = np.array([[0, 0, 1, 0, 0],
[0, 2, 4, 2, 0],
[1, 4, 8, 4, 1],
[0, 2, 4, 2, 0],
[0, 0, 1, 0, 0]],
dtype=np.float)
weights = weights / np.sum(weights[:])
y = sp.ndimage.filters.convolve(x, weights, mode='constant')
# Display filtered array
pl.imshow(y, cmap='Blues', interpolation='nearest')
pl.xlabel("$x$")
pl.ylabel("$y$")
pl.savefig("smooth_array.png")
And the corresponding result:
I hope this will help you.

Return Python numdifftools.Jacobian matrix

I am trying to write Python code that will return a Jacobian matrix. After installing numdifftools and running the in-built function numdifftools.Jacobian() I get this:
numdifftools.core.Jacobian object at 0x1032fe2d0
All examples I find online return this result for me. Is there a command I'm missing or am I miss-interpreting how this function works??
# To approximate a solution for a series of
# overdeterministic, non-linear equations.
# Find the point of intersection
import scipy
import numpy as np
import numdifftools as nd
# The matrix A, equations stored in rows in the form:
# [x^2, y^2, x, y]
def MatrixA():
return np.matrix([ [1, 1, 0, 0],
[1, 1, 4, -2],
[0, 0, 4, -2],
[4, 0, 22, -9],
[5, 0, 0, 1]
])
# The matrix B, the answers of the equations in Matrix A
def MatrixB():
return np.matrix([ [16],
[6],
[-13],
[31.5204],
[1.288]
])
#Using the Moore-Penrose method to solve
#an overdetermined set of equations
def MoorePenrose(A):
Ans = A.getT() * A
Ans = Ans.getI()
Ans = Ans * A.getT()
return Ans
# Linearise the equations by using the Newton method
# This will generate the best possible linear version
# of the nonlinear system.
def Linearise(A):
return nd.Jacobian(A)
#=============================================
# Program Main() {
#=============================================
#Read in A matrix of equations
A = MatrixA()
#Read in B matrix of solutions (RHS of A Matrices equations)
B = MatrixB()
#Solution =>
#Linearise Matrix A
A = Linearise(A)
print A
#A = Moorse Penrose psuedoinverse of A
A = MoorePenrose(A)
#Unknowns Matrix X = A * B
A = A * B
# Print out the unknowns Matrix.
print A
#=============================================
# } Main End;
#=============================================

Investigating the same question for multiple output function of multiple variables, I made this simple example demonstrating the use of numdifftools Jacobian function.
Please note the use of numpy array to define function multiple outputs and not lists.
import numpy as np
import numdifftools as nd
# Define your function
# Can be R^n -> R^n as long as you use numpy arrays as output
def f(x):
return np.array([x[0],x[1]])
# Define your Jacobian function
f_jacob = nd.Jacobian(f)
# Use your Jacobian function at any point you want.
# In our case, for a R² -> R² function, it returns a 2x2 matrix with all partial derivatives of all outputs wrt all inputs
print(f_jacob([1,2]))
Execution returns
[[1. 0.]
[0. 1.]]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pytorch/Tensorflow: Compute gradient of Mixture of Gaussians log density - python

Related

Expectation Maximization Algorithm (EM) for Gaussian Mixture Models (GMMs)

Prove Convolution is Equivariant with respect to translation

Matrix QR factorization algorithms

How can I smooth elements of a two-dimensional array with differing gaussian functions in python?

Return Python numdifftools.Jacobian matrix

Categories

Resources