I'm trying to implement non-negative matrix factorization using the Kullback-Liebler divergence as a similarity measure. The algorithm is described in: http://hebb.mit.edu/people/seung/papers/nmfconverge.pdf. Below is my python / numpy implementation, with an example matrix to run it on.
In a nutshell, the algorithm is supposed to learn matrices W(n by r) and H(r by m) such that V(n by m) is approximately WH. You start with random values in W and H, and by following the update rules described in the Seung and Lee paper, you're supposed to get closer and closer to good approximations for W and H.
The algorithm is proven to monotonically reduce the divergence measure, but that's not what happens in my implementation. Instead, it settles into an alternation between two divergence values. If you look at W and H, you can see that the resulting factorization is not particularly good.
I've wondered whether to use the updated or old H when calculating the update for W. I tried it both ways, and it doesn't change the behavior of the implementation.
I've checked my implementation against the paper a bunch of times, and I don't see what I'm doing wrong. Can anyone shed some light on the issue?
import numpy as np
def update(V, W, H, r, n, m):
n,m = V.shape
WH = W.dot(H)
# equation (5)
H_coeff = np.zeros(H.shape)
for a in range(r):
for mu in range(m):
for i in range(n):
H_coeff[a, mu] += W[i, a] * V[i, mu] / WH[i, mu]
H_coeff[a, mu] /= sum(W)[a]
H = H * H_coeff
W_coeff = np.zeros(W.shape)
for i in range(n):
for a in range(r):
for mu in range(m):
W_coeff[i, a] += H[a, mu] * V[i, mu] / WH[i, mu]
W_coeff[i, a] /= sum(H.T)[a]
W = W * W_coeff
return W, H
def factor(V, r, iterations=100):
n, m = V.shape
avg_V = sum(sum(V))/n/m
W = np.random.random(n*r).reshape(n,r)*avg_V
H = np.random.random(r*m).reshape(r,m)*avg_V
for i in range(iterations):
WH = W.dot(H)
divergence = sum(sum(V * np.log(V/WH) - V + WH)) # equation (3)
print "At iteration " + str(i) + ", the Kullback-Liebler divergence is", divergence
W,H = update(V, W, H, r, n, m)
return W, H
V = np.arange(0.01,1.01,0.01).reshape(10,10)
W, H = factor(V, 6)
How to eliminate the alternation effect:
The very last line of the Proof of Theorem 2 reads,
By reversing the roles of H and W, the update rule for W can similarly
be shown to be nonincreasing.
Thus we can surmise that updating H can be done independently of updating W. That means after updating H:
H = H * H_coeff
we should also update the intermediate value WH before updating W:
WH = W.dot(H)
W = W * W_coeff
Both updates decrease the divergence.
Try it: Just stick WH = W.dot(H) before the computation for W_coeff, and the alternation effect goes away.
Simplifying the code:
When dealing with NumPy arrays, use their mean and sum methods, and avoid using the Python sum function:
avg_V = sum(sum(V))/n/m
can be written as
avg_V = V.mean()
and
divergence = sum(sum(V * np.log(V/WH) - V + WH)) # equation (3)
can be written as
divergence = ((V * np.log(V_over_WH)) - V + WH).sum()
Avoid the Python builtin sum function because
it is slower than the NumPy sum method, and
it is not as versatile as the NumPy sum method. (It
does not allow you to specify the axis on which to sum. We managed to eliminate two calls to Python's sum with one call to NumPy's sum or mean method.)
Eliminate the triple for-loop:
But a bigger improvement in both speed and readability can be had by replacing
H_coeff = np.zeros(H.shape)
for a in range(r):
for mu in range(m):
for i in range(n):
H_coeff[a, mu] += W[i, a] * V[i, mu] / WH[i, mu]
H_coeff[a, mu] /= sum(W)[a]
H = H * H_coeff
with
V_over_WH = V/WH
H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
Explanation:
If you look at the equation 5 update rule for H, first notice that indices for V and (W H) are identical. So you can replace V / (W H) with
V_over_WH = V/WH
Next, note that in the numerator we are summing over the index i, which is the first index in both W and V_over_WH. We can express that as matrix multiplication:
np.dot(V_over_WH.T, W).T
And the denominator is simply:
W.sum(axis=0).T
If we divide the numerator and denominator
(np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
we get a matrix indexed by the two remaining indices, alpha and mu, in that order. That is the same as the indices for H. So we want to multiply H by this ratio element-wise. Perfect. NumPy multiplies arrays element-wise by default.
Thus, we can express the entire update rule for H as
H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
So, putting it all together:
import numpy as np
np.random.seed(1)
def update(V, W, H, WH, V_over_WH):
# equation (5)
H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
WH = W.dot(H)
V_over_WH = V / WH
W *= np.dot(V_over_WH, H.T) / H.sum(axis=1)
WH = W.dot(H)
V_over_WH = V / WH
return W, H, WH, V_over_WH
def factor(V, r, iterations=100):
n, m = V.shape
avg_V = V.mean()
W = np.random.random(n * r).reshape(n, r) * avg_V
H = np.random.random(r * m).reshape(r, m) * avg_V
WH = W.dot(H)
V_over_WH = V / WH
for i in range(iterations):
W, H, WH, V_over_WH = update(V, W, H, WH, V_over_WH)
# equation (3)
divergence = ((V * np.log(V_over_WH)) - V + WH).sum()
print("At iteration {i}, the Kullback-Liebler divergence is {d}".format(
i=i, d=divergence))
return W, H
V = np.arange(0.01, 1.01, 0.01).reshape(10, 10)
# V = np.arange(1,101).reshape(10,10).astype('float')
W, H = factor(V, 6)
Related
I have a few lines of code which doesn't converge. If anyone has an idea why, I would greatly appreciate. The original equation is written in def f(x,y,b,m) and I need to find parameters b,m.
np.random.seed(42)
x = np.random.normal(0, 5, 100)
y = 50 + 2 * x + np.random.normal(0, 2, len(x))
def f(x, y, b, m):
return (1/len(x))*np.sum((y - (b + m*x))**2) # it is supposed to be a sum operator
def dfb(x, y, b, m): # partial derivative with respect to b
return b - m*np.mean(x)+np.mean(y)
def dfm(x, y, b, m): # partial derivative with respect to m
return np.sum(x*y - b*x - m*x**2)
b0 = np.mean(y)
m0 = 0
alpha = 0.0001
beta = 0.0001
epsilon = 0.01
while True:
b = b0 - alpha * dfb(x, y, b0, m0)
m = m0 - alpha * dfm(x, y, b0, m0)
if np.sum(np.abs(m-m0)) <= epsilon and np.sum(np.abs(b-b0)) <= epsilon:
break
else:
m0 = m
b0 = b
print(m, f(x, y, b, m))
Both derivatives got some signs mixed up:
def dfb(x, y, b, m): # partial derivative with respect to b
# return b - m*np.mean(x)+np.mean(y)
# ^-------------^------ these are incorrect
return b + m*np.mean(x) - np.mean(y)
def dfm(x, y, b, m): # partial derivative with respect to m
# v------ this should be negative
return -np.sum(x*y - b*x - m*x**2)
In fact, these derivatives are still missing some constants:
dfb should be multiplied by 2
dfm should be multiplied by 2/len(x)
I imagine that's not too bad because the gradient is scaled by alpha anyway, but it could make the speed of convergence worse.
If you do use the correct derivatives, your code will converge after one iteration:
def dfb(x, y, b, m): # partial derivative with respect to b
return 2 * (b + m * np.mean(x) - np.mean(y))
def dfm(x, y, b, m): # partial derivative with respect to m
# Used `mean` here since (2/len(x)) * np.sum(...)
# is the same as 2 * np.mean(...)
return -2 * np.mean(x * y - b * x - m * x**2)
I am trying to implement ifft2 by computing the individual dot product instead of matrix mutliplication ( I understand it is computationlly extremely intensive). The image constructed from the individual dot product implementation is upside-down compare to the matrix multiplication-wise ifft2. In the code below rA is the data.
Code with matrix multiplication:
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * np.pi * 1j / N )
W = np.power( omega, i * j )/N
return W
def forPyhton(v1,v2):
weight=np.dot(v2,v1)
return weight
rA=slice_kspace[5,:,:]
slice7=np.fft.ifft2(rA)
slice7=np.fft.fftshift(slice7)
slices7Abs=np.abs(slice7)+1e-9
dftMtxM=np.conj(DFT_matrix(len(rA)))
dftMtxN=np.conj(DFT_matrix(len(rA[1])))
mA = dftMtxM # rA # dftMtxN
Individual dot producewise implementation:
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * np.pi * 1j / N )
W = np.power( omega, i * j )/N
return W
def forPyhton(v1,v2):
weight=np.dot(v2,v1)
return weight
rA=slice_kspace[5,:,:]
slice7=np.fft.ifft2(rA)
slice7=np.fft.fftshift(slice7)
slices7Abs=np.abs(slice7)+1e-9
dftMtxM=np.conj(DFT_matrix(len(rA)))
dftMtxN=np.conj(DFT_matrix(len(rA[1])))
#mA = dftMtxM # rA # dftMtxN
result=[]
for i in range(0,rA.shape[0]):
row1=[]
for j in range(0,dftMtxN.shape[1]):
scaleWeight=forPyhton(rA[i,:],dftMtxN[:,j])
row1.append(scaleWeight)
result.append(row1)
result=np.asarray(result)
mA=[]
for i in range(0,dftMtxM.shape[0]):
row2=[]
for j in range(0,result.shape[1]):
scaleWeight=forPyhton(dftMtxM[i,:],np.array(result[:,j]))
row2.append(scaleWeight)
mA.append(row2)
mm=np.amax(np.abs(mA))
mA=np.fft.fftshift(mA)
mAabs=np.abs(mA)+1e-9
The plotting of slices7Abs mA is done by
plt.figure(3)
plt.imshow(slices7Abs,cmap='gray',origin='lower')
plt.figure(4)
plt.imshow(mAabs,cmap='gray',origin='lower')
plt.show()
The figure 3 and figure 4 is same in case of method 1 i.e. for matrixwise multiplication but for the second case, with individual dot productwise implementation the image i.e figure 4 is upside-down. Any idea why the image is upside down?
In NMF we have to iteratively multiply matrix W and H such that it begins to approximate V.
# specify the rank
r = 4
print('r:', r)
# Generate some synthetic data to create V0 from Wtrue and Htrue.
# Our challenge is then to find V ~ V0 by iterative learning,
# starting from random values of w and H.
Wtrue = np.array([[1,2],
[3,4],
[5,6],
[7,8]])
Htrue = np.array([[9,11,13],
[10,12,14]])
# Wtrue = np.random.rand(40,5) # nxr, i: 1 -> n, a: 1 -> r
# Htrue = np.random.rand(5,10) # rxm, i: 1-> r, mu: 1 -> m
V0 = Wtrue # Htrue # converge: v ~ wH
print('V0:\n', V0)
# Dimensions of data (n rows and m cols)
n, m = V0.shape
print('n(row), m(col):', V0.shape)
# Normalise columnwise
V = np.zeros(shape=(n,m))
for i in range(m):
V[:,i] = V0[:,i] / np.max(V0[:,i])
print('V:\n', V)
# Initialise W as n rows and r cols
W = np.random.rand(n,r)
# Initialise H as r rows and m cols
H = np.random.rand(r,m)
print('wH:\n', W#H)
print('W:\n', W)
print('H:\n', H)
# Maximum iteration of 40k
maxIter = 8
f = np.zeros(shape=(maxIter, 1))
# Initial error
f[0] = np.linalg.norm(V - W # H, ord='fro')
print('Initial error: ', np.log(np.linalg.norm(V - W # H, ord='fro')))
print('Learning weights...')
for iter in range(maxIter-1):
# Update W
for i in range(n):
for a in range(r):
S = V[i,:] / (W#H)[i,:] # H[a,:]
W[i,a] = W[i,a] * S
# Update H
for a in range(r):
for u in range(m):
T = V[:,u] / (W#H)[:,u] # W[:,a]
H[a,u] = H[a,u] * T
# Measure Error
f[iter+1] = np.linalg.norm(V - W # H, ord='fro')
fig, ax = plt.subplots(figsize=(5,4))
ax.set_title('Convergence of NMF')
ax.set_xlabel('Iteration')
ax.set_ylabel('log(Error)')
ax.plot(np.arange(maxIter), np.log(f), c='m')
ax.grid(True)
print('Final error: ', np.log(np.linalg.norm(V - W # H, ord='fro')))
plt.savefig('images/NMF_convergence_r32.png')
The problem is that my solution stops reducing the error value after a small number of iteration. The correct code should reduce the error value with higher number of iteration.
This is where the problem most likely lies:
# Update W
for i in range(n):
for a in range(r):
S = V[i,:] / (W#H)[i,:] # H[a,:]
W[i,a] = W[i,a] * S
# Update H
for a in range(r):
for u in range(m):
T = V[:,u] / (W#H)[:,u] # W[:,a]
H[a,u] = H[a,u] * T
Here's the plot I'm getting:
Appreciate any thoughts.
I came up with a quick fix after reading chapter 10 of Programming Collective Intelligence by Toby Segaran.
First make sure you run from numpy import * then inside the main loop:
# ...
for iter in range(maxIter-1):
# Update W
wn=(V#transpose(H))
wd=(W#H#transpose(H))
W=matrix(array(W)*array(wn)/array(wd))
# Update H
hn=(transpose(W)*V)
hd=(transpose(W)*W*H)
H=matrix(array(H)*array(hn)/array(hd))
# Measure Error
# ...
This has given me the correct plot:
I have this interesting problem where I want to calculate the sum over the element-wise product of three matrices
While calculating \mathbf{p}_ {ijk} and c_{ijk} can be done apriori, I have my problem with f_{ijk}(x,y,z). Elements of this matrix are multivariate polynomials which depend upon the matrix indices, thus numpy.vectorize cannot be trivially applied. My best bet at tackling the issue would be to treat the (i,j,k) as additional variables such that numpy.vectorize is then subsequently applied to a 6-dimensional instead of 3-dimensional input. However, I am not sure if more efficient or alternative ways exist.
This is a simple way to implement that formula efficiently:
import numpy as np
np.random.seed(0)
l, m, n = 4, 5, 6
x, y, z = np.random.rand(3)
p = np.random.rand(l, m, n)
c = np.random.rand(l, m, n)
i, j, k = map(np.arange, (l, m, n))
xi = (x ** (l - i)) * (x ** l)
yj = (y ** (m - j)) * (y ** m)
zk = (z ** (n - k)) * (z ** n)
res = np.einsum('ijk,ijk,i,j,k->', p, c, xi, yj, zk)
print(res)
# 0.0007208482648476157
Or even slightly more compact:
import numpy as np
np.random.seed(0)
l, m, n = 4, 5, 6
x, y, z = np.random.rand(3)
p = np.random.rand(l, m, n)
c = np.random.rand(l, m, n)
t = map(lambda v, s: (v ** (s - np.arange(s))) * (v ** s), (x, y, z), (l, m, n))
res = np.einsum('ijk,ijk,i,j,k->', p, c, *t)
print(res)
# 0.0007208482648476157
Using np.einsum you minimize the need for intermediate arrays, so it should be faster that making f first (which you could get e.g. as f = np.einsum('i,j,k->ijk', xi, yj, zk)), multiplying p, c and f and then summing the result.
So I have this 3x3 G matrix (not shown here, it's irrelevant to my problem) that I created using the two variables u (a vector, x - y) and the scalar k. x_j = (x_1 (j), x_2 (j), x_3 (j)) and y_j = (y_1 (j), y_2 (j), y_3 (j)). alpha_j is a 3x3 matrix. The A matrix is block diagonal matrix of size 3nx3n. I am having trouble with the W matrix. How do I code a matrix of size 3nx3n, where the (i,j)th block is the 3x3 matrix given by alpha_i*G_[ij]*alpha_j?? I am lost.
My alpha_j matrix also seems to be having some trouble. The loop keeps throwing me the error, "only length-1 arrays can be converted to Python scalars." pls help :/
def W(x, y, k, alpha, A):
u = x - y
n = x.shape[0]
W = np.zeros((3*n, 3*n))
for i in range(0, n-1):
for j in range(0, n-1):
#u = -np.array([[x[i,0] - x[j,0]], [x[i,1] - x[j,1]], [0]]) ??
W[i][j] = (alpha_j(alpha, A) * G(u, k) * alpha_j(alpha, A))
W[i][i] = np.zeros((n, n))
return W
def alpha_j(a, A):
alph = np.array([[0,0,0],[0,0,0],[0,0,0]],complex)
rho = np.random.rand(3,1)
for i in range(0, 2):
for j in range(0, 2):
alph[i][j] = (rho[i] * a * A[i][j])
return alph
#-------------------------------------------------------------------
x1 = np.array([[1], [2], [0]])
y1 = np.array([[4], [5], [0]])
# SYSTEM PARAMETERS
# incoming Wave angle
theta = 0 # can range from [0, 2pi)
# susceptibility
chi = 10 + 1j
# wavelength
lam = 0.5 # microns (values between .4-.7)
# frequency
k = (2 * np.pi)/lam # 1/microns
# volume
V_0 = (0.05)**3 # microns^3
# incoming wave vector
K = k * np.array([[0], [np.sin(theta)], [np.cos(theta)]])
# polarization vector
vecinc = np.array([[1], [0], [0]]) # (can choose any vector perpendicular to K)
# for the fixed alpha case
alpha = (V_0 * 3 * chi)/(chi + 3)
# 3 x 3 matrix
A = np.matlib.identity(3) # could be any symmetric matrix,
#-------------------------------------------------------------------
# TEST FUNCTIONS
test = G((x1-y1), k)
print(test)
w = W(x1, y1, k, alpha, A)
print(w)
Sometimes my W loops throws me the error, "can't set an array element with a sequence." But I need to set each array element in this arbitrary matrix W to the 3x3 matrix created by multiplying alpha by G...
To your question of how to create a new array with a block for each element, the following should do the trick:
G = np.random.random([3,3])
result = np.zeros([9,9])
num_blocks = 3
a = np.random.random([3,3])
b = np.random.random([3,3])
for i in range(G.shape[0]):
for j in range(G.shape[1]):
block_result = a*G[i,j]*b
for k in range(num_blocks):
for l in range(num_blocks):
result[3*i + k, 3*j + l] = block_result[i, j]
You should be able to generalize from there. I hope I've understood correctly.
EDIT: It looks like I haven't understood correctly. I'm leaving it in hopes it spurs you to an answer. The general idea is to generate ranges of indices to operate on, and then just operate on them directly. Slicing might be helpful, too.
Ah, you asked how to create a diagonal filled with blocks. In that case:
num_diagonal_blocks = 3 # for example
for block_dim in range(num_diagonal_blocks)
# do your block calculation...
for k in range(G.shape[0]):
for l in range(G.shape[1]):
result[3*block_dim + k, 3*block_dim + l] = # assign to element of block
I think that's nearly it.