Machine Learning - implementing a Gradient Descent in Python from Octave code - python

I am trying to implement a gradient descent function in Python from scratch which I have implemented and work in GNU Octave. Unfortunately I am stuck. I fiddled with it for a while and checked the NumPy documentation but so far no luck.
I am aware of libraries such as scikit-learn, however my purpose is to learn to code such a function from scratch. Perhaps I am going about it the wrong way.
Below you will find all the code necessary to reproduce the error.
Thanks in advance for your help.
Actual result: test fails with error -> "ValueError: matmul: Input operand 0 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)"
Expected result: and array with values [5.2148, -0.5733]
Function gradientDescent() in Octave:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta - (alpha/m)*X'*(X*theta-y);
J_history(iter) = computeCost(X, y, theta);
end
Function gradient_descent() in python:
from numpy import zeros
def compute_cost(X, y, theta):
m = len(y)
ans = (X.T # theta).T - y
J = (ans # ans.T) / (2 * m)
return J[0, 0]
def gradient_descent(X, y, theta, alpha, num_iters):
m = len(y)
J_history = zeros((num_iters, 1), dtype=int)
for iter in range(num_iters):
theta = theta - (alpha / m) # X.T # (X # theta - y)
J_history[iter] = compute_cost(X, y, theta)
return theta
the test file: test_ml_utils.py
import unittest
import numpy as np
from ml.ml_utils import compute_cost, gradient_descent
class TestGradientDescent(unittest.TestCase):
# TODO: implement tests for Gradient Descent function
# [theta J_hist] = gradientDescent([1 5; 1 2; 1 4; 1 5],[1 6 4 2]',[0 0]',0.01,1000);
def test_gradient_descent_00(self):
X = np.array([[1, 5], [1, 2], [1, 4], [1, 5]])
y = np.array([1, 6, 4, 2])
theta = np.zeros(2)
alpha = 0.01
num_iter = 1000
r_theta = np.array([5.2148, -0.5733])
result = gradient_descent(X, y, theta, alpha, num_iter)
self.assertEqual((round(result, 4), r_theta), 'Result is wrong!')
if __name__ == '__main__':
unittest.main()

The __matmul__ operator # in Python binds more tightly than -. That means you're trying to do matrix multiplication with the operands (alpha / m), which is a scalar, and X.T, which is actually a matrix. See operator precedence.
In the Octave code, (alpha - m) * X' is doing scalar multiplication, not matrix, so if you want that same behavior in Python, use * rather than #. This seems to be because Octave overloads the * operator to perform scalar multiplication if one operand is a scalar, but matrix multiplication if both operands are matrices.

Adding to Adam's answer (which is correct in regards to the error you're getting).
However I wanted to add more generally, this code is meaningless to a reader without some sort of hint (whether programmatically or in the form of a comment) what dimensions the different variables take.
E.g., there is a hint in the code that y is likely to be 2-dimensional, and you're using len to get its size. Just as an example of how this might fail silently, consider this:
>>> y = numpy.array([[1,2,3,4,5]])
>>> len( y )
1
whereas presumably you want
>>> numpy.shape( y )
(1, 5)
or
>>> numpy.size( y )
5
I note in your unit test that you're passing a rank 1 vector instead of rank 2, so it turns out y is 1D instead of 2D, but operates with X which is 2D due to broadcasting. Your code therefore works despite the logic implied, but in the absence of making such things explicit, this is a runtime error waiting to happen.

Related

Computing covariance matrix of complex array with defined function is not matching while comparing with np.cov

I am trying to write a simple covariance matrix function in Python.
import numpy as np
def manual_covariance(x):
mean = x.mean(axis=1)
print(x.shape[1])
cov = np.zeros((len(x), len(x)), dtype='complex64')
for i in range(len(mean)):
for k in range(len(mean)):
s = 0
for j in range(len(x[1])): # 5 Col
s += np.dot((x[i][j] - mean[i]), (x[k][j] - mean[i]))
cov[i, k] = s / ((x.shape[1]) - 1)
return cov
With this function if I compute the covariance of:
A = np.array([[1, 2], [1, 5]])
man_cov = manual_covariance(A)
num_cov = np.cov(A)
My answer matches with the np.cov(), and there is no problem. But, when I use complex number instead, my answer does not match with np.cov()
A = np.array([[1+1j, 1+2j], [1+4j, 5+5j]])
man_cov = manual_covariance(A)
num_cov = cov(A)
Manual result:
[[-0.5+0.j -0.5+2.j]
[-0.5+2.j 7.5+4.j]]
Numpy cov result:
[[0.5+0.j 0.5+2.j]
[0.5-2.j 8.5+0.j]]
I have tried printing every statement, to check where it can go wrong, but I am not able to find a fault.
It is because the dot product of two complex vectors z1 and z2 is defined as z1 ยท z2*, where * means conjugation. If you use s += np.dot((x[i,j] - mean[i]), np.conj(x[k,j] - mean[i])) you should get the correct result, where we have used Numpy's conjugate function.

How to apply crank-nicolson method in python to a wave equation like schrodinger's

I'm trying to do a particle in a box simulation with no potential field. Took me some time to find out that simple explicit and implicit methods break unitary time evolution so I resorted to crank-nicolson, which is supposed to be unitary. But when I try it I find that it still is not so. I'm not sure what I'm missing.. The formulation I used is this:
where T is the tridiagonal Toeplitz matrix for the second derivative wrt x and
The system simplifies to
The A and B matrices are:
I just solve this linear system for using the sparse module. The math makes sense and I found the same numeric scheme in some papers so that led me to believe my code is where the problem is.
Here's my code so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import toeplitz
from scipy.sparse.linalg import spsolve
from scipy import sparse
# Spatial discretisation
N = 100
x = np.linspace(0, 1, N)
dx = x[1] - x[0]
# Time discretisation
K = 10000
t = np.linspace(0, 10, K)
dt = t[1] - t[0]
alpha = (1j * dt) / (2 * (dx ** 2))
A = sparse.csc_matrix(toeplitz([1 + 2 * alpha, -alpha, *np.zeros(N-4)]), dtype=np.cfloat) # 2 less for both boundaries
B = sparse.csc_matrix(toeplitz([1 - 2 * alpha, alpha, *np.zeros(N-4)]), dtype=np.cfloat)
# Initial and boundary conditions (localized gaussian)
psi = np.exp((1j * 50 * x) - (200 * (x - .5) ** 2))
b = B.dot(psi[1:-1])
psi[0], psi[-1] = 0, 0
for index, step in enumerate(t):
# Within the domain
psi[1:-1] = spsolve(A, b)
# Enforce boundaries
# psi[0], psi[N - 1] = 0, 0
b = B.dot(psi[1:-1])
# Square integration to show if it's unitary
print(np.trapz(np.abs(psi) ** 2, dx))
You are relying on the Toeplitz constructor to produce a symmetric matrix, so that the entries below the diagonal are the same as above the diagonal. However, the documentation for scipy.linalg.toeplitz(c, r=None) says not "transpose", but
*"If r is not given, r == conjugate(c) is assumed."
so that the resulting matrix is self-adjoint. In this case this means that the entries above the diagonal have their sign switched.
It makes no sense to first construct a dense matrix and then extract a sparse representation. Construct it as sparse tridiagonal matrix from the start, using scipy.sparse.diags
A = sparse.diags([ (N-3)*[-alpha], (N-2)*[1+2*alpha], (N-3)*[-alpha]], [-1,0,1], format="csc");
B = sparse.diags([ (N-3)*[ alpha], (N-2)*[1-2*alpha], (N-3)*[ alpha]], [-1,0,1], format="csc");

scipy's sparse eigs yielding wrong eigenvectors

I am a bit confused regarding the following issue:
I am computing fixed points of a quantum channel, which means I want to compute the leading eigenvector of a specific matrix. The matrix is such that its dimensionality is n^2 x n^2 and defined in such a way that the leading eigenvalue reshaped to a matrix with shape n x n is a positive matrix (self adjoint with positive eigenvalues).
If I do this with scipy.sparse.linalg.eigs however I get wrong results. The exact computation (using scipy.linalg.eig) however works fine. I tried around playing with the arguments for k and ncv for the solver, but didn't get i working properly unless I set k=n**2 in which case eigs just refers to eig. This, however, won't work in the case that I have actually in mind where the channel (super_op in the script below) is actually encoded as a LinearOperator. So I rely on using eigs :/
Anybody any idea how to get this run properly?
Thanks to everybody in advance!
import numpy as np
from numpy.random import rand
from numpy import tensordot as td
from scipy.sparse.linalg import eigs
from scipy.linalg import eig
n = 16
d = 3
kraus_op = .5 - rand(n, d, n) + 1j * (.5 - rand(n, d, n))
super_op = td(kraus_op, kraus_op.conj(), [[1], [1]]).transpose(0, 2, 1, 3)
########
# Sparse
########
vals, vecs = eigs(super_op.reshape(n**2, n**2), k=n*(n-1), which='LM')
rho = vecs[:,0].reshape(n, n)
print('is self adjoint: ', np.allclose(rho, rho.conj().T))
super_op_times_rho = td(super_op, rho, [[2, 3], [0, 1]])
print('super_op(rho) == lambda * rho :', np.allclose(rho, super_op_times_rho/vals[0]))
########
# Exact
########
vals, vecs = eig(super_op.reshape(n**2, n**2))
rho = vecs[:,0].reshape(n, n)
print('is self adjoint: ', np.allclose(rho, rho.conj().T))
super_op_times_rho = td(super_op, rho, [[2, 3], [0, 1]])
print('super_op(rho) == lambda * rho :', np.allclose(rho, super_op_times_rho/vals[0]))
the result is:
is self adjoint: False
super_op(rho) == lambda * rho : True
is self adjoint: True
super_op(rho) == lambda * rho : True
For completeness:
Python 3.5.2
numpy 1.16.1
scipy 1.2.1
After all I found the solution with some help of my colleagues:
While eig gives the eigenvectors (1) sorted (by magnitude) and (2) such that the first entry is real, eigs has another sorting that must not be as in eig and also does not regularize the complex phase of the eigenvector. Correcting the phase is easily done by dividing the tensor by the phase of the first entry (corresponding to ensuring that the first diagonal element is real to get rid of the freedom of choosing the complex phase of an eigenvector and making Hermiticity possible...).
So the corrected code snippet for the sparse case is:
vals, vecs = eigs(super_op.reshape(chi**2, chi**2), k=chi*(chi-1), which='LM')
# find the index corresponding to the largest eigenvalue
arg = np.argmax(np.abs(vals))
rho = vecs[:,arg].reshape(chi, chi)
# regularize the output array
rho *= np.abs(rho[0, 0])/rho[0, 0]

How to write a function, that generates a vector recursively in Python?

How can I write a recursive function to generate a vector X of size (1,n) as follows, where X_i is the i-th entry:
X_1 = Z_1 * E_1
X_i = max{B_(1,i) * X_1, ... , B_((i-1),i) * X_(i-1), Z_i} * E_i, i = 2,...,n,
where
Z = np.random.normal(0, 1,size = n)
E = np.random.lognormal(0, 1, size = n)
B = np.random.uniform(0,1,(n,n))
I do not have any experience with recursive functions, that is why I can not present any code with which I tried to solve this.
If you're working with numpy, then use all the power of numpy, not just the random module ;)
And if you work with vectors, then forget about recursion and use numpy's vectorised operations. For example, np.max gives you the maximum over an axis, np.dot gives you element-wise multiplication. You also have np.prod for the product of array elements over a given axis... Those are just examples that might fit your problem well. For a full documentation, https://docs.scipy.org/doc/numpy/
I got it, one does not need a recursion as #meowgoesthedog stated in the first comment.
import numpy as np
s=1000 # sample size
n=5
Z = np.random.normal(0, 1,size = (s,n))
B = np.random.uniform(0,1,(n,n))
E = np.random.lognormal(0, 1, size = (s,n))
X = np.zeros((s,n))
X[:,0] = Z[:,0]*E[:,0]
for k in range(s):
for l in range(1,n):
X[k,l] = max(np.max(X[k,:(l)] * B[:(l),l]), Z[k,l]) * E[k,l]

Conflict between vectorization/broadcasting and solving an ODE with solve_ivp

Using a NumPy array and vectorization, I'm trying to create a population of n different individuals, with each individual having three properties: alpha, beta, and phenotype (the phenotype being calculated as the steady state of a differential equation that involves alpha and beta). So, I want each individual to have its own phenotype.
However, my code produces the same phenotype for every individual. Moreover, this unwanted behavior only occurs if there happen to be exactly n entries in solve_ivp's y0 array (which here is [0, 1]) -- otherwise, a broadcasting error is produced:
ValueError: operands could not be broadcast together with shapes (2,) (3,)
Here's the code:
import numpy as np
from scipy.integrate import solve_ivp
def create_population(n):
"""creates a population of n individuals"""
pop = np.zeros(n, dtype=[('alpha','<f8'),('beta','<f8'),('phenotype','<f8')])
pop['alpha'] = np.random.randn(n)
pop['beta'] = np.random.randn(n) + 5
def phenotype(n):
"""creates the phenotype"""
def pheno_ode(t_ode, y):
"""defines the ode for the phenotype"""
dydt = 0.123 - y + pop['alpha'] * (y ** pop['beta'] / (1 + y ** pop['beta']))
return dydt
t_end = 1e06
sol = solve_ivp(pheno_ode, [0, t_end], [0, 1], method='BDF')
return sol.y[0][-1] # last entry is assumed to be the steady state
pop['phenotype'] = phenotype(n)
return pop
popul = create_population(3)
print(popul)
In contrast, if the phenotype is calculated from alpha and beta via a "simple" equation, then vectorization works fine:
def phenotype(n):
"""creates the phenotype"""
phenotype_simple = 2 * pop['alpha'] + pop['beta']
return phenotype_simple
There are two problems that I can see:
First, you have the initial condition for the ODE set to [0, 1]. The sets the size of the vector solution for solve_ivp to 2, regardless of the value of n. However, the arrays pop['alpha'] and pop['beta'] have length n, and in your script, you call create_population with n set to 3. So you have a mismatch in the array shapes in the formula for dydt: y has length 2, but pop['alpha'] and pop['beta'] have length 3. That causes the error that you see.
You can fix this by using, say, np.ones(n) instead of [0, 1] as the initial condition in your call to solve_ivp.
The second problem is in the statement return sol.y[0][-1] in the function phenotype(n). sol.y has shape (n, num_points), where num_points is the number of points computed by solve_ivp. So sol.y[0] is just the first component of the solution, and sol.y[0][-1] is the last value of the solution for the first component. It is a scalar, so when you execute pop['phenotype'] = phenotype(n), you are assigning the same value (the steady state of the first component) to all the phenotypes.
The return statement should be return sol.y[:, -1]. That returns the last column of the solution array (i.e. all the steady state phenotypes).

Categories