scipy optimize one iteration at a time - python

I want to control the objective of my optimization as a function of the number of iterations. In my real problem, I have a complicated regularization term that I want to control using the iteration number.
Is it possible to call a scipy optimizer one iteration at a time, or at least to be able to access the iteration number in the objective function?
Here is an example showing my best attempt so far:
from scipy.optimize import fmin_slsqp
from scipy.optimize import minimize as mini
import numpy as np
# define objective function
# x is the design input
# iteration is the iteration number
# the idea is that I want to control a regularization term using the iteration number
def objective(x, iteration):
return (1 - x[0]) ** 2 + 100 * (x[1] - x[0] ** 2) ** 2 + 10 * np.sum(x ** 2) / iteration
x = np.ones(2) * 5
for ii in range(20):
x = fmin_slsqp(objective, x, iter=1, args=(ii,), iprint=0)
if ii == 5: print('at iteration 5, I expect to get ~ [0, 0], but I get', x)
truex = mini(objective, np.ones(2) * 5, args=(200,)).x
print('the final result is ', x, 'instead of the correct answer, which is close to [1, 1] (', truex, ')')
output:
at iteration 5, I expect to get ~ [0, 0], but I get [5. 5.]
the final result is [5. 5.] instead of the correct answer, [1, 1] ([0.88613989 0.78485145])

No, I don't think scipy offers this option.
Interestingly, pytorch does. See this example of optimizing one iteration at a time:
import numpy as np
# define rosenbrock function and gradient
a = 1
b = 5
def f(x):
return (a - x[0]) ** 2 + b * (x[1] - x[0] ** 2) ** 2
# create stochastic rosenbrock function and gradient
def f_rand(x):
return f(x) * np.random.uniform(0.5, 1.5)
x = np.array([0.1, 0.1])
x0 = x.copy()
import torch
x_tensor = torch.tensor(x0, requires_grad=True)
optimizer = torch.optim.Adam([x_tensor], lr=learning_rate)
def closure():
optimizer.zero_grad()
loss = f_rand(x_tensor)
loss.backward()
return loss
# optimize one iteration at a time
for ii in range(200):
optimizer.step(closure)
print('optimal solution found: ', x_tensor, f(x_tensor))
If you really need to use scipy, you can make a class to count iterations, though you should be careful when mixing this with an algorithm that is approximating the inverse hessian matrix.
from scipy.optimize import fmin_slsqp
from scipy.optimize import minimize as mini
import numpy as np
# define objective function
# x is the design input
# iteration is the iteration number
# the idea is that I want to control a regularization term using the iteration number
def objective(x):
return (1 - x[0]) ** 2 + 100 * (x[1] - x[0] ** 2) ** 2 + 10 * np.sum(x ** 2)
class myclass:
def __init__(self):
self.iteration = 0
def call(self, x):
self.iteration += 1
return (1 - x[0]) ** 2 + 100 * (x[1] - x[0] ** 2) ** 2 + 10 * np.sum(x ** 2) / self.iteration
x = np.ones(2) * 5
obj = myclass()
x = fmin_slsqp(obj.call, x, iprint=0)
truex = mini(objective, np.ones(2) * 5).x
print('the final result is ', x, ', which is not the correct answer, and is not close to [1, 1] (', truex, ')')

Related

Numpy - vectorize the bivariate poisson pmf equation

I'm trying to write a function to evaluate the probability mass function for the bivariate poisson distribution.
This is easy when all of the parameters (x, y, theta1, theta2, theta0) are scalars, but tricky to scale up without loops to allow these parameters to be vectors. I need it to scale such that, for:
theta0 being a scalar - the "correlation parameter" in the equation
theta1 and theta2 having length l
x, y both having length n
the output array would have shape (l, n, n). For example, a slice [j, :, :] from the output array would look like:
The first part (the constant, before the summation) I think i've figured out:
import numpy as np
from scipy.special import factorial
def constant(theta1, theta2, theta0, x, y):
exponential_part = np.exp(-(theta1 + theta2 + theta0)).reshape(-1, 1, 1)
x = np.tile(x, (len(x), 1)).transpose()
y = np.tile(y, (len(y), 1))
double_factorial = (np.power(np.array(theta1).reshape(-1, 1, 1), x)/factorial(x)) * \
(np.power(np.array(theta2).reshape(-1, 1, 1), y)/factorial(y))
return exponential_part * double_factorial
But I'm struggling with the summation part. How can I vectorize a summation where the limits depend on variable arrays?
I think I have this figured out, based on the approach that #w-m suggests: calculate every possible summation term which could appear, based on the maximum x or y value which appears, and use a mask to get rid of the ones you don't want. Assuming you have your x and y terms go from 0 to N, in consecutive order, this is calculating up to three times more terms than are actually required, but this is offset by getting to use vectorization.
Reference implementation
I wrote this by first writing a pure-Python reference implementation, which just implements your problem using loops. With 4 nested loops, it's not exactly fast, but it's handy to have while testing the numpy version.
import numpy as np
from scipy.special import factorial, comb
import operator as op
from functools import reduce
def choose(n, r):
# https://stackoverflow.com/a/4941932/530160
r = min(r, n-r)
numer = reduce(op.mul, range(n, n-r, -1), 1)
denom = reduce(op.mul, range(1, r+1), 1)
return numer // denom # or / in Python 2
def reference_impl_constant(s_theta1, s_theta2, s_theta0, s_x, s_y):
# Cast to float to prevent overflow
s_theta1 = float(s_theta1)
s_theta2 = float(s_theta2)
s_theta0 = float(s_theta0)
s_x = float(s_x)
s_y = float(s_y)
term1 = np.exp(-(s_theta1 + s_theta2 + s_theta0))
term2 = (s_theta1 ** s_x / factorial(s_x))
term3 = (s_theta2 ** s_y / factorial(s_y))
assert term1 >= 0
assert term2 >= 0
assert term3 >= 0
return term1 * term2 * term3
def reference_impl_constant_loop(theta1, theta2, theta0, x, y):
theta_len = theta1.shape[0]
xy_len = x.shape[0]
constant_array = np.zeros((theta_len, xy_len, xy_len))
for i in range(theta_len):
for j in range(xy_len):
for k in range(xy_len):
s_theta1 = theta1[i]
s_theta2 = theta2[i]
s_theta0 = theta0
s_x = x[j]
s_y = y[k]
constant_term = reference_impl_constant(s_theta1, s_theta2, s_theta0, s_x, s_y)
assert constant_term >= 0
constant_array[i, j, k] = constant_term
return constant_array
def reference_impl_summation(s_theta1, s_theta2, s_theta0, s_x, s_y):
sum_ = 0
for i in range(min(s_x, s_y) + 1):
sum_ += choose(s_x, i) * choose(s_y, i) * factorial(i) * ((s_theta0/s_theta1/s_theta2) ** i)
assert sum_ >= 0
return sum_
def reference_impl_summation_loop(theta1, theta2, theta0, x, y):
theta_len = theta1.shape[0]
xy_len = x.shape[0]
summation_array = np.zeros((theta_len, xy_len, xy_len))
for i in range(theta_len):
for j in range(xy_len):
for k in range(xy_len):
s_theta1 = theta1[i]
s_theta2 = theta2[i]
s_theta0 = theta0
s_x = x[j]
s_y = y[k]
summation_term = reference_impl_summation(s_theta1, s_theta2, s_theta0, s_x, s_y)
assert summation_term >= 0
summation_array[i, j, k] = summation_term
return summation_array
def reference_impl(theta1, theta2, theta0, x, y):
# all array inputs must be 1D
assert len(theta1.shape) == 1
assert len(theta2.shape) == 1
assert len(x.shape) == 1
assert len(y.shape) == 1
# theta vectors must have same length
theta_len = theta1.shape[0]
assert theta2.shape[0] == theta_len
# x and y must have same length
xy_len = x.shape[0]
assert y.shape[0] == xy_len
# theta0 is scalar
assert isinstance(theta0, (int, float))
constant_array = np.zeros((theta_len, xy_len, xy_len))
output = np.zeros((theta_len, xy_len, xy_len))
constant_array = reference_impl_constant_loop(theta1, theta2, theta0, x, y)
summation_array = reference_impl_summation_loop(theta1, theta2, theta0, x, y)
output = constant_array * summation_array
return output
Numpy implementation
I split the implementation of this across two functions.
The fast_constant() function calculates everything to the left of the summation symbol. The fast_summation() function calculates everything inside the summation symbol.
import numpy as np
from scipy.special import factorial, comb
def fast_summation(theta1, theta2, theta0, x, y):
x = np.tile(x, (len(x), 1)).transpose()
y = np.tile(y, (len(y), 1))
sum_limit = np.minimum(x, y)
max_sum_limit = np.max(sum_limit)
i = np.arange(max_sum_limit + 1).reshape(-1, 1, 1)
summation_mask = (i <= sum_limit)
theta_ratio = (theta0 / (theta1 * theta2)).reshape(-1, 1, 1, 1)
theta_to_power = np.power(theta_ratio, i)
terms = comb(x, i) * comb(y, i) * factorial(i) * theta_to_power
# mask out terms which aren't part of sum
terms *= summation_mask
# axis 0 is theta
# axis 1 is i
# axis 2 & 3 are x and y
# so sum across axis 1
terms = terms.sum(axis=1)
return terms
def fast_constant(theta1, theta2, theta0, x, y):
theta1 = theta1.astype('float64')
theta2 = theta2.astype('float64')
exponential_part = np.exp(-(theta1 + theta2 + theta0)).reshape(-1, 1, 1)
# x and y must be 1D
assert len(x.shape) == 1
assert len(y.shape) == 1
# x and y must have same shape
assert x.shape == y.shape
x_len, y_len = x.shape[0], y.shape[0]
x = x.reshape((x_len, 1))
y = y.reshape((1, y_len))
double_factorial = (np.power(np.array(theta1).reshape(-1, 1, 1), x)/factorial(x)) * \
(np.power(np.array(theta2).reshape(-1, 1, 1), y)/factorial(y))
return exponential_part * double_factorial
def fast_impl(theta1, theta2, theta0, x, y):
return fast_summation(theta1, theta2, theta0, x, y) * fast_constant(theta1, theta2, theta0, x, y)
Benchmarking
Assuming that X and Y range from 0 to 20, and that theta is centered somewhere inside that range, I get the result that the numpy version is roughly 280 times faster than the pure python reference.
Numerical stability
I'm unsure how numerically stable this is. For example, when I center theta at 100, I get a floating-point overflow. Typically, when computing an expression which has lots of choose and factorial expressions inside it, you'll use some mathematical equivalent which results in smaller intermediate sums. In this case I have so little understanding of the math that I don't know how you'd do that.

Network Cost Function code Python Implementation

I was implementing Andrew NG’s ML course in Python and in week 5 exercise 4 I was referring to a code. What I didn’t understand was the need to use np.trace() in the final output. Having a problem visualising the matrices
import numpy as np
from scipy.optimize import minimize
import scipy.io
import matplotlib.pyplot as plt
data_dict = scipy.io.loadmat('ex4_orig_octave/ex4data1.mat')
X = data_dict['X']
y = data_dict['y'].ravel()
M = X.shape[0]
N = X.shape[1]
L = 26 # = number of nodes in the hidden layer (including bias node)
K = len(np.unique(y))
X = np.hstack((np.ones((M, 1)), X))
Y = np.zeros((M, K), dtype='uint8')
for i, row in enumerate(Y):
Y[i, y[i] - 1] = 1
weights_dict = scipy.io.loadmat('ex4_orig_octave/ex4weights.mat')
theta_1 = weights_dict['Theta1']
theta_2 = weights_dict['Theta2']
nn_params_saved = np.concatenate((theta_1.flatten(), theta_2.flatten()))
def nn_cost_function(nn_params, X, Y, M, N, L, K):
"""Python version of nnCostFunction.m after completing 'Part 1'."""
# Unroll the parameter vector.
theta_1 = nn_params[:(L - 1) * (N + 1)].reshape(L - 1, N + 1)
theta_2 = nn_params[(L - 1) * (N + 1):].reshape(K, L)
# Calculate activations in the second layer.
a_2 = sigmoid(theta_1.dot(X.T))
# Add the second layer's bias node.
a_2_p = np.vstack((np.ones(M), a_2))
# Calculate the activation of the third layer.
a_3 = sigmoid(theta_2.dot(a_2_p))
# Calculate the cost function.
cost = 1 / M * np.trace(- Y.dot(np.log(a_3)) - (1 - Y).dot(np.log(1 - a_3)))
return cost
cost_saved = nn_cost_function(nn_params_saved, X, Y, M, N, L, K)
print 'Cost at parameters (loaded from ex4weights): %.6f' % cost_saved
print '(this value should be about 0.287629)'
The operation 1/M * np.trace() is calculating the average cost over a batch of size M:
A bit less readable, but significantly faster should be:
np.sum(np.sum(Y.multiply(np.log(a_3.T)),axis=1),axis=0)
, if Y.shape==(M,K) and a_3.shape==(K,M):
Y = lambda : np.random.uniform(size=(5000,10)) # (M,K)
a3 = lambda : np.random.uniform(size=(10,5000)) # (K,M)
timeit.timeit('import numpy as np; np.trace(Y().dot(a3()))', number=10, globals=globals())
# 0.5633535870001651
timeit.timeit('import numpy as np; np.sum(np.sum(np.multiply(Y(),a3().T),axis=1),axis=0)', number=10, globals=globals())
# 0.013223066000136896

Poisson (npr) Size Alteration Returns ValueError (wrt arbitrary paths and array creation)

If I sample a non-central chi-square distribution using a Poisson distribution, I am unable to alter the size and can only input the mean, "nc / 2" (I must set size = 1 or it also returns the same error):
n = np.random.poisson(nc / 2, 1) # generates a random variable from the poisson distribution with
# mean: non-centrality parameter / 2
x[t] = c * mp.nsum(lambda i: np.random.standard_normal() ** 2, [0, v + 2 * n])
If I attempt to increase the size to the number of simulations being run
n = np.random.poisson(nc / 2, simulations)
where simulations = 10000, I receive:
"ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"
Running the code with 1 simulation produces one desired result, and every run produces another random path.
Graph created under 10,000 simulations with size = one
However, it is a necessity to have the graph composed of paths determined by each iteration of the simulation. Under a different condition, the non-central chi-square distribution is determined by the code:
x[t] = c * ((np.random.standard_normal(simulations) + nc ** 0.5) ** 2 + mp.nsum(
lambda i: np.random.standard_normal(simulations) ** 2, [0, v - 1]))
which does produce the desired result
Graph produced by the line of code above
How can I obtain a different path for x[t] despite not being able to change the size of the Poisson distribution (i.e. not have the same path for each of the 10,000 simulations)
If required:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import stats
import mpmath as mp
T = 1
beta = 1.5
x0 = 0.05
q = 0
mu = x0 - q
alpha = - (2 - beta) * mu
sigma0 = 0.1
sigma = (2 - beta) * sigma0
b = - (1 - beta) / (2 * mu) * sigma ** 2
simulations = 10000
M = 50
dt = T / M
def srd_sampled_nxc2():
x = np.zeros((M + 1, simulations))
x[0] = x0
for t in range(1, M + 1):
v = 4 * b * alpha / sigma ** 2
c = (sigma ** 2 * (1 - np.exp(-alpha * dt))) / (4 * alpha)
nc = np.exp(-alpha * dt) / c * x[t - 1] # The non-centrality parameter lambda
if v > 1:
x[t] = c * ((np.random.standard_normal(simulations) + nc ** 0.5) ** 2 + mp.nsum(
lambda i: np.random.standard_normal(simulations) ** 2, [0, v - 1]))
else:
n = np.random.poisson(nc / 2, 1)
x[t] = c * mp.nsum(lambda i: np.random.standard_normal() ** 2, [0, v + 2 * n])
return x
x1 = srd_sampled_nxc2()
plt.figure(figsize=(10, 6))
plt.plot(x1[:, :10], lw=1)
plt.xlabel('time')
plt.ylabel('index')
plt.show()
I've realized that the variable beta greater than 1 creates a negative v and a very large nc. There was nothing to fill the array with due to the fact that no distribution could be created as v couldn't go positive. I am under the impression that b must be made positive and thus solving the negative v and allowing the program to run.

How to fix fitting with scipy.optimize.minimize for derivation condition

I am trying to fit a function cav=p(T,x) with the condition that the derivation after x of cav for constant T is always positive dp/dx (for constant T) > 0. The data of x, T and p are from excel sheets. z are my coefficients I am trying to get.
I've used the solution from here Fitting with constraints on derivative Python as a template. Here is my code as it is right now with the providing error message:
import pandas as pd
import os
from scipy.optimize import minimize
import numpy as np
df = pd.read_excel(os.path.join(os.path.dirname(__file__), "./data.xlsx"))
T = np.array(df['T'], dtype=float)
x = np.array(df['x'], dtype=float)
p = np.array(df['p'], dtype=float)
p_s = 67
def cav(z,T,x): #my function
return x * p_s + x * (1 - x) * (z[0] + z[1] * T + z[2] * T ** 2 + z[3] * x + z[4] * x * T + z[5] * x * T ** 2) * p_s
def resid(p,T,x):
return ((p-cav(T,x))**2).sum()
def constr(z):
return np.gradient(cav(z,x,T))
con1 = {'type': 'ineq', 'fun': constr}
z0 = np.array([0,0,0,0,0,0], dtype=float)
res = minimize(resid,z0, args=(p,T,x), method='cobyla',options={'maxiter':50000}, constraints=con1)
And the Error:
TypeError: resid() takes 3 positional arguments but 4 were given
I don't understand what exactly do I have to put in as arguments for the three def. Thanks for any help!
The error is because you are passing 3 arguments to resid in addition to the initial guess z0.
Thus, the line which has to change is:
res = minimize(resid,z0, args=(T,x), method='cobyla',options={'maxiter':50000}, constraints=con1)
Another issue in your code is:
def resid(p,T,x):
return ((p-cav(T,x))**2).sum()
Your method cav takes three arguments but you pass only two. So this probably should be changed to:
def resid(p,T,x):
return ((p-cav(p,T,x))**2).sum()

Simple Neural Network from scratch using NumPy

I added learning rate and momentum to a neural network implementation from scratch I found at: https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6
However I had a few questions about my implementation:
Is it correct? Any suggested improvements? It appears to output adequate results generally but outside advice is very appreciated.
With a learning rate < 0.5 or momentum > 0.9 the network tends to gets stuck in a local optimum where loss = ~1. I assume this is because step size isn't big enough to escape this but is there a way to overcome this? Or is this inherent with the nature of the data being solved and unavoidable.
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
sig = 1 / (1 + np.exp(-x))
return sig * (1 - sig)
class NeuralNetwork:
def __init__(self, x, y):
self.input = x
self.weights1 = np.random.rand(self.input.shape[1], 4)
self.weights2 = np.random.rand(4, 1)
self.y = y
self.output = np.zeros(self.y.shape)
self.v_dw1 = 0
self.v_dw2 = 0
self.alpha = 0.5
self.beta = 0.5
def feedforward(self):
self.layer1 = sigmoid(np.dot(self.input, self.weights1))
self.output = sigmoid(np.dot(self.layer1, self.weights2))
def backprop(self, alpha, beta):
# application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y - self.output) *
sigmoid_derivative(self.output), self.weights2.T) *
sigmoid_derivative(self.layer1)))
# adding effect of momentum
self.v_dw1 = (beta * self.v_dw1) + ((1 - beta) * d_weights1)
self.v_dw2 = (beta * self.v_dw2) + ((1 - beta) * d_weights2)
# update the weights with the derivative (slope) of the loss function
self.weights1 = self.weights1 + (self.v_dw1 * alpha)
self.weights2 = self.weights2 + (self.v_dw2 * alpha)
if __name__ == "__main__":
X = np.array([[0, 0, 1],
[0, 1, 1],
[1, 0, 1],
[1, 1, 1]])
y = np.array([[0], [1], [1], [0]])
nn = NeuralNetwork(X, y)
total_loss = []
for i in range(10000):
nn.feedforward()
nn.backprop(nn.alpha, nn.beta)
total_loss.append(sum((nn.y-nn.output)**2))
iteration_num = list(range(10000))
plt.plot(iteration_num, total_loss)
plt.show()
print(nn.output)
first thing, in your "sigmoid_derivative(x)", input to this function is already output of a sigmoid, but you get the sigmoid again and then computed derivative, that is one problem, it should be :
return x * (1 - x)
second problem, you are not using any bias, how do you know your decision boundary would cross the origin in the problem hypothesis space? so you need to add a bias term.
And last thing I think your derivatives are not correct, you can refer to Andrew Ng deep learning course 1, week 2 at coursera.org , for a list of general formulas for computing back propagation in neural networks to make sure you are doing it right.

Categories