In an Linear Program I am minimizing the distance between weighted input vectors and a target vector. I used Scipyto compute values for the weights I need. Currently they are between zero and one, but I'd like them to be zero if they are smaller than .2 for example, so x_i should be 0 or [.2; 1]. I was pointed to mixed integer linear programming but I still can't find any approach for my problem. How can I fix this?
tldr: i want to use (0,0) or (.3,1) as bounds for each x, how do i implement this?
Here is my SciPy code:
# minimize the distance between weighted input vectors and a target vector
def milp_objective_function(weights):
scaled_matrix = input_matrix * weights[:, np.newaxis] # scale input_matrix columns by weights
sum_vector = sum(scaled_matrix) # sum weighted_input_matrix columns
difference_vector = sum_vector - target_vector
return np.sqrt(difference_vector.dot(difference_vector)) # return the distance between the sum_vector and the target_vector
# sum of weights should equal 100%
def milp_constraint(weights):
return sum(weights) - 1
def main():
# bounds should be 0 or [.2; 1] -> mixed integer linear programming?
weight_bounds = tuple([(0, 1) for i in input_matrix])
# random guess, will implement later
initial_guess = milp_guess_weights()
constraint_obj = {'type': 'eq', 'fun': milp_constraint}
result = minimize(milp_objective_function, x0=initial_guess, bounds=weight_bounds, constraints=constraint_obj)
Variables that are in {0} ∪ [L,U] are called semi-continuous variables. Advanced MIP solvers have built-in support for these types of variables.
Note that SciPy does not have a MIP solver at all.
I also want to note that if your MIP solver does not support semi-continuous variables you can simulate them with binary variables:
L ⋅ δ(i) ≤ x(i) ≤ U ⋅ δ(i)
δ(i) ∈ {0,1}
Related
I am trying to implement both the explicit and implicit Euler methods to approximate a solution for the following ODE:
dx/dt = -kx, where k = cos(2 pi t), and x(0) = 1
Euler's methods use finite differencing to approximate a derivative:
dx/dt = (x(t+dt) - x(t)) / dt
The forward method explicitly calculates x(t+dt) based on a previous solution x(t):
x(t+dt) = x(t) + f(x,t)dt
The backwards method is implicit, and finds the solution x(t+dt) by solving an equation involving the current state of the system x(t) and the later one x(t+dt):
x(t) = x(t+dt) - f(x,(t+dt))dt
My code for approximating a solution to dx/dt = -kx, x(0) = 1 and plotting it alongside the actual solution is given below:
### Import Necessary Packages
import matplotlib.pyplot as plt
import numpy as np
%config InlineBackend.figure_format = 'retina'
plt.rcParams['figure.figsize'] = (10.0, 6.0)
### Defining Basic Data
t0 = 0 # initial t
tf = 4*np.pi # final t
N = 1000 # factor affecting time step size
dt = (4*np.pi)/N # time step size (as a factor of N)
t = np.linspace(t0,tf,N) # defining a vector of t values from t0 to tf
x0 = 1 # initial x
x = np.zeros([N]) # initializing array for x values
f = lambda x,t: -np.cos(2*np.pi*t)*x # defining f(x,t) on RHS on ODE
### Define a Function for Euler's Forward Method ###
def ForwardEuler(f,x0,t):
x[0] = x0
for i in range(1,N-1):
x[i+1] = x[i] + (f(x[i],t[i]))*dt
return x
# Plot Solution
forward = ForwardEuler(f,x0,t)
actual = 1/np.exp((1/(2*np.pi))*np.sin(2*np.pi*t))
plt.plot(t,actual,'r-',t,forward,'b-')
plt.legend(['Actual','Backward Euler'])
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.title("Solution to $x'=-kx$, $x$(0)=1")
plt.grid(True);plt.show()
My question lies in how to adapt the for-loop section of the code to display the backward Euler method instead of forward Euler method. I am having trouble with this since the equations require you to know x[i+1] in order to solve x[i+1].
I believe the backwards for-loop would be what is given below, but I am unsure:
def BackwardEuler(f,x0,t):
x[0] = x0
for i in range(1,N-1):
x[i] = x[i+1] - (f(x[i+1],t[i+1]))*dt
return x
I have found very few resources online and am at a complete loss. Any help on this would be appreciated! Thank you!
Usually, for Backward Euler and Trapezoidal Rule, you write the expression as a equation (or a system of equations), then solve it (i.e. find its zeros). The zeros represent the value of x[i+1].
For example, for Backward Euler, the system is:
x[i+1] = x[i] + (f(x[i+1],t[i+1]))*dt
Which you can rewrite as:
x[i+1] - x[i] - dt*f(x[i+1], t[i+1]) = 0
The values x[i] and t[i+1] are known. The only unknown is x[i+1]. You can solve this system numerically (using something like fsolve), and the solution would be your x[i+1]. It is possible, of course, that you get more than one solution. You have to select the one that fits your problem (i.e. x cannot be an imaginary number, or x cannot be negative, etc.)
The same technique can be applied for Trapezoidal Rule, with the system being:
x[i+1] - x[i] - (f(x[i],t[i]) + f(x[i+1],t[i+1]))*(dt/2) = 0
PS: Check out Computational Science StackExchange. It is more suitable for question related to numerical and computational methods.
I have a matrix A of size m x m
and while using scipy, I need to define the inequality constraint in python for items of this matrix that verifies:
.
I managed to define the bounds between -1 and 1 :
in python as follows:
# -1 <= a[i][j] <= 1
bnds_a_s_t = [(-1,1) for _ in range(np.size(A))]
But I don't know how to define the inequality constraint. Should I use a for loop with 2 pointers and add them to a list of inequality constraints? In this case, what should I return from the inequality function? In the equations, M is a list of size m.
If you want to use it in a nonlinear programming you can use have a matrix that is constrained by construction, starting with an arbitrary matrix A, you can use, for instance
def constrained(A):
return np.tanh(A - A.T);
The matrix a = A - A.T will ensure that a[t,s] + a[s,t] = 0 the tanh is such that tanh(-x) = -tanh(x), so the identity will be preserved for a = tanh(A - A.T), furthermore -1 <= tanh(x) <= 1. So constrained(A) gives what you need.
I've seen another StackOverflow thread talking about the various implementations for calculating the Euclidian norm and I'm having trouble seeing why/how a particular implementation works.
The code is found in an implementation of the MMD metric: https://github.com/josipd/torch-two-sample/blob/master/torch_two_sample/statistics_diff.py
Here is some beginning boilerplate:
import torch
sample_1, sample_2 = torch.ones((10,2)), torch.zeros((10,2))
Then the next part is where we pick up from the code above.. I'm unsure why the samples are being concatenated together..
sample_12 = torch.cat((sample_1, sample_2), 0)
distances = pdist(sample_12, sample_12, norm=2)
and are then passed to the pdist function:
def pdist(sample_1, sample_2, norm=2, eps=1e-5):
r"""Compute the matrix of all squared pairwise distances.
Arguments
---------
sample_1 : torch.Tensor or Variable
The first sample, should be of shape ``(n_1, d)``.
sample_2 : torch.Tensor or Variable
The second sample, should be of shape ``(n_2, d)``.
norm : float
The l_p norm to be used.
Returns
-------
torch.Tensor or Variable
Matrix of shape (n_1, n_2). The [i, j]-th entry is equal to
``|| sample_1[i, :] - sample_2[j, :] ||_p``."""
here we get to the meat of the calculation
n_1, n_2 = sample_1.size(0), sample_2.size(0)
norm = float(norm)
if norm == 2.:
norms_1 = torch.sum(sample_1**2, dim=1, keepdim=True)
norms_2 = torch.sum(sample_2**2, dim=1, keepdim=True)
norms = (norms_1.expand(n_1, n_2) +
norms_2.transpose(0, 1).expand(n_1, n_2))
distances_squared = norms - 2 * sample_1.mm(sample_2.t())
return torch.sqrt(eps + torch.abs(distances_squared))
I am at a loss for why the euclidian norm would be calculated this way. Any insight would be greatly appreciated
Let's walk through this block of code step by step. The definition of Euclidean distance, i.e., L2 norm is
Let's consider the simplest case. We have two samples,
Sample a has two vectors [a00, a01] and [a10, a11]. Same for sample b. Let first calculate the norm
n1, n2 = a.size(0), b.size(0) # here both n1 and n2 have the value 2
norm1 = torch.sum(a**2, dim=1)
norm2 = torch.sum(b**2, dim=1)
Now we get
Next, we have norms_1.expand(n_1, n_2) and norms_2.transpose(0, 1).expand(n_1, n_2)
Note that b is transposed. The sum of the two gives norm
sample_1.mm(sample_2.t()), that's the multiplication of the two matrix.
Therefore, after the operation
distances_squared = norms - 2 * sample_1.mm(sample_2.t())
you get
In the end, the last step is taking the square root of every element in the matrix.
I was going through the code for SVM loss and derivative, I did understand the loss but I cannot understand how the gradient is being computed in a vectorized manner
def svm_loss_vectorized(W, X, y, reg):
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
num_train = X.shape[0]
scores = X.dot(W)
yi_scores = scores[np.arange(scores.shape[0]),y]
margins = np.maximum(0, scores - np.matrix(yi_scores).T + 1)
margins[np.arange(num_train),y] = 0
loss = np.mean(np.sum(margins, axis=1))
loss += 0.5 * reg * np.sum(W * W)
Understood up to here, After here I cannot understand why we are summing up row-wise in binary matrix and subtracting by its sum
binary = margins
binary[margins > 0] = 1
row_sum = np.sum(binary, axis=1)
binary[np.arange(num_train), y] = -row_sum.T
dW = np.dot(X.T, binary)
# Average
dW /= num_train
# Regularize
dW += reg*W
return loss, dW
Let us recap the scenario and the loss function first, so we are on the same page:
Given are P sample points in N-dimensional space in the form of a PxN matrix X, so the points are the rows of this matrix. Each point in X is assigned to one out of M categories. These are given as a vector Y of length P that has integer values between 0 and M-1.
The goal is to predict the classes of all points by M linear classifiers (one for each category) given in the form of a weight matrix W of shape NxM, so the classifiers are the columns of W. To predict the categories of all samples X the scalar products between all points and all weight vectors are formed. This is the same as matrix multiplying X and W yielding a score matrix Y0 that is arranged such that its rows are ordered like theh elements of Y, each row corresponds to one sample. The predicted category for each sample is simply that with the largest score.
There are no bias terms so I presume there is some kind of symmetry or zero mean assumption.
Now, to find a good set of weights we want a loss function that is small for good predictions and large for bad predictions and that lets us do gradient descent. One of the most straight-forward ways is to just punish for each sample i each score that is larger than the score of the correct category for that sample and let the penalty grow linearly with the difference. So if we write A[i] for the set of categories j that score more than the correct category Y0[i, j] > Y0[i, Y[i]] the loss for sample i could be written as
sum_{j in A[i]} (Y0[i, j] - Y0[i, Y[i]])
or equivalently if we write #A[i] for the number of elements in A[i]
(sum_{j in A[i]} Y0[i, j]) - #A[i] Y0[i, Y[i]]
The partial derivatives with respect to the score are thus simply
| -#A[i] if j == Y[i]
dloss / dY0[i, j] = { 1 if j in A[i]
| 0 else
which is precisely what the first four lines you say you don't understand compute.
The next line applies the chain rule dloss/dW = dloss/dY0 dY0/dW.
It remains to divide by the number of samples to get a per sample loss and to add the derivative of the regulatization term which the regularization being just a componentwise quadratic function is easy.
Personally, I found it much easier to understand the whole gradient calculation through looking at the analytic derivation of the loss function in more detail. To extend on the given answer, I would like to point to the derivatives of the loss function
with respect to the weights as follows:
Loss gradient wrt w_yi (correct class)
Hence, we count the cases where w_j is not meeting the margin requirement and sum those cases up. This negative sum is then specified as weight for the position of the correct class w_yi. (we later need to multiply this value with xi, this is what you do in your code in line 5)
2) Loss gradient wrt w_j (incorrect classes)
where 1 is the indicator function, 1 if true, else 0.
In other words, "programatically" we need to apply equation (2) to all cases where the margin requirement is not met, and adding the negative sum of all unmet requirements to the true class column (as in (1)).
So what you did in the first 3 lines of your code is to determine the cases where the margin is not met, as well as adding the negative sum of these cases to the correct class column (j). In the 5 line, you do the final step where you multiply the x_i's to the other term - and this completes the gradient calculations as in (1) and (2).
I hope this makes it easier to understand, let me know if anything remains unclear. source
I want to numerically solve integrals that contain white noise.
Mathematically white noise can be described by a variable X(t), which is a random variable with a time average, Avg[X(t)] = 0 and the correlation function, Avg[X(t), X(t')] = delta_distribution(t-t').
A simple example would be to calculate the integral over X(t) from t=0 to t=1. On average this is of course zero, but what I need are different realizations of this integral.
The problem is that this does not work with numpy.integrate.quad().
Are there any packages for python that deal with stochastic integrals?
This is a good starting point for numerical SDE methods: http://math.gmu.edu/~tsauer/pre/sde.pdf.
Here is a simple numpy solver for the stochastic differential equation dX_t = a(t,X_t)dt + b(t,X_t)dW_t which I wrote for a class project last year. It is based on the forward euler method for regular differential equations, and in practice is fairly widely used when solving SDEs.
def euler_maruyama(a,b,x0,t):
N = len(t)
x = np.zeros((N,len(x0)))
x[0] = x0
for i in range(N-1):
dt = t[i+1]-t[i]
dWt = np.random.normal(0,dt)
x[i+1] = x[i] + a(t[i],x[i])*dt + b(t[i],x[i])*dWt
return x
Essentially, at each timestep, the deterministic part of the function is integrated using forward Euler, and the stochastic part is integrated by generating a normal random variable dWt with mean 0 and variance dt and integrating the stochastic part with respect to this.
The reason we generate dWt like this is based on the definition of Brownian motions. In particular, if $W$ is a Brownian motion, then $(W_t-W_s)$ is normally distributed with mean 0 and variance $t-s$. So dWt is a discritization of the change in $W$ over a small time interval.
This is a the docstring from the function above:
Parameters
----------
a : callable a(t,X_t),
t is scalar time and X_t is vector position
b : callable b(t,X_t),
where t is scalar time and X_t is vector position
x0 : ndarray
the initial position
t : ndarray
list of times at which to evaluate trajectory
Returns
-------
x : ndarray
positions of trajectory at each time in t