Using Scipy minimize (scipy.optimize.minimize) with a large equality constraint matrix - python

I need to minimize a function of say, five variables (x[0] to x[4])
The scalar function to be minimized is given by X'*H*X. The objective function would look similar to this:
def objfun(x):
H = 0.1*np.ones([5,5])
f = np.dot(np.transpose(x),np.dot(H,x))[0][0]
return f
Which would return a single scalar value.
The question is, how do I implement a constraint equations given by:
A*X - b = 0
Where A and b are subject to change in each run. A random example would be:
A =
array([[ 1, 2, 3, 4, 5],
[ 2, 1, 3, 4, 5],
[-1, 2, 3, 0, 0],
[ 0, -5, 6, 3, 2],
[-3, 5, 6, 2, 8]])
B =
array([[ 0],
[ 2],
[ 3],
[-2],
[-7]])
A and B cannot be hard-coded into a constraint function as they may be different in each run. There are no bounds on the variables and the optimization method need not be specified.
EDIT
I realized that having 5 constraint equations for an optimization problem with 5 variables gives a unique solution just by solving the equations.
So how about a case where A may be defined as:
A =
array([[ 1, 2, 3, 4, 5],
[ 2, 1, 3, 4, 5],
[-1, 2, 3, 0, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0]])
B =
array([[ 0],
[ 2],
[ 3],
[ 0],
[ 0]])
So we have a 5 variable optimization problem with 3 linear constraints.

You could try using the scipy.optimize.fmin_cobyla function, I don't know the numerical details so you should check it with values for which you know the expected answer and see if it works for your needs, play with the tolerance arguments rhoend and rhobeg and see if you get an expected answer, a sample program could be something like:
import numpy as np
import scipy.optimize
A = \
np.array([[ 1, 2, 3, 4, 5],
[ 2, 1, 3, 4, 5],
[-1, 2, 3, 0, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0]])
B = \
np.array([[0],
[2],
[3],
[0],
[0]])
def objfun(x):
H = 0.1*np.ones([5,5])
f = np.dot(np.transpose(x),np.dot(H,x))
return f
def constr1(x):
""" The constraint is satisfied when return value >=0 """
sol = A*x
if np.allclose(sol, B):
return 0.01
else:
# Return the negative distance between the expected solution
# and the actual solution for a somehow meaningful value
return -np.linalg.norm(B-sol)
scipy.optimize.fmin_cobyla(objfun, [0.0, 0.0, 0.0, 0.0,0.0], [constr1])
#np.linalg.solve(A, b)
Please note that this given example doesn't have a solution, try it with something that does. I am not completely sure that the constraint function is properly defined, try to find something that works well for you. You should try to provide an initial guess that it's an actual solution instead of [0.0, 0.0, 0.0, 0.0,0.0] for better results.
Check the oficial documentation for more details: http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_cobyla.html#scipy.optimize.fmin_cobyla
Edit: Also depending what kind of solution you are looking for you could probably form a better constrain function, maybe allowing values that are around a certain tolerance distance from the expected solution even if not completely exact, and returning a value higher than 0 the closer they are to the tolerance instead of always 0.1, etc...

The NLopt doc
mentions a neat general method:
all solutions of Ax = b have the form xany + nullspace(A) z,
where xany is one solution and dim(z) < dim(x) .
So minimize f( xany + nullspace(A) z ) over unconstrained z.
For example, in 3d, the constraint x0 + x1 + x2 = 1 has nullspace matrix
[ 1 0 ] : [z0 z1] -> [z0, -z0 + z1, -z1] -- sum 0
[ -1 1 ]
[ 0 -1 ]
("Some care is required in numerically computing the nullspace ...")

Related

Is there a vectorized way to apply a transformation matrix to a group of vectors?

Imagine you have a group of vectors, e.g. in the form of a trajectory. Is there a vectorized way of applying the transformation matrix to all data points at once, or are you stuck with a for-loop? Here is some sample code:
import numpy as np
angle = np.deg2rad(90)
rotM = np.array(
[
[np.cos(angle), -np.sin(angle), 0],
[np.sin(angle), np.cos(angle), 0],
[ 0, 0, 1],
]
)
# trajectory with columns t, x, y, z
trajectory = np.array(
[
[1, 1, 0, 0],
[2, 2, 1, 0],
[3, 3, 2, 0],
[4, 4, 3, 1],
[5, 6, 4, 2],
[6, 9, 5, 3],
]
)
# transform coordinates
for i in range(len(trajectory)):
trajectory[i][1:] = np.dot(rotM, trajectory[i][1:])
All I found so far is numpy.linalg.multi_dot, and these two posts (one, two), none of which seem to apply to my case.
For this case, use broadcasting along with np.matmul/#. You can multiply a 3x3 martrix by an Nx3x1 array of vectors:
trajectory[:, 1:] = rotM # trajectory[:, 1:, None]
A cleaner and more flexible solution might be to use scipy.spatial.transform.Rotation objects instead of hand-crafting the matrix yourself:
rotM = Rotation.from_euler('z', angle)
trajectory[:, 1:] = rotM.apply(trajectory[:, 1:])
No need to add shim dimensions in this case.

How to find linearly independent vectors belonging to the null space of a non-square matrix? (Python)

I have a non-square matrix, and a method to determine the null space of the matrix (found from this thread: How to find the Null Space of a matrix in Python using numpy?
), but I have a few problems with taking this solution.
For one, I'm not sure if the values I have are correct, since I'm not too sure what I'm looking for.
Secondly, I need to find two linearly independent vectors from this null space, but I do not know the next step from here to determine this.
Finally, I need to determine whether any of the columns of the matrix are linearly independent in R3 and R4.
Any help would be greatly appreciated.
Code:
import numpy as np
import scipy as sp
from scipy import linalg
a = np.matrix(
[
[ 3, 2, -1, 4],
[ 1, 0, 2, 3],
[-2, -2, 3, -1]
])
def null(A, eps=1e-15):
u, s, vh = linalg.svd(A)
null_mask = (s <= eps)
null_space = sp.compress(null_mask, vh, axis=0)
return sp.transpose(null_space)
print(null(a))
Output:
[[ 0.8290113 ]
[-0.2330726 ]
[ 0.24969281]
[-0.44279897]]
I'm assuming since the output is anything other than an empty matrix [] that there's something special about this matrix, I just don't know what it means.
I would recommend using sympy in this case:
from sympy import Matrix
a = Matrix([
[ 3, 2, -1, 4],
[ 1, 0, 2, 3],
[-2, -2, 3, -1]
])
print(a.nullspace())
Output:
[Matrix([
[ -2],
[7/2],
[ 1],
[ 0]]),
Matrix([
[ -3],
[5/2],
[ 0],
[ 1]])]
You can easily check that the result indeed belongs to the nullspace by explicitly checking that it is mapped to 0 when multiplying with your matrix a:
n1, n2 = a.nullspace()
print(a*n1, a*n2)
results in:
Matrix([[0], [0], [0]]) Matrix([[0], [0], [0]])
Finally, to get the linearly independent columns of your matrix in R3 you can use the function columnspace, which returns a list of column vectors that span the columnspace of the matrix
print(a.columnspace())
results in
[Matrix([
[ 3],
[ 1],
[-2]]), Matrix([
[ 2],
[ 0],
[-2]])]
which are the first two columns of the matrix.

Interpreting (and comparing) output from numpy.correlate

I have looked at this question but it hasn't really given me any answers.
Essentially, how can I determine if a strong correlation exists or not using np.correlate? I expect the same output as I get from matlab's xcorr with the coeff option which I can understand (1 is a strong correlation at lag l and 0 is no correlation at lag l), but np.correlate produces values greater than 1, even when the input vectors have been normalised between 0 and 1.
Example input
import numpy as np
x = np.random.rand(10)
y = np.random.rand(10)
np.correlate(x, y, 'full')
This gives the following output:
array([ 0.15711279, 0.24562736, 0.48078652, 0.69477838, 1.07376669,
1.28020871, 1.39717118, 1.78545567, 1.85084435, 1.89776181,
1.92940874, 2.05102884, 1.35671247, 1.54329503, 0.8892999 ,
0.67574802, 0.90464743, 0.20475408, 0.33001517])
How can I tell what is a strong correlation and what is weak if I don't know the maximum possible correlation value is?
Another example:
In [10]: x = [0,1,2,1,0,0]
In [11]: y = [0,0,1,2,1,0]
In [12]: np.correlate(x, y, 'full')
Out[12]: array([0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0])
Edit: This was a badly asked question, but the marked answer does answer what was asked. I think it is important to note what I have found whilst digging around in this area, you cannot compare outputs from cross-correlation. In other words, it would not be valid to use the outputs from cross-correlation to say signal x is better correlated to signal y than signal z. Cross-correlation does not provide this kind of information
numpy.correlate is under-documented. I think that we can make sense of it, though. Let's start with your sample case:
>>> import numpy as np
>>> x = [0,1,2,1,0,0]
>>> y = [0,0,1,2,1,0]
>>> np.correlate(x, y, 'full')
array([0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0])
Those numbers are the cross-correlations for each of the possible lags. To make that more clear, let's put the lag numbers above the correlations:
>>> np.concatenate((np.arange(-5, 6)[None,...], np.correlate(x, y, 'full')[None,...]), axis=0)
array([[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[ 0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0]])
Here, we can see that the cross-correlation reaches its peak at a lag of -1. If you look at x and y above, that makes sense: it one shifts y to the left by one place, it matches x exactly.
To verify this, let's try again, this time shifting y further:
>>> y = [0, 0, 0, 0, 1, 2]
>>> np.concatenate((np.arange(-5, 6)[None,...], np.correlate(x, y, 'full')[None,...]), axis=0)
array([[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[ 0, 2, 5, 4, 1, 0, 0, 0, 0, 0, 0]])
Now, the correlation peaks at a lag of -3, meaning that the best match between x and y occurs when y is shifted to the left by 3 places.

Scikit-learn χ² (chi-squared) statistic and corresponding contingency table

In the docs for the chi-squared univariate feature selection function of scikit-learn http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html, it states
This score can be used to select the n_features features with the highest values for the χ² (chi-square) statistic from X, which must contain booleans or frequencies (e.g., term counts in document classification), relative to the classes.
I am struggling to understand what the corresponding contingency table would look like, especially in the case of frequency features.
For example, consider the below dataset with boolean features and targets:
import numpy as np
>>> X = np.random.randint(2, size=50).reshape(10, 5)
array([[1, 0, 0, 0, 1],
[1, 1, 0, 1, 1],
[1, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 1, 1, 1],
[0, 1, 1, 0, 0],
[1, 0, 1, 1, 1],
[1, 1, 1, 1, 0]])
>>> y = np.random.randint(2, size=10)
array([1, 0, 0, 0, 1, 1, 1, 1, 0, 1])
To construct the contingency table with respect to the first feature, we can do this (excuse my PEP8 violation)
import scipy as sp
>>> contingency_table = sp.sparse.coo_matrix(
... (np.ones_like(y), (X[:, 0], y)),
... shape=(np.unique(X[:, 0]).shape[0], np.unique(y).shape[0])).A
array([[1, 2],
[3, 4]])
So now I can calculate the chi-squared statistic and its p-values
>>> sp.stats.chi2_contingency(contingency_table)
(0.17857142857142855,
0.67260381744151676,
1,
array([[ 1.2, 1.8],
[ 2.8, 4.2]]))
And this ought to be consistent with scikit-learn's chi2
from sklearn.feature_selection import chi2
>>> chi2_, pval = chi2(X, y)
>>> chi2_[0], pval[0]
(0.023809523809523787, 0.87737055606414338)
...Nope. Have I misinterpreted something?
Also, what does the contingency table look like in the case of frequencies? I assumed it would be something like
contingency_table = sp.sparse.coo_matrix(
(np.ones_like(y), (X[:, 0], y)),
shape=(X[:, 0].max()+1, np.unique(y).shape[0])).A
But the corresponding table of expected frequencies will most likely have several zero elements.
Edit:
To clarify further, consider the first feature X[:, 0] that is, say, gender and the targets y, say, handedness.
From this we get the cross tabulation
Right-handed Left-handed (!right-handed)
Male 1 2
Female (!male) 3 4
And we can assess the significance of the difference between the two proportions using the Chi-squared test by setting the expected frequency
sklearn.feature_selection.chi2 does this directly without resorting to explicitly computing the table and obtains the scores using a more efficient procedure that is equivalent to scipy.stats.chisquare.
After explicitly enumerating the table shown above, I wanted to verify it is consistent with chi2 when applying scipy.stats.chi2_contingency and to my dismay, it isn't. I'd like to ask why it isn't.
Consider a column x of X. sklearn.feature_selection.chi2 tests whether
the frequencies of the y values where x is 1 agree with the frequencies of y in
the full population. (#larsman's answer shows how you can reproduce the calculation with numpy and scipy.) This is not the same as the standard 2x2 contingency table
analysis of x and y. In a 2x2 contingency table analysis, the frequencies of y
where x is 0 also contribute to the test.
Suppose we form the contingency table for x and y:
| y=0 y=1
----+---------
x=0 | a b
x=1 | c d
Let n = a + b + c + d. This is the number of samples (i.e. same as len(x) and len(y)).
Let nx = c + d. This is the number of occurrences of 1 in x.
Let py1 = (b + d)/n. This is the fraction of the full population where y is 1.
sklearn.feature_selection.chi2 performs a chi2 test on [c, d] using the expected
values [(1-py1)*nx, py1*nx]. This is not the same as the standard contingency table
analysis of a 2x2 table.
Here's an extreme example. Suppose the 2x2 contingency table for x and y is
| y=0 y=1
----+----------
x=0 | 8 8
x=1 | 20 188
The sklearn calculation produces a chi2 score of 1.58, with a p-value of 0.208.
The contingency table analysis of scipy.stats.chi2_contingency gives a chi2 score of 18.6, with a p-value of 1.60e-5.
Given your data,
>>> X = array([[1, 0, 0, 0, 1],
... [1, 1, 0, 1, 1],
... [1, 0, 0, 0, 0],
... [0, 0, 0, 0, 0],
... [0, 0, 0, 0, 1],
... [1, 0, 0, 0, 1],
... [1, 0, 1, 1, 1],
... [0, 1, 1, 0, 0],
... [1, 0, 1, 1, 1],
... [1, 1, 1, 1, 0]])
>>> y = array([1, 0, 0, 0, 1, 1, 1, 1, 0, 1])
this is what feature_selection.chi2 computes:
>>> Y = np.vstack([1 - y, y])
>>> observed = np.dot(Y, X)
>>> observed
array([[3, 1, 1, 2, 2],
[4, 2, 3, 2, 4]])
These are the observed feature frequencies, per class, i.e. the contingency table. Then the expected values:
>>> feature_count = X.sum(axis=0)
>>> class_prob = Y.mean(axis=1)
>>> expected = np.dot(feature_count.reshape(-1, 1), class_prob.reshape(1, -1)).T
>>> expected
array([[ 2.8, 1.2, 1.6, 1.6, 2.4],
[ 4.2, 1.8, 2.4, 2.4, 3.6]])
Finally, it runs a χ² test:
>>> from scipy.stats import chisquare
>>> score, pval = chisquare(observed, expected)
>>> score
array([ 0.02380952, 0.05555556, 0.375 , 0.16666667, 0.11111111])
>>> pval
array([ 0.87737056, 0.81366372, 0.54029137, 0.6830914 , 0.73888268])
The scores are the relevant bit: they're used to sort the features by discriminative power. Note that you get one score and one p-value per feature.

Quickly rarefy a matrix in Numpy/Python

I need to (quickly) rarefy a matrix.
Rarefaction - transform abundance matrices to even sampling depth.
In this example, each row is a sample and the sampling depth is the sum of the row. I want to randomly sample (with replacement) the matrix by min(rowsums(matrix)) samples.
Suppose I have a matrix:
>>> m = [ [0, 9, 0],
... [0, 3, 3],
... [0, 4, 4] ]
The rarefaction function goes row by row randomly sampling with replacement min(rowsums(matrix)) times (which is 6 in this case).
>>> rf = rarefaction(m)
>>> rf
[ [0, 6, 0], # sum = 6
[0, 3, 3], # sum = 6
[0, 3, 3] ] # sum = 6
The results are random but the row sums are always the same.
>>> rf = rarefaction(m)
>>> rf
[ [0, 6, 0], # sum = 6
[0, 2, 4], # sum = 6
[0, 4, 2], ] # sum = 6
PyCogent has a function that does this row by row however it is very slow on large matrices.
I have a feeling that there is a function in Numpy that can do this but I'm not sure what it would be called.
import numpy as np
from numpy.random import RandomState
def rarefaction(M, seed=0):
prng = RandomState(seed) # reproducible results
noccur = np.sum(M, axis=1) # number of occurrences for each sample
nvar = M.shape[1] # number of variables
depth = np.min(noccur) # sampling depth
Mrarefied = np.empty_like(M)
for i in range(M.shape[0]): # for each sample
p = M[i] / float(noccur[i]) # relative frequency / probability
choice = prng.choice(nvar, depth, p=p)
Mrarefied[i] = np.bincount(choice, minlength=nvar)
return Mrarefied
Example:
>>> M = np.array([[0, 9, 0], [0, 3, 3], [0, 4, 4]])
>>> M
array([[0, 9, 0],
[0, 3, 3],
[0, 4, 4]])
>>> rarefaction(M)
array([[0, 6, 0],
[0, 2, 4],
[0, 3, 3]])
>>> rarefaction(M, seed=1)
array([[0, 6, 0],
[0, 4, 2],
[0, 3, 3]])
>>> rarefaction(M, seed=2)
array([[0, 6, 0],
[0, 3, 3],
[0, 3, 3]])
Cheers,
Davide
I think the question is not entirely clear. I suppose the rarefaction matrix gives you the number of samples you take from each coefficient of your original matrix?
Looking at the code in your link, there might be potential to speed it up. Operate on transposed matrices and rewrite the code of your link to operate on columns instead of rows. Because that would allow your processor to cache the values it samples better, i.e. there are less jumps in the memory.
The rest is as I would do it as well, using numpy (does not have to mean that that is the most efficient way).
If you need it faster, you can try to code the function in C++ and including it into your python with scipy.weave. In C++ I would go for every row and build a lookup table of positions that are >0, generate min(rowsums(matrix)) integers within the range equal to the number of items in the lookup table. I would accumulate how often each position in the lookup table was drawn and then put those numbers back into the right positions in the array. That code should literatlly be just a few lines.

Categories