Solving linear equations in Python (not working using linalg.solve) - python

Probably this is a very beginner question. I am new to python and such operation. But would appreciate any help.
I am trying to solve a system of linear equations in Python, using numpy linalg.
x+y+z = 2
2x-6y-z=-1
3x-2z=8
I tried to use linalg.solve() function
a = np.array([[1,2,3],[1,-6,0],[1,-1,-2]])
b = np.array([2,-1,8])
try:
x1 = linalg.solve(a,b)
except LinAlgError:
x1 = linalg.lstsq(a,b)[0]
print(x1)
print(np.dot(a,x1) == b)
Here is the output
[ 5.38709677 1.06451613 -1.83870968]
[ True True True]
However, this doesn't work if we put these received values instead x,y,z in equations.
I tried another approach
x = np.dot(np.linalg.inv(a), b)
print(x)
print(np.dot(a,x) == b)
I received the same output as before:
[ 5.38709677 1.06451613 -1.83870968]
This method only worked if we change the places of b and np.linalg.inv(a). Now it gives correct output.
x = np.dot(b, np.linalg.inv(a))
print(x)
print(np.dot(x,a) == b)
It works in equations
[ 2. 1. -1.]
So, here is the question.
Could anyone please explain why cannot I get [ 2. 1. -1.] using linalg.solve?

The matrix a describing the left-hand side of the equation is set up the wrong way around;
np.linalg.solve(a.T, b)
does what you want to do with the given a.
That your second approach does the job boils down to the fact that for any 2-dimensional ndarray a, and any 1-dimensional ndarray x for which the shapes match, np.dot(a, x) will equal np.dot(x, a.T). This is the case since in the two operations, viewed as matrix multiplications, x is treated as a column and row vector respectively, and for any matrix $A$ and any vector $x$, $Ax = (x^TA^T)^T$.

Looks like your a does not match the equations you wrote at the beginning of your question. You have
a = np.array([[1,2,3],[1,-6,0],[1,-1,-2]])
Which corresponds to:
x + 2y + 3z
x - 6y
x - y - 2z
Which isn't what you put at the top of your question.
Note that the [True, True, True] is showing you this. The fact that the dot product produced values matching b indicates that it did solve for the a and b in your code.

Related

Using Python to use Euler's formula into matrix approximation

I am trying to use Python and NumPy to use Euler’s formula e^i(π) represented as a matrix as e^A where
A = [0 -π]
[π 0]
,and then apply it to the Maclaurin series for an exponential function e^x as
SUMMATION(n=0, infinity) x^n/n! = 1 + x + x^2/2! + x^3/3! +...
So I am trying to compute an approximation matrix S^N+1 and print the matrix and it's four entries.
I have tried emulating euler's and maclaurin's series, which i think the final approximation matrix for this will be when N = 20, but currently my values do not add up. I am also trying to use np.linalg.norm to compute a 2 norm as well.
import math
import numpy as np
n = 0
A = np.eye(2)
A = math.pi * np.rot90(A)
A[0,1] = -A[0,1]
A
mac_series = 0
while n < 120:
print(n)
n += 1
mac_series = (A**n) / (math.factorial(n))
print("\n",mac_series)
np.linalg.norm(mac_series)
The main problem here is that you are confusing A**3 with A#A#A.
Just look at case n=0.
A**0
#array([[1., 1.],
# [1., 1.]])
I am pretty sure, you were expecting A⁰ to be identity (that is only that way that this thinking of x+iy ⇔ np.array([[x,-y],[y,x]]) makes sense)
In numpy, you have np.linalg.matrix_power for that (or you could just accumulate power your self)
sum(np.linalg.matrix_power(A,i) / math.factorial(i) for i in range(20))
is
array([[-1.00000000e+00, 5.28918267e-10],
[-5.28918267e-10, -1.00000000e+00]])
for example. Pretty sure that is what you were expecting (that is the matrix that represents real -1 using the same logic. And whole point of Euler identity is e^(iπ) = -1).
By comparison,
sum(A**i / math.factorial(i) for i in range(20))
returns
array([[ 1. , 0.04321392],
[23.14069263, 1. ]])
Which is just the maclaurin series computed for all four elements of the matrix. In other words, since your matrix is [[0,-π],[π,0]], you are evaluating using a MacLauring series [[e⁰, exp(-π)], [exp(π), e⁰]]. And it works. e⁰=1, obviously. exp(π) is 23.140692632779267, so we got a very good approximation in our result. And exp(-π) is the inverse, 0.04321391826377226. We also got a good approximation.
So it works. Just not at all to do what you obviously intend to do: prove Euler identity's in matrix form; compute exp(iπ) not just exp(π).
Without matrix_power, and with a code closer to your initial code, you could
n=0
mac_series = 0
Apowern=np.eye(2) # A⁰=Id for now
while n < 20:
print(n)
mac_series += Apowern / (math.factorial(n))
Apowern = Apowern # A # # is the matrix multiplication operator
n+=1
Note that I've also moved n+=1 which was misplaced in your code. You were stacking Aⁿ⁺¹/(n+1)! not Aⁿ/n! with your code (in other words, your sum misses the A⁰/0!=Id term).
With this, I get the expected result
>>> mac_series
array([[-1.00000000e+00, 5.28918724e-10],
[-5.28918724e-10, -1.00000000e+00]])
Last problem, more subtle: you may have noticed that I do only 20 iterations, not 120. That is because after 20, you start to have a numerical problem. Apowern (or np.linalg.matrix_power(A,n), it is the same problem for both methods) becomes to big. Since it is divided by n! in the stacking, that doesn't prevent convergence. But it does prevent numeric convergence. And, in practice, after a while, numpy change the type of Apowern.
So, we should not have big matrix divided by big number, and try to iterate things that stay small enough. Like this for example
n=0
mac_series = 0
NthTerm=np.eye(2) # Aⁿ/n!. A⁰/0!=Id for now
while n < 120: # 120 is no longer a problem
print(n)
mac_series += NthTerm
n += 1
NthTerm = (NthTerm # A) / n # so if nthterm was
# Aⁿ/n!, now it becomes Aⁿ/n! # A/(n+1) = Aⁿ⁺¹/(n+1)!
Result
>>> mac_series
array([[-1.00000000e+00, -2.34844612e-16],
[ 2.34844612e-16, -1.00000000e+00]])
tl;dr
You have 4 problems
The one already mentioned by Roy: you are not accumulating the Aⁿ/n!, just replacing them, and eventually keeping only the last. In other words, you need a += instead of =
A**n is not Aⁿ. It is just A, with all the elements to the power n. Said otherwise [[x,-y],[y,x]]**n is not [[x,-y],[y,x]]ⁿ it is [[xⁿ,(-y)ⁿ],[yⁿ,xⁿ]]. So you'll end up computing [[e⁰, 1/e^π], [e^π, e⁰]] ≈ [[1, 0.0432], [23.14, 1]] which is irrelevant.
n+=1 is misplaced
The numerical problem due to Aⁿ becoming huge (even if you intend to divide it by a even huger n!, so it does not theoretically/mathematically pose a problem, but numerically it does, since intermediate result is to big for computer)

Enumerate solutions in Gurobi

I have a LP problem without any objective i.e. it looks like Ax <= B. Now the feasible set contains potentially infinitely many solutions. Is there any way to enumerate different solutions, which are reasonably different among each other?
For now, I am using this code which chooses a random optimization function, hoping that would produce different solutions.
import gurobipy as gp
import numpy as np
def solve(A, B):
model = gp.Model()
model.Params.OutputFlag = False
x = model.addVars(A.shape[1]).values()
for a, b in zip(A, B):
expr = gp.LinExpr(b)
expr.addTerms(a, x)
model.addConstr(expr <= 0)
expr = gp.LinExpr()
for x_ in x:
if np.random.random() < 0.5:
expr.add(x_)
else:
expr.add(-x_)
model.setObjective(expr, gp.GRB.MAXIMIZE)
model.optimize()
return np.array([x_.x for x_ in x])
n_constr = 6
n_var = 5
A = np.random.random((n_constr, n_var)) * 2 - 1
B = np.random.random((n_constr,)) * 2 - 1
for i in range(3):
print(solve(A, B))
One sample output
[ 1.59465412 0. 0. -0.77579453 0. ]
[-1.42381457 0. 0. -7.70035252 -8.55823707]
[1.8797086 0. 0. 7.24494007 4.43847791]
Is there any elegant Gurobi specific solution?
Your approach is of course a perfectly valid method to sample different solutions. One other possible way is to enumerate all basic feasible solutions (corner points). Not so easy however. There is an interesting trick for this: use binary variables to encode the basis (0=non-basic,1=basic) and then use the Gurobi solution pool to enumerate all feasible integer solutions. That will give you all basic feasible solutions. See link for details.
I think it depends on what you mean by reasonably different. Erwin's suggested approach works well if you want solutions at a gap of 1. If you want solutions with a gap of 0.5, define a dummy variable Z = 2X and set Z as an integer and X as continuous. You can then use the solution pool in Gurobi to enumerate any number of solutions (specified using PoolSolutions parameter Link. Similarly, you can vary the factor k in Z = k X to get solutions at some gap. You don't necessarily have to set any objective when using a solution pool. In that case, do not define the PoolGapparameter.

Using python, how can I optimize a system of equations and get integer-only solutions? [duplicate]

I'm trying to solve an overdetermined linear system of equations with numpy. Currently, I'm doing something like this (as a simple example):
a = np.array([[1,0], [0,1], [-1,1]])
b = np.array([1,1,0])
print np.linalg.lstsq(a,b)[0]
[ 1. 1.]
This works, but uses floats. Is there any way to solve the system over integers only? I've tried something along the lines of
print map(int, np.linalg.lstsq(a,b)[0])
[0, 1]
in order to convert the solution to an array of ints, expecting [1, 1], but clearly I'm missing something. Could anyone point me in the right direction?
You should use specialized integer problem solvers (note that integer problems are not even simple to solve). openopt is a package that for example should provide good wrappers for integer quadratic optimization such as you are doing. Trying to use linear algebra will simply not give you the correct solution that directly.
Your problem can be written as with a quadratic program, but it is an integer one, so use openopt or some other module for that. Since it is a very simple, unconstrained one, maybe there is some other approach. But for starters it is not the simple problem it looks like at first, and there are programs in openopt, etc. ready to solve this kind of thing efficiently.
You are looking at a system of linear diophantine equations. A quick Google search comes up with
Systems of Linear Diophantine Equations by Felix Lazebnik. In that paper, the author considers the following question:
Given a system of linear equations Ax = b, where A = a(i,j) is an m × n matrix
with integer entries, and b is an m × 1 column vector with integer components, does the system
have an integer solution, i.e. an n × 1 solution vector x with integer components?
When you convert to int, the decimal part of the elements gets truncated, so it is rounded down.
a = np.array([[1,0], [0,1], [-1,1]])
b = np.array([1,1,0])
x = np.linalg.lstsq(a,b)[0]
Result:
>>> x
array([ 1., 1.])
>>> x[0]
0.99999999999999967
>>> x[1]
1.0000000000000002
>>> x.astype(int)
array([0, 1])
>>> map(int, x)
[0, 1]
>>> np.array([1.,1.]).astype(int) # works fine here
array([1, 1])
I may be misunderstanding your problem, but I think you just need a combination of round and then astype(int)?
E.g.
a = np.array([[1,0], [0,1], [-1,1]])
b = np.array([1,1,0])
x = np.linalg.lstsq(a,b)[0]
print x.round().astype(int)
+1 to seberg, here is a counter example to illustrate that you should not map round:
(Sorry, it's matlab style, but you'll easily pythonize)
a =
3 0
0 3
1 1
b =
2.71
11.7
0.5
x = a\b =
0.5121
3.5088
round(x) =
1
4
norm(a*round(x)-b) = 4.5193
norm(a*[0;4]-b) = 4.4367
norm(a*[1;3]-b) = 4.4299
I needed to do this and ended up porting a PHP program written by Keith Matthews, which you can find on http://www.numbertheory.org/php/php.html, into Python. I initially used Numpy arrays but ran into problems with integer overflows so switched to Sympy matrices, which use arbitrary precision numerical representations.
The code is released on GitHub at https://github.com/tclose/Diophantine under the MIT licence, so feel free to use it and let me know if you run into any problems (sorry it isn't better documented). The master branch uses Sympy but you can access the original Numpy implementation in the 'numpy' branch, which seems to work okay for reasonably sparse systems.
If you do end up using it for a scientific publication please cite Keith's papers (and maybe add a link to the GitHub repo).
My approach is to find non-ineger solution first, then scale up to integer one
from fractions import Fraction, gcd
from functools import reduce
def lcm(a, b):
return a * b // gcd(a, b)
def common_int(*numbers):
fractions = [Fraction(n).limit_denominator() for n in numbers[0]]
multiple = reduce(lcm, [f.denominator for f in fractions])
ints = [f * multiple for f in fractions]
divisor = reduce(gcd, ints)
return [int(n / divisor) for n in ints]
sol = np.linalg.solve(np.array([[1, 2, 3], [2, 1, 0], [2, 1, 4]]), np.array([1., 1., 1.])) # system of equation
# [0.3333333, 0.3333333, 0.]
common_int(sol)
# [1., 1., 0.]
There is a method called block lanczos. This can your answer over a finite field. There are block lanczos solvers you find for this specific problem.

Additional information on numpy.einsum()

I am trying to understand numpy.einsum() function but the documentation as well as this answer from stackoverflow still leave me with some questions.
Let's take the einstein sum and the matrices defined in the answer.
A = np.array([0, 1, 2])
B = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
np.einsum('i,ij->i', A, B)
So, based on my understanding of einstein sum, I would translate this function to be equivalent to the notation (A_i*B_ij) so I would obtain :
j = 1 : A_1*B_11 + A_2*B_21 + A_3*B_31
j = 2 : A_1*B_12 + A_2*B_22+ A_3*B_32
and so on until j = 4. This gives
j = 1 : 0 + 4 + 16
j = 2 : 0 + 5 + 18
which would be the einstein summation according to my understanding. Instead, the function does not perform the overall sum but stores the separate term in a matrix where we can spot the results of the (A_i * B_ij)
0 0 0 0
4 5 6 7
16 18 20 22
How is this actually controlled by the function ? I feel this is controlled by the output labels as mentionned in the documentation :
The output can be controlled by specifying output subscript labels as
well. This specifies the label order, and allows summing to be
disallowed or forced when desired
so somehow I assume that putting ->i disables summing of the inner sums. But how does this work exactly ? This is not clear for me. Putting ->j provides the actual einstein sum as expected.
It seems your understanding of the Einstein summation is not correct. The subscript operations you've written out have the multiplication correct, but the summation is over the wrong axis.
Think about what this means: np.einsum('i,ij->i', A, B).
A has shape (i,) and B has shape (i, j).
Multiply every column of B by A.
Sum over the second axis of B, i.e., over the axis labeled j.
This gives an output of shape (i,) == (3,), while your subscript notation gives an output of shape (j,) == (4,). You're summing over the wrong axis.
More detail:
Remember that the multiplication always happens first. The left-hand subscripts tell the np.einsum function which rows/columns/etc of the input arrays are to be multiplied with one another. The output of this step always has the same shape as the highest-dimensional input array. I.e., at this point, a hypothetical "intermediate" array has shape (3, 4) == B.shape.
After multiplication, there is summation. This is controlled by which subscripts are omitted from the right-hand side. In this case, j is omitted, which means to sum along the first axis of the array. (You're summing along the zeroth.)
If you instead wrote: np.einsum('i,ij->ij', A, B), there would be no summation, as no subscripts are omitted. Thus you'd get the array you've got at the end of your question.
Here are a couple of examples:
Ex 1:
No omitted subscripts, so no summation. Just multiply columns of B by A. This is the last array you've written out.
>>> (np.einsum('i,ij->ij', A, B) == (A[:, None] * B)).all()
True
Ex 2:
Same as the example. Multiply columns, then sum across the output's columns.
>>> (np.einsum('i,ij->i', A, B) == (A[:, None] * B).sum(axis=-1)).all()
True
Ex 3:
The sum as you've written it above. Multiply columns, then sum across the output's rows.
>>> (np.einsum('i,ij->j', A, B) == (A[:, None] * B).sum(axis=0)).all()
True
Ex 4:
Note that we can omit all axes at the end, to just get the total sum across the whole array.
>>> np.einsum('i,ij->', A, B)
98
Ex 5:
Note that the summation really happens because we repeated the input label 'i'. If we instead use different labels for each axis of the input arrays, we can compute things similar to Kronecker products:
>>> np.einsum('i,jk', A, B).shape
(3, 3, 4)
EDIT
The NumPy implementation of the Einstein sum differs a bit from the traditional definition. Technically, the Einstein sum doesn't have the idea of "output labels". Those are always implied by the repeated input labels.
From the docs: "Whenever a label is repeated, it is summed." So, traditionally, we'd write something like np.einsum('i,ij', A, B). This is equivalent to np.einsum('i,ij->j', A, B). The i is repeated, so it is summed, leaving only the axis labeled j. You can think about the sum in which we specify no output labels as being the same as specifying only the labels that are not repeated in the input. That is, the label 'i,ij' is the same as 'i,ij->j'.
The output labels are an extension or augmentation implemented in NumPy, which allow the caller to force summation or to enforce no summation on an axis. From the docs: "The output can be controlled by specifying output subscript labels as well. This specifies the label order, and allows summing to be disallowed or forced when desired."

numpy: unexpected result when dividing a vertical array by one of its own elements

So, I created an vertical numpy array, used the /= operator and the output seems to be incorrect.
Basically if x is a vector, s a scalar. I would expect x /= s have every entry of x divided by s. However, I couldn't make much sense of the output. The operator is only applied on part of the entries in x, and I am not sure how they are chosen.
In [8]: np.__version__
Out[8]: '1.10.4'
In [9]: x = np.random.rand(5,1)
In [10]: x
Out[10]:
array([[ 0.47577008],
[ 0.66127875],
[ 0.49337183],
[ 0.47195985],
[ 0.82384023]]) ####
In [11]: x /= x[2]
In [12]: x
Out[12]:
array([[ 0.96432356],
[ 1.3403253 ],
[ 1. ],
[ 0.95660073],
[ 0.82384023]]) #### this entry is not changed.
Your value of x[2] changes to 1 midway through the evaluation. you need to make a copy of the value then divide each element by it, either assign it to another variable or use copy i.e.
from copy import copy
x /= copy(x[2])
To understand why we need to do this lets look under the hood of what is happening.
In [9]: x = np.random.rand(5,1)
Here we define x as an array, but what isn't exactly clear that each element in this array is technically an array also. This is the important distinction, as we are not dealing with defined values rather numpy array objects so in the next line:
In [11]: x /= x[2]
We end up essentially 'looking up' the value in x[2] which returns an array with one value, but because we're looking this up each time it is possible to change.
A cleaner solution would be to flatten the array into 1d therefore x[2] with now equal 0.49337183 instead of array( [0.49337183])
So before we do x /= x[2] we can call x = x.flatten()
Or better yet keep it 1d from the start x = np.random.rand(5)
And as for the reason x[3] changes and x[4] does not, the only real helpful answer I can give is that the division does not happen in order, complex buffering timey wimey stuff.
It is only for odd size of vector in theory but if you do :
x = np.random.rand(5,1)
a = x[2]*1
x/=a
it will be working

Categories