Python SciPy linprog optimization fails with status 3 - python

Trying to minimize a simple linear function with linprog. The coefficients are the elements of arr2 multiplied by -1. There are only inequality constraints for each variable, such as -1 <= x1 <= 1, -2 <= x2 <= 2 and so on.
If a choose not to specify bounds in linprog:
from scipy.optimize import linprog
import numpy as np
import pandas as pd
numdim = 28
arr1 = np.ones(numdim)
arr1 = - arr1
arr2 = np.array([
19.53,
128.97,
3538,
931.8,
0.1825,
150.88,
10315,
0.8109,
3.9475,
3022,
31.77,
10323,
110.93,
220,
2219.5,
119.2,
703.6,
616,
338,
84.67,
151.13,
111.28,
29.515,
29.67,
158800,
167.15,
0.06802,
1179
])
constr_a = []
for i in range(numdim):
constr_default = np.zeros(numdim)
constr_default[i] = 1
constr_a.append(constr_default)
for i in range(numdim):
constr_default = np.zeros(numdim)
constr_default[i] = -1
constr_a.append(constr_default)
constr_a = np.asarray(constr_a)
constr_b = np.arange(1, 2*numdim + 1, 1)
constr_b[numdim:] = constr_b[:numdim]
print linprog(np.transpose(arr1 * arr2), constr_a, constr_b, bounds=(None, None))
I get the following result:
fun: -4327476.2887400016
message: 'Optimization failed. The problem appears to be unbounded.'
status: 3
I've tried changing the last row to:
print linprog(np.transpose(arr1 * arr2), constr_a, constr_b, bounds=(-1000, 1000))
The numbers specified as bounds are random. The output is:
fun: -4327476.2887400296
message: 'Optimization terminated successfully.'
status: 0
which gives us a slightly different result and the desired status.
My question is, do I misuse the library and in which way? Which answer is correct? This code was expected to work without specifying the 'bounds' parameter. I cannot use this parameter because these simple constraints are unique for each variable.
I use python 2.7 and scipy 0.17.1. Big thanks in advance.
Upd
constr_a should be a matrix according to the documentation (https://docs.scipy.org/doc/scipy/reference/optimize.linprog-simplex.html) and actually is in the code. To be sure the syntax is correct, we can cut the number of dimensions to 2:
from scipy.optimize import linprog
import numpy as np
import pandas as pd
numdim = 2
arr1 = np.ones(numdim)
arr1 = - arr1
arr2 = np.array([
19.53,
128.97
])
constr_a = []
for i in range(numdim):
constr_default = np.zeros(numdim)
constr_default[i] = 1
constr_a.append(constr_default)
for i in range(numdim):
constr_default = np.zeros(numdim)
constr_default[i] = -1
constr_a.append(constr_default)
constr_a = np.asarray(constr_a)
constr_b = np.arange(1, 2*numdim + 1, 1)
constr_b[numdim:] = constr_b[:numdim]
print constr_a
print constr_b
print linprog(np.transpose(arr1 * arr2), constr_a, constr_b, bounds=(None, None))
and this will work.

the constr_a list is not properly formed. It is an array of array's instead of being an array of scalar. This might be leading to a improper lower bound causing the optimization to fail.
Perhaps
constr_a.append(constr_default)
should be
constr_a.append(constr_default[i])
inspect both the bound arrays to make sure they have proper form and values.

Related

SymPy lambdify gives wrong result, while *.subs gives the accruate one

Sorry for bothering you with this. I have a serious issue and now im on clock to solve it, so here is my question.
I have an issue where I lambdify a quantity, but the result of the quantity differs from the ".subs" result, and sometimes it's way off, or it's a NaN, where in reality there is a real number (found by subs)
Here, I have a small MWE where you can see the issue! Thanks in advance for ur time
import sympy as sy
import numpy as np
##STACK
#some quantities needed before u see the problem
r = sy.Symbol('r', real=True)
th = sy.Symbol('th', real=True)
e_c = 1e51
lf0 = 100
A = 1.6726e-24
#here are some quantities I define to go the problem
lfac = lf0+2
rd = 4*3.14/4/sy.pi/A/lfac**2
xi = r/rd #rescaled r
#now to the problem:
#QUANTITY
lfxi = xi**(-3)*(lfac+1)/2*(sy.sqrt( 1 + 4*lfac/(lfac+1)*xi**(3) + (2*xi**(3)/(lfac+1))**2) -1)
#RESULT WITH SUBS
print(lfxi.subs({th:1.00,r:1.00}).evalf())
#RESULT WITH LAMBDIFY
lfxi_l = sy.lambdify((r,th),lfxi)
lfxi_l(0.01,1.00)
##gives 0
The issue is that your mpmath precision needs to be set higher!
By default mpmath uses prec=53 and dps=15, but your expression requires a much higher resolution than this for it
# print(lfxi)
3.0256512324559e+62*(sqrt(1.09235114769539e-125*pi**6*r**6 + 6.74235013645028e-61*pi**3*r**3 + 1) - 1)/(pi**3*r**3)
...
from mpmath import mp
lfxi_l = sy.lambdify((r,th),lfxi, modules=["mpmath"])
mp.dps = 125
print(lfxi_l(1.00,1.00))
# 101.999... result
Changing a couple of the constants to "modest" values:
In [89]: e_c=1; A=1
The different methods produce essentially the same thing:
In [91]: lfxi.subs({th:1.00,r:1.00}).evalf()
Out[91]: 1.00000000461176
In [92]: lfxi_l = sy.lambdify((r,th),lfxi)
In [93]: lfxi_l(1.0,1.00)
Out[93]: 1.000000004611762
In [94]: lfxi_m = sy.lambdify((r,th),lfxi, modules=["mpmath"])
In [95]: lfxi_m(1.0,1.00)
Out[95]: mpf('1.0000000046117619')

efficient way to get all numpy slices for different ranges

I want to slice the same numpy array (data_arra) multiple times to find each time the values in a different range
data_ar shpe: (203,)
range_ar shape: (1000,)
I implemented it with a for loop, but it takes way to long since I have a lot of data_arrays:
#create results array
results_ar = np.zeros(shape=(1000),dtype=object)
i=0
for range in range_ar:
results_ar[i] = data_ar[( (data_ar>=(range-delta)) & (data_ar<(range+delta)) )].values
i+=1
so for example:
data_ar = [1,3,4,6,10,12]
range_ar = [7,4,2]
delta= 3
expected output:
(note results_ar shpae=(3,) dtype=object, each element is an array)
results_ar[[6,10];
[1,3,4,6];
[1,3,4]]
some idea on how to tackle this?
You can use numba to speed up the computations.
import numpy as np
import numba
from numba.typed import List
import timeit
data_ar = np.array([1,3,4,6,10,12])
range_ar = np.array([7,4,2])
delta = 3
def foo(data_ar, range_ar):
results_ar = list()
for i in range_ar:
results_ar.append(data_ar[( (data_ar>=(i-delta)) & (data_ar<(i+delta)) )])
print(timeit.timeit(lambda :foo(data_ar, range_ar)))
#numba.njit(parallel=True, fastmath=True)
def foo(data_ar, range_ar):
results_ar = List()
for i in range_ar:
results_ar.append(data_ar[( (data_ar>=(i-delta)) & (data_ar<(i+delta)) )])
print(timeit.timeit(lambda :foo(data_ar, range_ar)))
15.53519330600102
1.6557575029946747
An almost 9.8 times speedup.
You could use np.searchsorted like this:
data_ar = np.array([1, 3, 4, 6, 10, 12])
range_ar = np.array([7, 4, 2])
delta = 3
bounds = range_ar[:, None] + delta * np.array([-1, 1])
result = [data_ar[slice(*row)] for row in np.searchsorted(data_ar, bounds)]

Using numpy digitize output in scipy minimize problem

I am trying to minimize the quadratic weighted kappa function using scipy minimize fmin Powell function.
The two functions digitize_train and digitize_train2 gives 100% EXACT same results.
However, when I tried to use these functions with scipy minimize the second method fails.
I have been trying to debug the problem for hours, to my surprise despite the two functions being exact same the bumpy digitize function fails to give fmin Powell mimimization.
How to fix the error?
Question
How to use numpy.digitize in scipy fmin_powell?
SETUP
# imports
import numpy as np
import pandas as pd
import seaborn as sns
from scipy.optimize import fmin_powell
from sklearn import metrics
# data
train_labels = [1,1,8,7,6,5,3,2,4,4]
train_preds = [0.1,1.2,8.9, 7.6, 5.5, 5.5, 2.99, 2.4, 3.5, 4.0]
guess_lst = (1.5,2.9,3.1,4.5,5.5,6.1,7.1)
# functions
# here I am trying the convert real numbers -inf to +inf to integers 1 to 8
def digitize_train(train_preds, guess_lst):
(x1,x2,x3,x4,x5,x6,x7) = list(guess_lst)
res = []
for y in list(train_preds):
if y < x1:
res.append(1)
elif y < x2:
res.append(2)
elif y < x3:
res.append(3)
elif y < x4:
res.append(4)
elif y < x5:
res.append(5)
elif y < x6:
res.append(6)
elif y < x7:
res.append(7)
else: res.append(8)
return res
def digitize_train2(train_preds, guess_lst):
return np.digitize(train_preds,guess_lst) + 1
# compare two functions
df = pd.DataFrame({'train_labels': train_labels,
'train_preds': train_preds,
'method_1': digitize_train(train_preds, guess_lst),
'method_2': digitize_train2(train_preds, guess_lst)
})
df
** NOTE: The two functions are exact same**
Method 1: without numpy digitize runs fine
# using fmin_powel for method 1
def get_offsets_minimizing_train_preds_kappa(guess_lst):
res = digitize_train(train_preds, guess_lst)
return - metrics.cohen_kappa_score(train_labels, res,weights='quadratic')
offsets = fmin_powell(get_offsets_minimizing_train_preds_kappa, guess_lst, disp = True)
print(offsets)
Method 2: using numpy digitize fails
# using fmin_powell for method 2
def get_offsets_minimizing_train_preds_kappa2(guess_lst):
res = digitize_train2(train_preds, guess_lst)
return -metrics.cohen_kappa_score(train_labels, res,weights='quadratic')
offsets = fmin_powell(get_offsets_minimizing_train_preds_kappa2, guess_lst, disp = True)
print(offsets)
How to use numpy digitize method?
Update
As per suggestions I tried pandas cut, but still gives error.
ValueError: bins must increase monotonically.
# using fmin_powell for method 3
def get_offsets_minimizing_train_preds_kappa3(guess_lst):
res = pd.cut(train_preds, bins=[-np.inf] + list(guess_lst) + [np.inf],
right=False)
res = pd.Series(res).cat.codes + 1
res = res.to_numpy()
return -metrics.cohen_kappa_score(train_labels, res,weights='quadratic')
offsets = fmin_powell(get_offsets_minimizing_train_preds_kappa3, guess_lst, disp = True)
print(offsets)
It seems that during the minimization process, the value in guest_lst are not monotonically increasing anymore, one work around is to pass the sorted of guest_lst in digitize like:
def digitize_train2(train_preds, guess_lst):
return np.digitize(train_preds,sorted(guess_lst)) + 1
and you get
# using fmin_powell for method 2
def get_offsets_minimizing_train_preds_kappa2(guess_lst):
res = digitize_train2(train_preds, guess_lst)
return -metrics.cohen_kappa_score(train_labels, res,weights='quadratic')
offsets = fmin_powell(get_offsets_minimizing_train_preds_kappa2, guess_lst, disp = True)
print(offsets)
Optimization terminated successfully.
Current function value: -0.990792
Iterations: 2
Function evaluations: 400
[1.5 2.7015062 3.1 4.50379942 4.72643334 8.12463415
7.13652301]

NLopt minimize eigenvalue, Python

I have matrices where elements can be defined as arithmetic expressions and have written Python code to optimise parameters in these expressions in order to minimize particular eigenvalues of the matrix. I have used scipy to do this, but was wondering if it is possible with NLopt as I would like to try a few more algorithms which it has (derivative free variants).
In scipy I would do something like this:
import numpy as np
from scipy.linalg import eig
from scipy.optimize import minimize
def my_func(x):
y, w = x
arr = np.array([[y+w,-2],[-2,w-2*(w+y)]])
ev, ew=eig(arr)
return ev[0]
x0 = np.array([10, 3.45]) # Initial guess
minimize(my_func, x0)
In NLopt I have tried this:
import numpy as np
from scipy.linalg import eig
import nlopt
def my_func(x,grad):
arr = np.array([[x[0]+x[1],-2],[-2,x[1]-2*(x[1]+x[0])]])
ev, ew=eig(arr)
return ev[0]
opt = nlopt.opt(nlopt.LN_BOBYQA, 2)
opt.set_lower_bounds([1.0,1.0])
opt.set_min_objective(my_func)
opt.set_xtol_rel(1e-7)
x = opt.optimize([10.0, 3.5])
minf = opt.last_optimum_value()
print "optimum at ", x[0],x[1]
print "minimum value = ", minf
print "result code = ", opt.last_optimize_result()
This returns:
ValueError: nlopt invalid argument
Is NLopt able to process this problem?
my_func should return double, posted sample return complex
print(type(ev[0]))
None
<class 'numpy.complex128'>
ev[0]
(13.607794065928395+0j)
correct version of my_func:
def my_func(x, grad):
arr = np.array([[x[0]+x[1],-2],[-2,x[1]-2*(x[1]+x[0])]])
ev, ew=eig(arr)
return ev[0].real
updated sample returns:
optimum at [ 1. 1.]
minimum value = 2.7015621187164243
result code = 4

Interval containing specified percent of values

With numpy or scipy, is there any existing method that will return the endpoints of an interval which contains a specified percent of the values in a 1D array? I realize that this is simple to write myself, but it seems like the kind of thing that might be built in, although I can't find it.
E.g:
>>> import numpy as np
>>> x = np.random.randn(100000)
>>> print(np.bounding_interval(x, 0.68))
Would give approximately (-1, 1)
You can use np.percentile:
In [29]: x = np.random.randn(100000)
In [30]: p = 0.68
In [31]: lo = 50*(1 - p)
In [32]: hi = 50*(1 + p)
In [33]: np.percentile(x, [lo, hi])
Out[33]: array([-0.99206523, 1.0006089 ])
There is also scipy.stats.scoreatpercentile:
In [34]: scoreatpercentile(x, [lo, hi])
Out[34]: array([-0.99206523, 1.0006089 ])
I don't know of a built-in function to do it, but you can write one using the math package to specify approximate indices like this:
from __future__ import division
import math
import numpy as np
def bound_interval(arr_in, interval):
lhs = (1 - interval) / 2 # Specify left-hand side chunk to exclude
rhs = 1 - lhs # and the right-hand side
sorted = np.sort(arr_in)
lower = sorted[math.floor(lhs * len(arr_in))] # use floor to get index
upper = sorted[math.floor(rhs * len(arr_in))]
return (lower, upper)
On your specified array, I got the interval (-0.99072237819851039, 0.98691691784955549). Pretty close to (-1, 1)!

Categories