Minimization of an equation using Python - python

I have four vectors.
x = [0.4, -0.3, 0.9]
y1 = [0.3, 1, 0]
y2 = [1, -0.9, 0.5]
y3 =[0.6, 0.01, 0.8]
I need to minimize following equation:
where 0 <= a,b,g <= 1. I have tried to use scipy.minimize but I could not understand how that can be used for this equation. Is there any library for optimization that I can use or is there any easier way in Python to do it?
My ultimate goal is to find values of a,b,g between 0-1 that give me minimum value given these four vectors as input.

Edit 0: I fixed the problem by using a Bounds instance. The array x should be what you are looking for. Here is the answer.
fun: 0.34189582276366093
hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
jac: array([ 6.91014296e-01, 3.49720253e-07, -2.88657986e-07])
message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 40
nit: 8
status: 0
success: True
x: array([0. , 0.15928136, 0.79907217])
I worked on that a little bit. I get stuck with an error, but I feel like I am on the correct way to solve it. Here is the code.
import numpy as np
from scipy.optimize import Bounds,minimize
def cost_function(ini):
x = np.array([0.4, -0.3, 0.9])
y1 = np.array([0.3, 1, 0])
y2 = np.array([1, -0.9, 0.5])
y3 =np.array([0.6, 0.01, 0.8])
L = np.linalg.norm(np.transpose(x) - np.dot(ini[0],y1) - np.dot(ini[1],y2) - np.dot(ini[2],y3))
return L
ini = np.random.rand(3)
min_b= np.zeros(3)
max_b= np.ones(3)
bnds=Bounds(min_b,max_b)
print(minimize(cost_function,x0 =ini,bounds=bnds))
However, I am getting the error ValueError: length of x0 != length of bounds, although the lengths are equal. I could not find a solution, maybe you do. Good luck! Let me know if you find a solution and if it works!

Related

Issue with scipy minimizer and equation

I am trying to minimize a simple equation with sklearn minimizer but, weirdly, it seems the minimizer does not even try and send me back really bad result.
The equation has two different variable that I'd like to optimize for the formula to be minimized, here is the code I use:
from scipy.stats import poisson
import scipy.optimize
def objective_function(guess):
x = guess[0]
y = guess[1]
return poisson.pmf(1,x) * poisson.pmf(2,y) - 1/9.4 + poisson.pmf(1,x) * poisson.pmf(3,y) - 1/14
initialGuess = [0.0, 0.0]
scipy.optimize.minimize(objective_function, initialGuess)
and here is the result I guess from the minimizer
fun: -0.1778115501519757
hess_inv: array([[1, 0],
[0, 1]])
jac: array([0., 0.])
message: 'Optimization terminated successfully.'
nfev: 3
nit: 0
njev: 1
status: 0
success: True
x: array([0., 0.])
Trying on my side I can clearly see that it is not even close the best answer as [1, 1.5] will for example return me -0.03.
Is there a big thing I am missing with the optimizer from scipy?

Keep receiving Too many indices for array for interpolation

interp - Program to interpolate data using Lagrange
I am not able to complete the for-loop in the coding sequence below. I don't see anything wrong with it, since I choose np.empty(nplot) to create the 1D array for xi, and for some reason the loop won't fill those values.
def intrpf(xi,x,y):
"""Function to interpolate between data points
using Lagrange polynomial (quadratic)
Inputs
x Vector of x coordinates of data points (3 values)
y Vector of y coordinates of data points (3 values)
xi The x value where interpolation is computed
Output
yi The interpolation polynomial evaluated at xi
"""
#* Calculate yi = p(xi) using Lagrange polynomial
yi = ( (xi-x[1])*(xi-x[2])/((x[0]-x[1])*(x[0]-x[2])) * y[0]
+ (xi-x[0])*(xi-x[2])/((x[1]-x[0])*(x[1]-x[2])) * y[1]
+ (xi-x[0])*(xi-x[1])/((x[2]-x[0])*(x[2]-x[1])) * y[2] )
return yi
#* Initialize the data points to be fit by quadratic
x = np.empty(3)
y = np.empty(3)
print ('Enter data points as x,y pairs (e.g., [1, 2]')
for i in range(3):
temp = np.array(input('Enter data point: '))
x[i] = temp[0]
y[i] = temp[1]
#* Establish the range of interpolation (from x_min to x_max)
xr = np.array(input('Enter range of x values as [x_min, x_max]: '))
I'm getting stuck on this part, where it seems properly set up, but "Too many indices for array" appears on xi[i] within the for loop.
#* Find yi for the desired interpolation values xi using
# the function intrpf
nplot = 100 # Number of points for interpolation curve
xi = np.empty(nplot)
yi = np.empty(nplot)
for i in range(nplot) :
xi[i] = xr[0] + (xr[1]-xr[0])* i/float(nplot)
yi[i] = intrpf(xi[i], x, y) # Use intrpf function to interpolate
From the docs of np.array:
Parameters:
object: _array_like_
An array, any object exposing the array interface, an object whose array method returns an array, or any (nested) sequence.
This means array should receive something like a list, in order to make the casting, while input returns a string. What python is trying to do here at the end of the day is something like
np.array('[1, 2]')
While it might be tempting to do something like
np.array(eval(input()))
you should never do this because it is unsafe as it allows the user to execute any kind of code in your program. If you really need that kind of input I would suggest something like
np.array(list(map(int, input('Enter data point: ')
.replace('[','')
.replace(']','')
.split(','))))
The error occurs with your data input lines:
Enter data points as x,y pairs (e.g., [1, 2]
Enter data point: [1,2]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-6-8d648ad8c9e4> in <module>
22 for i in range(3):
23 temp = np.array(input('Enter data point: '))
---> 24 x[i] = temp[0]
25 y[i] = temp[1]
26
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
The code doesn't even get to " I choose np.empty(nplot) to create the 1D array for xi, and for some reason the loop won't fill those values." part.
When asking for help, give full and accurate information about the error.
If I change the input lines to:
...: x = np.empty(3)
...: y = np.empty(3)
...: print ('Enter data points as x,y pairs')
...: for i in range(3):
...: temp = input('Enter data point: ').split()
...: x[i] = temp[0]
...: y[i] = temp[1]
...:
...: #* Establish the range of interpolation (from x_min to x_max)
...: xr = np.array(input('Enter range of x values as x_min, x_max: ').split(),float)
Enter data points as x,y pairs
Enter data point: 1 2
Enter data point: 3 4
Enter data point: 5 6
Enter range of x values as x_min, x_max: 0 4
In [9]: x
Out[9]: array([1., 3., 5.])
In [10]: y
Out[10]: array([2., 4., 6.])
In [11]: xr
Out[11]: array([0., 4.])
Getting array values via user input is not ideal, but this at least works. input (in Py3) does not evaluate the inputs; it just returns a string. I split it (on default space), and then assign the values to an array. x is defined as a float array, so the x[i]=temp[0] takes care of converting the string to float. Similarly the xr line makes a float array from the string inputs. This input style is not very robust; I could easily raise an error with wrong input.
===
The rest of the code runs with this input:
In [12]: nplot = 100 # Number of points for interpolation curve
...: xi = np.empty(nplot)
...: yi = np.empty(nplot)
...: for i in range(nplot) :
...: xi[i] = xr[0] + (xr[1]-xr[0])* i/float(nplot)
...: yi[i] = intrpf(xi[i], x, y) # Use intrpf function to interpolate
...:
In [13]: xi
Out[13]:
array([0. , 0.04, 0.08, 0.12, 0.16, 0.2 , 0.24, 0.28, 0.32, 0.36, 0.4 ,
0.44, 0.48, 0.52, 0.56, 0.6 , 0.64, 0.68, 0.72, 0.76, 0.8 , 0.84,
...
3.52, 3.56, 3.6 , 3.64, 3.68, 3.72, 3.76, 3.8 , 3.84, 3.88, 3.92,
3.96])
In [14]: yi
Out[14]:
array([1. , 1.04, 1.08, 1.12, 1.16, 1.2 , 1.24, 1.28, 1.32, 1.36, 1.4 ,
1.44, 1.48, 1.52, 1.56, 1.6 , 1.64, 1.68, 1.72, 1.76, 1.8 , 1.84,
....
4.52, 4.56, 4.6 , 4.64, 4.68, 4.72, 4.76, 4.8 , 4.84, 4.88, 4.92,
4.96])

Pythonic way to remove elements from Numpy array closer than threshold

What is the best way to remove the minimal number of elements from a sorted Numpy array so that the minimal distance among the remaining is always bigger than a certain threshold?
For example, if the threshold is 1, the following sequence [0.1, 0.5, 1.1, 2.5, 3.] will become [0.1, 1.1, 2.5]. The 0.5 is removed because it is too close to 0.1 but then 1.1 is preserved because it is far enough from 0.1.
My current code:
import numpy as np
MIN_DISTANCE = 1
a = np.array([0.1, 0.5, 1.1, 2.5, 3.])
for i in range(len(a)-1):
if(a[i+1] - a[i] < MIN_DISTANCE):
a[i+1] = a[i]
a = np.unique(a)
a
array([0.1, 1.1, 2.5])
Is there a more efficient way to do so?
Note that my question is similar to Remove values from numpy array closer to each other but not exactly the same.
You could use numpy.ufunc.accumulate to iterate thru adjacent pairs of the array instead of the for loop.
The numpy.add.accumulate example or itertools.accumulate probably shows best what it's doing.
Along with numpy.frompyfunc your condition can be applied as ufunc (universal functions ).
Code: (with an extended array to cross check some additional cases, but works with your array as well)
import numpy as np
MIN_DISTANCE = 1
a = np.array([0.1, 0.5, 0.6, 0.7, 1.1, 2.5, 3., 4., 6., 6.1])
print("original: \n" + str(a))
def my_py_function(arr1, arr2):
if(arr2 - arr1 < MIN_DISTANCE):
arr2 = arr1
return arr2
my_np_function = np.frompyfunc(my_py_function, 2, 1)
my_np_function.accumulate(a, dtype=np.object, out=a).astype(float)
print("complete: \n" + str(a))
a = np.unique(a)
print("unique: \n" + str(a))
Result:
original:
[0.1 0.5 0.6 0.7 1.1 2.5 3. 4. 6. 6.1]
complete:
[0.1 0.1 0.1 0.1 1.1 2.5 2.5 4. 6. 6. ]
unique:
[0.1 1.1 2.5 4. 6. ]
Concerning execution time timeit shows a turnaround at array length of about 20.
Your code is much faster (relative) for your array length of 5
whereas for array length >>20 the accumulate option speeds up considerably (~35% in time for array length 300)

Normalization VS. numpy way to normalize?

I'm supposed to normalize an array. I've read about normalization and come across a formula:
I wrote the following function for it:
def normalize_list(list):
max_value = max(list)
min_value = min(list)
for i in range(0, len(list)):
list[i] = (list[i] - min_value) / (max_value - min_value)
That is supposed to normalize an array of elements.
Then I have come across this: https://stackoverflow.com/a/21031303/6209399
Which says you can normalize an array by simply doing this:
def normalize_list_numpy(list):
normalized_list = list / np.linalg.norm(list)
return normalized_list
If I normalize this test array test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9] with my own function and with the numpy method, I get these answers:
My own function: [0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
The numpy way: [0.059234887775909233, 0.11846977555181847, 0.17770466332772769, 0.23693955110363693, 0.29617443887954614, 0.35540932665545538, 0.41464421443136462, 0.47387910220727386, 0.5331139899831830
Why do the functions give different answers? Is there others way to normalize an array of data? What does numpy.linalg.norm(list) do? What do I get wrong?
There are different types of normalization. You are using min-max normalization. The min-max normalization from scikit learn is as follows.
import numpy as np
from sklearn.preprocessing import minmax_scale
# your function
def normalize_list(list_normal):
max_value = max(list_normal)
min_value = min(list_normal)
for i in range(len(list_normal)):
list_normal[i] = (list_normal[i] - min_value) / (max_value - min_value)
return list_normal
#Scikit learn version
def normalize_list_numpy(list_numpy):
normalized_list = minmax_scale(list_numpy)
return normalized_list
test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9]
test_array_numpy = np.array(test_array)
print(normalize_list(test_array))
print(normalize_list_numpy(test_array_numpy))
Output:
[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
MinMaxscaler uses exactly your formula for normalization/scaling:
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.minmax_scale.html
#OuuGiii: NOTE: It is not a good idea to use Python built-in function names as varibale names. list() is a Python builtin function so its use as a variable should be avoided.
The question/answer that you reference doesn't explicitly relate your own formula to the np.linalg.norm(list) version that you use here.
One NumPy solution would be this:
import numpy as np
def normalize(x):
x = np.asarray(x)
return (x - x.min()) / (np.ptp(x))
print(normalize(test_array))
# [ 0. 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1. ]
Here np.ptp is peak-to-peak ie
Range of values (maximum - minimum) along an axis.
This approach scales the values to the interval [0, 1] as pointed out by #phg.
The more traditional definition of normalization would be to scale to a 0 mean and unit variance:
x = np.asarray(test_array)
res = (x - x.mean()) / x.std()
print(res.mean(), res.std())
# 0.0 1.0
Or use sklearn.preprocessing.normalize as a pre-canned function.
Using test_array / np.linalg.norm(test_array) creates a result that is of unit length; you'll see that np.linalg.norm(test_array / np.linalg.norm(test_array)) equals 1. So you're talking about two different fields here, one being statistics and the other being linear algebra.
The power of python is its broadcasting property, which allows you to do vectorizing array operations without explicit looping. So, You do not need to write a function using explicit for loop, which is slow and time-consuming, especially if your dataset is too big.
The pythonic way of doing min-max normalization is
test_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
normalized_test_array = (test_array - min(test_array)) / (max(test_array) - min(test_array))
output >> [ 0., 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1. ]

calculation of residuals with numpy lstsq

I have x,y data:
import numpy as np
x = np.array([ 2.5, 1.25, 0.625, 0.3125, 0.15625, 0.078125])
y = np.array([ 2448636.,1232116.,617889.,310678.,154454.,78338.])
X = np.vstack((x, np.zeros(len(x))))
popt,res,rank,val = np.linalg.lstsq(X.T,y)
popt,res,rank,val
Gives me:
(array([ 981270.29919414, 0. ]),
array([], dtype=float64),
1,
array([ 2.88639894, 0. ]))
Why are the residuals zero ? If I add ones instead of zero the residuals are calculated:
X = np.vstack((x, np.ones(len(x)))) # added ones instead of zeros
popt,res,rank,val = np.linalg.lstsq(X.T,y)
popt,res,rank,val
(array([ 978897.28500355, 4016.82089552]),
array([ 42727293.12864216]),
2,
array([ 3.49623683, 1.45176681]))
Additionally, If I calculate the sum of squared residuals in excel i get 9261214 if the intercept is set zero and 5478137 if ones are added to x.
lstsq is going to have a tough time fitting to that column of zeros: any value of the corresponding parameter (presumably intercept) will do.
To fix the intercept to 0, if that's what you need to do, just send the x array, but make sure that it's the right shape for lstsq:
In [214]: popt,res,rank,val = np.linalg.lstsq(np.atleast_2d(x).T,y)
In [215]: popt
Out[215]: array([ 981270.29919414])
In [216]: res
Out[216]: array([ 92621214.2278382])

Categories