Optimize Variable From A Function In Python - python

I'm used to using Excel for this kind of problem but I'm trying my hand at Python for now.
Basically I have two sets of arrays, one constant, and the other's values come from a user-defined function.
This is the function, simple enough.
import scipy.stats as sp
def calculate_probability(spread, std_dev):
return sp.norm.sf(0.5, spread, std_dev)
I have two arrays of data, one with entries that run through the calculate_probability function (these are the spreads), and the other a set of constants called expected_probabilities.
spreads = [10.5, 9.5, 10, 8.5]
expected_probabilities = [0.8091, 0.7785, 0.7708, 0.7692]
The below function is what I am seeking to optimise.
import numpy as np
def calculate_mse(std_dev):
spread_inputs = np.array(spreads)
model_probabilities = calculate_probability(spread_inputs,std_dev)
subtracted_vector = np.subtract(model_probabilities,expected_probabilities)
vector_powered = np.power(subtracted_vector,2)
mse_sum = np.sum(vector_powered)
return mse_sum/len(spreads)
I would like to find a value of std_dev such that function calculate_mse returns as close to zero as possible. This is very easy in Excel using solver but I am not sure how to do it in Python. What is the best way?
EDIT: I've changed my calculate_mse function so that it only takes a standard deviation as a parameter to be optimised. I've tried to return Andrew's answer in an API format using flask but I've run into some issues:
class Minimize(Resource):
std_dev_guess = 12.0 # might have a better guess than zeros
result = minimize(calculate_mse, std_dev_guess)
def get(self):
return {'data': result},200
api.add_resource(Minimize,'/minimize')
This is the error:
NameError: name 'result' is not defined
I guess something is wrong with the input?

I'd suggest using scipy's optimization library. From there, you have a couple options, the easiest from your current setup would be to just use the minimize method. Minimize itself has a massive amount of options, from simplex methods (default) to BFGS and COBYLA.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
from scipy.optimize import minimize
n_params = 4 # based of your code so far
spreads_guess = np.zeros(n_params) # might have a better guess than zeros
result = minimize(calculate_mse, spreads_guess)
Give it a shot and if you have extra questions I can edit the answer and elaborate as needed.
Here's just a couple suggestions to clean up your code.
class Minimize(Resource):
def _calculate_probability(self, spread, std_dev):
return sp.norm.sf(0.5, spread, scale=std_dev)
def _calculate_mse(self, std_dev):
spread_inputs = np.array(self.spreads)
model_probabilities = self._calculate_probability(spread_inputs, std_dev)
mse = np.sum((model_probabilities - self.expected_probabilities)**2) / len(spread_inputs)
print(mse)
return mse
def __init__(self, expected_probabilities, spreads, std_dev_guess):
self.std_dev_guess = std_dev_guess
self.spreads = spreads
self.expected_probabilities = expected_probabilities
self.result = None
def solve(self):
self.result = minimize(self._calculate_mse, self.std_dev_guess, method='BFGS')
def get(self):
return {'data': self.result}, 200
# run something like
spreads = [10.5, 9.5, 10, 8.5]
expected_probabilities = [0.8091, 0.7785, 0.7708, 0.7692]
minimizer = Minimize(expected_probabilities, spreads, 10.)
print(minimizer.get()) # returns none since it hasn't been run yet, up to you how to handle this
minimizer.solve()
print(minimizer.get())

Related

Simulating different data with decorators in python

I am trying to force myself to understand how decorators work and how I might use them to run a function multiple times.
I am trying to simulate datasets with three variables, but they vary on their sample size and whether the sampling was conditional or not.
So I create the population distribution that I am sampling from:
from numpy.random import normal, negative_binomial, binomial
import pandas as pd
population_N = 100000
data = pd.DataFrame({
"Variable A": normal(0, 1, population_N),
"Variable B": negative_binomial(1, 0.5, population_N),
"Variable C": binomial(1, 0.5, population_N)
})
Rather than doing the following:
sample_20 = data.sample(20)
sample_50 = data.sample(50)
condition = data["Variable B"] != 0
sample_20_non_random = data[condition].sample(20)
sample_50_non_random = data[condition].sample(50)
I wanted to simplify things and make it more efficient. So I started with a super simple function where I can pass whether or not the sample will be random or not.
def simple_function(data_frame, type = "random"):
if (type == "random"):
sample = data_frame.sample(sample_size)
else:
condition = data_frame["Variable B"] != 0
sample = data_frame[condition].sample(sample_size)
return sample
But, I want to do this for more than one sample size. So I thought that rather than writing a for-loop that can be slow, I could maybe just use a decorator. I also have tried but have failed to understand their logic, so I thought this could be good practice to try to understand them better.
import functools
def decorator(cache = {}, **case):
def inner(function):
function_name = function.__name__
if function_name not in cache:
cache[function_name] = function
#functools.wraps(function)
def wrapped_function(**kwargs):
if cache[function_name] != function:
cache[function_name](**case)
else:
function(**case)
return wrapped_function
return inner
#decorator(sample_size = [20, 50])
def sample(data_frame, type = "random"):
if (type == "random"):
sample = data_frame.sample(sample_size)
else:
condition = data_frame["Variable B"] != 0
sample = data_frame[condition].sample(sample_size)
return sample
I guess what I am not understanding is how the inheritance of the arguments works and how that then affects the iteration over the function in the decorator.

How to specify derivative for SymPy custom function?

I am using sympy to help automate the process of finding equations of motion for some systems using the Euler Lagrange method. What would really make this easy is if I could define a function q and specify its time derivative qd --> d/dt(q) = qd. Likewise I'd like to specify d/dt(qd) = qdd. This is helpful because as part of the process of finding the equations of motion, I need to take derivatives (with respect to time, q, and qd) of expressions that are functions of q and qd. In the end I'll end up with an equation in terms of q, qd, and qdd and I'd like to be able to either print this neatly or use lambdify to convert this to a neat numpy function for use in a simulation.
Currently I've accomplished this is a very roundabout and annoying way by defining q as a function and qd as the derivative of that function:
q = sympy.Function('q', real=True)(t)
q_diff = diff(q,t)
This is fine for most of the process but then I end up with a messy expression filled with "Derivative (q, t)" and "Derivative (Derivative(q, t), t)" which is hard to wrangle into a neat printing format and difficult to turn into a numpy function using lambdify. My current solution is thus to use the subs function and replace q_diff and diff(q_diff, t) with sympy symbols qd and qdd respectively, which cleans things up and makes manipulating the expression much easier. This seems like a bad hack though and takes tons of time to do for more complicated equations with lots of state variables.
What I'd like is to define a function, q, with a specific value for the time derivative. I'd pass in that value when creating the function, and sympy could treat it like a generic Function object but would use whatever I'd given it for the time derivative instead of just saying "Derivative(q, t)". I'd like something like this:
qdd = sympy.symbols('qdd')
qd = my_func(name='qd', time_deriv=qdd)
q = my_func(name='q', time_deriv=qd)
diff(q, t)
>>> qd
diff(q**2, t)
>>> 2*q*qd
diff(diff(q**2, t))
>>> 2*q*qdd + 2*qd**2
expr = q*qd**2
expr.subs(q, 5)
>>> 5*qd**2
Something like that, where I could still use the subs command and lambdify command to substitute numeric values for q and qd, would be extremely helpful. I've been trying to do this but I don't understand enough of how the base sympy.Function class works to get this going. This is what I have right now:
class func(sp.Function):
def __init__(self, name, deriv):
self.deriv = deriv
self.name = name
def diff(self, *args, **kwargs):
return self.deriv
def fdiff(self, argindex=1):
assert argindex == 1
return self.deriv
This code so far does not really work, I don't know how to specify that specifically the time derivative of q is qd. Right now all derivatives of q are returning q?
I don't know if this is just a really bad solution, if I should be avoiding this issue entirely, or if there's already a clean way to solve this. Any advice would be very appreciated.

How to set an instance variable within a decorator?

I have a class which calculates salary components as shown below.
def normalize(func):
from functools import wraps
#wraps(func)
def wrapper(instance, *args, **kwargs):
allowanceToCheck = func(instance)
if instance.remainingAmount <= 0:
allowanceToCheck = 0.0
elif allowanceToCheck > instance.remainingAmount:
allowanceToCheck = instance.remainingAmount
instance.remainingAmount = instance.remainingAmount - allowanceToCheck
return allowanceToCheck
return wrapper
class SalaryBreakUpRule(object):
grossPay = 0.0
remainingAmount = 0.0
#property
def basic(self):
# calculates the basic pay according to predefined salary slabs.
basic = 6600 # Defaulting to 6600 for now.
self.remainingAmount = self.grossPay - basic
return basic
#property
#normalize
def dearnessAllowance(self):
return self.basic * 0.2
#property
#normalize
def houseRentAllowance(self):
return self.basic * 0.4
def calculateBreakUps(self, value = 0.0):
self.grossPay = value
return {
'basic' : self.basic,
'da' : self.dearnessAllowance,
'hra' : self.houseRentAllowance
}
Before calculating each allowance, I need to check if the total of all allowances does not exceed the grossPay i.e my total salary. I have written a decorator which wraps each allowance calculating method and does the above said requirement. For example,
* an employee having a salary of Rs.6700
* basic = 6,600 (according to slab)
* dearnessAllowance = 100 (cos 20% of basic is more than remaining amount)
* houseRentAllowance = 0.0 (cos 40% of basic is more than remaining amount)
But unfortunately it did not work. First allowance is calculated correctly, but all other allowances are being given the same value as first allowance. i.e houseRentAllowance will have 100 instead of 0.0 as given above.
The problem I have found is, the line of code
instance.remainingAmount = instance.remainingAmount - allowanceToCheck
in the decorator where I am trying to set a variable of the class does not work.
Is there any way I can fix this issue?
You've made Salary.basic into a property, and Salary.basic() has side effects! So every time your other functions reference self.basic, it recalculates and resets self.RemainingAmount to its original value, self.grossPay - basic.
Properties with this kind of side effects are bad design. I hope you see why now. Even after you fix this, accessing your other properties in different order will give you different results. Property accessors should not have durable side effects. More generally: Setters must set, getters must get. Properties look like simple variables, so they must behave accordingly or you'll never be able to understand or debug your code again.

Using SCIPY.OPTIMIZE.FMIN_CG to extract Weibull distribution parameters

I am attempting to extract Weibull distribution parameters (shape 'k' and scale 'lambda') that satisfy a certain mean and variance. In this example, the mean is 4 and the variance is 8. It is a 2-unknowns and 2-equations type of problem.
Since this algorithm works with Excel 2010's GRG Solver, I am certain it is about the way I am framing the problem, or potentially, the libraries I am using. I am not overly familiar with optimization libraries, so please let me know where the error is.
Below is the script:
from scipy.optimize import fmin_cg
import math
def weibull_mu(k, lmda): #Formula can be found on wikipedia
return lmda*math.gamma(1+1/k)
def weibull_var(k, lmda): #Formula can be found on wikipedia
return lmda**2*math.gamma(1+2/k)-weibull_mu(k, lmda)**2
def min_function(arggs):
actual_mean = 4 # specific to this example
actual_var = 8 # specific to this example
k = arggs[0]
lmda = arggs[1]
output = [weibull_mu(k, lmda)-(var_wei)]
output.append(weibull_var(k, lmda)-(actual_var)**2-(actual_mean)**2)
return output
print fmin(min_function, [1,1])
This script gives me the following error:
[...]
File "C:\Program Files\Python27\lib\site-packages\scipy\optimize\optimize.py", line 278, in fmin
fsim[0] = func(x0)
ValueError: setting an array element with a sequence.
As far as I can tell, min_function returns a multi-dimensional list, but fmin and fmin_cg does expect that the objective function returns a scalar, if I am not mistaken.
If you are searching the root of the two-equations problem, I suppose it is better that you apply the root function instead. As far as I have been able to find out, scipy does not provide any general optimizers for vector functions.
I managed to get it to work thanks to Anders Gustafsson's comment (thank you). This script now works if one returns only a scalar (in this case I used something along the lines of least-squares). Also, bounds were added by changing the optimization function to "fmin_l_bfgs_b" (again, thanks to Anders Gustafsson).
I only changed the min_function definition relative to the question.
from scipy.optimize import fmin_l_bfgs_b
import math
def weibull_mu(k, lmda):
return lmda*math.gamma(1+1/k)
def weibull_var(k, lmda):
return lmda**2*math.gamma(1+2/k)-weibull_mu(k, lmda)**2
def min_function(arggs):
actual_mean = 4. # specific to this example
actual_var = 8. # specific to this example
k = arggs[0]
lmda = arggs[1]
extracted_var = weibull_var(k, lmda)
extracted_mean = weibull_mu(k, lmda)
output = (extracted_var - actual_var)**2 + (extracted_mean - actual_mean)**2
return output
print fmin_l_bfgs_b(min_function, best_guess, approx_grad = True, bounds = [(.0000001,None),(.0000001,None)], disp = False)
Note: Please feel free to use this script for your own or professional use.

homogenization the functions can be compiled into a calculate networks?

Inside of a network, information (package) can be passed to different node(hosts), by modify it's content it can carry different meaning. The final package depends on hosts input via it's given route of network.
Now I want to implement a calculating network model can do small jobs by give different calculate path.
Prototype:
def a(p): return p + 1
def b(p): return p + 2
def c(p): return p + 3
def d(p): return p + 4
def e(p): return p + 5
def link(p, r):
p1 = p
for x in r:
p1 = x(p1)
return p1
p = 100
route = [a,c,d]
result = link(p,result)
#========
target_result = 108
if result = target_result:
# route is OK
I think finally I need something like this:
p with [init_payload, expected_target, passed_path, actual_calculated_result]
|
\/
[CHAOS of possible of functions networks]
|
\/
px [a,a,b,c,e] # ok this path is ok and match the target
Here is my questions hope may get your help:
can p carry(determin) the route(s) by inspect the function and estmated result?
(1.1 ) for example, if on the route there's a node x()
def x(p): return x / 0 # I suppose it can pass the compile
can p know in somehow this path is not good then avoid select this path?
(1.2) Another confuse is if p is a self-defined class type, the payload inside of this class essentially is a string, when it carry with a path [a,c,d], can p know a() must with a int type then avoid to select this node?'
same as 1.2 when generating the path, can I avoid such oops
def a(p): return p + 1
def b(p): return p + 2
def x(p): return p.append(1)
def y(p): return p.append(2)
full_node_list = [a,b,x,y]
path = random(2,full_node_list) # oops x,y will be trouble for the inttype P and a,b will be trouble to list type.
pls consider if the path is lambda list of functions
PS: as the whole model is not very clear in my mind the any leading and directing will be appreciated.
THANKS!
You could test each function first with a set of sample data; any function which returns consistently unusable values might then be discarded.
def isGoodFn(f):
testData = [1,2,3,8,38,73,159] # random test input
goodEnough = 0.8 * len(testData) # need 80% pass rate
try:
good = 0
for i in testData:
if type(f(i)) is int:
good += 1
return good >= goodEnough
except:
return False
If you know nothing about what the functions do, you will have to essentially do a full breadth-first tree search with error-checking at each node to discard bad results. If you have more than a few functions this will get very large very quickly. If you can guarantee some of the functions' behavior, you might be able to greatly reduce the search space - but this would be domain-specific, requiring more exact knowledge of the problem.
If you had a heuristic measure for how far each result is from your desired result, you could do a directed search to find good answers much more quickly - but such a heuristic would depend on knowing the overall form of the functions (a distance heuristic for multiplicative functions would be very different than one for additive functions, etc).
Your functions can raise TypeError if they are not satisfied with the data types they receive. You can then catch this exception and see whether you are passing an appropriate type. You can also catch any other exception type. But trying to call the functions and catching the exceptions can be quite slow.
You could also organize your functions into different sets depending on the argument type.
functions = { list : [some functions taking a list], int : [some functions taking an int]}
...
x = choose_function(functions[type(p)])
p = x(p)
I'm somewhat confused as to what you're trying to do, but: p cannot "know about" the functions until it is run through them. By design, Python functions don't specify what type of data they operate on: e.g. a*5 is valid whether a is a string, a list, an integer or a float.
If there are some functions that might not be able to operate on p, then you could catch exceptions, for example in your link function:
def link(p, r):
try:
for x in r:
p = x(p)
except ZeroDivisionError, AttributeError: # List whatever errors you want to catch
return None
return p

Categories