Trouble with least squares in Python

Trouble with least squares in Python - python

I am working on a project analyzing data and am trying to use a least squares method (built-in) to do so. I found a tutorial that provided code as an example and it works fine:
x = arange(0, 6e-2, 6e-2/30)
A, k, theta = 10, 1.0/3e-2, pi/6
y_true = A*sin(2*pi*k*x+theta)
y_meas = y_true+2*random.randn(len(x))
def residuals(p, y, x):
A, k, theta = p
print "A type" + str(type(A))
print "k type" + str(type(k))
print "theta type" + str(type(theta))
print "x type" + str(type(x))
err = y - A*sin(2*pi*k*x+theta)
return err
def peval(x, p):
return p[0]*sin(2*pi*p[1]*x+p[2])
p0 = [8,1/2.3e-2,pi/3]
plsq = leastsq(residuals, p0, args=(y_meas, x))
print(plsq[0])
However, when I try transferring this to my own code, it keeps throwing errors. I have been working on this for a while and have managed to eliminate, I think, all of the type mismatch issues which plagued me early on. As far as I can tell, currently the two pieces of code are nearly identical but I am getting the error
'unsupported operand type(s)' and can't figure out what to do next. Here is the section of my code that pertains to this question my code:
if (ls is not None):
from scipy.optimize import leastsq
p0 = [8, 1/2.3e-2,pi/3]
def residuals(p, y, x):
A,k,theta = p
if (type(x) is list):
x = asarray(x)
err = y - A*sin(2*pi*k*x+theta) #Point of error
return err
def peval(x, p):
return p[0]*sin(2*pi*p[1]*x+p[2])
plsq = leastsq(residuals, p0, args=(listRelativeCount, listTime))
plsq_0 = peval(listTime, plsq[0])
Where listTime is the x-values of the data found in listRelativeCount. I have marked the line where the code is currently failing. Any help would be appreciated as I have been stuck on this problem for over a month.

Three things are happening in the line you called #Point of error: You are multiplying values, adding values and applying the sin() function. "Unsupported operand type" means something is wrong in one of these operations. It means you need to verify the types of the operands, and also make sure you know what function is being applied.
Are you sure you know the types (and dtypes, for ndarrays) of all the operands, including pi, x, theta and A?
Are you sure which sin function you are using? math.sin is not the same as np.sin, and they accept different operands.
Mulitplying a list by a scalar (if your listTime variable is really a list) does something completely different from multiplying scalar and ndarray.
If it's unclear which operation is causing the error, try breaking up the expression:
err1 = 2*pi*k
err2 = err1*x
err3 = err2 + theta
err4 = sin(err3)
err5 = A*err4
err = y - err5
This ought to clarify what operation throws the exception.
This is an example of why it's often a better idea to use explicit package names, like np.sin() rather than sin().

Related

Iteratively generate restriction of multivariate function using Python

I'm trying to write a function that generates the restrictions of a function g at a given point p.
For example, let's say g(x, y, z) = 2x + 3y + z and p = (5, 10, 15). I'm trying to create a function that would return [lambda x : g(x, 10, 15), lambda y: g(5, y, 15), lambda z: g(5, 10, z)]. In other words, I want to take my multivariate function and return a list of univariate functions.
I wrote some Python to describe what I want, but I'm having trouble figuring out how to pass the right inputs from p into the lambda properly.
def restriction_generator(g, p):
restrictions = []
for i in range(len(p)):
restriction = lambda x : g(p[0], p[1], ..., p[i-1], p[x], p[i+1], .... p[-1])
restrictions.append(restriction)
return restrictions
Purpose: I wrote a short function to estimate the derivative of a univariate function, and I'm trying to extend it to compute the gradient of a multivariate function by computing the derivative of each restriction function in the list returned by restriction_generator.
Apologies if this question has been asked before. I couldn't find anything after some searching, but I'm having trouble articulating my problem without all of this extra context. Another title for this question would probably be more appropriate.

Since #bandicoot12 requested some more solutions, I will try to fix up your proposed code. I'm not familiar with the ... notation, but I think this slight change should work:
def restriction_generator(g, p):
restrictions = []
for i in range(len(p)):
restriction = lambda x : g(*p[: i], x, *p[i+1:])
restrictions.append(restriction)
return restrictions
Although I am not familiar with the ... notation, if I had to guess, your original code doesn't work because it probably always inputs p[0]. Maybe it can be fixed by changing it from p[0], p[1], ..., p[i-1] to p[0], ..., p[i-1].

try something like this:
def foo(fun, p, i):
def bar(x):
p[i] = x
return fun(*p)
return bar
and
def restriction_generator(g, p):
restrictions = []
for i in range(len(p)):
restrictions.append(foo(g, p, i))
return restrictions

np.linspace vs range in Bokeh

I'm a coding newcomer and I'm trying to work with Bokeh. Newcomer to StackOverflow too, so please tell me if I did something wrong here.
I'm playing with this example from Bokeh website and I ran into a problem. When the x values are set, as in the example, using np.linspace, I'm able to use the interact and play with the update function. But, if I change x to a list, using range(), then I get this error: TypeError: can only concatenate list (not "float") to list. As I understand it, the problem lies in "x + phi", since x is a list and phi is a float.
I get that it's not possible to concatenate a list with a float, but why is it only when I use a numpy.ndarray that Python understands that I want to modify the function that controls the y values?
Here is the code (I'm using Jupyter Notebook):
x = np.linspace(0,10,1000)
y = np.sin(x)
p = figure(title="example", plot_height=300, plot_width=600, y_range=(-5,5))
r = p.line(x, y)
def update(f, w=1, A=1, phi=0):
if f == "sin": func = np.sin
elif f == "cos": func = np.cos
elif f == "tan": func = np.tan
r.data_source.data["y"] = A * func(w * x + phi)
push_notebook()
show(p, notebook_handle=True)
interact(update, f=["sin", "cos", "tan"], w=(0,100), A=(1,5), phi=(0,20, 0.1))

Yes, please compare your numpy documentation with the documentation of lists: https://docs.python.org/3.6/tutorial/datastructures.html
You can also play with the following code:
from numpy import linspace
a = linspace(2, 3, num=5)
b = range(5)
print(type(a), a)
print(type(b), b)
print()
print("array + array:", a + a)
print("list + list:", b + b)
print(a + 3.14159)
print(b + 2.718) # will fail as in your example, because it is a list
My advise is to not mix lists and arrays if there is no other good reason to do so. I personally often cast function arguments to arrays if necessary:
def f(an_array):
an_array = numpy.array(an_array)
# continue knowing that it is an array now,
# being aware that I make a copy of an_array at this point

Python: Piecewise function integration error: "TypeError: cannot determine truth value of ..."

This code runs correctly:
import sympy as sp
def xon (ton, t):
return (t-ton)/5
xonInt = sp.integrate (xon(ton, t),t)
print xonInt
But when the function becomes piecewise, e.g.:
import sympy as sp
def xon (ton, t):
if ton <= t:
return (t-ton)/5
else:
return 0
xonInt = sp.integrate (xon(ton, t),t)
print xonInt
I get the following error:
File "//anaconda/lib/python2.7/site-packages/sympy/core/relational.py", line > 103, in nonzero
raise TypeError("cannot determine truth value of\n%s" % self)
TypeError: cannot determine truth value of
ton <= t
As far as I understand, the error is due to the fact that both ton and t can be positive and negative. Is it correct? If I set positive integration limits for t the error doesn't disappear. How can I calculate the integral for the given piecewise function?
UPDATE: The updated version o the function, which works:
import sympy as sp
t = sp.symbols('t')
ton = sp.symbols('ton')
xon = sp.Piecewise((((t-ton)/5), t <= ton), (0, True))
xonInt = sp.integrate (xon,t)
print xonInt

Piecewise Class
You need to use the sympy Piecewise class.
As suggested in the comments:
Piecewise(((t - ton)/5, ton <= t), (0, True))

Pythonic way to manage arbitrary amount of variables, used for equation solving.

This is a bit difficult to explain without a direct example. So let's put the very simplistic ideal-gas law as example. For an ideal gas under normal circumstances the following equation holds:
PV = RT
This means that if we know 3 of the 4 variables (pressure, volume, specific gas constant and temperature) we can solve for the other one.
How would I put this inside an object? I want to have an object where I can just insert 3 of the variables, and then it calculates the 4th. I wonder if this can be achieved through properties?
My current best guess is to insert it like:
class gasProperties(object):
__init__(self, P=None, V=None, R=None, T=None)
self.setFlowParams(P, V, R, T)
def setFlowParams(self, P=None, V=None, R=None, T=None)
if P is None:
self._P = R*T/V
self._V = V
self._R = R
self._T = T
elif V is None:
self._V = R*T/P
self._P = P
self._R = R
self._T = T
#etc
Though this is quite cumbersome, and error prone (I have to add checks to see that exactly one of the parameters is set to "None").
Is there a better, cleaner way?
I see this "problem" happening quite often, in all kinds of various ways, and especially once the number of variables grows (adding density, reynolds number, viscosity to the mix) the number of different if-statements grows quickly. (IE if I have 8 variables and any 5 make the system unique I would need 8 nCr 5 = 56 if statements).

Using sympy, you can create a class for each of your equations. Create the symbols of the equation with ω, π = sp.symbols('ω π') etc., the equation itself and then use function f() to do the rest:
import sympy as sp
# Create all symbols.
P, V, n, R, T = sp.symbols('P V n R T')
# Create all equations
IDEAL_GAS_EQUATION = P*V - n*R*T
def f(x, values_dct, eq_lst):
"""
Solves equations in eq_lst for x, substitutes values from values_dct,
and returns value of x.
:param x: Sympy symbol
:param values_dct: Dict with sympy symbols as keys, and numbers as values.
"""
lst = []
lst += eq_lst
for i, j in values_dct.items():
lst.append(sp.Eq(i, j))
try:
return sp.solve(lst)[0][x]
except IndexError:
print('This equation has no solutions.')
To try this out... :
vals = {P: 2, n: 3, R: 1, T:4}
r = f(V, values_dct=vals, eq_lst=[IDEAL_GAS_EQUATION, ])
print(r) # Prints 6
If you do not provide enough parameters through values_dct you ll get a result like 3*T/2, checking its type() you get <class 'sympy.core.mul.Mul'>.
If you do provide all parameters you get as a result 6 and its type is <class 'sympy.core.numbers.Integer'>, so you can raise exceptions, or whatever you need. You could also, convert it to an int with int() (it would raise an error if instead of 6 you had 3*T/2 so you can test it that way too).
Alternatively, you can simply check if None values in values_dct are more than 1.
To combine multiple equations, for example PV=nRT and P=2m, you can create the extra symbol m like the previous symbols and assign 2m to the new equation name MY_EQ_2, then insert it in the eq_lst of the function:
m = sp.symbols('m')
MY_EQ_2 = P - 2 * m
vals = {n: 3, R: 1, T:4}
r = f(V, values_dct=vals, eq_lst=[IDEAL_GAS_EQUATION, MY_EQ_2])
print(r) # Prints 6/m

A basic solution using sympy, and kwargs to check what information the user has provided:
from sympy.solvers import solve
from sympy import Symbol
def solve_gas_properties(**kwargs):
properties = []
missing = None
for letter in 'PVRT':
if letter in kwargs:
properties.append(kwargs[letter])
elif missing is not None:
raise ValueError("Expected 3 out of 4 arguments.")
else:
missing = Symbol(letter)
properties.append(missing)
if missing is None:
raise ValueError("Expected 3 out of 4 arguments.")
P, V, R, T = properties
return solve(P * V - R * T, missing)
print solve_gas_properties(P=3, V=2, R=1) # returns [6], the solution for T
This could then be converted into a class method, drawing on class properties instead of keyword arguments, if you want to store and manipulate the different values in the system.
The above can also be rewritten as:
def gas_properties(**kwargs):
missing = [Symbol(letter) for letter in 'PVRT' if letter not in kwargs]
if len(missing) != 1:
raise ValueError("Expected 3 out of 4 arguments.")
missing = missing[0]
P, V, R, T = [kwargs.get(letter, missing) for letter in 'PVRT']
return solve(P * V - R * T, missing)

One solution could be the use of a dictionary to store variable names and their values. This allows you to easily add other variables at any time. Also, you can check that exactly one variable has value "None" by counting the number of "None" items in your dictionary.

My approach would be fairly simple:
class GasProperties(object):
def __init__(self, P=None, V=None, R=None, T=None):
self.setFlowParams(P, V, R, T)
def setFlowParams(self, P=None, V=None, R=None, T=None):
if sum(1 for arg in (P, V, R, T) if arg is None) != 1:
raise ValueError("Expected 3 out of 4 arguments.")
self._P = P
self._V = V
self._R = R
self._T = T
#property
def P(self):
return self._P is self._P is not None else self._R*self._T/self._V
You similarly define properties for V, R and T.

This approach allows you to set up object's attributes:
def setFlowParams(self, P=None, V=None, R=None, T=None):
params = self.setFlowParams.func_code.co_varnames[1:5]
if sum([locals()[param] is None for param in params]) > 1:
raise ValueError("3 arguments required")
for param in params:
setattr(self, '_'+param, locals()[param])
In addition, you need to define getters for attributes with formulas. Like this:
#property
def P(self):
if self._P is None:
self._P = self._R*self._T/self._V
return self._P
Or calculate all values in setFlowParams.

Numerical methods
You might want to do this without sympy, as and exercise for example, with numerical root finding. The beauty of this method is that it works for a extremely wide range of equations, even ones sympy would have trouble with. Everybody i know was taught this in uni on bachelor maths course*, unfortunately not many can apply this in practice.
So first we get the rootfinder you can find code examples on wikipedia and on the net at large this is fairly well known stuff. Many math packages have these built in see for example scipy.optimize for good root finders. I'm going to use the secant method for ease of implementation (in this case i don't really need iterations but ill use generic versions anyway if you happen to want to use some other formulas).
"""Equation solving with numeric root finding using vanilla python 2.7"""
def secant_rootfind(f, a, incr=0.1, accuracy=1e-15):
""" secant root finding method """
b=a+incr;
while abs(f(b)) > accuracy :
a, b = ( b, b - f(b) * (b - a)/(f(b) - f(a)) )
class gasProperties(object):
def __init__(self, P=None,V=None,n=None,T=None):
self.vars = [P, V, n, 8.314, T]
unknowns = 0
for i,v in enumerate(self.vars):
if v is None :
self._unknown_=i
unknowns += 1
if unknowns > 1:
raise ValueError("too many unknowns")
def equation(self, a):
self.vars[self._unknown_] = a
P, V, n, R, T = self.vars
return P*V - n*R*T # = 0
def __str__(self):
return str((
"P = %f\nV = %f\nn = %f\n"+
"R = %f\nT = %f ")%tuple(self.vars))
def solve(self):
secant_rootfind(self.equation, 0.2)
print str(self)
if __name__=="__main__": # run tests
gasProperties(P=1013.25, V=1., T=273.15).solve()
print "--- test2---"
gasProperties( V=1,n = 0.446175, T=273.15).solve()
The benefit of root finding is that even if your formula wouldn't be so easy it would still work, so any number of formulas could be done with no more code than writing formulation. This is generally a very useful skill to have. SYMPY is good but symbolic math is not always easily solvable
The root solver is easily extendable to vector and multi equation cases, even matrix solving. The ready made scipy functions built for optimization allready do this by default.
Here is some more resources:
Some numerical methods in python
* most were introduced at minimum to Newton–Raphson method

Vectorized Partial Derivative of Multivariate Function in Python

There was a phenomenal answer posted by alko for computing a partial derivative of a multivariate function numerically in this thread.
I have a follow-up question now about enhancing this function to accept an array of input values. I have some code where I'm looping through a big long list of n-dimensional points, calculating the partial derivatives with respect to each variable, and this becomes quite computationally expensive.
It's easy enough to vectorize the function in question with np.vectorize, but it causes issues with the partial_derivative wrapper:
from scipy.misc import derivative
import numpy as np
def foo(x, y):
return(x**2 + y**3)
def partial_derivative(func, var=0, point=[]):
args = point[:]
def wraps(x):
args[var] = x
return func(*args)
return derivative(wraps, point[var], dx=1e-6)
vfoo = np.vectorize(foo)
>>>foo(3,1)
>>>10
>>>vfoo([3,3], [1,1])
>>>array([10,10])
>>>partial_derivative(foo,0,[3,1])
>>>6.0
>>>partial_derivative(vfoo,0,[[3,3], [1,1]])
>>>TypeError: can only concatenate list (not "float") to list
The last line should ideally return [6.0, 6.0]. In this case the two arrays supplied to the vectorized function vfoo are essentially zipped up pairwise, so ([3,3], [1,1]) gets transformed into two points, [3,1] and [3,1]. This seems to get mangled when it gets passed to the function wraps. The point that it ends up passing to the function derivative is [3,3]. In addition, there's obviously the TypeError thrown up.
Does anyone have any recommendations or suggestions? Has anyone ever had a need to do something similar?
Edit
Sometimes I think posting on SO is just what it takes to break a mental block. I think I've got it working for anyone who might be interested:
vfoo = np.vectorize(foo)
foo(3,1)
X = np.array([3,3])
Y = np.array([1,1])
vfoo(X, Y)
partial_derivative(foo,0,[3,1])
partial_derivative(vfoo,0,[X, Y])
And the last line now returns array([ 6., 6.])

I have a small problem with args[var] = x : this might forever change args[var] , and all values have been passed by reference however small your change is. So you might not get the exact answer you are looking for. Here is an example:
In[67]: a = np.arange(9).reshape(3,3)
In[68]: b = a[:]
In[69]: b[0,0]=42
In[70]: a
Out[70]:
array([[42, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]])
you need to fix it by e.g.:
def wraps(x):
tmp = args[var]
args[var] = x
ret= func(*args)
args[var] = tmp
return ret
Also, you can use numdifftools. They seem to know what they are doing. This will do all the partial derivatives:
import numpy as np
import numdifftools as nd
def partial_function(f___,input,pos,value):
tmp = input[pos]
input[pos] = value
ret = f___(*input)
input[pos] = tmp
return ret
def partial_derivative(f,input):
ret = np.empty(len(input))
for i in range(len(input)):
fg = lambda x:partial_function(f,input,i,x)
ret[i] = nd.Derivative(fg)(input[i])
return ret
if __name__ == "__main__":
f = lambda x,y: x*x*x+y*y
input = np.array([1.0,1.0])
print ('partial_derivative of f() at: '+str(input))
print (partial_derivative(f,input))
Finally: if you want your function to take an array of the parameters, e.g.:
f = lambda x: x[0]*x[0]*x[0]+x[1]*x[1]
then replace the respective line with (removed the '*')
ret = f___(input)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trouble with least squares in Python - python

Related

Iteratively generate restriction of multivariate function using Python

np.linspace vs range in Bokeh

Python: Piecewise function integration error: "TypeError: cannot determine truth value of ..."

Pythonic way to manage arbitrary amount of variables, used for equation solving.

Vectorized Partial Derivative of Multivariate Function in Python

Categories

Resources