Is there any Tool for analyzing the Influence of variable - python

When you write a program with lots of code, its difficult to find out which values have a big influence of your final result.
In my case i have got a few differential equations which I solve with odeint.
It would take a lot of time to find out which values have a big influence on my result (velocity).
Is there any Tool in python to analyze your values or does someone have any idea?
Thanks for your help.
[Edit]
MathBio: "In general a sensitivity analysis is what you would do. "
#MathBio I read a few blogs about SALib now(SALib Guide) and tried to write a "easier" test program to solve differential equations.
Below you see my written program:
I get the error-message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/Tim_s/Desktop/Workspace/test/TrySALib.py", line 44, in <module>
Y = Odefunk(param_values)
File "C:/Users/Tim_s/Desktop/Workspace/test/TrySALib.py", line 24, in Odefunk
dT=odeint(dTdt,T0,t,args=(P,))
File "C:\Python27\lib\site-packages\scipy\integrate\odepack.py", line 148, in odeint
ixpr, mxstep, mxhnil, mxordn, mxords)
File "C:/Users/Tim_s/Desktop/Workspace/test/TrySALib.py", line 17, in dTdt
dT[0]=P[0]*(T[1]-T[0])+P[2]
IndexError: tuple index out of range
Here the code:
from SALib.sample import saltelli
from SALib.analyze import sobol
import numpy as np
from pylab import *
from scipy.integrate import odeint
Tu=20.
t=linspace(0,180,90)
def dTdt(T,t,P):# DGL
dT=zeros(2)
dT[0]=P[0]*(T[1]-T[0])+P[2]
dT[1]=P[1]*(Tu-T[1])+P[0]*(T[0]-T[1])
return dT
T0=[Tu,Tu]
def Odefunk(values):
for P in enumerate(values):
dT=odeint(dTdt,T0,t,args=(P,))
return dT
# Define the model inputs
problem = {
'num_vars': 3,
'names': ['P0', 'P1', 'P2'],
'bounds': [[ 0.1, 0.2],
[ 0.01, 0.02],
[ 0.5, 1]]
}
# Generate samples
param_values = saltelli.sample(problem, 1000, calc_second_order=True)
# Run model (example)
Y = Odefunk(param_values)
# Perform analysis
Si = sobol.analyze(problem, Y, print_to_console=False)
# Print the first-order sensitivity indices
print Si['S1']

You really should include your odes, so we can see the parameters and the initial condition. Code would be nice also.
In general a sensitivity analysis is what you would do. Performing a nondimentionalization is standard also. Look up these tips and try to implement them to see how varying a parameter a small amount will effect your solution.
I suggest you look up these concepts, and include your code. You should try the first steps yourself, and then I'd be happy to help with anything technical once you've clearly made an effort. Best wishes.

Related

what is wrong with this shapely error message

I have a dictionary called Rsize which have number-List as key-value pair. The dictionary is like this
{10: [0.6621485767296484, 0.6610747762560114, 0.659607022086639, 0.6567761845867727, 0.6535392433801197, 0.6485977028504701, 0.6393024556394106, 0.6223866436257335, 0.5999232392636733, 0.5418403536642005, 0.4961461379219235, 0.4280278015788386, 0.35462315989740956, 0.2863017237662875, 0.2312185739351389, 0.18306363413831017], 12: [0.6638977494825118, 0.663295576452323, 0.662262804664348, 0.6610413916318628, 0.6590939627030634, 0.655212304186114, 0.6492141689834672, 0.6380632834031537, 0.6096663492242224, 0.5647498006858608, 0.4983281599318278, 0.3961350546063216, 0.32119092575707087, 0.2257230704567207, 0.1816695139119151, 0.14363448808684576], 14: [0.6649598494971014, 0.6644370245269158, 0.6638578972784479, 0.6630511299276417, 0.6615070373022596, 0.6596206155163766, 0.6560628158033714, 0.6487119276511941, 0.6343385358239866, 0.5792725000508062, 0.49799837531709923, 0.42482204326408324, 0.26633662071414366, 0.2028085235063155, 0.12411214668987203, 0.09336935548451253]}
[0.6621485767296484, 0.6610747762560114, 0.659607022086639, 0.6567761845867727, 0.6535392433801197, 0.6485977028504701, 0.6393024556394106, 0.6223866436257335, 0.5999232392636733, 0.5418403536642005, 0.4961461379219235, 0.4280278015788386, 0.35462315989740956, 0.2863017237662875, 0.2312185739351389, 0.18306363413831017]
The keys are 10,14,16. I have used each list for plotting and want to find out their pairwise intersection points. I have written the following script for that and used shapely intersection function for the intersection points detection.
from shapely.geometry import LineString
Rsize={10: [0.6621485767296484, 0.6610747762560114, 0.659607022086639, 0.6567761845867727, 0.6535392433801197, 0.6485977028504701, 0.6393024556394106, 0.6223866436257335, 0.5999232392636733, 0.5418403536642005, 0.4961461379219235, 0.4280278015788386, 0.35462315989740956, 0.2863017237662875, 0.2312185739351389, 0.18306363413831017], 12: [0.6638977494825118, 0.663295576452323, 0.662262804664348, 0.6610413916318628, 0.6590939627030634, 0.655212304186114, 0.6492141689834672, 0.6380632834031537, 0.6096663492242224, 0.5647498006858608, 0.4983281599318278, 0.3961350546063216, 0.32119092575707087, 0.2257230704567207, 0.1816695139119151, 0.14363448808684576], 14: [0.6649598494971014, 0.6644370245269158, 0.6638578972784479, 0.6630511299276417, 0.6615070373022596, 0.6596206155163766, 0.6560628158033714, 0.6487119276511941, 0.6343385358239866, 0.5792725000508062, 0.49799837531709923, 0.42482204326408324, 0.26633662071414366, 0.2028085235063155, 0.12411214668987203, 0.09336935548451253]}
[0.6621485767296484, 0.6610747762560114, 0.659607022086639, 0.6567761845867727, 0.6535392433801197, 0.6485977028504701, 0.6393024556394106, 0.6223866436257335, 0.5999232392636733, 0.5418403536642005, 0.4961461379219235, 0.4280278015788386, 0.35462315989740956, 0.2863017237662875, 0.2312185739351389, 0.18306363413831017]
listkT = np.arange(4.0,4.8,0.05)
print(Rsize[10])
plt.figure(figsize=(18, 10))
plt.title ('Binder cumulant for critical point')
plt.plot(listkT, Rsize[10], '-',label='Lattice sie 10')
plt.plot(listkT, Rsize[12], '-',label='Lattice sie 12')
plt.plot(listkT, Rsize[14], '-',label='Lattice sie 14')
plt.legend()
plt.show()
curve_10=LineString(np.column_stack((listkT, Rsize[10])))
curve_12=LineString(np.column_stack((listkT, Rsize[12])))
curve_14=LineString(np.column_stack((listkT, Rsize[14])))
intersection12 = curve_10.intersection(curve_12)
intersection14 = curve_10.intersection(curve_14)
plt.plot(*LineString(intersection12).xy, 'o')
plt.plot(*LineString(intersection14).xy, 'o')
x12, y = LineString(intersection12).xy
x14, y = LineString(intersection14).xy
print(np.intersect1d(x12, x14))
print(x12,x14)
But shapely throws an AssertionError.
File "C:\Users\Endeavour\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\Users\Endeavour\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "E:/Project/Codes/3D.py", line 118, in <module>
plt.plot(*LineString(intersection12).xy, 'o')
File "C:\Users\Endeavour\Anaconda3\lib\site-packages\shapely\geometry\linestring.py", line 48, in __init__
self._set_coords(coordinates)
File "C:\Users\Endeavour\Anaconda3\lib\site-packages\shapely\geometry\linestring.py", line 97, in _set_coords
ret = geos_linestring_from_py(coordinates)
File "shapely/speedups/_speedups.pyx", line 87, in shapely.speedups._speedups.geos_linestring_from_py
AssertionError
The plots are drawn correctly by matplotlib though.
I am using shapely for first time with no prior experience in it. Any help will be much appreciated. Thank you.
Note: The final goal is to get the intersection of 3 curves. If no intersection found the point where they come closest is good enough. Any suggestion or library function to find that will be of great help.
Thank you in advance.
Following the assertion error, I checked shapely/speedups/_speedups.pyx, line 87. geos_linestring_from_py function expects you to either pass a LineString or a LinearRing. When I print your intersection12 and intersection14 I get:
POINT (4.503201814825258 0.4917840919384173)
POINT (4.51830999373466 0.4712012116887737)
So you are passing a Point instance to create a LineString, which creates an AssertionError.
Aside from the error you have, your approach is also wrong because it assumes that (1) there will be multiple intersections between two curves, and (2) there will be one absolute point where three curves intersect. If you zoom into your plot, you can see that neither is the case.
The red circle corresponds to your intersection12 and the purple one is intersection14. If you are looking for an approximate solution, maybe finding the mean of these points can help in this situation, but for more complex curves with multiple intersections per pair, it is also not recommended.

Solve non-negative least squares p‌r‌o‌b‌l‌e‌m "xA=b" [duplicate]

This question already has an answer here:
Constrained linear least-squares for xA=b in matlab
(1 answer)
Closed 5 years ago.
I want to find the non-negative least squares solution for "xA=b". I'm happy for answers to be in Python, Matlab or R.
A is a 6*10 matrix, and b is 8192*10 matrix.
I found some functions: least_squares and nnls in Python, and lsqnonneg in Matlab.
nnls and lsqnonneg are only used for Ax=b.
My implementation of least_squares give me an error:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from scipy.optimize import least_squares
spec=pd.read_csv('spec.csv',sep=',',header=None)
y=pd.read_csv('y.csv',sep=',',header=None)
spec=np.array(spec).T
y=np.array(y)
spec=spec[(0,1,2,3,4,5,6,9),:]
y=y[(0,1,2,3,4,5,6,9),:]
print(spec.shape,y.shape)
def fun(a, x, y):
return a*x-y
a0=np.ones((8192,6))
a=least_squares(fun, a0, args=(y.T[:,0], spec.T[:,0]),
bounds=([np.zeros((8192,6)),
np.ones((8192,6))*np.inf]))
runfile('C:/Users/Documents/lsq.py', wdir='C:/Users/Documents')
(8, 8192) (8, 6)
Traceback (most recent call last):
File "", line 1, in
runfile('C:/Users/wangm/Documents/lsq.py', wdir='C:/Users/Documents')
File "C:\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Documents/lsq.py", line 30, in
np.ones((8192,6))*np.inf]))
File "C:\Anaconda3\lib\site-packages\scipy\optimize_lsq\least_squares.py", line 742, in least_squares
raise ValueError("x0 must have at most 1 dimension.")
ValueError: x0 must have at most 1 dimension.
This is such a common matrix problem that you can do it in one character in Matlab using mrdivide.
From the docs:
mrdivide, /: Solve systems of linear equations xA = B for x
% Option 1, shorthand:
x = B/A;
% Option 2, longhand:
x = mrdivide(B,A);

mnlogit regression, singular matrix error

My regression model using statsmodels in python works with 48,065 lines of data, but while adding new data I have tracked down one line of code that produces a singular matrix error. Answers to similar questions seem to suggest missing data but I have checked and there is nothing visibibly irregular from the error prone row of code causing me major issues. Does anyone know if this is an error in my code or knows a solution to fix it as I'm out of ideas.
Data2.csv - http://www.sharecsv.com/s/8ff31545056b8864f2ad26ef2fe38a09/Data2.csv
import pandas as pd
import statsmodels.formula.api as smf
data = pd.read_csv("Data2.csv")
formula = 'is_success ~ goal_angle + goal_distance + np_distance + fp_distance + is_fast_attack + is_header + prev_tb + is_rebound + is_penalty + prev_cross + is_tb2 + is_own_goal + is_cutback + asst_dist'
model = smf.mnlogit(formula, data=data, missing='drop').fit()
CSV Line producing error: 0,0,0,0,0,0,0,1,22.94476,16.877204,13.484806,20.924627,0,0,11.765203
Error with Problematic line within the model:
runfile('C:/Users/User1/Desktop/Model Check.py', wdir='C:/Users/User1/Desktop')
Optimization terminated successfully.
Current function value: 0.264334
Iterations 20
Traceback (most recent call last):
File "<ipython-input-76-eace3b458e24>", line 1, in <module>
runfile('C:/Users/User1/Desktop/xG_xA Model Check.py', wdir='C:/Users/User1/Desktop')
File "C:\Users\User1\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Users\User1\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/User1/Desktop/xG_xA Model Check.py", line 6, in <module>
model = smf.mnlogit(formula, data=data, missing='drop').fit()
File "C:\Users\User1\Anaconda2\lib\site-packages\statsmodels\discrete\discrete_model.py", line 587, in fit
disp=disp, callback=callback, **kwargs)
File "C:\Users\User1\Anaconda2\lib\site-packages\statsmodels\base\model.py", line 434, in fit
Hinv = np.linalg.inv(-retvals['Hessian']) / nobs
File "C:\Users\User1\Anaconda2\lib\site-packages\numpy\linalg\linalg.py", line 526, in inv
ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
File "C:\Users\User1\Anaconda2\lib\site-packages\numpy\linalg\linalg.py", line 90, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
LinAlgError: Singular matrix
As far as I can see:
The problem is the variable is_own_goal because all observation where this is 1 also have the dependent variable is_success equal to 1. That means there is no variation in the outcome because is_own_goal already specifies that it is a success.
As a consequence, we cannot estimate a coefficient for is_own_goal, the coefficient is not identified by the data. The variance of the coefficient would be infinite and inverting the Hessian to get the covariance of the parameter estimates fails because the Hessian is singular.
Given floating point precision, with some computational noise the hessian might be invertible and the Singular Matrix exception would not show up. Which, I guess, is the reason that it works with some but not all observations.
BTW: If the dependent variable, endog, is binary, then Logit is more appropriate, even though MNLogit has it as a special case.
BTW: Penalized estimation would be another way to force an estimate even in singular cases, although the coefficient would still not be identified by the data and be just a consequence of the penalization.
In this example,
mod = smf.logit(formula, data=data, missing='drop').fit_regularized()
works for me. This is L1 penalization. In statsmodels 0.8, there is also elastic net penalization for GLM which has Binomial (i.e. Logit) as a family.

PYMC MAP Fit problems

I use PyMC to implement a multinomial-dirichlet pair. I want to MAP the model for all the instances that we have.
The issue I face is that once MAP.fit() the prior distribution is changed. Thus, for every new instance, I need to have a new prior distribution, which should be fine. However, I keep seeing this error:
Traceback (most recent call last):
File "/Users/xingweiy/Project/StarRating/TimePlot/BayesianPrediction/DiricheletMultinomialStarRating.py", line 41, in <module>
prediction = predict.predict(input,prior)
File "/Users/xingweiy/Project/StarRating/TimePlot/BayesianPrediction/predict.py", line 12, in predict
likelihood = pm.Categorical('rating',prior,value = exp_data,observed = True)
File "/Library/Python/2.7/site-packages/pymc-2.3.4-py2.7-macosx-10.9-intel.egg/pymc/distributions.py", line 3170, in __init__
verbose=verbose, **kwds)
File "/Library/Python/2.7/site-packages/pymc-2.3.4-py2.7-macosx-10.9-intel.egg/pymc/PyMCObjects.py", line 772, in __init__
if not isinstance(self.logp, float):
File "/Library/Python/2.7/site-packages/pymc-2.3.4-py2.7-macosx-10.9-intel.egg/pymc/PyMCObjects.py", line 929, in get_logp
raise ZeroProbability(self.errmsg)
pymc.Node.ZeroProbability: Stochastic rating's value is outside its support,
or it forbids its parents' current values.
Here is the code:
alpha= np.array([0.1,0.1,0.1,0.1,0.1])
prior = pm.Dirichlet('prior',alpha)
exp_data = np.array(input)
likelihood = pm.Categorical('rating',prior,value = exp_data,observed = True)
MaximumPosterior = inf.inference(prior, likelihood, exp_data)
def inference(prior,likelihood,observation):
model = Model({'likelihood':likelihood,'prior':prior})
M = MAP(model)
M.fit()
result = M.prior.value
result = np.append(result,1- np.sum(M.prior.value))
return result
I think it is a bug of pymc package. Is there any way to do MAP without changing the prior distribution?
Thanks
The answer in the link below solved my issue:
https://groups.google.com/forum/#!topic/pymc/uYQSGW4acf8
Basically, the dirichlet distribution generates probability that is close to 0.
The link below solved my issue:
https://groups.google.com/forum/#!topic/pymc/uYQSGW4acf8

Python fsolve does not take array of floats. How to implement it?

I used fsolve to find the zeros of an example sinus function, and worked great. However, I wanted to do the same with a dataset. Two lists of floats, later converted to arrays with numpy.asarray(), containing the (x,y) values, namely 't' and 'ys'.
Although I found some related questions, I failed to implement the code provided in them, as I try to show here. Our arrays of interest are stored in a 2D list (data[i][j], where 'i' corresponds to a variable (e.g. data[0]==t==time==x values) and 'j' are the values of said variable along the x axis (e.g. data[1]==Force). Keep in mind that each data[i] is an array of floats.
Could you offer an example code that takes two inputs (the two mentioned arrays) and returns its intersecting points with a defined function (e.g. 'y=0').
I include some testing I made regarding the other related question. ( #HYRY 's answer)
I do not think it is relevant, but I'm using Spyder through Anaconda.
Thanks in advance!
"""
Following the answer provided by #HYRY in the 'related questions' (see link above).
At this point of the code, the variable 'data' has already been defined as stated before.
"""
from scipy.optimize import fsolve
def tfun(x):
return data[0][x]
def yfun(x):
return data[14][x]
def findIntersection(fun1, fun2, x0):
return [fsolve(lambda x:fun1(x)-fun2(x, y), x0) for y in range(1, 10)]
print findIntersection(tfun, yfun, 0)
Which returns the next error
File "E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py", line 36, in tfun
return data[0][x]
IndexError: arrays used as indices must be of integer (or boolean) type
The full output is as it follows:
Traceback (most recent call last):
File "<ipython-input-16-105803b235a9>", line 1, in <module>
runfile('E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py', wdir='E:/Data/Anaconda/[...]/00-Latest')
File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py", line 44, in <module>
print findIntersection(tfun, yfun, 0)
File "E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py", line 42, in findIntersection
return [fsolve(lambda x:fun1(x)-fun2(x, y), x0) for y in range(1, 10)]
File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 140, in fsolve
res = _root_hybr(func, x0, args, jac=fprime, **options)
File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 209, in _root_hybr
ml, mu, epsfcn, factor, diag)
File "E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py", line 42, in <lambda>
return [fsolve(lambda x:fun1(x)-fun2(x, y), x0) for y in range(1, 10)]
File "E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py", line 36, in tfun
return data[0][x]
IndexError: arrays used as indices must be of integer (or boolean) type
You can 'convert' a datasets (arrays) to continuous functions by means of interpolation. scipy.interpolate.interp1d is a factory that provides you with the resulting function, which you could then use with your root finding algorithm.
--edit-- an example for computing an intersection of sin and cos from 20 samples (I've used cubic spline interpolation, as piecewise linear gives warnings about the smoothness):
>>> import numpy, scipy.optimize, scipy.interpolate
>>> x = numpy.linspace(0,2*numpy.pi, 20)
>>> x
array([ 0. , 0.33069396, 0.66138793, 0.99208189, 1.32277585,
1.65346982, 1.98416378, 2.31485774, 2.64555171, 2.97624567,
3.30693964, 3.6376336 , 3.96832756, 4.29902153, 4.62971549,
4.96040945, 5.29110342, 5.62179738, 5.95249134, 6.28318531])
>>> y1sampled = numpy.sin(x)
>>> y2sampled = numpy.cos(x)
>>> y1int = scipy.interpolate.interp1d(x,y1sampled,kind='cubic')
>>> y2int = scipy.interpolate.interp1d(x,y2sampled,kind='cubic')
>>> scipy.optimize.fsolve(lambda x: y1int(x) - y2int(x), numpy.pi)
array([ 3.9269884])
>>> scipy.optimize.fsolve(lambda x: numpy.sin(x) - numpy.cos(x), numpy.pi)
array([ 3.92699082])
Note that interpolation will give you 'guesses' about what data should be between the sampling points. No way to tell how good these guesses are. (but for my example, you can see it's a pretty good estimation)

Categories