How to calculate uncertainty of y values generated by fit functions

How to calculate uncertainty of y values generated by fit functions - python

I am using a fit function to calculate values used by an application in a manner similar to below:
import numpy as np
from numpy import random
x = range(10)
y = random.standard_normal(10)
w = random.standard_normal(10)/10
w = 1/w
p,cov = np.polynomial.polynomial.polyfit(x=x,y=y,deg=1,w=w,full=True)
fun = np.polynomial.polynomial.Polynomial(p)
new_x = 20
new_y = fun(new_x)
#y_1_sigma_uncertainty = ???
Is there a way to use the covariance matrix to calculate an uncertainty associated with values calculated by fun? Is there another way to go about this? I have done quite a bit of searching, but I am probably not asking the question correctly. I am not a stats person so I am hoping my example is useful in clarifying what I am trying to ask.
Thanks,
gl

Related

Trying to find a minimum of g(x) function and the x value at this minimum

I am trying to find a minimum of a created function g(alpha) and what is more important, to find value of the alpha at this minimum, or close to the minimum.
The code I use is the following: it creates function f, vectors D,avec and grad and uses it for creation of function g(alpha), minimum of which I want to find, together with the alpha value.
The problem is that after applying solve from sympy library I don't get numerical number of alpha. Instead of I get the following error:
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
The code:
import numpy as np
from scipy.optimize import fmin
from sympy import Symbol, solve
from scipy import interpolate
Emax = 10
bins = 200
x = np.linspace(1, Emax, num = Emax, dtype=np.int) #create grid of indexes
y = np.linspace(1, bins, num = bins, dtype=np.int)
z = np.random.rand(bins, Emax) # random matrix
f = interpolate.interp2d(x,y,z, kind='cubic') # make the matrix continious
D= np.zeros(bins)
D = 2*f(1.5, y) # create vector
avec = np.array([4.0, 16.0])
grad = np.array([1e-5,1e-5])
g= lambda alpha: np.sum(np.square(np.subtract(D, (avec[0]-alpha*grad[0])*f((avec[1]-
alpha*grad[1]),y))))
oo= fmin(g,(0.0))
alfa = Symbol("alfa")
slv= solve((np.sum(np.square(np.subtract(D, (avec[0]-alfa*grad[0])*f((avec[1]-
alfa*grad[1]),y)))) - oo), alfa)
I know that this solution may not be the best for this problem. I'm new in Python and if you have any better suggestions how to find alpha here, please tell me.

I think you are really confusing what sympy does. sympy is a module to solve and print analytical equations. You do not need to use that package at all for this task.
You actually do find the minimum of g here. You store this result in oo.
So basically, delete the last 2 lines starting alfa = ... and slv = ... and then just put print(oo). oo is the value you are looking for, the value of alpha which minimises the function g

Covariance and correlation coefficient

I have two random variables and I need to calculate precisely some characteristics for them.
https://math.stackexchange.com/questions/3052308/calculated-covariance-corr-coefficient-confirmation?noredirect=1#
I already did this in Java but I want to confirm my answers with at least one more tool.
Could anyone good at python / probability provide me with some guidance how I can calculate these 6 values in python? I guess it is really simple but I am not very confident in python.
I looked at the documentation of the numpy cov function but I have difficulty to understand it.

The best solution is to use the functions from numpy:
import numpy as np
e_X = np.average(X_values, weights=X_weights)
e_Y = np.average(Y_values, weights=Y_weights)
varX = np.average((X_values-e_X)**2, weights=X_weights)
varY = np.average((Y_values-e_Y)**2, weights=Y_weights)
cov_XY = np.cov(X_values, Y_values)
corrcoef_XY = np.corrcoef(X_values, Y_values)

average correlation of z dim of an x,y rolling window in numpy

I would like to write some code that helps me assess how good some fits are. I have a 3D matrix. The z dimension is a fit to some data at point i, j of the matrix. I would like to assess if this fit is good by comparing the fit at point i, j to the fits of its nearest neighbours (in the x,y dimension). If the fits of the neighbours are similar to the fit at that point then I would like to keep the fit. I hope that makes sense.
What that boils down to is: is there a good way to have a rolling window across the x,y dimension that calculates the Pearson's r in the z dim of the window central point to all the other points in the window and takes the mean (or even the number of points with r greater than some constant).
I can only think how to do this in a very long handed inefficient way currently. For some background information, I am fitting these data with a fourier series. Ultimately I want to use this technique to assess the minimum number of waves to use in the fourier fits at each point.
Thanks in advance
Niall

This is my solution but its not very efficient. (by the way, was another dimension of data I didn't bother telling you about in the question. Has anyone got any suggestions of more efficient ways to do this?
Thanks again
import numpy as np
from scipy.stats import pearsonr
from bottleneck import nanmean
def calc_corr_of_neighbours(data, win_shape):
rs = np.empty(data.shape[1:])
thisrs = np.empty(win_shape)
win_data = np.empty(win_shape)
dA = int(win_shape[0]/2)
dB = int(win_shape[1]/2)
maxA = data.shape[2]
maxB = data.shape[3]
for i in np.ndindex(rs.shape):
stA = max(i[1]-dA, 0)
endA = min(i[1]+dA, maxA)
stB = max(i[2]-dB, 0)
endB = min(i[2]+dB, maxB)
win_data = data[:, i[0], stA:endA, stB:endB]
thisrs.fill(np.NaN)
for j in np.ndindex(win_data.shape[1:]):
thisrs[j] = pearsonr(data[:, i[0], i[1], i[2]], win_data[:, j[0], j[1]])[0]
rs[i] = nanmean(thisrs)
return rs

Computing filter(b,a,x,zi) using FFTs

I would like to try to compute y=filter(b,a,x,zi) and dy[i]/dx[j] using FFTs rather than in the time domain for possible speedup in a GPU implementation.
I am not sure it's possible, particularly when zi is non-zero. I looked through how scipy.signal.lfilter in scipy and filter in octave are implemented. They are both done directly in the time domain, with scipy using direct form 2 and octave direct form 1 (from looking through code in DLD-FUNCTIONS/filter.cc). I haven't seen anywhere an FFT implementation analogous to fftfilt for FIR filters in MATLAB (i.e. a = [1.]).
I tried doing y = ifft(fft(b) / fft(a) * fft(x)) but this seems to be conceptually wrong. Also, I am not sure how to handle the initial transient zi. Any references, pointer to existing implementation, would be appreciated.
Example code,
import numpy as np
import scipy.signal as sg
import matplotlib.pyplot as plt
# create an IRR lowpass filter
N = 5
b, a = sg.butter(N, .4)
MN = max(len(a), len(b))
# create a random signal to be filtered
T = 100
P = T + MN - 1
x = np.random.randn(T)
zi = np.zeros(MN-1)
# time domain filter
ylf, zo = sg.lfilter(b, a, x, zi=zi)
# frequency domain filter
af = sg.fft(a, P)
bf = sg.fft(b, P)
xf = sg.fft(x, P)
yfft = np.real(sg.ifft(bf/af * xf))[:T]
# error
print np.linalg.norm(yfft - ylf)
# plot, note error is larger at beginning and with larger N
plt.figure(1)
plt.clf()
plt.plot(ylf)
plt.plot(yfft)

You can reduce the error in your existing implementation by replacing P = T + MN - 1 with P = T + 2*MN - 1. This is purely intuitive, but it seems to me that the division of bf and af will require 2*MN terms, due to wraparound.
C.S. Burrus has a pretty terse writeup of how to regard filtering, whether FIR or IIR, in a block oriented way, here. I haven't read it in detail, but I think it gives you the equations you need to implement IIR filtering by convolution, including intermediate states.

I've forgotten what little I knew about FFTs but you could take a look at sedit.py and frequency.py at http://jc.unternet.net/src/ and see if anything there would help.

Try scipy.signal.lfiltic(b, a, y, x=None) to obtain the initial conditions.
Doc text for lfiltic:
Given a linear filter (b,a) and initial conditions on the output y
and the input x, return the inital conditions on the state vector zi
which is used by lfilter to generate the output given the input.
If M=len(b)-1 and N=len(a)-1. Then, the initial conditions are given
in the vectors x and y as
x = {x[-1],x[-2],...,x[-M]}
y = {y[-1],y[-2],...,y[-N]}
If x is not given, its inital conditions are assumed zero.
If either vector is too short, then zeros are added
to achieve the proper length.
The output vector zi contains
zi = {z_0[-1], z_1[-1], ..., z_K-1[-1]} where K=max(M,N).

Python OLS calculation

Is there any good library to calculate linear least squares OLS (Ordinary Least Squares) in python?
Thanks.
Edit:
Thanks for the SciKits and Scipy.
#ars: Can X be a matrix? An example:
y(1) = a(1)*x(11) + a(2)*x(12) + a(3)*x(13)
y(2) = a(1)*x(21) + a(2)*x(22) + a(3)*x(23)
...........................................
y(n) = a(1)*x(n1) = a(2)*x(n2) + a(3)*x(n3)
Then how do I pass the parameters for Y and X matrices in your example?
Also, I don't have much background in algebra, I would appreciate if you guys can let me know a good tutorial for that kind of problems.
Thanks much.

Try the statsmodels package. Here's a quick example:
import pylab
import numpy as np
import statsmodels.api as sm
x = np.arange(-10, 10)
y = 2*x + np.random.normal(size=len(x))
# model matrix with intercept
X = sm.add_constant(x)
# least squares fit
model = sm.OLS(y, X)
fit = model.fit()
print fit.summary()
pylab.scatter(x, y)
pylab.plot(x, fit.fittedvalues)
Update In response to the updated question, yes it works with matrices. Note that the code above has the x data in array form, but we build a matrix X (capital X) to pass to OLS. The add_constant function simply builds the matrix with a first column initialized to ones for the intercept. In your case, you would simply pass your X matrix without needing that intermediate step and it would work.

Have you looked at SciPy? I don't know if it does that, but I would imagine it will.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate uncertainty of y values generated by fit functions - python

Related

Trying to find a minimum of g(x) function and the x value at this minimum

Covariance and correlation coefficient

average correlation of z dim of an x,y rolling window in numpy

Computing filter(b,a,x,zi) using FFTs

Python OLS calculation

Categories

Resources