RMS value of a function - python

Now the full code / questions
I would like to estimate the random fluctuations of the function v - therefore I would like to calculate the RMS value of it:
import numpy as np
import matplotlib.pyplot as plt
def HHmodel(I,length, area):
v = []
m = []
h = []
z = []
n = []
squares = []
vsquare = (-60)*(-60)
sumsquares = 0
rms = []
a= []
dt = 0.05
t = np.linspace(0,100,length)
#constants
Cm = area#microFarad
ENa=50 #miliVolt
EK=-77 #miliVolt
El=-54 #miliVolt
g_Na=120*area #mScm-2
g_K=36*area #mScm-2
g_l=0.03*area #mScm-2
def alphaN(v):
return 0.01*(v+50)/(1-np.exp(-(v+50)/10))
def betaN(v):
return 0.125*np.exp(-(v+60)/80)
def alphaM(v):
return 0.1*(v+35)/(1-np.exp(-(v+35)/10))
def betaM(v):
return 4.0*np.exp(-0.0556*(v+60))
def alphaH(v):
return 0.07*np.exp(-0.05*(v+60))
def betaH(v):
return 1/(1+np.exp(-(0.1)*(v+30)))
#Initialize the voltage and the channels :
v.append(-60)
rms.append(1)
m0 = alphaM(v[0])/(alphaM(v[0])+betaM(v[0]))
n0 = alphaN(v[0])/(alphaN(v[0])+betaN(v[0]))
h0 = alphaH(v[0])/(alphaH(v[0])+betaH(v[0]))
#t.append(0)
m.append(m0)
n.append(n0)
h.append(h0)
#solving ODE using Euler's method:
for i in range(1,len(t)):
m.append(m[i-1] + dt*((alphaM(v[i-1])*(1-m[i-1]))-betaM(v[i-1])*m[i-1]))
n.append(n[i-1] + dt*((alphaN(v[i-1])*(1-n[i-1]))-betaN(v[i-1])*n[i-1]))
h.append(h[i-1] + dt*((alphaH(v[i-1])*(1-h[i-1]))-betaH(v[i-1])*h[i-1]))
gNa = g_Na * h[i-1]*(m[i-1])**3
gK=g_K*n[i-1]**4
gl=g_l
INa = gNa*(v[i-1]-ENa)
IK = gK*(v[i-1]-EK)
Il=gl*(v[i-1]-El)
v.append(v[i-1]+(dt)*((1/Cm)*(I[i-1]-(INa+IK+Il))))
#v.append(v[i-1]+(dt)*((1/Cm)*(I-(INa+IK+Il))))
meansquare = np.sqrt((np.square(v).sum()))
return v,area,meansquare
spikeEvents = [] #timing each spike
length = 1000*5 #the time period
fluctuations = []
output = []
for j in range(1, 10):
barcode = np.zeros(length)
noisyI = np.random.normal(0,9,length)
area = 1.0+0.1*j
res = HHmodel(noisyI,length,area)
output.append(res[2])
print('Done.')
The goal should be that the fluctuations of v increase in some way with the size of the are a - I was thinking here of the rms amplitude as a reasonable measure
BR
edit:
for i in range(1,len(t)):
m.append(m[i-1] + dt*((alphaM(v[i-1])*(1-m[i-1]))-betaM(v[i-1])*m[i-1]))
n.append(n[i-1] + dt*((alphaN(v[i-1])*(1-n[i-1]))-betaN(v[i-1])*n[i-1]))
h.append(h[i-1] + dt*((alphaH(v[i-1])*(1-h[i-1]))-betaH(v[i-1])*h[i-1]))
gNa = g_Na * h[i-1]*(m[i-1])**3
gK=g_K*n[i-1]**4
gl=g_l
INa = gNa*(v[i-1]-ENa)
IK = gK*(v[i-1]-EK)
Il=gl*(v[i-1]-El)
v.append(v[i-1]+(dt)*((1/Cm)*(I[i-1]-(INa+IK+Il))))
z.append(v[i-1]-np.mean(v))
#v.append(v[i-1]+(dt)*((1/Cm)*(I-(INa+IK+Il))))
mean = sum(np.square(v))/len(v)
squared_diffs =[(item-mean)**2 for item in v]
ms_diff = sum(squared_diffs)/len(squared_diffs)
rms_diff =np.sqrt(ms_diff)
return v,area,rms_diff
edit2:
Plot for j in range(1,10) - blue: rmsvalue as calculated in edit 1, yellow 1/sqrt(j)
edit3:
Plot for j in range(1,100) - but the "size" of fluctuations should increase, and not decrease and center somewhere

A few minor notes:
So, basically your "function" v is a one-timestep discrete evaluation of some function rather than a true function, but that's not really relevant here.
As indicated by comments above, you should calculate v for all timesteps and aggregate the squared values, then sum them outside of the loop and normalize by dividing by len(v).
It is also unclear why in iteration i you calculate v[i] but the corresponding squared value you calculate is v[i-1] squared. Should use same index on same loop iteration or you'll likely end up missing an element.
I would say that the reason that the result is not useful is that root-mean square is not really ever used for a function's outputs (RMS in this case is just some sort of less useful mean that gives extra weight to outliers); rather RMS is generally used on the error or variance of that function's outputs. RMS error or variance tells you how far, in the function's original units, does the average function value differ from the average value?). Note that this is only really an imporant metric if you expect the value of v to be constant.
Given all this, it's hard to say from your question what your intention is and what you're actually trying to do with this info so I will guess that what you really care about is how much the value of v is varying from the mean. In this case, you can use RMS difference from mean value of v calculated as such:
for i in range(1,len(t)):
#calculate v[i] here, omitted for simplicity
# get mean value
mean = sum(squares)/len(squares)
# you want to get the squared value of the difference, not the value itself
squared_diffs = [(item - mean)**2 for item in v)]
# get mean squared diff
ms_diff = sum(squared_diffs) / len(squared_diffs)
# return root of mean squared diff
rms_diff = np.sqrt(ms_diff)
return v,area,rms_diff
Again, this is only useful if you expect the outputs of v to be a constant. If not, you would try to fit a different model (linear, quadratic, etc.) to the function and then calculate the RMS error. Question would be much clearer if you indicated goal of this calculation.

Related

Matrix inversion using Neumann Series giving funny loss function

According to (steward,1998). A matrix A which is invertible can be approximated by the formula A^{-1} = \sum^{inf}_{n=0} (I- A)^{n}
I tried implementing an algorithm to approximate a simple matrix's inverse, the loss function showed funny results. please look at the code below. more info about the Neumann series can be found here and here
here is my code.
A = np.array([[1,0,2],[3,1,-2],[-5,-1,9]])
class Neumann_inversion():
def __init__(self,A,rank):
self.A = A
self.rank = rank
self.eye = np.eye(len(A))
self.loss = []
self.loss2 =[]
self.A_hat = np.zeros((3,3),dtype = float)
#self.loss.append(np.linalg.norm(np.linalg.inv(self.A)-self.A_hat))
def approximate(self):
# self.A_hat = None
n = 0
L = (self.eye-self.A)
while n < self.rank:
self.A_hat += np.linalg.matrix_power(L,n)
loss = np.linalg.norm(np.linalg.inv(self.A) - self.A_hat)
self.loss.append(loss)
n+= 1
plt.plot(self.loss)
plt.ylabel('Loss')
plt.xlabel('rank')
# ax.axis('scaled')
return
Matrix = Neumann_inversion(A,200)
Matrix.approximate()
The formula is valid only if $A^n$ tends to zero as $n$ increase. So your matrix must satisfy
np.all(np.abs(np.linalg.eigvals(A)) < 1)
Try
Neumann_inversion(A/10, 200).approximate()
and you can take the loss seriously :)
The origin of the formula has something to do with
(1-x) * (1 + x + x^2 + ... x^n) = (1 - x^(n+1))
If, and only if, all the eigenvalues of the matrix have magnitude less than 1 the term x^(n+1) will be close to zero, so the sum will be approximately the inverse of (1-x).

Why is minimize_scalar not minimizing correctly?

I am a new Python user, so bear with me if this question is obvious.
I am trying to find the value of lmbda that minimizes the following function, given a fixed vector Z and scalar sigma:
def sure_sft(z,lmbda, sigma):
indicator = np.abs(z) <= lmbda;
minimum = np.minimum(z**2,lmbda**2);
return -sigma**2*np.sum(indicator) + np.sum(minimum);
When I pass in values of lmbda manually, I find that the function produces the correct value of sure_stf. However, when I try to use the following code to find the value of lmbda that minimizes sure_stf:
minimize_scalar(lambda lmbda: sure_sft(Z, lmbda, sigma))
it gives me an incorrect value for sure_stf (-8.6731 for lmbda = 0.4916). If I pass in 0.4916 manually to sure_sft, I obtain -7.99809 instead. What am I doing incorrectly? I would appreciate any advice!
EDIT: I've pasted my code below. The data is from: https://web.stanford.edu/~chadj/HallJones400.asc
import pandas as pd
import numpy as np
from scipy.optimize import minimize_scalar
# FUNCTIONS
# Calculate orthogonal projection of g onto f
def proj(f, g):
return ( np.dot(f,g) / np.dot(f,f) ) * f
def gs(X):
# Copy of X -- will be used to store orthogonalization
F = np.copy(X)
# Orthogonalize design matrix
for i in range(1, X.shape[1]): # Iterate over columns of X
for j in range(i): # Iterate over columns less than current one
F[:,i] -= proj(F[:,j], X[:,i]) # Subtract projection of x_i onto f_j for all j<i from F_i
# normalize each column to have unit length
norm_F=( (F**2).mean(axis=0) ) ** 0.5 # Row vector with sqrt root of average of the squares of each column
W = F/norm_F # Normalize
return W
# SURE for soft-thresholding
def sure_sft(z,lmbda, sigma):
indicator = np.abs(z) <= lmbda
minimum = np.minimum(z**2,lmbda**2)
return -sigma**2*np.sum(indicator) + np.sum(minimum)
# Import data.
data_raw = pd.read_csv("hall_jones1999.csv")
# Drop missing observations.
data = data_raw.dropna(subset=['logYL', 'Latitude'])
Y = data['logYL']
Y = np.array(Y)
N = Y.size
# Create design matrix.
design = np.empty([data['Latitude'].size,15])
design[:,0] = 1
for j in range(1, 15):
design[:,j] = data['Latitude']**j
K = design.shape[1]
# Use Gramm-Schmidt on design matrix.
W = gs(design)
Z = np.dot(W.T, Y)/N
# MLE
mu_mle = np.dot(W, Z)
# Soft-thresholding
# Use MLE residuals to calculate sigma for SURE calculation
sigma = np.sqrt(np.sum((Y - mu_mle)**2)/(N-K))
# Write SURE as a function of lmbda
sure = lambda lmbda: sure_sft(Z, lmbda, sigma)
# Find SURE-minimizing lmbda
lmbda = minimize_scalar(sure).x
min_sure = minimize_scalar(sure).fun #-8.673172212265738
# Compare to manually inputting minimized lambda into sure_sft
# I'm s
act_sure1 = sure_sft(Z, 0.49167598, sigma) #-7.998060514873529
act_sure2 = sure_sft(Z, 0.491675989, sigma) #-8.673172212306728
You're actually not doing anything wrong. I just tested out the code and confirmed that lmbda has a value of 0.4916759890416824 at the end of the script. You can confirm this for yourself by adding the following lines to the bottom of your script:
print(lmbda)
print(sure_sft(Z, lmbda, sigma))
when you run your script you should then see:
0.4916759890416824
-8.673158394698172
The only thing I can figure is that somehow the routine you were using to print out lmbda was set up to only print a fixed number of digits of floating point numbers, or somehow the printout was otherwise truncated.

Average True Range and Exponential Moving Average Functions on PandasDataSeries needed

I am stuck while calculating Average True Range[ATR] of a Series.
ATR is basically a Exp Movin Avg of TrueRange[TR]
TR is nothing but MAX of -
Method 1: Current High less the current Low
Method 2: Current High less the previous Close (absolute value)
Method 3: Current Low less the previous Close (absolute value)
In Pandas we dont have an inbuilt EMA function. Rather we have EWMA which is a weighted moving average.
If someone helps to calculate EMA that also will be good enough
def ATR(df,n):
df['H-L']=abs(df['High']-df['Low'])
df['H-PC']=abs(df['High']-df['Close'].shift(1))
df['L-PC']=abs(df['Low']-df['Close'].shift(1))
df['TR']=df[['H-L','H-PC','L-PC']].max(axis=1)
df['ATR_' + str(n)] =pd.ewma(df['TR'], span = n, min_periods = n)
return df
The above code doesnt give error but it also doesnt give correct values either. I compared it with manually calculating ATR values on same dataseries in excel and values were different
ATR excel formula-
Current ATR = [(Prior ATR x 13) + Current TR] / 14
- Multiply the previous 14-day ATR by 13.
- Add the most recent day's TR value.
- Divide the total by 14
This is the dataseries I used as a sample
start='2016-1-1'
end='2016-10-30'
auro=web.DataReader('AUROPHARMA.NS','yahoo',start,end)
You do need to use ewma
See here: An exponential moving average (EMA) is a type of moving average that is similar to a simple moving average, except that more weight is given to the latest data.
Read more: Exponential Moving Average (EMA) http://www.investopedia.com/terms/e/ema.asp#ixzz4ishZbOGx
I dont think your excel formula is right... Here is a manual way to calculate ema in python
def exponential_average(values, window):
weights = np.exp(np.linspace(-1.,0.,window))
weights /= weights.sum()
a = np.convolve(values, weights) [:len(values)]
a[:window]=a[window]
return a
scipy.signal.lfilter could help you.
scipy.signal.lfilter(b, a, x, axis=-1,zi=None)
The filter function is implemented as a direct II transposed structure. This means that the filter implements:
a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[M]*x[n-M]
- a[1]*y[n-1] - ... - a[N]*y[n-N]
If we normalize the above formula, we obtain the following one:
y[n] = b'[0]*x[n] + b'[1]*x[n-1] + ... + b'[M]*x[n-M]
- a'[1]*y[n-1] + ... + a'[N]*y[n-N]
where b'[i] = b[i]/a[0], i = 0,1,...,M; a'[j] = a[j]/a[0],j = 1,2,...,N
and a'[0] = 1
Exponential Moving Average formula:
y[n] = alpha*x[n] + (1-alpha)*y[n-1]
So to apply scipy.signal.lfilter, by the formula above we can set a and b as below:
a[0] = 1, a[1] = -(1-alpha)
b[0] = alpha
My implementation is as below, hope it to help you.
def ema(values, window_size):
alpha = 2./ (window_size + 1)
a = np.array([1, alpha - 1.])
b = np.array([alpha])
zi = sig.lfilter_zi(b, a)
y, _ = sig.lfilter(b, a, values, zi=zi)
return y

MLE for a Polya Distribution

I'm working on programming a MLE for the Polya distribution using scipy. The Nelder-Mead method is working, however I get a "Desired error not necessarily achieved due to precision loss." error when running BFGS. The Nelder-Mead method seems like it's too slow for my needs (I have a lot of fairly big data, say 1000 tables in some cases as big as 10x10000). I've tried using the check_grad function and the result is smallish on the example below (order 10^-2), so I'm not sure if that means there's a bug in the gradient of the log likelihood or the likelihood is just very strongly peaked. For what it's worth, I've stared quite hard at my code and I can't see the issue. Here's some example code to recreate the problem
#setup some data
from numpy.random import dirichlet, multinomial
from scipy.optimize import check_grad
alpha = [10,30,50]
p = pd.DataFrame(dirichlet(alpha,200))
data = p.apply(lambda x: multinomial(500,x),1)
a = np.array(data.mean(0))
#optimize
result = minimize(lambda a: -1*llike(data,exp(a)),
x0=np.log(a),
method='Nelder-Mead')
x0=result.x
result = minimize(lambda a: -1*llike(data,exp(a)),
x0=x0,
jac=lambda a: -1*gradient_llike(data,np.exp(a)),
method='BFGS')
exp(result.x) #should be close to alpha
#uhoh, let's check that this is right.
check_grad(func=lambda a: -1*llike(data,a),grad=lambda a: -1*gradient_llike(data,a),x0=alpha)
Here's the code for my functions
def log_polya(Z,alpha):
"""
Z is a vector of counts
https://en.wikipedia.org/wiki/Dirichlet-multinomial_distribution
http://mimno.infosci.cornell.edu/info6150/exercises/polya.pdf
"""
if not isinstance(alpha,np.ndarray):
alpha = np.array(alpha)
if not isinstance(Z,np.ndarray):
Z = np.array(Z)
#Concentration Parameter
A = sum(alpha)
#Number of Datapoints
N = sum(Z)
return gammaln(A) - gammaln(N+A) + sum(gammaln(Z+alpha) - gammaln(alpha))
def llike(data,alpha):
return sum(data.apply(log_polya,1,alpha=alpha))
def log_polya_derivative(Z,alpha):
if not isinstance(alpha,np.ndarray):
alpha = np.array(alpha)
if not isinstance(Z,np.ndarray):
Z = np.array(Z)
if 0. in Z+alpha:
Warning("invalid prior parameter,nans should be produced")
#Concentration Parameter
A = sum(alpha)
#Number of Datapoints
N = sum(Z)
K = len(Z)
return np.array([psi(A) - psi(N+A) + psi(Z[i]+alpha[i]) - psi(alpha[i]) for i in xrange(K)])
def gradient_llike(data,alpha):
return np.array(data.apply(log_polya_derivative,1,alpha=alpha).sum(0))
UPDATE: Still curious about this, but for those interested in a working implementation for this problem, the following code for implementing the Minka Fixed Point Algorithm seems to work well (i.e. recovers quickly values that are close to the true dirichlet parameter).
def minka_mle_polya(data):
"""
http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/minka-dirichlet.pdf
"""
data = np.array(data)
K = np.shape(data)[1]
alpha = np.array(data.mean(0))
alpha_new = np.ndarray((K))
precision = 10
while precision > 10**-5:
for k in range(K):
A = sum(alpha)
N = data.sum(1)
numerator = sum(
psi(data[:,k]+alpha[k])-psi(alpha[k])
)
denominator = sum(
psi(N+A)-psi(A)
)
alpha_new[k] = alpha[k]*numerator/denominator
precision = sum(abs(alpha_new - alpha))
alpha_old = np.array(alpha)
alpha = np.array(alpha_new)
print "Gap", precision

Fourier's fit coefficients

lately i am been working fitting a fourier series function to a periodic signal for retrieve the amplitude and the phase of each component via least squares, so i modified the code of this file for it:
import math
import numpy as np
#period of the signal
per=1.0
w = 2.0*np.pi/per
#number of fourier components.
nf = 5
fp = open("file.cat","r")
# m1 is the number of unknown coefficients.
m1 = 2*nf + 1
# Create empty matrices.
x = np.zeros((m1,m1))
y = np.zeros((m1,1))
xi = [0.0]*m1
# Read (time, value) from each line of the file.
for line in fp:
t = float(line.split()[0])
yi = float(line.split()[1])
xi[0] = 1.0
for k in range(1,nf+1):
xi[2*k-1] = np.sin(k*w*t)
xi[2*k] = np.cos(k*w*t)
for j in range(m1):
for k in range(m1):
x[j,k] += xi[j]*xi[k]
y[j] += yi*xi[j]
fp.close()
# Copy to big matrices.
X = np.mat( x.copy() )
Y = np.mat( y.copy() )
# Invert X and multiply by Y to get coefficients.
A = X.I*Y
A0 = A[0]
# Solution is A0 + Sum[ Amp*sin(k*wt + phi) ]
print "a[0] = %f" % A[0]
for k in range(1,nf+1):
amp = math.sqrt(A[2*k-1]**2 + A[2*k]**2)
phs = math.atan2(A[2*k],A[2*k-1])
print "amp[%d] = %f phi = %f" % (k, amp, phs)
but the plot show this (without the points, of course):
and it should show something like this:
somebody can tell me how can i compute the phase and the amplitude in another simpler way? a guide maybe, i will be very grateful.
cheers!
PD. I will attach the FILE that i used, just because :)
EDITED
The error was with a index :(
First, I defined the vector with the values:
amp = np.array([np.sqrt((A[2*k-1])**2 + (A[2*k])**2) for k in range(1,nf+1)])
phs = np.array([math.atan2(A[2*k],A[2*k-1]) for k in range(1,nf+1)])
and then, to build the signal, I defined:
def term(t): return np.array([amp[k]*np.sin(k*w*t + phs[k]) for k in range(len(amp))])
Signal = np.array([A0+sum(term(phase[i])) for i in range(len(mag))])
but within the np.sin(), k should be k+1, because the index start in 0 ·__·
def term(t): return np.array([amp[k]*np.sin((k+1)*w*t + phs[k]) for k in range(len(amp))])
plt.plot(phase,Signal,'r-',lw=3)
and that is all.
Thanks Marco Tompitak for the help!!
You're specifying the wrong period for the signal:
#period of the signal
per=0.178556
This gives you the resulting Fourier fit, indeed with a maximum period of ~0.17. The problem is that this number specifies the longest period that is present in your Fourier series. The function only has components with perior 0.17 or shorter. Apparently you are expecting a fit with period ~1, so it can never approximate that properly. You should specify per=1.0. There's nothing wrong with the algorithm; a quick writeup of a similar algorithm in Mathematica gives the same output and plausible results:

Categories