How to plot confidence intervals for stattools ccf function? - python

I am computing the cross-correlation function using ccf from statsmodels. It works fine except I can't see how to also plot the confidence intervals. I notice that acf seems to have much more functionality. Here is a toy example just to have something to see:
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.tsa.stattools as stattools
def create(n):
x = np.zeros(n)
for i in range(1, n):
if np.random.rand() < 0.9:
if np.random.rand() < 0.5:
x[i] = x[i-1] + 1
else:
x[i] = np.random.randint(0,100)
return x
x = create(4000)
y = create(4000)
plt.plot(stattools.ccf(x, y)[:100])
This gives:

Unfortunately, the confidence interval is not provided by the statsmodels cross-correlation function (ccf). In R the ccf() would also print the confidence interval.
Here, we need to calculate the confidence interval by ourself and plot it out afterwards. The confidence interval is here computed as 2 / np.sqrt(lags). For basic info on confidence intervals for cross-correlation refer to:
Stats StackExchange answer by Rob Hyndman: https://stats.stackexchange.com/a/3128/43304
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.tsa.stattools as stattools
def create(n):
x = np.zeros(n)
for i in range(1, n):
if np.random.rand() < 0.9:
if np.random.rand() < 0.5:
x[i] = x[i-1] + 1
else:
x[i] = np.random.randint(0,100)
return x
x = create(4000)
y = create(4000)
lags= 4000
sl = 2 / np.sqrt(lags)
plt.plot(x, list(np.ones(lags) * sl), color='r')
plt.plot(x, list(np.ones(lags) * -sl), color='r')
plt.plot(stattools.ccf(x, y)[:100])
This leads to the following plot with the additional red lines:

Related

Distribution plot with wrong total value

To create
I have made a distribution plot with code below:
from numpy import *
import numpy as np
import matplotlib.pyplot as plt
sigma = 4.1
x = np.linspace(-6*sigma, 6*sigma, 200)
def distr(n):
def g(x):
return (1/(sigma*sqrt(2*pi)))*exp(-0.5*(x/sigma)**2)
FxSum = 0
a = list()
for i in range(n):
# divide into 200 parts and sum one by one
numb = g(-6*sigma + (12*sigma*i)/n)
FxSum += numb
a.append(FxSum)
return a
plt.plot(x, distr(len(x)))
plt.show()
This is, of course, a way of getting the result without using hist(), cdf() or any other options from Python libraries.
Why the total sum is not 1? It shouldn't depend from (for example) sigma.
Almost right, but in order to integrate you have to multiply the function value g(x) times your tiny interval dx (12*sigma/200). That's the area you sum up:
from numpy import *
import numpy as np
import matplotlib.pyplot as plt
sigma = 4.1
x = np.linspace(-6*sigma, 6*sigma, 200)
def distr(n):
def g(x):
return (1/(sigma*sqrt(2*pi)))*exp(-0.5*(x/sigma)**2)
FxSum = 0
a = list()
for i in range(n):
# divide into 200 parts and sum one by one
numb = g(-6*sigma + (12*sigma*i)/n) * (12*sigma/200)
FxSum += numb
a.append(FxSum)
return a
plt.plot(x, distr(len(x)))
plt.show()

do same calculation over and over in loop with changing variable each time

I want to run this code with several x values and get all the outputs in a list. First run x should be 1, next loop x should be 2, then 3 etc... Is there an easy way to implement this in my code?
EDIT: The loop is now working after i added:
for x in range(1, max_value):
Is there an way I can make a list of the outputs for the degrees of freedom for each loop?
https://imgur.com/eQxHzHZ
import numpy as np
import math
from scipy.stats import skew, kurtosis, kurtosistest
import matplotlib.pyplot as plt
from scipy.stats import norm,t
import pandas as pd
data = pd.read_excel(r"filename.xlsx",sheet_name,skiprows=x+5,usecols="C")
ret = np.array(data.values)
from scipy.stats import skew, kurtosis
X = np.random.randn(10000000)
print(skew(X))
print(kurtosis(X, fisher=False))
# N(x; mu, sig) best fit (finding: mu, stdev)
mu_norm, sig_norm = norm.fit(ret)
dx = 0.0001 # resolution
x = np.arange(-0.1, 0.1, dx)
pdf = norm.pdf(x, mu_norm, sig_norm)
print("Integral norm.pdf(x; mu_norm, sig_norm) dx = %.2f" % (np.sum(pdf*dx)))
print("Sample mean = %.5f" % mu_norm)
print("Sample stdev = %.5f" % sig_norm)
print()
df = pd.DataFrame(ret)
# Student t best fit (finding: nu)
x = t.fit(ret)
nu, mu_t, sig_t = x
pdf2 = t.pdf(x, nu, mu_t, sig_t)
print("Integral t.pdf(x; mu, sig) dx = %.2f" % (np.sum(pdf2*dx)))
print("nu = %.2f" % nu)
print()
You can use a for loop :
for x in range(n):
f(x)
will call the function f on x with x=0, x=1, all the way to x=n-1.
Put the whole code in a for loop that increments x each time.
for x in range(1, max_value):
#do stuff
#add value to a list
print(your_list)
Side note: maybe add all your imports at the beginning, before any scripts start
EDIT x2: as x is overwritten, do
my_list = []
for my_var in range(1, max_value):
x = my_var
#do stuff with x
#add value to a list
my_list.append(x)
print(my_list)

The Birthday paradox - how to plot

from __future__ import division, print_function
from numpy.random import randint
import random
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
def bday(c):
trials = 5000
count = 0
for trial in range(trials):
year = [0]*365
l = False
for i in range(c):
bdayp = randint(1,365)
year[bdayp] = year[bdayp] + 1
if year[bdayp] > 1:
l = True
if l == True:
count = count + 1
prob = count / trials
return prob
for i in range(2,41):
a = bday(i)
print(i,a)
As you can see, I generate the number of people in the class along with the probability that they share a birthday. How can I plot this so that I have n (number of people) on the x-axis and probability on the y-axis using matplotlib.pyplot?
Thanks.
I've linked in the comments the proper documentation to your problem. For the sake of you finding your own solution, perhaps looking at the following might make more sense of how to go about your problem:
def func(x):
return x * 10
x = []
y = []
for i in range(10):
x.append(i)
y.append(func(i))
plt.plot(x, y)
The above can also be achieved by doing the following:
def func(x):
return x * 10
x = np.arange(10)
plt.plot(x, func(x))
Here is the documentation for np.arange; both will plot the following:

efficient discrete bayes filter for localization

I'm trying to implement a discrete bayes filter (i.e. histogram filter) for robot localization as described in 'Probabilistic Robotics' by Thrun, Burgard, and Fox. The model is a robot that moves in one dimensions, so the state vector is just the position and velocity. At each time step there is a random acceleration with mean of zero and variance of one.
I think that my implementation is accurate, but it runs slowly. I'm looping over indices of my probability density map which seems very inefficient, but I don't see how to vectorize this algorithm correctly. Any suggestions?
My code:
import matplotlib.pyplot as plt
import numpy as np
from math import pi, exp, sqrt
import matplotlib.cm as cm
N = 31
numsteps = 5
# set up grid
x = np.linspace(-10, 10, N)
v = np.linspace(-10, 10, N)
sp = (x[1] - x[0])/2
X, V = np.meshgrid(x, v)
# initial probability distribution is a delta function at origin
p0 = np.zeros(X.shape)
p0[N/2, N/2] = 1
plt.ion()
plt.imshow(p0, cmap = cm.Greys_r, interpolation='none')
plt.draw()
acc_var = 1 # Variance of random acceleration
for ii in range(numsteps):
print 'Step %d'%ii
p1 = np.zeros(X.shape)
for (i, j), p in np.ndenumerate(p0): #outer loop over configuration space
for (k,l), x in np.ndenumerate(X):
# new position is old position plus velocity
if X[i,j] > X[k,l] + V[k,l] - sp and X[i,j] <= X[k,l] + V[k,l] + sp:
# Bayesian update with random acceleration
p1[i,j] = p1[i,j] + exp(-0.5 * (V[i,j] - V[k,l])**2 / acc_var) / sqrt(2*pi*acc_var) * p0[k,l]
p0 = p1
plt.imshow(p1, cmap = cm.Greys_r, interpolation='none')
plt.draw()

How to specify upper and lower limits when using numpy.random.normal

I want to be able to pick values from a normal distribution that only ever fall between 0 and 1. In some cases I want to be able to basically just return a completely random distribution, and in other cases I want to return values that fall in the shape of a gaussian.
At the moment I am using the following function:
def blockedgauss(mu,sigma):
while True:
numb = random.gauss(mu,sigma)
if (numb > 0 and numb < 1):
break
return numb
It picks a value from a normal distribution, then discards it if it falls outside of the range 0 to 1, but I feel like there must be a better way of doing this.
It sounds like you want a truncated normal distribution.
Using scipy, you could use scipy.stats.truncnorm to generate random variates from such a distribution:
import matplotlib.pyplot as plt
import scipy.stats as stats
lower, upper = 3.5, 6
mu, sigma = 5, 0.7
X = stats.truncnorm(
(lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma)
N = stats.norm(loc=mu, scale=sigma)
fig, ax = plt.subplots(2, sharex=True)
ax[0].hist(X.rvs(10000), normed=True)
ax[1].hist(N.rvs(10000), normed=True)
plt.show()
The top figure shows the truncated normal distribution, the lower figure shows the normal distribution with the same mean mu and standard deviation sigma.
I came across this post while searching for a way to return a series of values sampled from a normal distribution truncated between zero and 1 (i.e. probabilities). To help anyone else who has the same problem, I just wanted to note that scipy.stats.truncnorm has the built-in capability ".rvs".
So, if you wanted 100,000 samples with a mean of 0.5 and standard deviation of 0.1:
import scipy.stats
lower = 0
upper = 1
mu = 0.5
sigma = 0.1
N = 100000
samples = scipy.stats.truncnorm.rvs(
(lower-mu)/sigma,(upper-mu)/sigma,loc=mu,scale=sigma,size=N)
This gives a behavior very similar to numpy.random.normal, but within the bounds desired. Using the built-in will be substantially faster than looping to gather samples, especially for large values of N.
I have made an example script by the following. It shows how to use the APIs to implement the functions we wanted, such as generate samples with known parameters, how to compute CDF, PDF, etc. I also attach an image to show this.
#load libraries
import scipy.stats as stats
#lower, upper, mu, and sigma are four parameters
lower, upper = 0.5, 1
mu, sigma = 0.6, 0.1
#instantiate an object X using the above four parameters,
X = stats.truncnorm((lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma)
#generate 1000 sample data
samples = X.rvs(1000)
#compute the PDF of the sample data
pdf_probs = stats.truncnorm.pdf(samples, (lower-mu)/sigma, (upper-mu)/sigma, mu, sigma)
#compute the CDF of the sample data
cdf_probs = stas.truncnorm.cdf(samples, (lower-mu)/sigma, (upper-mu)/sigma, mu, sigma)
#make a histogram for the samples
plt.hist(samples, bins= 50,normed=True,alpha=0.3,label='histogram');
#plot the PDF curves
plt.plot(samples[samples.argsort()],pdf_probs[samples.argsort()],linewidth=2.3,label='PDF curve')
#plot CDF curve
plt.plot(samples[samples.argsort()],cdf_probs[samples.argsort()],linewidth=2.3,label='CDF curve')
#legend
plt.legend(loc='best')
In case anybody wants a solution using numpy only, here is a simple implementation using a normal function and a clip (the MacGyver's approach):
import numpy as np
def truncated_normal(mean, stddev, minval, maxval):
return np.clip(np.random.normal(mean, stddev), minval, maxval)
EDIT: do NOT use this!! this is how you shouldn't do it!! for instance,
a = truncated_normal(np.zeros(10000), 1, -10, 10)
may look like it works, but
b = truncated_normal(np.zeros(10000), 100, -1, 1)
will definitely not draw a truncated normal, as you can see in the following histogram:
Sorry for that, hope nobody got hurt! I guess the lesson is, don't try to emulate MacGyver at coding...
Cheers,
Andres
I have tested some solutions using numpy. Through trial and error method, I found out that ± variation divided by 3 is a good guess for standard deviation.
Following you have some examples:
The basics
import numpy as np
import matplotlib.pyplot as plt
val_min = 1000
val_max = 2000
variation = (val_max - val_min)/2
std_dev = variation/3
mean = (val_max + val_min)/2
dist_normal = np.random.normal(mean, std_dev, 1000)
print('Normal distribution\n\tMin: {0:.2f}, Max: {1:.2f}'
.format(dist_normal.min(), dist_normal.max()))
plt.hist(dist_normal, bins=30)
plt.show()
A comparative case
import numpy as np
import matplotlib.pyplot as plt
val_min = 1400
val_max = 2800
variation = (val_max - val_min)/2
std_dev = variation/3
mean = (val_max + val_min)/2
fig, ax = plt.subplots(3, 3)
plt.suptitle("Histogram examples by Davidson Lima (github.com/davidsonlima)",
fontweight='bold')
i = 0
j = 0
pos = 1
while (i < 3):
while (j < 3):
dist_normal = np.random.normal(mean, std_dev, 1000)
max_min = 'Min: {0:.2f}, Max: {1:.2f}'.format(dist_normal.min(), dist_normal.max())
ax[i, j].hist(dist_normal, bins=30, label='Dist' + str(pos))
ax[i, j].set_title('Normal distribution ' + str(pos))
ax[i, j].legend()
ax[i, j].text(mean, 0, max_min, horizontalalignment='center', color='white',
bbox={'facecolor': 'red', 'alpha': 0.5})
print('Normal distribution {0}\n\tMin: {1:.2f}, Max: {2:.2f}'
.format(pos, dist_normal.min(), dist_normal.max()))
j += 1
pos += 1
j = 0
i += 1
plt.show()
If someone has a better approach with numpy, please comment below.
The parametrization of truncnorm is complicated, so here is a function that translates the parametrization to something more intuitive:
from scipy.stats import truncnorm
def get_truncated_normal(mean=0, sd=1, low=0, upp=10):
return truncnorm(
(low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd)
How to use it?
Instance the generator with the parameters: mean, standard deviation, and truncation range:
>>> X = get_truncated_normal(mean=8, sd=2, low=1, upp=10)
Then, you can use X to generate a value:
>>> X.rvs()
6.0491227353928894
Or, a numpy array with N generated values:
>>> X.rvs(10)
array([ 7.70231607, 6.7005871 , 7.15203887, 6.06768994, 7.25153472,
5.41384242, 7.75200702, 5.5725888 , 7.38512757, 7.47567455])
A Visual Example
Here is the plot of three different truncated normal distributions:
X1 = get_truncated_normal(mean=2, sd=1, low=1, upp=10)
X2 = get_truncated_normal(mean=5.5, sd=1, low=1, upp=10)
X3 = get_truncated_normal(mean=8, sd=1, low=1, upp=10)
import matplotlib.pyplot as plt
fig, ax = plt.subplots(3, sharex=True)
ax[0].hist(X1.rvs(10000), normed=True)
ax[1].hist(X2.rvs(10000), normed=True)
ax[2].hist(X3.rvs(10000), normed=True)
plt.show()
actually, you can normalize the data, then transit it to the range you need. sorry for firstly use, i dont know how to show pictures directly
the function is shown
I developed a simple function for creating a list of values in a range using numpy.random.normal and some extra code.
def truncnormal(meanv, sd, minv, maxv, n):
finallist = []
initiallist = []
while len(finallist) < n:
initiallist = list(np.random.normal(meanv, sd, n))
initiallist.sort()
indexmin = 0
indexmax = 0
for item in initiallist:
if item < minv:
indexmin = indexmin + 1
else:
break
for item in initiallist[::-1]:
if item > maxv:
indexmax = indexmax + 1
else:
break
indexmax = -indexmax
finallist = finallist + initiallist[indexmin:indexmax]
shuffle(finallist)
finallist = finallist[:n]
print(len(finallist), min(finallist), max(finallist))
truncnormal(10, 3, 8, 11, 10000)

Categories