I'm using the Savitzky-Golay filter to smooth my data. I ran into a list of values that fall very quickly to values close to 0.0. The following figure is the original data:
After applying the Savitzky-Golay filter, I get the following:
We can see that it plots values below 0.0, which is wrong.
Here is the function I use:
def savitzky_golay(y, window_size, order, deriv=0, rate=1):
r"""Smooth (and optionally differentiate) data with a Savitzky-Golay filter.
The Savitzky-Golay filter removes high frequency noise from data.
It has the advantage of preserving the original shape and
features of the signal better than other types of filtering
approaches, such as moving averages techniques.
Parameters
----------
y : array_like, shape (N,)
the values of the time history of the signal.
window_size : int
the length of the window. Must be an odd integer number.
order : int
the order of the polynomial used in the filtering.
Must be less then `window_size` - 1.
deriv: int
the order of the derivative to compute (default = 0 means only smoothing)
Returns
-------
ys : ndarray, shape (N)
the smoothed signal (or it's n-th derivative).
Notes
-----
The Savitzky-Golay is a type of low-pass filter, particularly
suited for smoothing noisy data. The main idea behind this
approach is to make for each point a least-square fit with a
polynomial of high order over a odd-sized window centered at
the point.
Examples
--------
t = np.linspace(-4, 4, 500)
y = np.exp( -t**2 ) + np.random.normal(0, 0.05, t.shape)
ysg = savitzky_golay(y, window_size=31, order=4)
import matplotlib.pyplot as plt
plt.plot(t, y, label='Noisy signal')
plt.plot(t, np.exp(-t**2), 'k', lw=1.5, label='Original signal')
plt.plot(t, ysg, 'r', label='Filtered signal')
plt.legend()
plt.show()
References
----------
.. [1] A. Savitzky, M. J. E. Golay, Smoothing and Differentiation of
Data by Simplified Least Squares Procedures. Analytical
Chemistry, 1964, 36 (8), pp 1627-1639.
.. [2] Numerical Recipes 3rd Edition: The Art of Scientific Computing
W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery
Cambridge University Press ISBN-13: 9780521880688
"""
import numpy as np
from math import factorial
try:
window_size = np.abs(np.int(window_size))
order = np.abs(np.int(order))
except ValueError, msg:
raise ValueError("window_size and order have to be of type int")
if window_size % 2 != 1 or window_size < 1:
raise TypeError("window_size size must be a positive odd number")
if window_size < order + 2:
raise TypeError("window_size is too small for the polynomials order")
order_range = range(order+1)
half_window = (window_size -1) // 2
# precompute coefficients
b = np.mat([[k**i for i in order_range] for k in range(-half_window, half_window+1)])
m = np.linalg.pinv(b).A[deriv] * rate**deriv * factorial(deriv)
# pad the signal at the extremes with
# values taken from the signal itself
firstvals = y[0] - np.abs( y[1:half_window+1][::-1] - y[0] )
lastvals = y[-1] + np.abs(y[-half_window-1:-1][::-1] - y[-1])
y = np.concatenate((firstvals, y, lastvals))
return np.convolve( m[::-1], y, mode='valid')
Does anybody know why and how to fix it?
Such a thing is typical for several interpolating/ smoothing techniques, which are based on polynomials. (I did not calculate whether you have an actual 1/x function here.) Approximating a function like 1/x will probably give you overshooting to negative values (how often and how much depends on the order of Polynomials you use: n-th order might give you n-1 changes in sign).
A "fix" would be to use optimize.curve_fit() instead of a filter and define your function as something like
def f(x, a, b, c, d):
return a/(x-b)**c + d
Related
I am trying to reproduce figure 5.6 (attached) from the textbook "Modeling Infectious Diseases in Humans and Animals (official code repo)" (Keeling 2008) to verify whether my implementation of a seasonally forced SEIR (epidemiological model) is correct. An official program from the textbook that implements seasonal forcing indicates that large values of Beta 1 can lead to numerical errors, but if the figure has Beta 1 values that did not lead to numerical errors, then in principle this should not be the cause of the problem. My implementation correctly produces the graphs in row 0, column 0 and row 1, column 0 of figure 5.6 but there is no output in my figure for the remaining cells due to the numerical solution for the fraction of infected (see code at bottom) producing 0 (and the ln(0) --> -inf).
I do receive the following warnings:
ODEintWarning: Excess work done on this call
C:\Users\jared\AppData\Local\Temp\ipykernel_24972\2802449019.py:68:
RuntimeWarning: divide by zero encountered in log infected =
np.log(odeint(
C:\Users\jared\AppData\Local\Temp\ipykernel_24972\2802449019.py:68:
RuntimeWarning: invalid value encountered in log infected =
np.log(odeint(
Here is the textbook figure:
My figure within the same time range (990 - 1000 years). Natural log taken of fraction infected:
My figure but with a shorter time range (0 - 100 years). Natural log taken of fraction infected. The numerical solution for the infected population seems to fail between the 5 and 20 year mark for most of the seasonal parameters (Beta 1 and R0):
My figure with a shorter time range as above, but with no natural log taken of fraction infected.
Code to reproduce my figure:
# Code to minimally reproduce figure
import itertools
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
def seir(y, t, mu, sigma, gamma, omega, beta_zero, beta_one):
"""System of diff eqs for epidemiological model.
SEIR stands for susceptibles, exposed, infectious, and
recovered populations.
References:
[SEIR Python Program from Textbook](http://homepages.warwick.ac.uk/~masfz/ModelingInfectiousDiseases/Chapter2/Program_2.6/Program_2_6.py)
[Seasonally Forced SIR Program from Textbook](http://homepages.warwick.ac.uk/~masfz/ModelingInfectiousDiseases/Chapter5/Program_5.1/Program_5_1.py)
"""
s, e, i = y
beta = beta_zero * (1 + beta_one * np.cos(omega * t))
sdot = mu - (beta*i + mu)*s
edot = beta*s*i - (mu + sigma)*e
idot = sigma*e - (mu + gamma)*i
return sdot, edot, idot
def solve_beta_zero(basic_reproductive_rate, gamma):
"""Defined in the last paragraph of pg. 159 of textbook Keeling 2008."""
return gamma * basic_reproductive_rate
# Model parameters (see Figure 5.6 description)
mu = 0.02 / 365
sigma = 1/8
gamma = 1/5
omega = 2 * np.pi / 365 # frequency of oscillations per year
# Seasonal forcing parameters
r0s = [17, 10, 3]
b1s = [0.02, 0.1, 0.225]
# Permutes params to get tuples matching row i column j params in figure
# e.g., [(0.02, 17), (0.02, 10) ... ]
seasonal_params = [p for p in itertools.product(*(b1s, r0s))]
# Initial Conditions: I assume these are proportions of some total population
s0 = 6e-2
e0 = i0 = 1e-3
initial_conditions = [s0, e0, i0]
# Timesteps
nyears = 1000
days_per_year = 365
ndays = nyears * days_per_year
timesteps = np.arange(1, ndays+1, 1)
# Range to slice data to reproduce my figures
# NOTE: CHange the min slice or max slice for different ranges
min_slice = 990 # or 0
max_slice = 1000 # or 100
sliced = slice(min_slice * days_per_year, max_slice * days_per_year)
x_ticks = timesteps[sliced]/days_per_year
# Define figure
nrows = 3
ncols = 3
fig, ax = plt.subplots(nrows, ncols, sharex=True, figsize=(15, 8))
# Iterate through parameters and recreate figure
for i in range(nrows):
for j in range(ncols):
# Get seasonal parameters for this subplot
beta_one = seasonal_params[i * nrows + j][0]
basic_reproductive_rate = seasonal_params[i * nrows + j][1]
# Compute beta zero given the reproductive rate
beta_zero = solve_beta_zero(
basic_reproductive_rate=basic_reproductive_rate,
gamma=gamma)
# Numerically solve the model, extract only the infected solutions,
# slice those solutions to the desired time range, and then natural
# log scale them
solutions = odeint(
seir,
initial_conditions,
timesteps,
args=(mu, sigma, gamma, omega, beta_zero, beta_one))
infected_solutions = solutions[:, 2]
log_infected = np.log(infected_solutions[sliced])
# NOTE: To inspect results without natural log, uncomment the
# below line
# log_infected = infected_solutions[sliced]
# DEBUG: For shape and parameter printing
# print(
# infected_solutions.shape, 'R0=', basic_reproductive_rate, 'B1=', beta_one)
# Plot results
ax[i,j].plot(x_ticks, log_infected)
# label subplot
ax[i,j].title.set_text(rf'$(R_0=${basic_reproductive_rate}, $\beta_{1}=${beta_one})')
fig.supylabel('NaturalLog(Fraction Infected)')
fig.supxlabel('Time (years)')
Disclaimer:
My short term solution is to simply change the list of seasonal parameters to values that will produce data for that range, and this adequately illustrates the effects of seasonal forcing. The point is to reproduce the figure, though, and if the author was able to do it, others should be able to as well.
Your first (and possibly main) problem is one of scale. This diagnosis is also conform with the observations in your later experiments. The system is such that if it is started with positive values, it should stay within positive values. That negative values are reached is only possible if the step errors of the numerical method are too large.
As you can see in the original graphs, the range of values goes from exp(-7) ~= 9e-4 to exp(-12) ~= 6e-6. The value of the absolute tolerance should force at least 3 digits to be exact, so atol = 1e-10 or smaller. The relative tolerance should be adapted similarly. Viewing all components together shows that the first component has values around exp(-2.5) ~= 5e-2, so per-component tolerances should provide better results. The corresponding call is
solutions = odeint(
seir,
initial_conditions,
timesteps,
args=(mu, sigma, gamma, omega, beta_zero, beta_one),
atol = [1e-9,1e-13,1e-13], rtol=1e-11)
With these parameters I get the plots below
The first row and first column are as in the cited graphic, the others look different.
As a test and a general method to integrate in the range of small positive solutions, reformulate for the integration of the logarithms of the components. This can be done with a simple wrapper
def seir_log(log_y,t,*args):
y = np.exp(log_y)
dy = np.array(seir(y,t,*args))
return dy/y # = d(log(y))
Now the expected values have scale 1 to 10, so that the tolerances are no longer so critical, default tolerances should be sufficient, but it is better to work with documented tolerances.
log_solution = odeint(
seir_log,
np.log(initial_conditions),
timesteps,
args=(mu, sigma, gamma, omega, beta_zero, beta_one),
atol = 1e-8, rtol=1e-9)
log_infected = log_solution[sliced,2]
The bottom-left plot is still sensible to atol, with 1e-7 one gets a more wavy picture. Bounding the step size with hmax=5 also stabilizes that. With the code as above the plots are
The center plot is still different than the reference. It might be that there are different stable cycles.
I am interested in integrating in Fourier space after using scipy to take an fft of some data. I have been following along with this stack exchange post numerical integration in Fourier space with numpy.fft but it does not properly integrate a few test cases I have been working with. I have added a few lines to address this issue but still am not recovering the correct integrals. Below is the code I have been using to integrate my test cases. At the top of the code are the 3 test cases I have been using.
import numpy as np
import scipy.special as sp
from scipy.fft import fft, ifft, fftfreq
import matplotlib.pyplot as plt
#set number of points in array
Ns = 2**16
#create array in space
x = np.linspace(-np.pi, np.pi, Ns)
#test case 1 from stack exchange post
# y = np.exp(-x**2) # function f(x)
# ys = np.exp(-x**2) * (-2 *x) # derivative f'(x)
#test case 2
# y = np.exp(-x**2) * x - 1/2 *np.sqrt(np.pi)*sp.erf(x)
# ys = np.exp(-x**2) * -2*x**2
#test case 3
y = np.sin(x**2) + (1/4)*np.exp(x)
ys = 1/4*(np.exp(x) + 8*x*np.cos(x**2))
#find spacing in space array
ss = x[1]-x[0]
#definte fft integration function
def fft_int(N,s,dydt):
#create frequency array
f = fftfreq(N,s)
# integration step ignoring divide by 0 errors
Fys = fft(dydt)
with np.errstate(divide="ignore", invalid="ignore"):
modFys = Fys / (2*np.pi*1j*f)
#set DC term to 0, was a nan since we divided by 0
modFys[0] = 0
#take inverse fft and subtract by integration constant
fourier = ifft(modFys)
fourier = fourier-fourier[0]
#tilt correction if function doesn't approach 0 at its ends
tilt = np.sum(dydt)*s*(np.arange(0,N)/(N-1) - 1/2)
fourier = fourier + tilt
return fourier
Test case 1 was from the stack exchange post from above. If you copy paste the code from the top answer and plot you'll get something like this:
with the solid blue line being the fft integration method and the dashed orange as the analytic solution. I account for this offset with the following line of code:
fourier = fourier-fourier[0]
since I don't believe the code was setting the constant of integration.
Next for test case 2 I get a plot like this:
again with the solid blue line being the fft integration method and the dashed orange as the analytic solution. I account for this tilt in the solution using the following lines of code
tilt = np.sum(dydt)*s*(np.arange(0,N)/(N-1) - 1/2)
fourier = fourier + tilt
Finally we arrive at test case 3. Which results in the following plot:
again with the solid blue line being the fft integration method and the dashed orange as the analytic solution. This is where I'm stuck, this offset has appeared again and I'm not sure why.
TLDR: How do I correctly integrate a function in fourier space using scipy.fft?
The tilt component makes no sense. It fixes one function, but it's not a generic solution of the problem.
The problem is that the FFT induces periodicity in the signal, meaning you compute the integral of a different function. Multiplying the FFT of the signal by 1/(2*np.pi*1j*f) is equivalent to a circular convolution of the signal with ifft(1/(2*np.pi*1j*f)). "Circular" is the key here. This is just a boundary problem.
Padding the function with zeros is one way to attempt to fix this:
import numpy as np
import scipy.special as sp
from scipy.fft import fft, ifft, fftfreq
import matplotlib.pyplot as plt
def fft_int(s, dydt, N=0):
dydt_padded = np.pad(dydt, (0, N))
f = fftfreq(dydt_padded.shape[0], s)
F = fft(dydt_padded)
with np.errstate(divide="ignore", invalid="ignore"):
F = F / (2*np.pi*1j*f)
F[0] = 0
y_padded = np.real(ifft(F))
y = y_padded[0:dydt.shape[0]]
return y - np.mean(y)
N = 2**16
x = np.linspace(-np.pi, np.pi, N)
s = x[1] - x[0]
# Test case 3
y = np.sin(x**2) + (1/4)*np.exp(x)
dy = 1/4*(np.exp(x) + 8*x*np.cos(x**2))
plt.plot(y - np.mean(y))
plt.plot(fft_int(s, dy))
plt.plot(fft_int(s, dy, N))
plt.plot(fft_int(s, dy, 10*N))
plt.show()
(Blue is expected output, computed solution without padding is orange, and with increasing amount of padding, green and red.)
Here I've solved the "offset" problem by plotting all functions with their mean removed. Setting the DC component to 0 is equal to subtracting the mean. But after cropping off the padding the mean changes, so fft_int subtracts the mean again after cropping.
Anyway, note how we get an increasingly better approximation as the padding increases. To get the exact result, one would need an infinite amount of padding, which of course is unrealistic.
Test case #1 doesn't need padding, the function reaches zero at the edges of the sampled domain. We can impose such a behavior on the other cases too. In Discrete Fourier analysis this is called windowing. This would look something like this:
def fft_int(s, dydt):
dydt_windowed = dydt * np.hanning(dydt.shape[0])
f = fftfreq(dydt.shape[0], s)
F = fft(dydt_windowed)
with np.errstate(divide="ignore", invalid="ignore"):
F = F / (2*np.pi*1j*f)
F[0] = 0
y = np.real(ifft(F))
return y
However, here we get correct integration results only in the middle of the domain, with increasingly suppressed values towards to ends. So this is not a practical solution either.
My conclusion is that no, this is not possible to do. It is much easier to compute the integral with np.cumsum:
yp = np.cumsum(dy) * s
plt.plot(y - np.mean(y))
plt.plot(yp - np.mean(yp))
plt.show()
(not showing output: the two plots overlap perfectly.)
Background: I observe a sample of a variable z that is the sum of two independent and identically distributed variables x and y. I'm trying to recover the distribution of x, y (call it f) from the distribution of z (call it g), under the assumption that f is symmetric about zero. According to Horowitz and Markatou (1996) we have that the Fourier Transform of f is equal to sqrt(|G|), where G is the Fourier transform of g.
Example:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde, laplace
# sample size
size = 10000
# Number of points to preform FFT on
N = 501
# Scale of the laplace rvs
scale = 3.0
# Test deconvolution
laplace_f = laplace(scale=scale)
x = laplace_f.rvs(size=size)
y = laplace_f.rvs(size=size)
z = x + y
t = np.linspace(-4 * scale, 4 * scale, size)
laplace_pdf = laplace_f.pdf(t)
t2 = np.linspace(-4 * scale, 4 * scale, N)
# Get density from z. Kind of cheating using gaussian
z_density = gaussian_kde(z)(t2)
z_density = (z_density + z_density[::-1]) / 2
z_density_half = z_density[:((N - 1) // 2) + 1]
ft_z_density = np.fft.hfft(z_density_half)
inv_fz_density = np.fft.ihfft(np.sqrt(np.abs(ft_z_density)))
inv_fz_density = np.r_[inv_fz_density, inv_fz_density[::-1][:-1]]
f_deconv_shifted = np.real(np.fft.fftshift(inv_fz_density))
f_deconv = np.real(inv_fz_density)
# Normalize to be a pdf
f_deconv_shifted /= f_deconv_shifted.mean()
f_deconv /= f_deconv.mean()
# Plot
plt.subplot(221)
plt.plot(t, laplace_pdf)
plt.title('laplace pdf')
plt.subplot(222)
plt.plot(t2, z_density)
plt.title("z density")
plt.subplot(223)
plt.plot(t2, f_deconv_shifted)
plt.title('Deconvolved with shift')
plt.subplot(224)
plt.plot(t2, f_deconv)
plt.title('Deconvolved without shift')
plt.tight_layout()
plt.show()
Which results in
Issue: there's clearly something wrong here. I don't think I should need the shift, yet the shifted pdf seems to be closer to the truth. I suspect it has something to do with the domain of the IFFT changing with the sqrt(abs()) operation, but I'm really not sure.
The FFT is defined such that the times associated to the input samples are t=0..N-1. That is, the origin is in the first sample. The same is true for the output, the associated frequencies are k=0..N-1.
Your distribution is symmetric about zero, but ignoring your t (which you cannot pass to the FFT function), and knowing what the t values are that are implied by the FFT definition, you can see that your distribution is actually shifted, which adds a phase component to the frequency domain. You ignore the phase there (by using hfft instead of fft, which means you shift your input signal such that it becomes symmetric about the origin as defined by the FFT (not your origin).
fftshift shifts the signal resulting from the IFFT such that the origin is back where you want it to be. I recommend that you use ifftshift before calling hfft, just to ensure that your signal is actually symmetric as expected by that function. I don't know if it will make a difference, it depends on how this function is implemented.
I have two numpy arrays, one is an array of x values and the other an array of y values and together they give me the empirical cdf. E.g.:
plt.plot(xvalues, yvalues)
plt.show()
I assume the data needs to be smoothed somehow in order to give a smooth pdf.
I would like to plot the pdf. How can I do that?
The raw data is at: http://dpaste.com/1HVK5DR .
There are two main problems: Your data seems to be quite noisy, and it is not equally spaced: The points at the low end are sampled quite densly, while the ponts at the high end are sampled quite sparsely. This can cause numerical issues.
So first I suggest resampling the data using a linear interpolation to get equaly spaced samples: (Note that all the snippets appended to eachother form the content of one python file.)
import matplotlib.pyplot as plt
import numpy as np
from data import xvalues, yvalues #load data from file
print("#datapoints: {}".format(len(xvalues)))
#don't use every point if your computer is not very fast
xv = np.array(xvalues)[::5]
yv = np.array(yvalues)[::5]
#interpolate to have evenly space data
xi = np.linspace(xv.min(), xv.max(), 400)
yi = np.interp(xi, xv, yv)
Then, to smoothen the data, I suggest performing a RBF regression (=using an "RBF Network"). The idea is fiting a curve of the form
c(t) = sum a(i) * phi(t - x(i)) #(not part of the program)
where phi is some radial basis function. (In theory we could use any functions.) To have a very smooth result I choose a very smooth function, namely a gaussian: phi(x) = exp( - x^2/sigma^2) where sigma is yet to be determined. The x(i) are just some nodes that we can define. If we have a smooth function, we just need a few nodes. The number of nodes also determines how much computation needs to be done. The a(i) are the coefficients we can optimize to get the best fit. In this case I just use a least squares approach.
Note that IF we can write a function in the form above, it is very easy to compute the derivative, it is just
c(t) = sum a(i) * phi'(t - x(i))
where phi' is the derivative of phi. #(not part of the program)
Regarding sigma: It is usually a good idea to choose it as a multiple of the step between the nodes we chose. The greater we choose sigma, the smoother the resulting function gets.
#set up rbf network
rbf_nodes = xv[::50][None, :]#use a subset of the x-values as rbf nodes
print("#rbfs: {}".format(rbf_nodes.shape[1]))
#estimate width of kernels:
sigma = 20 #greater = smoother, this is the primary parameter to play with
sigma *= np.max(np.abs(rbf_nodes[0,1:]-rbf_nodes[0,:-1]))
# kernel & derivative
rbf = lambda r:1/(1+(r/sigma)**2)
Drbf = lambda r: -2*r*sigma**2/(sigma**2 + r**2)**2
#compute coefficients of rbf network
r = np.abs(xi[:, None]-rbf_nodes)
A = rbf(r)
coeffs = np.linalg.lstsq(A, yi, rcond=None)[0]
print(coeffs)
#evaluate rbf network
N=1000
xe = np.linspace(xi.min(), xi.max(), N)
Ae = rbf(xe[:, None] - rbf_nodes)
ye = Ae # coeffs
#evaluate derivative
N=1000
xd = np.linspace(xi.min(), xi.max(), N)
Bd = Drbf(xe[:, None] - rbf_nodes)
yd = Bd # coeffs
fig,ax = plt.subplots()
ax2 = ax.twinx()
ax.plot(xv, yv, '-')
ax.plot(xi, yi, '-')
ax.plot(xe, ye, ':')
ax2.plot(xd, yd, '-')
fig.savefig('graph.png')
print('done')
You need the derivative to go from CDF to PDF
PDF(x) = d CDF(x)/ dx
With NumPy, you could use gradient
pdf = np.gradient(yvalues, xvalues)
plt.plot(xvalues, pdf)
plt.show()
or manual differential
pdf = np.diff(yvalues)/np.diff(xvalues)
l = np.asarray(xvalues[:-1])
r = np.asarray(xvalues[1:])
plt.plot((l+r)/2.0, pdf) # points in the middle of interval
plt.show()
Both produce something like, updated picture it got botched somehow
I am interested in generating an array(or numpy Series) of length N that will exhibit specific autocorrelation at lag 1. Ideally, I want to specify the mean and variance, as well, and have the data drawn from (multi)normal distribution. But most importantly, I want to specify the autocorrelation. How do I do this with numpy, or scikit-learn?
Just to be explicit and precise, this is the autocorrelation I want to control:
numpy.corrcoef(x[0:len(x) - 1], x[1:])[0][1]
If you are interested only in the auto-correlation at lag one, you can generate an auto-regressive process of order one with the parameter equal to the desired auto-correlation; this property is mentioned on the Wikipedia page, but it's not hard to prove it.
Here is some sample code:
import numpy as np
def sample_signal(n_samples, corr, mu=0, sigma=1):
assert 0 < corr < 1, "Auto-correlation must be between 0 and 1"
# Find out the offset `c` and the std of the white noise `sigma_e`
# that produce a signal with the desired mean and variance.
# See https://en.wikipedia.org/wiki/Autoregressive_model
# under section "Example: An AR(1) process".
c = mu * (1 - corr)
sigma_e = np.sqrt((sigma ** 2) * (1 - corr ** 2))
# Sample the auto-regressive process.
signal = [c + np.random.normal(0, sigma_e)]
for _ in range(1, n_samples):
signal.append(c + corr * signal[-1] + np.random.normal(0, sigma_e))
return np.array(signal)
def compute_corr_lag_1(signal):
return np.corrcoef(signal[:-1], signal[1:])[0][1]
# Examples.
print(compute_corr_lag_1(sample_signal(5000, 0.5)))
print(np.mean(sample_signal(5000, 0.5, mu=2)))
print(np.std(sample_signal(5000, 0.5, sigma=3)))
The parameter corr lets you set the desired auto-correlation at lag one and the optional parameters, mu and sigma, let you control the mean and standard deviation of the generated signal.