I want to add some random noise to some 100 bin signal that I am simulating in Python - to make it more realistic.
On a basic level, my first thought was to go bin by bin and just generate a random number between a certain range and add or subtract this from the signal.
I was hoping (as this is python) that there might a more intelligent way to do this via numpy or something. (I suppose that ideally a number drawn from a gaussian distribution and added to each bin would be better also.)
Thank you in advance of any replies.
I'm just at the stage of planning my code, so I don't have anything to show. I was just thinking that there might be a more sophisticated way of generating the noise.
In terms out output, if I had 10 bins with the following values:
Bin 1: 1
Bin 2: 4
Bin 3: 9
Bin 4: 16
Bin 5: 25
Bin 6: 25
Bin 7: 16
Bin 8: 9
Bin 9: 4
Bin 10: 1
I just wondered if there was a pre-defined function that could add noise to give me something like:
Bin 1: 1.13
Bin 2: 4.21
Bin 3: 8.79
Bin 4: 16.08
Bin 5: 24.97
Bin 6: 25.14
Bin 7: 16.22
Bin 8: 8.90
Bin 9: 4.02
Bin 10: 0.91
If not, I will just go bin-by-bin and add a number selected from a gaussian distribution to each one.
Thank you.
It's actually a signal from a radio telescope that I am simulating. I want to be able to eventually choose the signal to noise ratio of my simulation.
You can generate a noise array, and add it to your signal
import numpy as np
noise = np.random.normal(0,1,100)
# 0 is the mean of the normal distribution you are choosing from
# 1 is the standard deviation of the normal distribution
# 100 is the number of elements you get in array noise
For those trying to make the connection between SNR and a normal random variable generated by numpy:
[1] , where it's important to keep in mind that P is average power.
Or in dB:
[2]
In this case, we already have a signal and we want to generate noise to give us a desired SNR.
While noise can come in different flavors depending on what you are modeling, a good start (especially for this radio telescope example) is Additive White Gaussian Noise (AWGN). As stated in the previous answers, to model AWGN you need to add a zero-mean gaussian random variable to your original signal. The variance of that random variable will affect the average noise power.
For a Gaussian random variable X, the average power , also known as the second moment, is
[3]
So for white noise, and the average power is then equal to the variance .
When modeling this in python, you can either
1. Calculate variance based on a desired SNR and a set of existing measurements, which would work if you expect your measurements to have fairly consistent amplitude values.
2. Alternatively, you could set noise power to a known level to match something like receiver noise. Receiver noise could be measured by pointing the telescope into free space and calculating average power.
Either way, it's important to make sure that you add noise to your signal and take averages in the linear space and not in dB units.
Here's some code to generate a signal and plot voltage, power in Watts, and power in dB:
# Signal Generation
# matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(1, 100, 1000)
x_volts = 10*np.sin(t/(2*np.pi))
plt.subplot(3,1,1)
plt.plot(t, x_volts)
plt.title('Signal')
plt.ylabel('Voltage (V)')
plt.xlabel('Time (s)')
plt.show()
x_watts = x_volts ** 2
plt.subplot(3,1,2)
plt.plot(t, x_watts)
plt.title('Signal Power')
plt.ylabel('Power (W)')
plt.xlabel('Time (s)')
plt.show()
x_db = 10 * np.log10(x_watts)
plt.subplot(3,1,3)
plt.plot(t, x_db)
plt.title('Signal Power in dB')
plt.ylabel('Power (dB)')
plt.xlabel('Time (s)')
plt.show()
Here's an example for adding AWGN based on a desired SNR:
# Adding noise using target SNR
# Set a target SNR
target_snr_db = 20
# Calculate signal power and convert to dB
sig_avg_watts = np.mean(x_watts)
sig_avg_db = 10 * np.log10(sig_avg_watts)
# Calculate noise according to [2] then convert to watts
noise_avg_db = sig_avg_db - target_snr_db
noise_avg_watts = 10 ** (noise_avg_db / 10)
# Generate an sample of white noise
mean_noise = 0
noise_volts = np.random.normal(mean_noise, np.sqrt(noise_avg_watts), len(x_watts))
# Noise up the original signal
y_volts = x_volts + noise_volts
# Plot signal with noise
plt.subplot(2,1,1)
plt.plot(t, y_volts)
plt.title('Signal with noise')
plt.ylabel('Voltage (V)')
plt.xlabel('Time (s)')
plt.show()
# Plot in dB
y_watts = y_volts ** 2
y_db = 10 * np.log10(y_watts)
plt.subplot(2,1,2)
plt.plot(t, 10* np.log10(y_volts**2))
plt.title('Signal with noise (dB)')
plt.ylabel('Power (dB)')
plt.xlabel('Time (s)')
plt.show()
And here's an example for adding AWGN based on a known noise power:
# Adding noise using a target noise power
# Set a target channel noise power to something very noisy
target_noise_db = 10
# Convert to linear Watt units
target_noise_watts = 10 ** (target_noise_db / 10)
# Generate noise samples
mean_noise = 0
noise_volts = np.random.normal(mean_noise, np.sqrt(target_noise_watts), len(x_watts))
# Noise up the original signal (again) and plot
y_volts = x_volts + noise_volts
# Plot signal with noise
plt.subplot(2,1,1)
plt.plot(t, y_volts)
plt.title('Signal with noise')
plt.ylabel('Voltage (V)')
plt.xlabel('Time (s)')
plt.show()
# Plot in dB
y_watts = y_volts ** 2
y_db = 10 * np.log10(y_watts)
plt.subplot(2,1,2)
plt.plot(t, 10* np.log10(y_volts**2))
plt.title('Signal with noise')
plt.ylabel('Power (dB)')
plt.xlabel('Time (s)')
plt.show()
... And for those who - like me - are very early in their numpy learning curve,
import numpy as np
pure = np.linspace(-1, 1, 100)
noise = np.random.normal(0, 1, 100)
signal = pure + noise
For those who want to add noise to a multi-dimensional dataset loaded within a pandas dataframe or even a numpy ndarray, here's an example:
import pandas as pd
# create a sample dataset with dimension (2,2)
# in your case you need to replace this with
# clean_signal = pd.read_csv("your_data.csv")
clean_signal = pd.DataFrame([[1,2],[3,4]], columns=list('AB'), dtype=float)
print(clean_signal)
"""
print output:
A B
0 1.0 2.0
1 3.0 4.0
"""
import numpy as np
mu, sigma = 0, 0.1
# creating a noise with the same dimension as the dataset (2,2)
noise = np.random.normal(mu, sigma, [2,2])
print(noise)
"""
print output:
array([[-0.11114313, 0.25927152],
[ 0.06701506, -0.09364186]])
"""
signal = clean_signal + noise
print(signal)
"""
print output:
A B
0 0.888857 2.259272
1 3.067015 3.906358
"""
AWGN Similar to Matlab Function
def awgn(sinal):
regsnr=54
sigpower=sum([math.pow(abs(sinal[i]),2) for i in range(len(sinal))])
sigpower=sigpower/len(sinal)
noisepower=sigpower/(math.pow(10,regsnr/10))
noise=math.sqrt(noisepower)*(np.random.uniform(-1,1,size=len(sinal)))
return noise
In real life you wish to simulate a signal with white noise. You should add to your signal random points that have Normal Gaussian distribution. If we speak about a device that have sensitivity given in unit/SQRT(Hz) then you need to devise standard deviation of your points from it. Here I give function "white_noise" that does this for you, an the rest of a code is demonstration and check if it does what it should.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
"""
parameters:
rhp - spectral noise density unit/SQRT(Hz)
sr - sample rate
n - no of points
mu - mean value, optional
returns:
n points of noise signal with spectral noise density of rho
"""
def white_noise(rho, sr, n, mu=0):
sigma = rho * np.sqrt(sr/2)
noise = np.random.normal(mu, sigma, n)
return noise
rho = 1
sr = 1000
n = 1000
period = n/sr
time = np.linspace(0, period, n)
signal_pure = 100*np.sin(2*np.pi*13*time)
noise = white_noise(rho, sr, n)
signal_with_noise = signal_pure + noise
f, psd = signal.periodogram(signal_with_noise, sr)
print("Mean spectral noise density = ",np.sqrt(np.mean(psd[50:])), "arb.u/SQRT(Hz)")
plt.plot(time, signal_with_noise)
plt.plot(time, signal_pure)
plt.xlabel("time (s)")
plt.ylabel("signal (arb.u.)")
plt.show()
plt.semilogy(f[1:], np.sqrt(psd[1:]))
plt.xlabel("frequency (Hz)")
plt.ylabel("psd (arb.u./SQRT(Hz))")
#plt.axvline(13, ls="dashed", color="g")
plt.axhline(rho, ls="dashed", color="r")
plt.show()
Awesome answers from Akavall and Noel (that's what worked for me). Also, I saw some comments about different distributions. A solution that I also tried was to make test over my variable and find what distribution it was closer.
numpy.random
has different distributions that can be used, it can be seen in its documentation:
documentation numpy.random
As an example from a different distribution (example referenced from Noel's answer):
import numpy as np
pure = np.linspace(-1, 1, 100)
noise = np.random.lognormal(0, 1, 100)
signal = pure + noise
print(pure[:10])
print(signal[:10])
I hope this can help someone looking for this specific branch from the original question.
You can try this:
import numpy as np
x = np.arange(-5.0, 5.0, 0.1)
y = np.power(x,2)
noise = 2 * np.random.normal(size=x.size)
ydata = y + noise
plt.plot(x, ydata, 'bo')
plt.plot(x,y, 'r')
plt.ylabel('y data')
plt.xlabel('x data')
plt.show()
Awesome answers above. I recently had a need to generate simulated data and this is what I landed up using. Sharing in-case helpful to others as well,
import logging
__name__ = "DataSimulator"
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
import numpy as np
import pandas as pd
def generate_simulated_data(add_anomalies:bool=True, random_state:int=42):
rnd_state = np.random.RandomState(random_state)
time = np.linspace(0, 200, num=2000)
pure = 20*np.sin(time/(2*np.pi))
# concatenate on the second axis; this will allow us to mix different data
# distribution
data = np.c_[pure]
mu = np.mean(data)
sd = np.std(data)
logger.info(f"Data shape : {data.shape}. mu: {mu} with sd: {sd}")
data_df = pd.DataFrame(data, columns=['Value'])
data_df['Index'] = data_df.index.values
# Adding gaussian jitter
jitter = 0.3*rnd_state.normal(mu, sd, size=data_df.shape[0])
data_df['with_jitter'] = data_df['Value'] + jitter
index_further_away = None
if add_anomalies:
# As per the 68-95-99.7 rule(also known as the empirical rule) mu+-2*sd
# covers 95.4% of the dataset.
# Since, anomalies are considered to be rare and typically within the
# 5-10% of the data; this filtering
# technique might work
#for us(https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)
indexes_furhter_away = np.where(np.abs(data_df['with_jitter']) > (mu +
2*sd))[0]
logger.info(f"Number of points further away :
{len(indexes_furhter_away)}. Indexes: {indexes_furhter_away}")
# Generate a point uniformly and embed it into the dataset
random = rnd_state.uniform(0, 5, 1)
data_df.loc[indexes_furhter_away, 'with_jitter'] +=
random*data_df.loc[indexes_furhter_away, 'with_jitter']
return data_df, indexes_furhter_away
Related
I have data distribution that I want to fit Poisson distribution to it. my data looks like that:
I try to fit :
mu = herd_size["COW_NUM"].mean()
ax=sns.displot(data=herd_size["COW_NUM"], kde=True)
ax.set(xlabel='Size',title='Herd size distribution & poisson distribution')
plt.plot(np.arange(0, 2000, 80), [st.poisson.pmf(np.arange(i, i+80), mu).sum()*len(herd_size["COW_NUM"])
for i in np.arange(0, 2000, 80)], color='red')
#every bin contain approximatly 80 observes
plt.show()
but I get something not at the same scale:
UPDATE
I try to apply negative binom distribution with the code:
n=len(herd_size["COW_NUM"])
p =herd_size["COW_NUM"].mean()/(herd_size["COW_NUM"].mean()+2)
ax=sns.displot(data=herd_size["COW_NUM"], kde=True)
ax.set(xlabel='Size',title='Herd size distribution & geometry distribution')
plt.plot(np.arange(0, 2000, 80), [st.nbinom.pmf(np.arange(i, i+80), n,p).sum()*len(herd_size["COW_NUM"])
for i in np.arange(0, 2000, 80)], color='red')
#every bin contain approximatly 80 observes
plt.show()
but I got this:
nbinom
For what you need to plot, might be easier to provide the bins to make your histogram:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import poisson
herd_size = pd.DataFrame({'COW_NUM':np.random.poisson(200,2000)})
binwidth = 10
xstart = 150
xend = 280
bins = np.arange(xstart,xend,binwidth)
o = sns.histplot(data=herd_size["COW_NUM"], kde=True,bins = bins)
Then calculate your mean and total number:
mu = herd_size["COW_NUM"].mean()
n = len(herd_size)
The expected frequency is the difference of the start and end of cdf on your left and right intervals:
plt.plot(bins + binwidth/2 , n*(poisson.cdf(bins+binwidth,mu) - poisson.cdf(bins,mu)), color='red')
Your data is overdispersed, because for a poisson you don't expect data to be so spread. so what you need to do is to use a gamma or a negative binomial to fit it, for example:
from scipy.stats import nbinom
herd_size = pd.DataFrame({'COW_NUM':nbinom.rvs(n=2,p=0.1,loc=240,size=2000)})
binwidth = 50
xstart = 0
xend = 2000
bins = np.arange(xstart,xend,binwidth)
herd_size = pd.DataFrame({'COW_NUM':nbinom.rvs(n=1,p=0.004,size=2000)})
Var = herd_size["COW_NUM"].var()
mu = herd_size["COW_NUM"].mean()
p = (mu/Var)
r = mu**2 / (Var-mu)
n = len(herd_size)
o = sns.histplot(data=herd_size["COW_NUM"], kde=True,bins=bins)
plt.plot(bins + binwidth/2 ,
n*(nbinom.cdf(bins+binwidth,r,p) - nbinom.cdf(bins,r,p)),
color='red')
Your plot is (at least approximately) correct, the problem is with modeling your data as Poisson. As lambda grows large the Poisson looks more and more like a normal distribution — see this plot from Wikipedia. A Poisson distribution has its variance equal to its mean, so with a mean of around ~240 you have a standard deviation of ~15.5. The net result is that outcomes for a Poisson(240) should overwhelmingly fall between 210 and 270, which is what your red plot shows. Try fitting a different distribution to your data.
I just spotted StupidWolf's answer. Other than using a mean of 200 rather than 240, his histogram shows the same behavior described above.
I'm completely new to Python. Could someone show me how can I write a random number generator which samples from the Levy Distribution? I've written the function for the distribution, but I'm confused about how to proceed further!
The random numbers generated by this distribution I want to use them to simulate a 2D random walk.
I'm aware that from scipy.stats I can use the Levy class, but I want to write the sampler myself.
import numpy as np
import matplotlib.pyplot as plt
# Levy distribution
"""
f(x) = 1/(2*pi*x^3)^(1/2) exp(-1/2x)
"""
def levy(x):
return 1 / np.sqrt(2*np.pi*x**3) * np.exp(-1/(2*x))
N = 50
foo = levy(N)
#pjs code looks ok to me, but there is a discrepancy between his code and what SciPy thinks about Levy - basically, sampling is different from PDF.
Code, Python 3.8 Windows 10 x64
import numpy as np
from scipy.stats import levy
from scipy.stats import norm
import matplotlib.pyplot as plt
rng = np.random.default_rng(312345)
# Arguments
# u: a uniform[0,1) random number
# c: scale parameter for Levy distribution (defaults to 1)
# mu: location parameter (offset) for Levy (defaults to 0)
def my_levy(u, c = 1.0, mu = 0.0):
return mu + c / (2.0 * (norm.ppf(1.0 - u))**2)
fig, ax = plt.subplots()
rnge=(0, 20.0)
x = np.linspace(rnge[0], rnge[1], 1001)
N = 200000
q = np.empty(N)
for k in range(0, N):
u = rng.random()
q[k] = my_levy(u)
nrm = levy.cdf(rnge[1])
ax.plot(x, levy.pdf(x)/nrm, 'r-', lw=5, alpha=0.6, label='levy pdf')
ax.hist(q, bins=100, range=rnge, density=True, alpha=0.2)
plt.show()
produce graph
UPDATE
Well, I tried to use home-made PDF, same output, same problem
# replace levy.pdf(x) with PDF(x)
def PDF(x):
return np.where(x <= 0.0, 0.0, 1.0 / np.sqrt(2*np.pi*x**3) * np.exp(-1./(2.*x)))
UPDATE II
After applying #pjs corrected sampling routine, sampling and PDF are aligned perfectly. New graph
Here's a straightforward implementation of the generating algorithm for the Levy distribution found on Wikipedia:
import random
from scipy.stats import norm
# Arguments
# u: a uniform[0,1) random number
# c: scale parameter for Levy distribution (defaults to 1)
# mu: location parameter (offset) for Levy (defaults to 0)
def my_levy(u, c = 1.0, mu = 0.0):
return mu + c / (2 * norm.ppf(1.0 - u)**2)
# Generate a handful of samples
for _ in range(10):
print(my_levy(random.random()))
I don't normally use Python, so please suggest improvements.
ADDENDUM
Kudos to Severin Pappadeux for the work in his response. I had already noted that a simpler answer would be to take the inverse of a squared Gaussian, but Advaita had asked for an explicit function of U ~ Uniform(0,1) so I didn't pursue that. It turns out that I should have. The Wikipedia cite mentions that, but without the scale factor of 2 in the denominator. When I take the 2 out of the implementation of Wikipedia's generating algorithm, i.e. change the implemention to
def my_levy(u, c = 1.0, mu = 0.0):
return mu + c / (norm.ppf(1.0 - u)**2)
the resulting histogram aligns beautifully with the normalized plot of the pdf. (Note - I've now also edited the incorrect Wikipedia entry to correct the formula.)
Problem
In a paper I am reading now, it defines a new metric and authors claim some advantages over previous metrics. They verify their claim by some synthetic data, which looks like following
The implementation of their metric is pretty straightforward. However, I am not sure how they create this kind of synthetic data.
What I Have Done
This looks like Gaussian where x is only within certain intervals, I tried with following code but did not get anything similar to the plot presented in the paper.
import numpy as np
def generate_gaussian(size=1000, lb=-0.1, up=0.1):
data = np.random.randn(5000)
data = data[(data <= up) & (data >= lb)][:size]
return data
np.random.seed(1234)
base = generate_gaussian()
background_pos = base + 0.3
background_neg = base + 0.7
Now I am wondering if the authors create these data using some special distribution (other than Gaussian) I do not know?
Numpy has a numpy.random.normal that draws random samples from a normal (Gaussian) distribution.
import numpy as np
import matplotlib.pyplot as plt
sigma = 0.05
s0 = np.random.normal(0.2, sigma, 5000)
s1 = np.random.normal(0.6, sigma, 5000)
plt.hist(s0, 300, density=True, color="b")
plt.hist(s1, 300, density=True, color="r")
plt.xlim(0, 1)
plt.show()
You can change the values of the mu (mean) and sigma to alter the distributions
mu = 0.55
sigma = 0.1
dist = np.random.normal(mu, sigma, 5000)
You have cut off the data at +/- 0.1. A normalised Gausian distribution only 'looks Gaussian' if you look over the range approximately +/- 3. Try this:
import numpy as np
def generate_gaussian(size=1000, lb=-3, up=3):
data = np.random.randn(5000)
data = data[(data <= up) & (data >= lb)][:size]
return data
np.random.seed(1234)
base = generate_gaussian()
background_pos = base + 5
background_neg = base + 15
You can use scipy.stats.norm (info).
import libraries
>>> from scipy.stats import norm
>>> from matplotlib import pyplot
plot
>>> pyplot.hist(norm.rvs(loc=1, scale=0.5, size=10000), bins=30, alpha=0.5, label='norm_1')
>>> pyplot.hist(norm.rvs(loc=5, scale=0.5, size=10000), bins=30, alpha=0.5, label='norm_2')
>>> pyplot.legend()
>>> pyplot.show()
Clarification:
A normal distribution is defined by mean (loc, distribution center) and standard distribution (scale, measure of distribution dispersion or width). rvs generates random samples of the desired normal distribution of size size. For example next code generates 4 random elements of a normal distribution (mean = 1, SD = 1).
>>> norm.rvs(loc=1, scale=1, size=4)
array([ 0.52154255, 1.40873701, 1.55959291, -0.01730568])
I am trying to plot normal distribution curve using Python. First I did it manually by using the normal probability density function and then I found there's an exiting function pdf in scipy under stats module. However, the results I get are quite different.
Below is the example that I tried:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
mean = 5
std_dev = 2
num_dist = 50
# Draw random samples from a normal (Gaussion) distribution
normalDist_dataset = np.random.normal(mean, std_dev, num_dist)
# Sort these values.
normalDist_dataset = sorted(normalDist_dataset)
# Create the bins and histogram
plt.figure(figsize=(15,7))
count, bins, ignored = plt.hist(normalDist_dataset, num_dist, density=True)
new_mean = np.mean(normalDist_dataset)
new_std = np.std(normalDist_dataset)
normal_curve1 = stats.norm.pdf(normalDist_dataset, new_mean, new_std)
normal_curve2 = (1/(new_std *np.sqrt(2*np.pi))) * (np.exp(-(bins - new_mean)**2 / (2 * new_std**2)))
plt.plot(normalDist_dataset, normal_curve1, linewidth=4, linestyle='dashed')
plt.plot(bins, normal_curve2, linewidth=4, color='y')
The result shows how the two curves I get are very different from each other.
My guess is that it is has something to do with bins or pdf behaves differently than usual formula. I have used the same and new mean and standard deviation for both the plots. So, how do I change my code to match what stats.norm.pdf is doing?
I don't know yet which curve is correct.
Function plot simply connects the dots with line segments. Your bins do not have enough dots to show a smooth curve. Possible solution:
....
normal_curve1 = stats.norm.pdf(normalDist_dataset, new_mean, new_std)
bins = normalDist_dataset # Add this line
normal_curve2 = (1/(new_std *np.sqrt(2*np.pi))) * (np.exp(-(bins - new_mean)**2 / (2 * new_std**2)))
....
So after my two last questions I come to my actual problem. Maybe somebody finds the error in my theoretical procedure or I did something wrong in programming.
I am implementing a bandpass filter in Python using scipy.signal (using the firwin function). My original signal consists of two frequencies (w_1=600Hz, w_2=800Hz). There might be a lot more frequencies that's why I need a bandpass filter.
In this case I want to filter the frequency band around 600 Hz, so I took 600 +/- 20Hz as cutoff frequencies. When I implemented the filter and reproduce the signal in the time domain using lfilter the right frequency is filtered.
To get rid of the phase shift I plotted the frequency response by using scipy.signal.freqz with the return h of firwin as numerator and 1 as predefined denumerator.
As described in the documentation of freqz I plotted the phase (== angle in the doc) as well and was able to look at the frequency response plot to get the phase shift for the frequency 600 Hz of the filtered signal.
So the phase delay t_p is
t_p=-(Tetha(w))/(w)
Unfortunately when I add this phase delay to the time data of my filtered signal, it has not got the same phase as the original 600 Hz signal.
I added the code. It is weird, before eliminating some part of the code to keep the minimum, the filtered signal started at the correct amplitude - now it is even worse.
################################################################################
#
# Filtering test
#
################################################################################
#
from math import *
import numpy as np
from scipy import signal
from scipy.signal import firwin, lfilter, lti
from scipy.signal import freqz
import matplotlib.pyplot as plt
import matplotlib.colors as colors
################################################################################
# Nb of frequencies in the original signal
nfrq = 2
F = [60,80]
################################################################################
# Sampling:
nitper = 16
nper = 50.
fmin = np.min(F)
fmax = np.max(F)
T0 = 1./fmin
dt = 1./fmax/nitper
#sampling frequency
fs = 1./dt
nyq_rate= fs/2
nitpermin = nitper*fmax/fmin
Nit = int(nper*nitpermin+1)
tps = np.linspace(0.,nper*T0,Nit)
dtf = fs/Nit
################################################################################
# Build analytic signal
# s = completeSignal(F,Nit,tps)
scomplete = np.zeros((Nit))
omg1 = 2.*pi*F[0]
omg2 = 2.*pi*F[1]
scomplete=scomplete+np.sin(omg1*tps)+np.sin(omg2*tps)
#ssingle = singleSignals(nfrq,F,Nit,tps)
ssingle=np.zeros((nfrq,Nit))
ssingle[0,:]=ssingle[0,:]+np.sin(omg1*tps)
ssingle[1,:]=ssingle[0,:]+np.sin(omg2*tps)
################################################################################
## Construction of the desired bandpass filter
lowcut = (60-2) # desired cutoff frequencies
highcut = (60+2)
ntaps = 451 # the higher and closer the signal frequencies, the more taps for the filter are required
taps_hamming = firwin(ntaps,[lowcut/nyq_rate, highcut/nyq_rate], pass_zero=False)
# Use lfilter to get the filtered signal
filtered_signal = lfilter(taps_hamming, 1, scomplete)
# The phase delay of the filtered signal
delay = ((ntaps-1)/2)/fs
plt.figure(1, figsize=(12, 9))
# Plot the signals
plt.plot(tps, scomplete,label="Original signal with %s freq" % nfrq)
plt.plot(tps-delay, filtered_signal,label="Filtered signal %s freq " % F[0])
plt.plot(tps, ssingle[0,:],label="original signal %s Hz" % F[0])
plt.grid(True)
plt.legend()
plt.xlim(0,1)
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
# Plot the frequency responses of the filter.
plt.figure(2, figsize=(12, 9))
plt.clf()
# First plot the desired ideal response as a green(ish) rectangle.
rect = plt.Rectangle((lowcut, 0), highcut - lowcut, 5.0,facecolor="#60ff60", alpha=0.2,label="ideal filter")
plt.gca().add_patch(rect)
# actual filter
w, h = freqz(taps_hamming, 1, worN=1000)
plt.plot((fs * 0.5 / np.pi) * w, abs(h), label="designed rectangular window filter")
plt.xlim(0,2*F[1])
plt.ylim(0, 1)
plt.grid(True)
plt.legend()
plt.xlabel('Frequency (Hz)')
plt.ylabel('Gain')
plt.title('Frequency response of FIR filter, %d taps' % ntaps)
plt.show()'
The delay of your FIR filter is simply 0.5*(n - 1)/fs, where n is the number of filter coefficients (i.e. "taps") and fs is the sample rate. Your implementation of this delay is fine.
The problem is that your array of time values tps is not correct. Take a look
at 1.0/(tps[1] - tps[0]); you'll see that it does not equal fs.
Change this:
tps = np.linspace(0.,nper*T0,Nit)
to, for example, this:
T = Nit / fs
tps = np.linspace(0., T, Nit, endpoint=False)
and your plots of the original and filtered 60 Hz signals will line up beautifully.
For another example, see http://wiki.scipy.org/Cookbook/FIRFilter.
In the script there, the delay is calculated on line 86. Below this, the delay is used to plot the original signal aligned with the filtered signal.
Note: The cookbook example uses scipy.signal.lfilter to apply the filter. A more efficient approach is to use numpy.convolve.
Seems like you may have had this answered already, but I believe that this is what the filtfilt function is used for. Basically, it does both a forward sweep and a backward sweep through your data, thus reversing the phase shift introduced by the initial filtering. Might be worth looking into.