How to discretize a signal? - python

If I have a function like below:
G(s)= C/(s-p) where s=jw, c and p are constant number.
Also, the available frequency is wa= 100000 rad/s. How can I discretize the signal at ∆w = 0.0001wa in Python?

Use numpy.arange to accomplish this:
import numpy as np
wa = 100000
# np.arange will generate every discrete value given the start, end and the step value
discrete_wa = np.arange(0, wa, 0.0001*wa)
# lets say you have previously defined your function
g_s = [your_function(value) for value in discrete_wa]

Related

Storing Values from One Array into Another Larger Array

I am trying to create a range of signals of different frequencies. I am finding it difficult to store amplitude vs time into another storage matrix for each frequency ranging from 0 to 50 Hz. Example, for a frequency of 20 Hz, I want to store the amplitude vs time for that frequency, then for 21 Hz I want to store the amplitude vs time for that frequency etc, until I have all of them in a large matrix. I am getting so confused at this point with indexing and syntax, any help welcome!
import numpy as np
max_freq = 50
s_frequency = np.arange(0,51,0.1)
fs = 200
time = np.arange(0,5-(1/fs),(1/fs))
x = np.empty((len(time)), dtype=np.float32)
i = 0
j = 0
full_array = np.empty((len(s_frequency),len(time),len(time)), dtype=np.float32)
amplitude = np.zeros(999)
for f1 in s_frequency:
i = 0
for t in time:
amplitude[i] = np.sin(2*np.pi*f1*t)
i = i + 1
full_array[i] = ([time], [amplitude])
I have also tried the following:
import numpy as np
max_freq = 50
s_frequency = np.arange(0,50.1,0.1)
fs = 200
time = np.arange(0,5-(1-fs),(1/fs))
#full_array = np.sin(2*np.pi*np.outer(s_frequency,time))
full_array = np.empty((len(s_frequency),len(time), len(time)), dtype=np.float32)
for f1 in s_frequency:
array = []
for i, t in enumerate(time):
amplitude = np.sin(2*np.pi*f1*t)
array.insert(i,amplitude)
full_array[i] = [time, array]
Not 100% sure what you're trying to do, but it seems like you're trying to initialize a 2-dimensional grid (i.e. a matrix) where you have a dimension for time and one for frequency. Here is what I would do:
import numpy as np
max_freq = 50
s_frequency = np.arange(0,51,0.1)
fs = 200
time = np.arange(0,5-(1/fs),(1/fs))
full_array = np.sin(2*np.pi*np.outer(s_frequency,time))
No explicit for-loops or index handling needed. np.outer() will give you a 2D grid (i.e. a matrix) of frequency versus time. Now whats left is to compute the sine of 2 Pi times that grid value. Very conveniently numpy functions do accept arrays as input, thus we can simply call np.sin(2*np.pi*np.outer(s_frequency,time).
Not sure what x and j are good for in your code and why full_array should be 3-diemsional. Would you like to include a spatial component as well?
By the way, a construct like this:
i = 0
for t in time:
amplitude[i] = np.sin(2*np.pi*f1*t)
i = i + 1
can easily be avoided in python, thanks to pythons build-in enumerate() function. It would then look like this:
for i, t in enumerate(time):
amplitude[i] = np.sin(2*np.pi*f1*t)
which does essentially the same, but you don't have to explicitly create the index i = 0 and manually incerement it in every iteration i = i + 1.

R `summary` function closest equivalent in python

Is there any kind of helper to find the min, max (and ideally standard deviation) of each dimension in a multidimensional array within numpy? I'm looking for something like the summary() function in R.
My data is essentially a huge 2D array (list of lists), in which the sublists contain n dimensional values. E.g. currently I have data with 3 dimensional attributes x,y,z:
a = np.random.rand(100,3)
For each of those dimensions (x,y,z) I want to know the min, max, mean, and std.
I know one can loop through the axes and measure these values, e.g.:
for i in range(a.shape[-1]):
vals = a[:,i]
print(np.min(vals), np.max(vals), np.std(vals))
I find myself writing the code to do that almost every time I have a new dataset. Any way to expedite this operation would be hugely helpful!
Without pandas:
from scipy import stats
import numpy as np
a = np.random.rand(100,3)
summary = stats.describe(a, axis = 0)
print(summary.mean)
print(summary.minmax)
...
Using pandas:
import pandas as pd
summary_across_rows = pd.DataFrame(a).describe() # across axis=0
print(summary)
0 1 2
count 100.000000 100.000000 100.000000
mean 0.495204 0.573827 0.476202
std 0.275131 0.246189 0.271626
min 0.005202 0.037195 0.023595
25% 0.295210 0.399358 0.258712
50% 0.512023 0.562181 0.417322
75% 0.710216 0.790970 0.712047
max 0.998371 0.997717 0.980840
Note: for the summary across the other dimension you need:
summary_across_columns = pd.DataFrame(a.T).describe() # across axis=1
Without pandas:
from scipy import stats
stats.describe(lst)
stats.scoreatpercentile(lst,(5,10,50,90,95))
Here is an example:
from scipy import stats
import numpy as np
stdev = 10
mu = 10
a=stdev*np.random.randn(100)+mu
stats.describe(a)
[OUT1]: DescribeResult(nobs=100, minmax=(-13.180682481878286, 40.6109521437826), mean=10.352380786199149, variance=103.27168865119998, skewness=0.13852516641657087, kurtosis=0.2691915766145532)
stats.scoreatpercentile(a,(5,10,50,90,95))
[OUT2]: array([-7.21731609, -3.22696662, 10.39364637, 21.78527621, 24.20685179])

Is there a way to get Pandas ewm to function on fixed windows?

I am trying to use Pandas ewm function to calculating exponentially weighted moving averages. However i've noticed that information seems to carry through your entire time series. What this means is that every data point's MA is dependant on a different number of previous data points. Therefore the ewm function at every data point is mathematically different.
I think some here had a similar question
Does Pandas calculate ewm wrong?
But i did try their method, and i am not getting functionality i want.
def EMA(arr, window):
sma = arr.rolling(window=window, min_periods=window).mean()[:window]
rest = arr[window:]
return pd.concat([sma, rest]).ewm(com=window, adjust=False).mean()
a = pd.DataFrame([x for x in range(100)])
print(list(EMA(a, 10)[0])[-1])
print(list(EMA(a[50:], 10)[0])[-1])
In this example, i have an array of 1 through 100. I calculate moving averages on this array, and array of 50-100. The last moving average should be the same, since i am using only a window of 10. But when i run this code i get two different values, indicating that ewm is indeed dependent on the entire series.
IIUC, you are asking for ewm in a rolling window, which means, every 10 rows return a single number. If that is the case, then we can use a stride trick:
Edit: update function works on series only
def EMA(arr, window=10, alpha=0.5):
ret = pd.Series(index=arr.index, name=arr.name)
arr=np.array(arr)
l = len(arr)
stride = arr.strides[0]
ret.iloc[window-1:] = (pd.DataFrame(np.lib.stride_tricks.as_strided(arr,
(l-window+1,window),
(stride,stride)))
.T.ewm(alpha)
.mean()
.iloc[-1]
.values
)
return ret
Test:
a = pd.Series([x for x in range(100)])
EMA(a).tail(2)
# 98 97.500169
# 99 98.500169
# Name: 9, dtype: float64
EMA(a[:50]).tail(2)
# 98 97.500169
# 99 98.500169
# Name: 9, dtype: float64
EMA(a, 2).tail(2)
98 97.75
99 98.75
dtype: float64
Test on random data:
a = pd.Series(np.random.uniform(0,1,10000))
fig, ax = plt.subplots(figsize=(12,6))
a.plot(ax=ax)
EMA(a,alpha=0.99, window=2).plot(ax=ax)
EMA(a,alpha=0.99, window=1500).plot(ax=ax)
plt.show()
Output: we can see that the larger window (green) is less volatile than the smaller window (orange).
This can be achieved by working with the formula for exponential smoothing by cancelling the lagged terms. The formula can be found on the ewm page.
The following code demonstrates that no memory is left after adjustment. For every point, the fixed window of information used is L=1000. And the factor f should be included if one desires to have the equivalent for the adjust=True version (for adjust=False just get rid of the f factor).
srs1=pd.Series(np.random.normal(size=100000))
alpha=0.02
em1=srs1.ewm(alpha=alpha,adjust=False).mean()
L=1000
f=1-(1-alpha)**np.clip(np.arange(em1.shape[0]),0,L)
em1_=(em1-em1.shift(L)*(1-alpha)**L)/f
S=1001
em2=srs1[S:].ewm(alpha=alpha,adjust=False).mean()
f=1-(1-alpha)**np.clip(np.arange(em2.shape[0]),0,L)
em2_=(em2-em2.shift(L)*(1-alpha)**L)/f
print((em2_[:10000]-em1_[S:S+10000]).abs().max())
This seems to be possible in pandas 1.5 with a mix of rolling, and win_type:
pd.Series.rolling(window=10, win_type='exponential').mean(tau=0.5, center=10, sym=False)
I use a non symetric exponential window centered at the same size of the window in order to have a exponential function decaying towards the past.
This yields the same results as the EMA function provided by Quang Hoang.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def EMA(arr, window=10, alpha=0.5):
ret = pd.Series(index=arr.index, name=arr.name, dtype='float64')
arr=np.array(arr)
l = len(arr)
stride = arr.strides[0]
ret.iloc[window-1:] = (pd.DataFrame(np.lib.stride_tricks.as_strided(arr,
(l-window+1,window),
(stride,stride)))
.T.ewm(alpha)
.mean()
.iloc[-1]
.values
)
return ret
a = pd.Series([x for x in range(100)])
custom=EMA(a)
builtin= a.rolling(window=10, win_type='exponential').mean(tau=0.5, center=10, sym=False)
custom=custom.plot.line(label="Custom EMA")
builtin.plot.line(label="Built-in EMA")
plt.legend()

Compute rolling z-score in pandas dataframe

Is there a open source function to compute moving z-score like https://turi.com/products/create/docs/generated/graphlab.toolkits.anomaly_detection.moving_zscore.create.html. I have access to pandas rolling_std for computing std, but want to see if it can be extended to compute rolling z scores.
rolling.apply with a custom function is significantly slower than using builtin rolling functions (such as mean and std). Therefore, compute the rolling z-score from the rolling mean and rolling std:
def zscore(x, window):
r = x.rolling(window=window)
m = r.mean().shift(1)
s = r.std(ddof=0).shift(1)
z = (x-m)/s
return z
According to the definition given on this page the rolling z-score depends on the rolling mean and std just prior to the current point. The shift(1) is used above to achieve this effect.
Below, even for a small Series (of length 100), zscore is over 5x faster than using rolling.apply. Since rolling.apply(zscore_func) calls zscore_func once for each rolling window in essentially a Python loop, the advantage of using the Cythonized r.mean() and r.std() functions becomes even more apparent as the size of the loop increases.
Thus, as the length of the Series increases, the speed advantage of zscore increases.
In [58]: %timeit zscore(x, N)
1000 loops, best of 3: 903 µs per loop
In [59]: %timeit zscore_using_apply(x, N)
100 loops, best of 3: 4.84 ms per loop
This is the setup used for the benchmark:
import numpy as np
import pandas as pd
np.random.seed(2017)
def zscore(x, window):
r = x.rolling(window=window)
m = r.mean().shift(1)
s = r.std(ddof=0).shift(1)
z = (x-m)/s
return z
def zscore_using_apply(x, window):
def zscore_func(x):
return (x[-1] - x[:-1].mean())/x[:-1].std(ddof=0)
return x.rolling(window=window+1).apply(zscore_func)
N = 5
x = pd.Series((np.random.random(100) - 0.5).cumsum())
result = zscore(x, N)
alt = zscore_using_apply(x, N)
assert not ((result - alt).abs() > 1e-8).any()
You should use native functions of pandas:
# Compute rolling zscore for column ="COL" and window=window
col_mean = df["COL"].rolling(window=window).mean()
col_std = df["COL"].rolling(window=window).std()
df["COL_ZSCORE"] = (df["COL"] - col_mean)/col_std
def zscore(arr, window):
x = arr.rolling(window = 1).mean()
u = arr.rolling(window = window).mean()
o = arr.rolling(window = window).std()
return (x-u)/o
df['zscore'] = zscore(df['value'],window)
Let us say you have a data frame called data, which looks like this:
enter image description here
then you run the following code,
data_zscore=data.apply(lambda x: (x-x.expanding().mean())/x.expanding().std())
enter image description here
Please note that the first row will always have NaN values as it doesn't have a standard deviation.
This can be solved in a single line of code. Given that s is the input series and wlen is the window length:
zscore = s.sub(s.rolling(wlen).mean()).div(s.rolling(wlen).std())
If you need to shift the mean and std it can still be done:
zscore = s.sub(s.rolling(wlen).mean().shift()).div(s.rolling(wlen).std().shift())

Moving average of an array in Python

I have an array where discreet sinewave values are recorded and stored. I want to find the max and min of the waveform. Since the sinewave data is recorded voltages using a DAQ, there will be some noise, so I want to do a weighted average. Assuming self.yArray contains my sinewave values, here is my code so far:
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
for y in range (0,filtersize):
summation = sum(self.yArray[x+y])
ave = summation/filtersize
filterarray.append(ave)
My issue seems to be in the second for loop, where depending on my averaging window size (filtersize), I want to sum up the values in the window to take the average of them. I receive an error saying:
summation = sum(self.yArray[x+y])
TypeError: 'float' object is not iterable
I am an EE with very little experience in programming, so any help would be greatly appreciated!
The other answers correctly describe your error, but this type of problem really calls out for using numpy. Numpy will run faster, be more memory efficient, and is more expressive and convenient for this type of problem. Here's an example:
import numpy as np
import matplotlib.pyplot as plt
# make a sine wave with noise
times = np.arange(0, 10*np.pi, .01)
noise = .1*np.random.ranf(len(times))
wfm = np.sin(times) + noise
# smoothing it with a running average in one line using a convolution
# using a convolution, you could also easily smooth with other filters
# like a Gaussian, etc.
n_ave = 20
smoothed = np.convolve(wfm, np.ones(n_ave)/n_ave, mode='same')
plt.plot(times, wfm, times, -.5+smoothed)
plt.show()
If you don't want to use numpy, it should also be noted that there's a logical error in your program that results in the TypeError. The problem is that in the line
summation = sum(self.yArray[x+y])
you're using sum within the loop where your also calculating the sum. So either you need to use sum without the loop, or loop through the array and add up all the elements, but not both (and it's doing both, ie, applying sum to the indexed array element, that leads to the error in the first place). That is, here are two solutions:
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
summation = sum(self.yArray[x:x+filtersize]) # sum over section of array
ave = summation/filtersize
filterarray.append(ave)
or
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
summation = 0.
for y in range (0,filtersize):
summation = self.yArray[x+y]
ave = summation/filtersize
filterarray.append(ave)
self.yArray[x+y] is returning a single item out of the self.yArray list. If you are trying to get a subset of the yArray, you can use the slice operator instead:
summation = sum(self.yArray[x:y])
to return an iterable that the sum builtin can use.
A bit more information about python slices can be found here (scroll down to the "Sequences" section): http://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy
You could use numpy, like:
import numpy
filtersize = 2
ysums = numpy.cumsum(numpy.array(self.yArray, dtype=float))
ylags = numpy.roll(ysums, filtersize)
ylags[0:filtersize] = 0.0
moving_avg = (ysums - ylags) / filtersize
Your original code attempts to call sum on the float value stored at yArray[x+y], where x+y is evaluating to some integer representing the index of that float value.
Try:
summation = sum(self.yArray[x:y])
Indeed numpy is the way to go. One of the nice features of python is list comprehensions, allowing you to do away with the typical nested for loop constructs. Here goes an example, for your particular problem...
import numpy as np
step=2
res=[np.sum(myarr[i:i+step],dtype=np.float)/step for i in range(len(myarr)-step+1)]

Categories