SciPy discrete cosine transform (DCT) power in non-existing frequencies

SciPy discrete cosine transform (DCT) power in non-existing frequencies - python

I try to transform a simple cosine signal using the Discrete Cosine Transform (DCT) scipy.fft.dct, however it seems there is an issue as there is power in frequencies that should not exist.
Suppose a domain from zero to one, both endpoints included, for the cosine function:
import numpy as np
x = np.linspace(0, 1, 8, endpoint = True)
f = np.cos(1 * np.pi * x)
This simple signal offers a single frequency, so I do expect significant powers only at a single frequency of the DCT:
import scipy.fft
f_FT = scipy.fft.dct(f, type = 1, norm = "ortho")
I select the DCT type I according to the Wikipedia classification (that is also referenced in SciPy's documentation) because the endpoints are included and the signal is even at both boundaries. But this yields as result:
array([ 3.35699888e-16, 2.09223516e+00, -1.48359792e-17, 2.21406462e-01,
-1.92867730e-16, 2.21406462e-01, 1.18687834e-16, 1.56558011e-01])
Thus, there is still significant energy in k=3pi, 5pi, 7pi (second and last column).
Am I doing something wrong? As written above, I expect only power at k=1pi. The Discrete Sine Transform (DST) does not offer this kind of problem - there, I find only power in frequencies that I generate.
Thank you in advance for your help.

Update - origin of problem found
I found that the problem is caused by norm = "ortho". Apparently, the library modifies the first and last point of the input signal before the transform (in the documentation this is indicated by "If norm='ortho', x[0] and x[N-1] are multiplied by a scaling factor of sqrt(2)") to make sure that Parseval's theorem still holds afterwards. However, then the power in the different modes do not correspond any more to the original signal.
Solution
This modification of the original signal is confusing and I propose the following to anybody who also wants Parseval's theorem to hold while still knowing in which modes the original input signal has power:
f_DCT = scipy.fft.dct(f, type = 1, norm = "backward")
# Apply manual normalisation similar to: norm = "ortho"
# See documentation of SciPy DCT (y[k]).
scaling_factors = np.zeros(np.shape(f_DCT))
scaling_factors[1:-1] = 0.5 * (np.sqrt(2) / np.sqrt(len(f_DCT) -1))
scaling_factors[0] = 0.5 * (1 / np.sqrt(len(f_DCT) -1))
scaling_factors[-1] = 0.5 * (1 / np.sqrt(len(f_DCT) -1))
f_DCT = f_DCT * scaling_factors
del scaling_factors
# Now f_DCT is scaled as expected for norm = "ortho"
# To check Parseval's theorem, one must scale the weight of the first
# and last data point because of the specific type I of the DCT.
# See documentation of SciPy DCT (x[0], x[N-1]).
scaling_factors = np.ones(np.shape(f))
scaling_factors[0] = 1 / np.sqrt(2)
scaling_factors[-1] = 1 / np.sqrt(2)
# Compute the signal weighted properly for checking Parseval's theorem (PT).
f_PT = f * scaling_factors
del scaling_factors
# Note that there is now only energy in one single mode (at k=1pi):
array([ 2.93737402e-16, 1.87082869e+00, -2.31984713e-17, 1.78912008e-16,
-1.78912008e-16, 2.31984713e-17, 0.00000000e+00, -1.25887458e-16])
# Also, Parseval's theorem holds:
np.sum(f_PT * f_PT) # 3.499999999999999
np.sum(f_DCT * f_DCT) # 3.499999999999999

Related

Integrate a 2D vectorfield-array (reversing np.gradient)

i have the following problem:
I want to integrate a 2D array, so basically reversing a gradient operator.
Assuming i have a very simple array as follows:
shape = (60, 60)
sampling = 1
k_mesh = np.meshgrid(np.fft.fftfreq(shape[0], sampling), np.fft.fftfreq(shape[1], sampling))
Then i construct my vectorfield as a complex-valued arreay (x-vector = real part, y-vector = imaginary part):
k = k_mesh[0] + 1j * k_mesh[1]
So the real part for example looks like this
Now i take the gradient:
k_grad = np.gradient(k, sampling)
I then use Fourier transforms to reverse it, using the following function:
def freq_array(shape, sampling):
f_freq_1d_y = np.fft.fftfreq(shape[0], sampling[0])
f_freq_1d_x = np.fft.fftfreq(shape[1], sampling[1])
f_freq_mesh = np.meshgrid(f_freq_1d_x, f_freq_1d_y)
f_freq = np.hypot(f_freq_mesh[0], f_freq_mesh[1])
return f_freq
def int_2d_fourier(arr, sampling):
freqs = freq_array(arr.shape, sampling)
k_sq = np.where(freqs != 0, freqs**2, 0.0001)
k = np.meshgrid(np.fft.fftfreq(arr.shape[0], sampling), np.fft.fftfreq(arr.shape[1], sampling))
v_int_x = np.real(np.fft.ifft2((np.fft.fft2(arr[1]) * k[0]) / (2*np.pi * 1j * k_sq)))
v_int_y = np.real(np.fft.ifft2((np.fft.fft2(arr[0]) * k[0]) / (2*np.pi * 1j * k_sq)))
v_int_fs = v_int_x + v_int_y
return v_int_fs
k_int = int_2d_fourier(k, sampling)
Unfortunately, the result is not very accurate at the position where k has an abrupt change, as can be seen in the plot below, which displayes a horizontal line profile of k and k_int.
Any ideas how to improve the accuracy? Is there a way to make it exactly the same?

I actually found a solution. The integration itself yields very accurate results.
However, the gradient function from numpy calculates second order accurate central differences, which means that the gradient itself already is an approximation.
When you replace the problem above with an analytical formula such as a 2D Gaussian, one can calculate the derivative analytically. When integrating this analytically derived function, the error is on the order of 10^-10 (depending on the width of the Gaussian, which can lead to aliasing effects).
So long story short: The integration function proposed above works as intended!

compare two time series (simulation results)

I want to do unit testing of simulation models and for that, I run a simulation once and store the results (a time series) as reference in a csv file (see an example here). Now when I change my model, I run the simulation again, store the new reults as a csv file as well and then I compare the results.
The results are usually not 100% identical, an example plot is shown below:
The reference results are plotted in black and the new results are plotted in green.
The difference of the two is plotted in the second plot, in blue.
As can be seen, at a step the difference can become arbitrarily high, while everywhere else the difference is almost zero.
Therefore, I would prefer to use a different algorithms for comparison than just subtracting the two, but I can only describe my idea graphically:
When plotting the reference line twice, first in a light color with a high line width and then again in a dark color and a small line width, then it will look like it has a pink tube around the centerline.
Note that during a step that tube will not only be in the direction of the ordinate axis, but also in the direction of the abscissa.
When doing my comparison, I want to know whether the green line stays within the pink tube.
Now comes my question: I do not want to compare the two time series using a graph, but using a python script. There must be something like this already, but I cannot find it because I am missing the right vocabulary, I believe. Any ideas? Is something like that in numpy, scipy, or similar? Or would I have to write the comparison myself?
Additional question: When the script says the two series are not sufficiently similar, I would like to plot it as described above (using matplotlib), but the line width has to be defined somehow in other units than what I usually use to define line width.

I would assume here that your problem can be simplified by assuming that your function has to be close to another function (e.g. the center of the tube) with the very same support points and then a certain number of discontinuities are allowed.
Then, I would implement a different discretization of function compared to the typical one that is used for L^2 norm (See for example some reference here).
Basically, in the continuous case, the L^2 norm relaxes the constrain of the two function being close everywhere, and allow it to be different on a finite number of points, called singularities
This works because there are an infinite number of points where to calculate the integral, and a finite number of points will not make a difference there.
However, since there are no continuous functions here, but only their discretization, the naive approach will not work, because any singularity will contribute potentially significantly to the final integral value.
Therefore, what you could do is to perform a point by point check whether the two functions are close (within some tolerance) and allow at most num_exceptions points to be off.
import numpy as np
def is_close_except(arr1, arr2, num_exceptions=0.01, **kwargs):
# if float, calculate as percentage of number of points
if isinstance(num_exceptions, float):
num_exceptions = int(len(arr1) * num_exceptions)
num = len(arr1) - np.sum(np.isclose(arr1, arr2, **kwargs))
return num <= num_exceptions
By contrast the standard L^2 norm discretization would lead to something like this integrated (and normalized) metric:
import numpy as np
def is_close_l2(arr1, arr2, **kwargs):
norm1 = np.sum(arr1 ** 2)
norm2 = np.sum(arr2 ** 2)
norm = np.sum((arr1 - arr2) ** 2)
return np.isclose(2 * norm / (norm1 + norm2), 0.0, **kwargs)
This however will fail for arbitrarily large peaks, unless you set such a large tolerance than basically anything results as "being close".
Note that the kwargs is used if you want to specify a additional tolerance constraints to np.isclose() or other of its options.
As a test, you could run:
import numpy as np
import numpy.random
np.random.seed(0)
num = 1000
snr = 100
n_peaks = 5
x = np.linspace(-10, 10, num)
# generate ground truth
y = np.sin(x)
# distributed noise
y2 = y + np.random.random(num) / snr
# distributed noise + peaks
y3 = y + np.random.random(num) / snr
peak_positions = [np.random.randint(num) for _ in range(n_peaks)]
for i in peak_positions:
y3[i] += np.random.random() * snr
# for distributed noise, both work with a 1/snr tolerance
is_close_l2(y, y2, atol=1/snr)
# output: True
is_close_except(y, y2, atol=1/snr)
# output: True
# for peak noise, since n_peaks < num_exceptions, this works
is_close_except(y, y3, atol=1/snr)
# output: True
# and if you allow 0 exceptions, than it fails, as expected
is_close_except(y, y3, num_exceptions=0, atol=1/snr)
# output: False
# for peak noise, this fails because the contribution from the peaks
# in the integral is much larger than the contribution from the rest
is_close_l2(y, y3, atol=1/snr)
# output: False
There are other approaches to this problem involving higher mathematics (e.g. Fourier or Wavelet transforms), but I would stick to the simplest.
EDIT (updated):
However, if the working assumption does not hold or you do not like, for example because the two functions have different sampling or they are described by non-injective relations.
In that case, you can follow the center of the tube using (x, y) data and the calculate the Euclidean distance from the target (the tube center), and check that this distance is point-wise smaller than the maximum allowed (the tube size):
import numpy as np
# assume it is something with shape (N, 2) meaning (x, y)
target = ...
# assume it is something with shape (M, 2) meaning again (x, y)
trajectory = ...
# calculate the distance minimum distance between each point
# of the trajectory and the target
def is_close_trajectory(trajectory, target, max_dist):
dist = np.zeros(trajectory.shape[0])
for i in range(len(dist)):
dist[i] = np.min(np.sqrt(
(target[:, 0] - trajectory[i, 0]) ** 2 +
(target[:, 1] - trajectory[i, 1]) ** 2))
return np.all(dist < max_dist)
# same as above but faster and more memory-hungry
def is_close_trajectory2(trajectory, target, max_dist):
dist = np.min(np.sqrt(
(target[:, np.newaxis, 0] - trajectory[np.newaxis, :, 0]) ** 2 +
(target[:, np.newaxis, 1] - trajectory[np.newaxis, :, 1]) ** 2),
axis=1)
return np.all(dist < max_dist)
The price of this flexibility is that this will be a significantly slower or memory-hungry function.

Assuming you have your list of results in the form we discussed in the comments already loaded:
from random import randint
import numpy
l1 = [(i,randint(0,99)) for i in range(10)]
l2 = [(i,randint(0,99)) for i in range(10)]
# I generate some random lists e.g:
# [(0, 46), (1, 33), (2, 85), (3, 63), (4, 63), (5, 76), (6, 85), (7, 83), (8, 25), (9, 72)]
# where the first element is the time and the second a value
print(l1)
# Then I just evaluate for each time step the difference between the values
differences = [abs(x[0][1]-x[1][1]) for x in zip(l1,l2)]
print(differences)
# And I can just print hte maximum difference and its index:
print(max(differences))
print(differences.index(max(differences)))
And with this data if you define that your "tube" is for example 10 large you can just check if the maxximum value that you find is greater than your thrashold in order to decide if those functions are similar enough or not

you will have to remove outliers from your dataset first if you need to skip a random spike.
you could also try the following?
from tslearn.metrics import dtw
print(dtw(arr1,arr2)*100/<lengthOfArray>)

Bit late to the game but I encountered the same conundrum recently and this seems to be the only question on on the site discussing this particular problem.
A basic solution is to use time and amplitude tolerance values to create a 'bounding box' style envelope (similar to your pink tube) around the data.
I'm sure there are more elegant ways to do this, but a very crudely coded brute force example would be something like the following using pandas:
import pandas as pd
data = pd.DataFrame()
data['benchmark'] = [0.1, 0.2, 0.3] # or whatever you pull from your expected value data set
data['under_test'] = [0.2, 0.3, 0.1] # or whatever you pull from your simulation results data set
sample_rate = 20 # or whatever the data sample rate is
st = 0.05 * sample_rate # shift tolerance adjusted to time series sample rate
# best to make it an integer so we can use standard
# series shift functions and whatnot
at = 0.05 # amplitude tolerance
bounding = pd.DataFrame()
# if we didn't care about time shifts, the following two would be sufficient
# (i.e. if the data didn't have severe discontinuities between samples)
bounding['top'] = data[['benchmark']] + at
bounding['bottom'] = data[['benchmark']] - at
# if you want to be able to tolerate large discontinuities
# the bounds can be widened along the time axis to accommodate for large jumps
bounding['bottomleft'] = data[['benchmark']].shift(-st) - at
bounding['topleft'] = data[['benchmark']].shift(-st) + at
bounding['topright'] = data[['benchmark']].shift(st) + at
bounding['bottomright'] = data[['benchmark']].shift(st) - at
# minimums and maximums give us a rough (but hopefully good enough) envelope
# these can be plotted as a parametric replacement of the 'pink tube' of line width
data['min'] = bounding.min(1)
data['max'] = bounding.max(1)
# see if the test data falls inside the envelope
data['pass/fail'] = data['under_test'].between(data['min'], data['max'])
# You now have a machine-readable column of booleans
# indicating which data points are outside the envelope

Get reverse cumulative density function with NumPy?

I am interested in a particular density, and I need to sample it "regularly" in a way that represent its shape (not random).
Formally, f is my density function, F is the corresponding cumulative density function (F' = f), whose reverse function rF = F^-1 does exist. I am interested in casting a regular sample from [0, 1] into my variable domain through F^-1. Something like:
import numpy as np
uniform_sample = np.linspace(0., 1., 256 + 2)[1:-1] # source sample
shaped_sample = rF(uniform_sample) # this is what I want to get
Is there a dedicated way to do this with numpy, or should I do this by hand? Here is the 'by hand' way for exponential law:
l = 5. # exponential parameter
# f = lambda x: l * np.exp(-l * x) # density function, not used
# F = lambda x: 1 - np.exp(-l * x) # cumulative density function, not used either
rF = lambda y: np.log(1. / (1. - y)) / l # reverse `F^-1` function
# What I need is:
shaped_sample = rF(uniform_sample)
I know that, in theory, rF is internally used for drawing random samples when np.random.exponential is called, for example (a uniform, random sample from [0, 1] is transformed by rF to get the actual result). So my guess is that numpy.random does know the rF function for each distribution it offers.
How do I access it? Does numpy provide functions like:
np.random.<any_numpy_distribution>.rF
or
np.random.get_reverse_F(<any_custom_density_function>)
.. or should I derive / approximate them myself?

scipy has probability distribution objects for all (I think) of the probability distributions in numpy.random.
http://docs.scipy.org/doc/scipy/reference/stats.html
The all have a ppf() method that does what you want.
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.ppf.html
In your example:
import scipy.stats as st
l = 5. # exponential parameter
dist = st.expon(0., l) # distribution object provided by scipy
f = dist.pdf # probability density function
F = dist.cdf # cumulative density function
rF = dist.ppf # percent point function : reverse `F^-1` function
shaped_sample = rF(uniform_sample)
# and much more!

As far as I'm aware there isn't a way to do this directly in numpy. For the case of functions where the cumulative distribution is analytic but it's inverse isn't I generally use a spline to do the inversion numerically.
from scipy.interpolate import UnivariateSpline
x = np.linspace(0.0, 1.0, 1000)
F = cumulative_distn(x) #This we know and is analytic
rF = UnivariateSpline(F, x) #This will then be the inverse
Note that if you can do the inversion of F to rF by hand then you should. This method is only for the case where the inverse cannot be found in a closed form.

Python - generate array of specific autocorrelation

I am interested in generating an array(or numpy Series) of length N that will exhibit specific autocorrelation at lag 1. Ideally, I want to specify the mean and variance, as well, and have the data drawn from (multi)normal distribution. But most importantly, I want to specify the autocorrelation. How do I do this with numpy, or scikit-learn?
Just to be explicit and precise, this is the autocorrelation I want to control:
numpy.corrcoef(x[0:len(x) - 1], x[1:])[0][1]

If you are interested only in the auto-correlation at lag one, you can generate an auto-regressive process of order one with the parameter equal to the desired auto-correlation; this property is mentioned on the Wikipedia page, but it's not hard to prove it.
Here is some sample code:
import numpy as np
def sample_signal(n_samples, corr, mu=0, sigma=1):
assert 0 < corr < 1, "Auto-correlation must be between 0 and 1"
# Find out the offset `c` and the std of the white noise `sigma_e`
# that produce a signal with the desired mean and variance.
# See https://en.wikipedia.org/wiki/Autoregressive_model
# under section "Example: An AR(1) process".
c = mu * (1 - corr)
sigma_e = np.sqrt((sigma ** 2) * (1 - corr ** 2))
# Sample the auto-regressive process.
signal = [c + np.random.normal(0, sigma_e)]
for _ in range(1, n_samples):
signal.append(c + corr * signal[-1] + np.random.normal(0, sigma_e))
return np.array(signal)
def compute_corr_lag_1(signal):
return np.corrcoef(signal[:-1], signal[1:])[0][1]
# Examples.
print(compute_corr_lag_1(sample_signal(5000, 0.5)))
print(np.mean(sample_signal(5000, 0.5, mu=2)))
print(np.std(sample_signal(5000, 0.5, sigma=3)))
The parameter corr lets you set the desired auto-correlation at lag one and the optional parameters, mu and sigma, let you control the mean and standard deviation of the generated signal.

python- convolution with step response

I want to compute this integral $\frac{1}{L}\int_{-\infty}^{t}H(t^{'})\exp(-\frac{R}{L}(t-t^{'}))dt^{'}$ using numpy.convolution, where $H(t)$ is heavside function. I am supposed to get this equals to $\exp(-\frac{R}{L}t)H(t)$
below is what I did,
I changed the limitation of the integral into -inf to +inf by change of variable multiplying a different H(t) then I used this as my function to convolve with H(t)(the one inside the integral), but the output plot is definitely not a exp function, neither I could find any mistakes in my code, please help, any hint or suggestions will be appreciated!
import numpy as np
import matplotlib.pyplot as plt
R = 1e3
L = 3.
delta = 1
Nf = 100
Nw = 200
k = np.arange(0,Nw,delta)
dt = 0.1e-3
tk = k*dt
Ng = Nf + Nw -2
n = np.arange(0,Nf+Nw-1,delta)
tn = n*dt
#define H
def H(n):
H = np.ones(n)
H[0] = 0.5
return H
#build ftns that get convoluted
f = H(Nf)
w = np.exp((-R/L)*tk)*H(Nw)
#return the value of I
It = np.convolve(w,f)/L
#return the value of Voutput, b(t)
b = H(Ng+1) - R*It
plt.plot(tn,b,'o')
plt.show()

The issue with your code is not so much programming as it is conceptual. Rewrite the convolution as Integral[HeavisideTheta[t-t']*Exp[-R/L * t'], -Inf, t] (that's Mathematica code) and upon inspection you find that H(t-t') is always 1 within the limits (except for at t'=t which is the integration limit... but that's not important). So in reality you're not actually performing a complete convolution... you're basically just taking half (or a third) of the convolution.
If you think of a convolution as inverting one sequence and then going one shift at the time and adding it all up (see http://en.wikipedia.org/wiki/Convolution#Derivations - Visual Explanation of Convolution) then what you want is the middle half... i.e. only when they're overlapping. You don't want the lead-in (4-th graph down: http://en.wikipedia.org/wiki/File:Convolution3.svg). You do want the lead-out.
Now the easiest way to fix your code is as such:
#build ftns that get convoluted
f = H(Nf)
w = np.exp((-R/L)*tk)*H(Nw)
#return the value of I
It = np.convolve(w,f)/L
max_ind = np.argmax(It)
print max_ind
It1 = It[max_ind:]
The lead-in is the only time when the convolution integral (technically sum in our case) increases... thus after the lead-in is finished the convolution integral follows Exp[-x]... so you tell python to only take values after the maximum is achieved.
#return the value of Voutput, b(t) works perfectly now!
Note: Since you need the lead-out you can't use np.convolve(a,b, mode = 'valid').
So It1 looks like:
b(t) using It1 looks like:
There is no way you can ever get exp(-x) as the general form because the equation for b(t) is given by 1 - R*exp(-x)... It can't mathematically follow an exp(-x) form. At this point there are 3 things:
The units don't really make sense... check them. The Heaviside function is 1 and R*It1 is about 10,000. I'm not sure this is an issue but just in case, the normalized curve looks as such:
You can get an exp(-x) form if you use b(t) = R*It1 - H(t)... the code for that is here (You might have to normalize depending on your needs):
b = R*It1 - H(len(It1))
# print len(tn)
plt.plot(tn[:len(b)], b,'o')
plt.show()
And the plot looks like:
Your question might still not be resolved in which case you need to explain what exactly you think was wrong. With the info you've given me... b(t) can never have an Exp[-x] form unless the equation for b(t) is messed with. As it stands in your original code It1 follows Exp[-x] in form but b(t) cannot.

I think there's a bit of confusion here about convolution. We use convolution in the time domain to calculate the response of a linear system to an arbitrary input. To do this, we need to know the impulse response of the system. Be careful switching between continuous and discrete systems - see e.g. http://en.wikipedia.org/wiki/Impulse_invariance.
The (continuous) impulse response of your system (which I assume to be for the resistor voltage of an L-R circuit) I have defined for convenience as a function of time t: IR = lambda t: (R/L)*np.exp(-(R/L)*t) * H.
I have also assumed that your input is the Heaviside step function, which I've defined on the time interval [0, 1], for a timestep of 0.001 s.
When we convolve (discretely), we effectively flip one function around and slide it along the other one, multiplying corresponding values and then taking the sum. To use the continuous impulse response with a step function which actually comprises of a sequence of Dirac delta functions, we need to multiply the continuous impulse response by the time step dt, as described in the Wikipedia link above on impulse invariance. NB - setting H[0] = 0.5 is also important.
We can visualise this operation below. Any given red marker represents the response at a given time t, and is the "sum-product" of the green input and a flipped impulse response shifted to the right by t. I've tried to show this with a few grey impulse responses.
The code to do the calculation is here.
import numpy as np
import matplotlib.pyplot as plt
R = 1e3 # Resistance
L = 3. #Inductance
dt = 0.001 # Millisecond timestep
# Define interval 1 second long, interval dt
t = np.arange(0, 1, dt)
# Define step function
H = np.ones_like(t)
H[0] = 0.5 # Correction for impulse invariance (cf http://en.wikipedia.org/wiki/Impulse_invariance)
# RL circuit - resistor voltage impulse response (cf http://en.wikipedia.org/wiki/RL_circuit)
IR = lambda t: (R/L)*np.exp(-(R/L)*t) * H # Don't really need to multiply by H as w is zero for t < 0
# Response of resistor voltage
response = np.convolve(H, IR(t)*dt, 'full')
The extra code to make the plot is here:
# Define new, longer, time array for plotting response - must be same length as response, with step dt
tp = np.arange(len(response))* dt
plt.plot(0-t, IR(t), '-', label='Impulse response (flipped)')
for q in np.arange(0.01, 0.1, 0.01):
plt.plot(q-t, IR(t), 'o-', markersize=3, color=str(10*q))
t = np.arange(-1, 1, dt)
H = np.ones_like(t)
H[t<0] = 0.
plt.plot(t, H, 's', label='Unit step function')
plt.plot(tp, response, '-o', label='Response')
plt.tight_layout()
plt.grid()
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.legend()
plt.show()
Finally, if you still have some confusion about convolution, I strongly recommend "Digital Signal Processing: A Practical Guide for Engineers and Scientists" by Steven W. Smith.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.