From DFT to QFT python/qiskit - python

I have the following code
x = [1, -4, 5, -2] # Data points
N = len(x) # Number of samples
n = np.arange(N) # Current sample
k = n.reshape((N, 1)) # Current frequency
e = np.exp(-2j * np.pi * k * n / N) # Exponential part
DFT = np.dot(e, x)
How can I make this Classical Fourier Transform into the Quantum version, either via python or qiskit?

Ok, so I think you should first read about the quantum Fourier transform and see what it does, as the Quantum Fourier Transform is not a quantum algorithm for calculating the Fourier transform but rather a way to transform from one state basis to another called the Fourier basis, it does not convert a finite sequence of equally-spaced samples of a function into a same-length sequence of equally-spaced samples as the DFT does
That being said, if you simply want to implement the QFT then you can find examples here
https://qiskit.org/textbook/ch-algorithms/quantum-fourier-transform.html
as well as some intuition of what the transform does and how to use it. I don't know if the data points you have are those for a reason, but you may want to change them to use them in qiskit for strings of 0s and 1s or convert them to binary and use quite a few qubits

Related

Fourier transformation (fft) for Time Series, but both ends of cleaned data move towards each other

I have a time serie that represents the X and the Z coordinate in a virtual environment.
X = np.array(df["X"])
Z = np.array(df["Z"])
Both the X and the Z coordinates contain noise from a different source. To filter out the noise I'd like to use a Fourier Transform.
After some research I used the code from https://medium.com/swlh/5-tips-for-working-with-time-series-in-python-d889109e676d to denoise my data.
def fft_denoiser(x, n_components, to_real=True):
n = len(x)
# compute the fft
fft = np.fft.fft(x, n)
# compute power spectrum density
# squared magnitud of each fft coefficient
PSD = fft * np.conj(fft) / n
# keep high frequencies
_mask = PSD > n_components
fft = _mask * fft
# inverse fourier transform
clean_data = np.fft.ifft(fft)
if to_real:
clean_data = clean_data.real
return clean_data
After setting the n_components I like to use the cleaned data. Which goes pretty well, as I plotted for the X-coordinate:
Only at the beginning and the end, the cleaned data suddenly move towards each others values... Can somebody help or expain to me what causes this, and how I can overcome this?
The reason you're having this issue is because the FFT implicitly assumes that the provided input signal is periodic. If you repeat your raw data, you see that at each period there is a large discontinuity (as the signal goes from ~20 back down to ~5). Once some higher frequency components are removed you see the slightly less sharp discontinuity at the edges (a few samples at the start and a few samples at the end).
To avoid this situation, you can do the filtering in the time-domain using a linear FIR filter, which can process the data sequence without the periodicity assumption.
For the purpose of this answer I constructed a synthetic test signal (which you can use to recreate the same conditions), but you can obviously use your own data instead:
# Generate synthetic signal for testing purposes
fs = 1 # Hz
f0 = 0.002/fs
f1 = 0.01/fs
dt = 1/fs
t = np.arange(200, 901)*dt
m = (25-5)/(t[-1]-t[0])
phi = 4.2
x = 5 + m*(t-t[0]) + 2*np.sin(2*np.pi*f0*t) + 1*np.sin(2*np.pi*f1*t+phi) + 0.2*np.random.randn(len(t))
Now to design the filter we can take the inverse transform of the _mask (instead of applying the mask):
import numpy as np
# Design denoising filter
def freq_sampling_filter(x, threshold):
n = len(x)
# compute the fft
fft = np.fft.fft(x, n)
# compute power spectrum density
# squared magnitud of each fft coefficient
PSD = fft * np.conj(fft) / n
# keep frequencies with large contributions
_mask = PSD > threshold
_coff = np.fft.fftshift(np.real(np.fft.ifft(_mask)))
return _coff
coff = freq_sampling_filter(x, threshold)
The threshold is a tunable parameter which would be chosen to keep enough of the frequency components you'd like to keep and get rid of the unwanted frequency components. That is of course highly subjective.
Then we can simply apply the filter with scipy.signal.filtfilt:
from scipy.signal import filtfilt
# apply the denoising filter
cleaned = filtfilt(coff, 1, x, padlen=len(x)-1, padtype='constant')
For the purpose of illustration, using a threshold of 10 with the above generated synthetic signal yields the following raw data (variable x) and cleaned data (variable cleaned):
The choice of padtype to 'constant' ensures that the filtered values start and end at the start and end values of the unfiltered data.
Alternative
As was posted in comments, filtfilt may be expensive for longer data sets.
As an alternative, the filtering can be performed using FFT-based convolution by using scipy.fftconvolve. Note that in this case, there is no equivalent of the padtype argument of filtfilt, so we need to manually pad the signal to avoid edge effects at the start and end.
n = len(x)
# Manually pad signal to avoid edge effects
x_padded = np.concatenate((x[0]*np.ones(n-1), x, x[-1]*np.ones((n-1)//2)))
# Filter using FFT-based convolution
cleaned = fftconvolve(x_padded, coff, mode='same')
# Extract result (remove data from padding)
cleaned = cleaned[2*(n-1)//2:-n//2+1]
For reference, here are some benchmark comparisons (timings in seconds, so smaller is better) for the above signal of length 700:
filtfilt : 0.3831593
fftconvolve : 0.00028040000000029153
Note that relative performance would vary, but FFT-based convolution is expected to perform comparatively better as the signal gets longer.

Parseval's Theorem with Numpy FFT is not fulfilled

I am trying to determine the total energy recorded by a detector in time domain by means of it's spectrum.
The first step after performing the Fast Fourier Transformation with Numpy's FFT library was to confirm Parseval's theorem.
According to the theorem, the total energy in time domain and in frequency domain must be the same. I have two problems that I am not able to solve.
I can confirm the theorem when I don't use the proper units for the x-Axis during the np.trapz() integration. As soon as I use my the actual sample points/frequencies, the result is off. I do not understand why this is the case and am wondering if I can apply a normalization to solve this error.
I cannot confirm the theorem when I apply a DC offset to the signal (uncomment the f = np.sin(np.pi**t)* line).
Below is my code with an examplatory Sine function.
# Python code
import matplotlib.pyplot as plt
import numpy as np
# Create a Sine function
dt = 0.001 # Time steps
t = np.arange(0,10,dt) # Time array
f = np.sin(np.pi*t) # Sine function
# f = np.sin(np.pi*t)+1 # Sine function with DC offset
N = len(t) # Number of samples
# Energy of function in time domain
energy_t = np.trapz(abs(f)**2)
# Energy of function in frequency domain
FFT = np.sqrt(2) * np.fft.rfft(f) # only positive frequencies; correct magnitude due to discarding of negative frequencies
FFT[0] /= np.sqrt(2) # DC magnitude does not have to be corrected
FFT[-1] /= np.sqrt(2) # Nyquist frequency does not have to be corrected
frq = np.fft.rfftfreq(N,d=dt) # FFT frequenices
# Energy of function in frequency domain
energy_f = np.trapz(abs(FFT)**2) / N
print('Parsevals theorem fulfilled: ' + str(energy_t - energy_f))
# Parsevals theorem with proper sample points
energy_t = np.trapz(abs(f)**2, x=t)
energy_f = np.trapz(abs(FFT)**2, x=frq) / N
print('Parsevals theorem NOT fulfilled: ' + str(energy_t - energy_f))
The FFT computes the Discrete Fourier Transform (DFT), which is not the same as the (continuous-domain) Fourier Transform.
For the DFT, Parseval’s theorem states that the sum of the square magnitude of the discrete signal equals the sum of the square magnitude of the DFT of the signal. There is no integration involved, and therefore you should not use trapz. Just use sum.
Note that a discrete signal is a set of samples x[n] at n=0..N-1. Fourier analysis in the discrete domain, and all related operations, only consider n, not t. The sampling frequency and the actual times those samples were recorded is irrelevant in these analyses. Likewise, the DFT produces a set of samples X[k] at k=0..N-1, not at any specific f or ω related to any sampling frequency.
Now it is possible to relate n to t because we know the sampling frequency, and it is possible to relate k to f because we know the sampling frequency. But these conversions should not make us think that X[k] is a sampling of the continuous-domain Fourier transform of the original continuous-domain signal. And they should especially not make us think that we can interpolate X[k].
Reconstructing the samples x[n] is accomplished by adding N sinusoids with parameters given by X[k]. “In between” those DFT components should not be anything. Interpolating them would mean we add sinusoids that do not exist in the samples x[n].
trapz uses linear interpolation to obtain an estimate of the integral, and therefore is inappropriate to use in discrete Fourier analysis.

Generating correlated random potential using fast Fourier transform

I would like to generate a random potential in 1D or 2D spaces with a specified autocorrelation function, and according to some mathematical derivations including the Wiener-Khinchin theorem and properties of the Fourier transforms, it turns out that this can be done using the following equation:
where phi(k) is uniformly distributed in interval [0, 1). And this function satisfies , which is to ensure that the potential generated is always real.
The autocorrelation function should not affect what I am doing here, and I take a simple Gaussian distribution .
The choice of the phase term and the condition of phi(k) is based on the following properties
The phase term must have a modulus of 1 (by Wiener-Khinchin theorem, i.e. the Fourier transform of the autocorrelation of a function equals the modulus of the Fourier transform of that function);
The Fourier transform of a real function must satisfy (by directly inspecting the definition of Fourier transform in integral form).
Both the generated potential and the autocorrelation are real.
By combining these three properties, this term can only take the form as stated above.
For the relevant mathematics, you may refer to p.16 of the following pdf:
https://d-nb.info/1007346671/34
I randomly generated a numpy array using uniform distribution and concatenated the negative of the array with the original array, such that it satisfies the condition of phi(k) stated above. And then I performed the numpy (inverse) fast Fourier transform.
I have tried both 1D and 2D cases, and only the 1D case is shown below.
import numpy as np
from numpy.fft import fft, ifft
import matplotlib.pyplot as plt
## The Gaussian autocorrelation function
def c(x, V0, rho):
return V0**2 * np.exp(-x**2/rho**2)
x_min, x_max, interval_x = -10, 10, 10000
x = np.linspace(x_min, x_max, interval_x, endpoint=False)
V0 = 1
## the correlation length
rho = 1
## (Uniformly) randomly generated array for k>0
phi1 = np.random.rand(int(interval_x)/2)
phi = np.concatenate((-1*phi1[::-1], phi1))
phase = np.exp(2j*np.pi*phi)
C = c(x, V0, rho)
V = ifft(np.power(fft(C), 0.5)*phase)
plt.plot(x, V.real)
plt.plot(x, V.imag)
plt.show()
And the plot is similar to what is shown as follows:
.
However, the generated potential turns out to be complex, and the imaginary parts are of the same order of magnitude as that of the real parts, which is not expected. I have checked the math many times, but I couldn't spot any problems. So I am thinking whether it's related to the implementation problems, for example whether the data points are dense enough for Fast Fourier Transform, etc.
You have a few misunderstandings about how fft (more correctly, DFT) operates.
First note that DFT assumes that the samples of the sequence are indexed as 0, 1, ..., N-1, where N are the number of samples. Instead, you generate a sequence corresponding to indices -10000, ..., 10000. Second, note that the DFT of a real sequence will generate real values for the "frequencies" corresponding to 0 and N/2. You also seem to not take this into account.
I won't go into further details as this is out of the scope of this stackexchange site.
Just for a sanity check, the code below generates a sequence that has the properties expected for the DFT (FFT) of a real-valued sequence:
conjugate symmetry of positive and negative frequencies,
real-valued elements corresponding to frequencies 0 and N/2
sequence assumed to correspond to indices 0 to N-1
As you can see, the ifft of this sequence indeed generates a real-valued sequence
from scipy.fftpack import ifft
N = 32 # number of samples
n_range = np.arange(N) # indices over which the sequence is defined
n_range_positive = np.arange(int(N/2)+1) # the "positive frequencies" sample indices
n_range_negative = np.arange(int(N/2)+1, N) # the "negative frequencies" sample indices
# generate a complex-valued sequence with the properties expected for the DFT of a real-valued sequence
abs_FFT_positive = np.exp(-n_range_positive**2/100)
phase_FFT_positive = np.r_[0, np.random.uniform(0, 2*np.pi, int(N/2)-1), 0] # note last frequency has zero phase
FFT_positive = abs_FFT_positive * np.exp(1j * phase_FFT_positive)
FFT_negative = np.conj(np.flip(FFT_positive[1:-1]))
FFT = np.r_[FFT_positive, FFT_negative] # this is the final FFT sequence
# compute the IFFT of the above sequence
IFFT = ifft(FFT)
#plot the results
plt.plot(np.abs(FFT), '-o', label = 'FFT sequence (abs. value)')
plt.plot(np.real(IFFT), '-s', label = 'IFFT (real part)')
plt.plot(np.imag(IFFT), '-x', label = 'IFFT (imag. part)')
plt.legend()
More care needs to be taken when concatenating:
phi1 = np.random.rand(int(interval_x)//2-1)
phi = np.concatenate(([0], phi1, [0], -phi1[::-1]))
The first element is the offset (zero frequency mode). "Negative" frequencies come after the midpoint.
This gives me

Sample a truncated integer power law in Python?

What function can I use in Python if I want to sample a truncated integer power law?
That is, given two parameters a and m, generate a random integer x in the range [1,m) that follows a distribution proportional to 1/x^a.
I've been searching around numpy.random, but I haven't found this distribution.
AFAIK, neither NumPy nor Scipy defines this distribution for you. However, using SciPy it is easy to define your own discrete distribution function using scipy.rv_discrete:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
def truncated_power_law(a, m):
x = np.arange(1, m+1, dtype='float')
pmf = 1/x**a
pmf /= pmf.sum()
return stats.rv_discrete(values=(range(1, m+1), pmf))
a, m = 2, 10
d = truncated_power_law(a=a, m=m)
N = 10**4
sample = d.rvs(size=N)
plt.hist(sample, bins=np.arange(m)+0.5)
plt.show()
I don't use Python, so rather than risk syntax errors I'll try to describe the solution algorithmically. This is a brute-force discrete inversion. It should translate quite easily into Python. I'm assuming 0-based indexing for the array.
Setup:
Generate an array cdf of size m with cdf[0] = 1 as the first entry, cdf[i] = cdf[i-1] + 1/(i+1)**a for the remaining entries.
Scale all entries by dividing cdf[m-1] into each -- now they actually are CDF values.
Usage:
Generate your random values by generating a Uniform(0,1) and
searching through cdf[] until you find an entry greater than your
uniform. Return the index + 1 as your x-value.
Repeat for as many x-values as you want.
For instance, with a,m = 2,10, I calculate the probabilities directly as:
[0.6452579827864142, 0.16131449569660355, 0.07169533142071269, 0.04032862392415089, 0.02581031931145657, 0.017923832855178172, 0.013168530260947229, 0.010082155981037722, 0.007966147935634743, 0.006452579827864143]
and the CDF is:
[0.6452579827864142, 0.8065724784830177, 0.8782678099037304, 0.9185964338278814, 0.944406753139338, 0.9623305859945162, 0.9754991162554634, 0.985581272236501, 0.9935474201721358, 1.0]
When generating, if I got a Uniform outcome of 0.90 I would return x=4 because 0.918... is the first CDF entry larger than my uniform.
If you're worried about speed you could build an alias table, but with a geometric decay the probability of early termination of a linear search through the array is quite high. With the given example, for instance, you'll terminate on the first peek almost 2/3 of the time.
Use numpy.random.zipf and just reject any samples greater than or equal to m

Interpolate whole arrays of complex numbers

I have a number of 2-dimensional np.arrays (all of equal size) containing complex numbers. Each of them belongs to one position in a 4-dimensional space. Those positions are sparse and distributed irregularly (a latin hypercube to be precise).
I would like to interpolate this data to other points in the same 4-dimensional space.
I can successfully do this for simple numbers, using either sklearn.kriging(), scipy.interpolate.Rbf() (or others):
# arrayof co-ordinates: 2 4D sets
X = np.array([[1.0, 0.0, 0.0, 0.0],\
[0.0, 1.0, 0.0, 0.0]])
# two numbers, one for each of the points above
Y = np.array([1,\
0])
# define the type of gaussian process I want
kriging = gp.GaussianProcess(theta0=1e-2, thetaL=1e-4, thetaU=4.0,\
corr='linear', normalize=True, nugget=0.00001, optimizer='fmin_cobyla')
# train the model on the data
kmodel = kriging.fit(X,Y)
# interpolate
kmodel.predict(np.array([0.5, 0.5, 0.0, 0.0]))
# returns: array([ 0.5])
If I try to use arrays (or just complex numbers) as data, this doesn't work:
# two arrays of complex numbers, instead of the numbers
Y = np.array([[1+1j, -1-1j],\
[0+0j, 0+0j]])
kmodel = kriging.fit(X,Y)
# returns: ValueError: The number of features in X (X.shape[1] = 1) should match the sample size used for fit() which is 4.
This is obvious since the docstring for kriging.fit() clearly states that it needs an array of n scalars, one per each element in the first dimension of X.
One solution is to decompose the arrays in Y into individual numbers, those into real and imaginary parts, make a separate interpolation of each of those and then put them together again. I can do this with the right combination of loops and some artistry but it would be nice if there was a method (e.g. in scipy.interpolate) that could handle an entire np.array instead of scalar values.
I'm not fixed on a specific algorithm (yet), so I'd be happy to know about any that can use arrays of complex numbers as the "variable" to be interpolated. Since -- as I said -- there are few and irregular points in space (and no grid to interpolate on), simple linear interpolation won't do, of course.
There are two ways of looking at complex numbers:
Cartesian Form ( a + bi ) and
Polar/Euler Form ( A * exp(i * phi) )
When you say you want to interpolate between two polar coordinates, do you want to interpolate with respect to the real/imaginary components (1), or with respect to the number's magnitude and phase (2)?
You CAN break things down into real and imaginary components,
X = 2 * 5j
X_real = np.real(X)
X_imag = np.imag(X)
# Interpolate the X_real and X_imag
# Reconstruct X
X2 = X_real + 1j * X_imag
However, With real-life applications that involve complex numbers, such as digital filter design, you quite often want to work with numbers in Polar/exponential form.
Therefore instead of interpolating the np.real() and np.imag() components, you may want to break it down into magnitude & phase using np.abs() and Angle or Arctan2, and interpolate separately. You might do this, for example, when trying to interpolate the Fourier Transform of a digital filter.
Y = 1+2j
mag = np.abs(Y)
phase = np.angle(Y)
The interpolated values can be converted back into complex (Cartesian) numbers using the Eulers formula
# Complex number
y = mag * np.exp( 1j * phase)
# Or if you want the real and imaginary complex components separately,
realPart, imagPart = mag * np.cos(phase) , mag * np.sin(phase)
Depending on what you're doing, this gives you some real flexibility with the interpolation methods you use.
I ended up working around the problem, but after learning a good deal more about response surfaces and the like, I now understand that this is a far-from-trivial problem. I could not have expected a simple solution in numpy, and the question would have probably been better placed in a forum on mathematics than on programming.
If I had to tackle such a task again, I'd probably use scikit-learn to try and establish either a co-Kriging interpolation for both components, or two separate Kriging (or more general, Gaussian Process) models which share a common set of model constants, optimized to minimize the combined error amplitude, (i.e.: Full model error square is the sum of both partial model errors)
-- but first I'd go and have a look if there aren't any useful papers on the topic already.

Categories