Manually recover the original function from numpy rfft

Manually recover the original function from numpy rfft - python

I have performed a numpy.fft.rfft on a function to obtain the Fourier coefficients. Since the docs do not seem to contain the exact formula used, I have been assuming a formula found in a textbook of mine:
S(x) = a_0/2 + SUM(real(a_n) * cos(nx) + imag(a_n) * sin(nx))
where imag(a_n) is the imaginary part of the n_th element of the Fourier coefficients.
To translate this into python-speak, I have implemented the following:
def fourier(freqs, X):
# input the fourier frequencies from np.fft.rfft, and arbitrary X
const_term = np.repeat(np.real(freqs[0])/2, X.shape[0]).reshape(-1,1)
# this is the "n" part of the inside of the trig terms
trig_terms = np.tile(np.arange(1,len(freqs)), (X.shape[0],1))
sin_terms = np.imag(freqs[1:])*np.sin(np.einsum('i,ij->ij', X, trig_terms))
cos_terms = np.real(freqs[1:])*np.cos(np.einsum('i,ij->ij', X, trig_terms))
return np.concatenate((const_term, sin_terms, cos_terms), axis=1)
This should give me an [X.shape[0], 2*freqs.shape[0] - 1] array, containing at entry i,j the i_th element of X evaluated at the j_th term of the Fourier decomposition (where the j_th term is a sin term for odd j).
By summing this array over the axis of Fourier terms, I should obtain the function evaluated at the i_th term in X:
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(-1,1,50)
y = X*(X-0.8)*(X+1)
reconstructed_y = np.sum(
fourier(
np.fft.rfft(y),
X
),
axis = 1
)
plt.plot(X,y)
plt.plot(X, reconstructed_y, c='r')
plt.show()
In any case, the red line should be basically on top of the blue line. Something has gone wrong either in my assumptions about what numpy.fft.rfft returns, or in my specific implementation, but I am having a hard time tracking down the bug. Can anyone shed some light on what I've done wrong here?

Related

Is there a way to numerically integrate in Fourier space with scipy.fft?

I am interested in integrating in Fourier space after using scipy to take an fft of some data. I have been following along with this stack exchange post numerical integration in Fourier space with numpy.fft but it does not properly integrate a few test cases I have been working with. I have added a few lines to address this issue but still am not recovering the correct integrals. Below is the code I have been using to integrate my test cases. At the top of the code are the 3 test cases I have been using.
import numpy as np
import scipy.special as sp
from scipy.fft import fft, ifft, fftfreq
import matplotlib.pyplot as plt
#set number of points in array
Ns = 2**16
#create array in space
x = np.linspace(-np.pi, np.pi, Ns)
#test case 1 from stack exchange post
# y = np.exp(-x**2) # function f(x)
# ys = np.exp(-x**2) * (-2 *x) # derivative f'(x)
#test case 2
# y = np.exp(-x**2) * x - 1/2 *np.sqrt(np.pi)*sp.erf(x)
# ys = np.exp(-x**2) * -2*x**2
#test case 3
y = np.sin(x**2) + (1/4)*np.exp(x)
ys = 1/4*(np.exp(x) + 8*x*np.cos(x**2))
#find spacing in space array
ss = x[1]-x[0]
#definte fft integration function
def fft_int(N,s,dydt):
#create frequency array
f = fftfreq(N,s)
# integration step ignoring divide by 0 errors
Fys = fft(dydt)
with np.errstate(divide="ignore", invalid="ignore"):
modFys = Fys / (2*np.pi*1j*f)
#set DC term to 0, was a nan since we divided by 0
modFys[0] = 0
#take inverse fft and subtract by integration constant
fourier = ifft(modFys)
fourier = fourier-fourier[0]
#tilt correction if function doesn't approach 0 at its ends
tilt = np.sum(dydt)*s*(np.arange(0,N)/(N-1) - 1/2)
fourier = fourier + tilt
return fourier
Test case 1 was from the stack exchange post from above. If you copy paste the code from the top answer and plot you'll get something like this:
with the solid blue line being the fft integration method and the dashed orange as the analytic solution. I account for this offset with the following line of code:
fourier = fourier-fourier[0]
since I don't believe the code was setting the constant of integration.
Next for test case 2 I get a plot like this:
again with the solid blue line being the fft integration method and the dashed orange as the analytic solution. I account for this tilt in the solution using the following lines of code
tilt = np.sum(dydt)*s*(np.arange(0,N)/(N-1) - 1/2)
fourier = fourier + tilt
Finally we arrive at test case 3. Which results in the following plot:
again with the solid blue line being the fft integration method and the dashed orange as the analytic solution. This is where I'm stuck, this offset has appeared again and I'm not sure why.
TLDR: How do I correctly integrate a function in fourier space using scipy.fft?

The tilt component makes no sense. It fixes one function, but it's not a generic solution of the problem.
The problem is that the FFT induces periodicity in the signal, meaning you compute the integral of a different function. Multiplying the FFT of the signal by 1/(2*np.pi*1j*f) is equivalent to a circular convolution of the signal with ifft(1/(2*np.pi*1j*f)). "Circular" is the key here. This is just a boundary problem.
Padding the function with zeros is one way to attempt to fix this:
import numpy as np
import scipy.special as sp
from scipy.fft import fft, ifft, fftfreq
import matplotlib.pyplot as plt
def fft_int(s, dydt, N=0):
dydt_padded = np.pad(dydt, (0, N))
f = fftfreq(dydt_padded.shape[0], s)
F = fft(dydt_padded)
with np.errstate(divide="ignore", invalid="ignore"):
F = F / (2*np.pi*1j*f)
F[0] = 0
y_padded = np.real(ifft(F))
y = y_padded[0:dydt.shape[0]]
return y - np.mean(y)
N = 2**16
x = np.linspace(-np.pi, np.pi, N)
s = x[1] - x[0]
# Test case 3
y = np.sin(x**2) + (1/4)*np.exp(x)
dy = 1/4*(np.exp(x) + 8*x*np.cos(x**2))
plt.plot(y - np.mean(y))
plt.plot(fft_int(s, dy))
plt.plot(fft_int(s, dy, N))
plt.plot(fft_int(s, dy, 10*N))
plt.show()
(Blue is expected output, computed solution without padding is orange, and with increasing amount of padding, green and red.)
Here I've solved the "offset" problem by plotting all functions with their mean removed. Setting the DC component to 0 is equal to subtracting the mean. But after cropping off the padding the mean changes, so fft_int subtracts the mean again after cropping.
Anyway, note how we get an increasingly better approximation as the padding increases. To get the exact result, one would need an infinite amount of padding, which of course is unrealistic.
Test case #1 doesn't need padding, the function reaches zero at the edges of the sampled domain. We can impose such a behavior on the other cases too. In Discrete Fourier analysis this is called windowing. This would look something like this:
def fft_int(s, dydt):
dydt_windowed = dydt * np.hanning(dydt.shape[0])
f = fftfreq(dydt.shape[0], s)
F = fft(dydt_windowed)
with np.errstate(divide="ignore", invalid="ignore"):
F = F / (2*np.pi*1j*f)
F[0] = 0
y = np.real(ifft(F))
return y
However, here we get correct integration results only in the middle of the domain, with increasingly suppressed values towards to ends. So this is not a practical solution either.
My conclusion is that no, this is not possible to do. It is much easier to compute the integral with np.cumsum:
yp = np.cumsum(dy) * s
plt.plot(y - np.mean(y))
plt.plot(yp - np.mean(yp))
plt.show()
(not showing output: the two plots overlap perfectly.)

domain of inverse fourier transform after operation

Background: I observe a sample of a variable z that is the sum of two independent and identically distributed variables x and y. I'm trying to recover the distribution of x, y (call it f) from the distribution of z (call it g), under the assumption that f is symmetric about zero. According to Horowitz and Markatou (1996) we have that the Fourier Transform of f is equal to sqrt(|G|), where G is the Fourier transform of g.
Example:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde, laplace
# sample size
size = 10000
# Number of points to preform FFT on
N = 501
# Scale of the laplace rvs
scale = 3.0
# Test deconvolution
laplace_f = laplace(scale=scale)
x = laplace_f.rvs(size=size)
y = laplace_f.rvs(size=size)
z = x + y
t = np.linspace(-4 * scale, 4 * scale, size)
laplace_pdf = laplace_f.pdf(t)
t2 = np.linspace(-4 * scale, 4 * scale, N)
# Get density from z. Kind of cheating using gaussian
z_density = gaussian_kde(z)(t2)
z_density = (z_density + z_density[::-1]) / 2
z_density_half = z_density[:((N - 1) // 2) + 1]
ft_z_density = np.fft.hfft(z_density_half)
inv_fz_density = np.fft.ihfft(np.sqrt(np.abs(ft_z_density)))
inv_fz_density = np.r_[inv_fz_density, inv_fz_density[::-1][:-1]]
f_deconv_shifted = np.real(np.fft.fftshift(inv_fz_density))
f_deconv = np.real(inv_fz_density)
# Normalize to be a pdf
f_deconv_shifted /= f_deconv_shifted.mean()
f_deconv /= f_deconv.mean()
# Plot
plt.subplot(221)
plt.plot(t, laplace_pdf)
plt.title('laplace pdf')
plt.subplot(222)
plt.plot(t2, z_density)
plt.title("z density")
plt.subplot(223)
plt.plot(t2, f_deconv_shifted)
plt.title('Deconvolved with shift')
plt.subplot(224)
plt.plot(t2, f_deconv)
plt.title('Deconvolved without shift')
plt.tight_layout()
plt.show()
Which results in
Issue: there's clearly something wrong here. I don't think I should need the shift, yet the shifted pdf seems to be closer to the truth. I suspect it has something to do with the domain of the IFFT changing with the sqrt(abs()) operation, but I'm really not sure.

The FFT is defined such that the times associated to the input samples are t=0..N-1. That is, the origin is in the first sample. The same is true for the output, the associated frequencies are k=0..N-1.
Your distribution is symmetric about zero, but ignoring your t (which you cannot pass to the FFT function), and knowing what the t values are that are implied by the FFT definition, you can see that your distribution is actually shifted, which adds a phase component to the frequency domain. You ignore the phase there (by using hfft instead of fft, which means you shift your input signal such that it becomes symmetric about the origin as defined by the FFT (not your origin).
fftshift shifts the signal resulting from the IFFT such that the origin is back where you want it to be. I recommend that you use ifftshift before calling hfft, just to ensure that your signal is actually symmetric as expected by that function. I don't know if it will make a difference, it depends on how this function is implemented.

How to implement the following formula for derivatives in python?

I'm trying to implement the following formula in python for X and Y points
I have tried following approach
def f(c):
"""This function computes the curvature of the leaf."""
tt = c
n = (tt[0]*tt[3] - tt[1]*tt[2])
d = (tt[0]**2 + tt[1]**2)
k = n/d
R = 1/k # Radius of Curvature
return R
There is something incorrect as it is not giving me correct result. I think I'm making some mistake while computing derivatives in first two lines. How can I fix that?
Here are some of the points which are in a data frame:
pts = pd.DataFrame({'x': x, 'y': y})
x y
0.089631 97.710199
0.089831 97.904541
0.090030 98.099313
0.090229 98.294513
0.090428 98.490142
0.090627 98.686200
0.090827 98.882687
0.091026 99.079602
0.091225 99.276947
0.091424 99.474720
0.091623 99.672922
0.091822 99.871553
0.092022 100.070613
0.092221 100.270102
0.092420 100.470020
0.092619 100.670366
0.092818 100.871142
0.093017 101.072346
0.093217 101.273979
0.093416 101.476041
0.093615 101.678532
0.093814 101.881451
0.094013 102.084800
0.094213 102.288577
pts_x = np.gradient(x_c, t) # first derivatives
pts_y = np.gradient(y_c, t)
pts_xx = np.gradient(pts_x, t) # second derivatives
pts_yy = np.gradient(pts_y, t)
After getting the derivatives I am putting the derivatives x_prim, x_prim_prim, y_prim, y_prim_prim in another dataframe using the following code:
d = pd.DataFrame({'x_prim': pts_x, 'y_prim': pts_y, 'x_prim_prim': pts_xx, 'y_prim_prim':pts_yy})
after having everything in the data frame I am calling function for each row of the data frame to get curvature at that point using following code:
# Getting the curvature at each point
for i in range(len(d)):
temp = d.iloc[i]
c_temp = f(temp)
curv.append(c_temp)

You do not specify exactly what the structure of the parameter pts is. But it seems that it is a two-dimensional array where each row has two values x and y and the rows are the points in your curve. That itself is problematic, since the documentation is not quite clear on what exactly is returned in such a case.
But you clearly are not getting the derivatives of x or y. If you supply only one array to np.gradient then numpy assumes that the points are evenly spaced with a distance of one. But that is probably not the case. The meaning of x' in your formula is the derivative of x with respect to t, the parameter variable for the curve (which is separate from the parameters to the computer functions). But you never supply the values of t to numpy. The values of t must be the second parameter passed to the gradient function.
So to get your derivatives, split the x, y, and t values into separate one-dimensional arrays--lets call them x and y and t. Then get your first and second derivatives with
pts_x = np.gradient(x, t) # first derivatives
pts_y = np.gradient(y, t)
pts_xx = np.gradient(pts_x, t) # second derivatives
pts_yy = np.gradient(pts_y, t)
Then continue from there. You no longer need the t values to calculate the curvatures, which is the point of the formula you are using. Note that gradient is not really designed to calculate the second derivatives, and it absolutely should not be used to calculate third or higher-order derivatives. More complex formulas are needed for those. Numpy's gradient uses "second order accurate central differences" which are pretty good for the first derivative, poor for the second derivative, and worthless for higher-order derivatives.

I think your problem is that x and y are arrays of double values.
The array x is the independent variable; I'd expect it to be sorted into ascending order. If I evaluate y[i], I expect to get the value of the curve at x[i].
When you call that numpy function you get an array of derivative values that are the same shape as the (x, y) arrays. If there are n pairs from (x, y), then
y'[i] gives the value of the first derivative of y w.r.t. x at x[i];
y''[i] gives the value of the second derivative of y w.r.t. x at x[i].
The curvature k will also be an array with n points:
k[i] = abs(x'[i]*y''[i] -y'[i]*x''[i])/(x'[i]**2 + y'[i]**2)**1.5
Think of x and y as both being functions of a parameter t. x' = dx/dt, etc. This means curvature k is also a function of that parameter t.
I like to have a well understood closed form solution available when I program a solution.
y(x) = sin(x) for 0 <= x <= pi
y'(x) = cos(x)
y''(x) = -sin(x)
k = sin(x)/(1+(cos(x))**2)**1.5
Now you have a nice formula for curvature as a function of x.
If you want to parameterize it, use
x(t) = pi*t for 0 <= t <= 1
x'(t) = pi
x''(t) = 0
See if you can plot those and make your Python solution match it.

Discrete Fourier Transform in Python

I need to use discrete Fourier transform (DFT) in Python (and inverse DFT) and the results I obtain are a bit weird, so I tried on a small example and I am not sure I understand the mistake (if it is math or coding). Here is my small version of the code:
from __future__ import division
import numpy as np
from pylab import *
pi = np.pi
def f(x):
return sin(x)
theta = np.arange(0,2*pi,2*pi/4)
k = np.arange(0,4,1)
x = f(theta)
y = np.fft.fft(x)
derivative = np.fft.ifft(1j*k*y)
print(derivative)
So what I do is to sample sin at 4 different points between 0 and 2pi and create with these numbers a vector x. Then I take the DFT of x to get y. What I want is to get the derivative of sin at the chosen points, so to do this I multiply y by k (the wave number, which in this case would be 0,1,2,3) and my the imaginary number 1j (this is because in the Fourier sum I have for each term something of the form e^{ikx}). So in the end I take the inverse DFT of 1jky and I am supposed to get the derivative of sin. But what I get is this.
[ -1.00000000e+00 -6.12323400e-17j -6.12323400e-17 +2.00000000e+00j
1.00000000e+00 +1.83697020e-16j 6.12323400e-17 -2.00000000e+00j]
when I was supposed to get this
[1,0,-1,0]
ignoring round-off errors. Can someone tell me what am I doing wrong? Thank you!

Manipulation of the spectrum must preserve this Hermitian symmetry if the inverse FFT is to yield result. Accordingly, the derivative operator in the frequency domain is defined over the lower half of the spectrum, and the upper half of the spectrum constructed from symmetry. Note that for spectrum of even sizes the value at exactly N/2 must be its own symmetry, hence must have a imaginary part which is 0. The following illustrate how to construct this derivative operator:
N = len(y)
if N%2:
derivative_operator = np.concatenate((np.arange(0,N/2,1),[0],np.arange(-N/2+1,0,1)))*1j
else:
derivative_operator = np.concatenate((np.arange(0,N/2,1),np.arange(-N//2+1,0,1)))*1j
You'd use this derivative_operator in the frequency-domain as follow:
derivative = np.fft.ifft(derivative_operator*y)
In your sample case you should then get the following result
[ 1.00000000e+00+0.j 6.12323400e-17+0.j
-1.00000000e+00+0.j -6.12323400e-17+0.j]
which is within roundoff errors of your expected [1,0,-1,0].

Reverse output of polyfit numpy

I have used numpy's polyfit and obtained a very good fit (using a 7th order polynomial) for two arrays, x and y. My relationship is thus;
y(x) = p[0]* x^7 + p[1]*x^6 + p[2]*x^5 + p[3]*x^4 + p[4]*x^3 + p[5]*x^2 + p[6]*x^1 + p[7]
where p is the polynomial array output by polyfit.
Is there a way to reverse this method easily, so I have a solution in the form of,
x(y) = p[0]*y^n + p[1]*y^n-1 + .... + p[n]*y^0

No there is no easy way in general. Closed form-solutions for arbitrary polynomials are not available for polynomials of the seventh order.
Doing the fit in the reverse direction is possible, but only on monotonically varying regions of the original polynomial. If the original polynomial has minima or maxima on the domain you are interested in, then even though y is a function of x, x cannot be a function of y because there is no 1-to-1 relation between them.
If you are (i) OK with redoing the fitting procedure, and (ii) OK with working piecewise on single monotonic regions of your fit at a time, then you could do something like this:
-
import numpy as np
# generate a random coefficient vector a
degree = 1
a = 2 * np.random.random(degree+1) - 1
# an assumed true polynomial y(x)
def y_of_x(x, coeff_vector):
"""
Evaluate a polynomial with coeff_vector and degree len(coeff_vector)-1 using Horner's method.
Coefficients are ordered by increasing degree, from the constant term at coeff_vector[0],
to the linear term at coeff_vector[1], to the n-th degree term at coeff_vector[n]
"""
coeff_rev = coeff_vector[::-1]
b = 0
for a in coeff_rev:
b = b * x + a
return b
# generate some data
my_x = np.arange(-1, 1, 0.01)
my_y = y_of_x(my_x, a)
# verify that polyfit in the "traditional" direction gives the correct result
# [::-1] b/c polyfit returns coeffs in backwards order rel. to y_of_x()
p_test = np.polyfit(my_x, my_y, deg=degree)[::-1]
print p_test, a
# fit the data using polyfit but with y as the independent var, x as the dependent var
p = np.polyfit(my_y, my_x, deg=degree)[::-1]
# define x as a function of y
def x_of_y(yy, a):
return y_of_x(yy, a)
# compare results
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(my_x, my_y, '-b', x_of_y(my_y, p), my_y, '-r')
Note: this code does not check for monotonicity but simply assumes it.
By playing around with the value of degree, you should see that see the code only works well for all random values of a when degree=1. It occasionally does OK for other degrees, but not when there are lots of minima / maxima. It never does perfectly for degree > 1 because approximating parabolas with square-root functions doesn't always work, etc.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.