discrete cosine transform implementation differs from library function - python

I've implemented my own DCT function, but the output differs from scipy's fftpack dct function. I was wondering if anyone knows whether fftpack.dct( ) does any additional transformations, and if so what they are ?
Note: I've tried subtracting 128 from the data but that just changes the colors, not the frequency locations.
import numpy as np
from numpy import empty,arange,exp,real,imag,pi
from numpy.fft import rfft,irfft
import matplotlib.pyplot as plt
from scipy import fftpack
def dct(x):
N = len(x)
x2 = empty(2*N,float)
x2[:N] = x[:]
x2[N:] = x[::-1]
X = rfft(x2)
phi = exp(-1j*pi*arange(N)/(2*N))
return real(phi*X[:N])
def dct2(x):
M = x.shape[0]
N = x.shape[1]
a = empty([M,N],float)
X = empty([M,N],float)
for i in range(M):
a[i,:] = dct(x[i,:])
for j in range(N):
X[:,j] = dct(a[:,j])
return X
if __name__ == "__main__":
data = np.array([
[0,0,0,20,0,0,0],
[0,0,20,50,20,0,0],
[0,7,50,90,50,7,0],
[0,0,20,50,20,0,0],
[0,0,0,20,0,0,0],
])
X = dct2(data)
plt.matshow(X)
X2 = fftpack.dct(data)
plt.matshow(X2)
data:
X:
X2:

The scipy.fftpack.dct performs the 1D dct transform whereas you implemented the 2d dct transform. To perform the 2D dct using scipy use:
X2 = fftpack.dct(fftpack.dct(data, axis=0), axis=1)
This should solve your problem, since the resulting matrix using your example will be:
Which is similar to your implementation up to a constant factor. The constant factor can be controlled using the norm argument to the dct, read more here

Related

3D Fourier transformation of a gaussian function in python

I'm trying to get the 3D Fourier Transform of the gaussian function e^(-r^(2)/2) in python using the numpy.fft library.
I've attempted using different ffts from the library with different inputs, shifting the results with np.fft.fftshift, trying to find a multiplicative factor and many other things, the last thing I tried was using the 1D fft function, and then cubing the result, here's the corresponding source code:
import numpy as np
R = float(10)
N = float(100)
y= np.dtype(np.float64)
dr = R/N
def F(x):
return np.exp(-((x*dr)**2)/2)
Frange = np.arange(1,int(N)+1)
y = np.zeros((int(N)))
i = 0
while i<int(N):
y[i] = F(Frange[i])
i += 1
y = y/3
y_fft = np.fft.fftshift(np.abs(np.fft.fft(y)))**3
print (y_fft)
The first values I get:
4.62e-03, 4.63e-03, 4.65e-03, 4.69e-03, 4.74e-03
According to Lado, Fred. (1971) Numerical Fourier transforms in one, two, and three dimensions for liquid state calculations, the analytic solution to the problem is: (2pi )^(3/2)*e^(-k^(2)/2)
And the first values of the analytic solution with the same values of R and N are:
14.99, 12.92, 10.10, 7.15, 4.58
I also created a DFT program using a formula provided in the previous article which gives the expected results, but I haven't been able to replicate the analytic results in any of my attempts using the NumPy or SciPy fft libraries.
Here's my program for the analytic and DFT results:
import math
import numpy as np
def F(r):
x=math.exp((-1/2)*(r**2))
return x
def FT(r):
x=((2*math.pi)**(3/2))*(math.exp((-1/2)*(r**2)))
return x
R = float(10)
N = int(100)
ft = np.zeros(N)
fta = np.zeros(N)
dr = R/N
dk = math.pi/R
print ("\tk \t\t\t Discrete \t\t\t Analytic")
for j in range (1, N):
kj = j*dk
#Discrete Transform
sum = 0
for i in range(1, N):
ri = i*dr
sum = sum + (dr*ri*(F(ri))*(math.sin(kj*ri)))
ft[j] = ((4*math.pi)/kj)*sum
#Analytic Transform
fta[j] = FT(kj)
#Print results
print(kj, f" \t\t{ft[j]:.10E} \t\t{fta[j]:.10E}")
And these are the first few results:
k Discrete Analytic
0.3141592653589793 1.4991263193E+01 1.4991263193E+01
0.6283185307179586 1.2928362116E+01 1.2928362116E+01
0.9424777960769379 1.0101494686E+01 1.0101494686E+01
1.2566370614359172 7.1509645344E+00 7.1509645344E+00
1.5707963267948966 4.5864901093E+00 4.5864901093E+00

1D Wasserstein distance in Python

The formula below is a special case of the Wasserstein distance/optimal transport when the source and target distributions, x and y (also called marginal distributions) are 1D, that is, are vectors.
where F^{-1} are inverse probability distribution functions of the cumulative distributions of the marginals u and v, derived from real data called x and y, both generated from the normal distribution:
import numpy as np
from numpy.random import randn
import scipy.stats as ss
n = 100
x = randn(n)
y = randn(n)
How can the integral in the formula be coded in python and scipy? I'm guessing the x and y have to be converted to ranked marginals, which are non-negative and sum to 1, while Scipy's ppf could be used to calculate the inverse F^{-1}'s?
Note that when n gets large we have that a sorted set of n samples approaches the inverse CDF sampled at 1/n, 2/n, ..., n/n. E.g.:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
plt.plot(norm.ppf(np.linspace(0, 1, 1000)), label="invcdf")
plt.plot(np.sort(np.random.normal(size=1000)), label="sortsample")
plt.legend()
plt.show()
Also note that your integral from 0 to 1 can be approximated as a sum over 1/n, 2/n, ..., n/n.
Thus we can simply answer your question:
def W(p, u, v):
assert len(u) == len(v)
return np.mean(np.abs(np.sort(u) - np.sort(v))**p)**(1/p)
Note that if len(u) != len(v) you can still apply the method with linear interpolation:
def W(p, u, v):
u = np.sort(u)
v = np.sort(v)
if len(u) != len(v):
if len(u) > len(v): u, v = v, u
us = np.linspace(0, 1, len(u))
vs = np.linspace(0, 1, len(v))
u = np.linalg.interp(u, us, vs)
return np.mean(np.abs(u - v)**p)**(1/p)
An alternative method if you have prior information about the sort of distribution of your data, but not its parameters, is to find the best fitting distribution on your data (e.g. with scipy.stats.norm.fit) for both u and v and then do the integral with the desired precision. E.g.:
from scipy.stats import norm as gauss
def W_gauss(p, u, v, num_steps):
ud = gauss(*gauss.fit(u))
vd = gauss(*gauss.fit(v))
z = np.linspace(0, 1, num_steps, endpoint=False) + 1/(2*num_steps)
return np.mean(np.abs(ud.ppf(z) - vd.ppf(z))**p)**(1/p)
I guess I am a bit late but, but this is what I would do for an exact solution (using only numpy):
import numpy as np
from numpy.random import randn
n = 100
m = 80
p = 2
x = np.sort(randn(n))
y = np.sort(randn(m))
a = np.ones(n)/n
b = np.ones(m)/m
# cdfs
ca = np.cumsum(a)
cb = np.cumsum(b)
# points on which we need to evaluate the quantile functions
cba = np.sort(np.hstack([ca, cb]))
# weights for integral
h = np.diff(np.hstack([0, cba]))
# construction of first quantile function
bins = ca + 1e-10 # small tolerance to avoid rounding errors and enforce right continuity
index_qx = np.digitize(cba, bins, right=True) # right=True becouse quantile function is
# right continuous
qx = x[index_qx] # quantile funciton F^{-1}
# construction of second quantile function
bins = cb + 1e-10
index_qy = np.digitize(cba, bins, right=True) # right=True becouse quantile function is
# right continuous
qy = y[index_qy] # quantile funciton G^{-1}
ot_cost = np.sum((qx - qy)**p * h)
print(ot_cost)
In case you are interested, here you can find a more detailed numpy based implementation of the ot problem on the real line with dual and primal solutions as well: https://github.com/gnies/1d-optimal-transport. (I am still working on it though).

Optimize a distance matrix calculation

Im trying to calculate a matrix distance from a Fourier transformation for the first two components. The matrix is a 40k by 40k and the way im doing it is extremely slow. Is there a way to calculate the matrix is a more efficient faster way?
import numpy as np
from scipy.linalg import dft
#Transform the data using Fourier Transform.
ft = norm_data.dot(dft(8).transpose())/sqrt(8)
def ft_distance_calc(x,y):
temp = np.zeros((x,y))
for i in range(x):
for z in range(y):
temp[i,z] = sqrt(np.square(abs(ft[i,0:2] - ft[z,0:2])).sum())
return temp
ft_distance = ft_distance_calc(40000,40000)
You can use built-in functions for it:
from scipy.spatial.distance import cdist
def ft_distance_calc_2(x,y):
return cdist(ft[:x,0:2],ft[:y,0:2])
Comparison using benchit:
#OP's solution
def ft_distance_calc(x,y):
temp = np.zeros((x,y))
for i in range(x):
for z in range(y):
temp[i,z] = np.sqrt(np.square(abs(ft[i,0:2] - ft[z,0:2])).sum())
return temp
##Ehsan's solution
def ft_distance_calc_2(x,y):
return cdist(ft[:x,0:2],ft[:y,0:2])
##Quang's solution
def dist_cal(x,y):
return np.sqrt(np.square(ft[:x,None, :2]-ft[None, :y, :2]).sum(-1))
ft = np.random.rand(1000,2)
in_ = {n:[n, n] for n in [10,100,1000]}
Seems like ft_distance_calc_2 is the fastest.
How about a broadcasting
def dist_cal(x,y):
return np.sqrt(np.square(ft[:x,None, :2]-ft[None, :y, :2]).sum(-1))
# test
a = ft_distance_calc(400,200)
b = dist_cal(400,200)
(np.abs(a-b) < 1e-6).all()
# True

numpy and statsmodels give different values when calculating correlations, How to interpret this?

I can't find a reason why calculating the correlation between two series A and B using numpy.correlate gives me different results than the ones I obtain using statsmodels.tsa.stattools.ccf
Here's an example of this difference I mention:
import numpy as np
from matplotlib import pyplot as plt
from statsmodels.tsa.stattools import ccf
#Calculate correlation using numpy.correlate
def corr(x,y):
result = numpy.correlate(x, y, mode='full')
return result[result.size/2:]
#This are the data series I want to analyze
A = np.array([np.absolute(x) for x in np.arange(-1,1.1,0.1)])
B = np.array([x for x in np.arange(-1,1.1,0.1)])
#Using numpy i get this
plt.plot(corr(B,A))
#Using statsmodels i get this
plt.plot(ccf(B,A,unbiased=False))
The results seem qualitatively different, where does this difference come from?
statsmodels.tsa.stattools.ccf is based on np.correlate but does some additional things to give the correlation in the statistical sense instead of the signal processing sense, see cross-correlation on Wikipedia. What happens exactly you can see in the source code, it's very simple.
For easier reference I copied the relevant lines below:
def ccovf(x, y, unbiased=True, demean=True):
n = len(x)
if demean:
xo = x - x.mean()
yo = y - y.mean()
else:
xo = x
yo = y
if unbiased:
xi = np.ones(n)
d = np.correlate(xi, xi, 'full')
else:
d = n
return (np.correlate(xo, yo, 'full') / d)[n - 1:]
def ccf(x, y, unbiased=True):
cvf = ccovf(x, y, unbiased=unbiased, demean=True)
return cvf / (np.std(x) * np.std(y))

How to change elements in sparse matrix in Python's SciPy?

I have built a small code that I want to use for solving eigenvalue problems involving large sparse matrices. It's working fine, all I want to do now is to set some elements in the sparse matrix to zero, i.e. the ones in the very top row (which corresponds to implementing boundary conditions). I can just adjust the column vectors (C0, C1, and C2) below to achieve that. However, I wondered if there is a more direct way. Evidently, NumPy indexing does not work with SciPy's sparse package.
import scipy.sparse as sp
import scipy.sparse.linalg as la
import numpy as np
import matplotlib.pyplot as plt
#discretize x-axis
N = 11
x = np.linspace(-5,5,N)
print(x)
V = x * x / 2
h = len(x)/(N)
hi2 = 1./(h**2)
#discretize Schroedinger Equation, i.e. build
#banded matrix from difference equation
C0 = np.ones(N)*30. + V
C1 = np.ones(N) * -16.
C2 = np.ones(N) * 1.
diagonals = np.array([-2,-1,0,1,2])
H = sp.spdiags([C2, C1, C0,C1,C2],[-2,-1,0,1,2], N, N)
H *= hi2 * (- 1./12.) * (- 1. / 2.)
#solve for eigenvalues
EV = la.eigsh(H,return_eigenvectors = False)
#check structure of H
plt.figure()
plt.spy(H)
plt.show()
This is a visualisation of the matrix that is build by the code above. I want so set the elements in the first row zero.
As suggested in the comments, I'll post the answer that I found to my own question. There are several matrix classes in in SciPy's sparse package, they are listed here. One can convert sparse matrices from one class to another. So for what I need to do, I choose to convert my sparse matrix to the class csr_matrix, simply by
H = sp.csr_matrix(H)
Then I can set the elements in the first row to 0 by using the regular NumPy notation:
H[0,0] = 0
H[0,1] = 0
H[0,2] = 0
For completeness, I post the full modified code snippet below.
#SciPy Sparse linear algebra takes care of sparse matrix computations
#http://docs.scipy.org/doc/scipy/reference/sparse.linalg.html
import scipy.sparse as sp
import scipy.sparse.linalg as la
import numpy as np
import matplotlib.pyplot as plt
#discretize x-axis
N = 1100
x = np.linspace(-100,100,N)
V = x * x / 2.
h = len(x)/(N)
hi2 = 1./(h**2)
#discretize Schroedinger Equation, i.e. build
#banded matrix from difference equation
C0 = np.ones(N)*30. + V
C1 = np.ones(N) * -16.
C2 = np.ones(N) * 1.
H = sp.spdiags([C2, C1, C0, C1, C2],[-2,-1,0,1,2], N, N)
H *= hi2 * (- 1./12.) * (- 1. / 2.)
H = sp.csr_matrix(H)
H[0,0] = 0
H[0,1] = 0
H[0,2] = 0
#check structure of H
plt.figure()
plt.spy(H)
plt.show()
EV = la.eigsh(H,return_eigenvectors = False)
Using lil_matrix is much more efficient in scipy to change elements than simple numpy method.
H = sp.csr_matrix(H)
HL = H.tolil()
HL[1,1] = 5 # same as the numpy indexing notation
print HL
print HL.todense() # if numpy style matrix is required
H = HL.tocsr() # if csr is required

Categories