As it turns out this is still a question of floating point rounding error like others. The asymmetry in fft vs ifft absolute error comes from the difference in the magnitudes of the numbers (1e10 vs 1e8).
So there are many questions about the differences between Numpy/Scipy and MATLAB FFT's; however, most of these come down to floating point rounding errors and the fact that MATLAB will make elements on the order of 1e-15 into true 0's which is not what I'm after.
I am seeing a totally different issue where for identical inputs the Numpy/Scipy FFT's produce differences on the order of 1e-6 from MATLAB. At the same time for identical inputs the Numpy/Scipy IFFT's produce differences on the order or 1e-9. My data is a complex 1D vector of length 2^14 with the zero point in the middle of the array (If you know how to share this let me know). As such for both languages I am calling fftshift before and after the fft (ifft) operation.
My question is where is this difference coming from and, more importantly, why is it asymmetric with the fft and ifft? I can live with a small difference but 1e-6 is large when it accumulates over a large number of fft's.
The functional form of the fft (I'm not doing anything else to it) for either language is:
def myfft
return fftshift(fft(fftshift(myData)))
def myifft
return fftshift(ifft(fftshift(myData)))
I have the data saved in a .mat file and load it with scipy.io.loadmat into python. The data is a (2**14,) numpy array
The fft differences are calculated and plotted with
myData = loadmat('mydata.mat',squeeze_me=True)
py = myfft(myData['fft_IN'])
mat = myData['fft_OUT']
plt.title('FFT Difference')
and the ifft differences are calculated with
myData = loadmat('mydata.mat',squeeze_me=True)
py = myifft(myData['ifft_IN'])
mat = myData['ifft_OUT']
plt.title('FFT Difference')
As it turns out this is still a question of floating point rounding error like all the other MATLAB vs numpy fft questions.
For my data the output of the fft function has numbers on the order of 1e10. This means that a precision of around 1e-16 on a float of this size is an absolute error less than or equal to 1e-6. The asymmetry in fft vs ifft absolute error comes from the output of the ifft being around 1e8. As such, this absolute error would then be less than or equal to 1e-8 which is exactly what we see.
Credit for this goes to #CrisLuengo who also helpfully pointed out that the ordering of fftshift and ifftshift for proper handing of odd length arrays.
You'll have to come up with a better workable example to show what you're after (also I don't have MATLAB, just Octave, and likely many others). I ran a quick code of fft and back with no issues. Be aware, generally DFTs (FFTs) are extremely nuanced to work with. You need to consider sampling, windowing, etc. very carefully.
Also, why the comparison to MATLAB to begin with, are you trusting it more, or just want to learn more about why one package produces an answer vs another? MATLAB uses fftw under the hood, which is very well tested and documented, but it doesn't mean that all the above nuances aren't coming into play in a different way.
import numpy as np
import matplotlib.pyplot as plt
fft = np.fft.fft
ifft = np.fft.ifft
def myfft(myData):
return fft(myData)
def myifft(myData):
return ifft(myData)
myData = np.exp(-np.linspace(-1, 1, 256)**2 / (2 * .25**2))
fft_python = myifft(myfft(myData))
plt.plot(myData - fft_python.real)
plt.title('FFT Difference')
I've computed the eigenvalues and eigenstates of a Hamiltonian in Python. I have a matrix containing all the wavefunctions in discrete space psi. I'd like to normalise the total wavefunction (or the 'ket') (i.e the matrix of vectors) such that its modulus squared integrates to 1.
I've tried the following:
A= np.linalg.norm(abs(psi.T)**2)
The matrix is transposed so I can access each state using psi[n].
However, the output of the print statement is:
When it should be 1.I feel like I'm not using linalg.norm correctly. I've also tried using my own integral function using the trapezium rule to no success.
I'm not really sure as to what to do at this point. Any help would be great.
It seems you're confusing np.linalg.norm and np.sum, up to the usual floating point issues these two snippets should be identical:
normed_psi = psi.T / np.sqrt(np.sum(psi.T**2))
normed_psi = psi.T / np.linalg.norm(psi.T)
I want to translate this MATLAB code into Python, I guess I did everything right, even though I didn't get the same results.
MATLAB script:
n=2 %Filter_Order
Wn=[0.4 0.6] %# Normalized cutoff frequencies
[b,a] = butter(n,Wn,'bandpass') % Transfer function coefficients of the filter
Python script:
import numpy as np
from scipy import signal
n=2 #Filter_Order
Wn=np.array([0.4,0.6]) # Normalized cutoff frequencies
b, a = signal.butter(n, Wn, btype='band') #Transfer function coefficients of the filter
a coefficients in MATLAB: 1, -5.55e-16, 1.14, -1.66e-16, 0.41
a coefficients in Python: 1, -2.77e-16, 1.14, -1.94e-16, 0.41
Could it just be a question of precision, since the two different values (the 2nd and 4th) are both on the order of 10^(-16)?!
The b coefficients are the same on the other hand.
You machine precision is about 1e-16 (in MATLAB this can be checked easily with eps(), I presume about the same in Python). The 'error' you are dealing with is thus on the order of machine precision, i.e. not actually calculable within fitting precision.
Also of note is that MATLAB ~= Python (or != in Python), thus the implementations of butter() on one hand and signal.butter() on the other will be slightly different, even if you use the exact same numbers, due to the way both languages are translated to machine code.
It rarely matters to have coefficients differing 16 orders of magnitude; the smaller ones would be essentially neglected. In case you do need exact values, consider using either symbolic math, or some kind of Variable Precision Arithmetic (vpa() in MATLAB), but I guess that in your case the difference is irrelevant.
I wrote a C++ wrapper class to some functions in LAPACK. In order to test the class, I use the Python C Extension, where I call numpy, and do the same operations, and compare the results by taking the difference
For example, for the inverse of a matrix, I generate a random matrix in C++, then pass it as a string (with many, many digits, like 30 digits) to Python's terminal using PyRun_SimpleString, and assign the matrix as numpy.matrix(...,dtype=numpy.double) (or numpy.complex128). Then I use numpy.linalg.inv() to calculate the inverse of the same matrix. Finally, I take the difference between numpy's result and my result, and use numpy.isclose with a specific relative tolerance to see whether the results are close enough.
The problem: The problem is that when I use C++ floats, the relative precision I need to be able to compare is about 1e-2!!! And yet with this relative precision I get some statistical failures (with low probability).
Doubles are fine... I can do 1e-10 and it's statistically safe.
While I know that floats have intrinsic bit precision of about 1e-6, I'm wondering why I have to go so low to 1e-2 to be able to compare the results, and it still fails some times!
So, going so low down to 1e-2 got me wondering whether I'm thinking about this whole thing the wrong way. Is there something wrong with my approach?
Please ask for more details if you need it.
Update 1: Eric requested example of Python calls. Here is an example:
//create my matrices
Matrix<T> mat_d = RandomMatrix<T>(...);
auto mat_d_i = mat_d.getInverse();
//I store everything in the dict 'data'
//original matrix
//mat_d.asString(...) will return in the format [[1,2],[3,4]], where 32 is 32 digits per number
PyRun_SimpleString(std::string("data['a']=np.matrix(" + mat_d.asString(32,'[',']',',') + ",dtype=np.complex128)").c_str());
//pass the inverted matrix to Python
PyRun_SimpleString(std::string("data['b_c']=np.matrix(" + mat_d_i.asString(32,'[',']',',') + ",dtype=np.complex128)").c_str());
//inverse in numpy
//flatten the matrices to make comparing them easier (make them 1-dimensional)
//make the comparison. The function compare_floats(f1,f2,t) calls numpy.isclose(f1,f2,rtol=t)
//prec is an integer that takes its value from a template function, where I choose the precision I want based on type
PyRun_SimpleString(std::string("res=list(set([compare_floats(data['fb_p'][i],data['fb_c'][i],1e-"+ std::to_string(prec) +") for i in range(len(data['fb_p']))]))[0]").c_str());
//the set above eliminates repeated True and False. If all results are True, we expect that res=[True], otherwise, the test failed somewhere
PyRun_SimpleString(std::string("res = ((len(res) == 1) and res[0])").c_str());
//Now if res is True, then success
Comments in the code describe the procedure step-by-step.
I'm trying to replicate an N dimensional Delaunay triangulation that is performed by the Matlab delaunayn function in Python using the scipy.spatial.Delaunay function. However, while the Matlab function gives me the result I want and expect, scipy is giving me something different. I find this odd considering both are wrappers of the QHull library. I assume Matlab is implicitly setting different parameters in its call. The situation I'm trying to replicate between the two of them is found in Matlab's documentation.
The set up is to have a cube with a point in the center as below. The blue lines I provided to help visualize the shape, but they serve no purpose or meaning for this problem.
The triangulation I expect from this results in 12 simplices (listed in the Matlab example) and looks like the following.
However this python equivalent produces "extra" simplices.
x = np.array([[-1,-1,-1],[-1,-1,1],[-1,1,-1],[1,-1,-1],[1,1,1],[1,1,-1],[1,-1,1],[-1,1,1],[0,0,0]])
simp = scipy.spatial.Delaunay(x).simplices
The returned variable simp should be an M x N array where M is the number of simplices found (should be 12 for my case) and N is the number of points in the simplex. In this case, each simplex should be a tetrahedron meaning N is 4.
What I'm finding though is that M is actually 18 and that the extra 6 simplices are not tetrahedrons, but rather the 6 faces of the cube.
What's going on here? How can I limit the returned simplices to only be tetrahedrons? I used this simple case to demonstrate the problem so I'd like a solution that isn't tailored to this problem.
Thanks to an answer by Amro, I was able to figure this out and I can get a match in simplices between Matlab and Scipy. There were two factors in play. First, as pointed out, Matlab and Scipy use different QHull options. Second, QHull returns simplices with zero volume. Matlab removes these, Scipy doesn't. That was obvious in the example above because all 6 extra simplices were the zero-volume coplanar faces of the cube. These can be removed, in N dimensions, with the following bit of code.
N = 3 # The dimensions of our points
options = 'Qt Qbb Qc' if N <= 3 else 'Qt Qbb Qc Qx' # Set the QHull options
tri = scipy.spatial.Delaunay(points, qhull_options = options).simplices
keep = np.ones(len(tri), dtype = bool)
for i, t in enumerate(tri):
if abs(np.linalg.det(np.hstack((points[t], np.ones([1,N+1]).T)))) < 1E-15:
keep[i] = False # Point is coplanar, we don't want to keep it
tri = tri[keep]
I suppose the other conditions should be addressed, but I'm guaranteed that my points contain no duplicates already, and the orientation condition appears to have no affect on the outputs that I can discern.
Some notes comparing MATLAB and SciPy functions:
According to MATLAB docs, by default it uses Qt Qbb Qc Qhull options for 3-dimensional input, while SciPy uses Qt Qbb Qc Qz.
not sure if it matters, but your NumPy array is not in the same order as the points created with ndgrid in MATLAB.
In fact if you look at the MATLAB code in edit delaunayn.m, you can see three extra steps performed:
first it merges duplicate points mergeDuplicatePoints (this is not an issue in your case)
then it enforces an orientation convention for the points (see the code)
finally after getting the result from Qhull (implemented as a MEX-function qhullmx), there is the following comment above a few lines of code:
Strip the zero volume simplices that may have been created by the presence of degeneracy.
Since the file is copyrighted, I won't post the code here, but you can check it on your end.
I get a 512^3 array representing a Temperature distribution from a simulation (written in Fortran). The array is stored in a binary file that's about 1/2G in size. I need to know the minimum, maximum and mean of this array and as I will soon need to understand Fortran code anyway, I decided to give it a go and came up with the following very easy routine.
integer gridsize,unit,j
real mini,maxi
double precision mean
read(unit=unit) tmp
do j=2,gridsize**3
read(unit=unit) tmp
end if
end do
This takes about 25 seconds per file on the machine I use. That struck me as being rather long and so I went ahead and did the following in Python:
import numpy
Now, I expected this to be faster of course, but I was really blown away. It takes less than a second under identical conditions. The mean deviates from the one my Fortran routine finds (which I also ran with 128-bit floats, so I somehow trust it more) but only on the 7th significant digit or so.
How can numpy be so fast? I mean you have to look at every entry of an array to find these values, right? Am I doing something very stupid in my Fortran routine for it to take so much longer?
To answer the questions in the comments:
Yes, also I ran the Fortran routine with 32-bit and 64-bit floats but it had no impact on performance.
I used iso_fortran_env which provides 128-bit floats.
Using 32-bit floats my mean is off quite a bit though, so precision is really an issue.
I ran both routines on different files in different order, so the caching should have been fair in the comparison I guess ?
I actually tried open MP, but to read from the file at different positions at the same time. Having read your comments and answers this sounds really stupid now and it made the routine take a lot longer as well. I might give it a try on the array operations but maybe that won't even be necessary.
The files are actually 1/2G in size, that was a typo, Thanks.
I will try the array implementation now.
I implemented what #Alexander Vogt and #casey suggested in their answers, and it is as fast as numpy but now I have a precision problem as #Luaan pointed out I might get. Using a 32-bit float array the mean computed by sum is 20% off. Doing
real,allocatable :: tmp (:,:,:)
double precision,allocatable :: tmp2(:,:,:)
Solves the issue but increases computing time (not by very much, but noticeably).
Is there a better way to get around this issue? I couldn't find a way to read singles from the file directly to doubles.
And how does numpy avoid this?
Thanks for all the help so far.
Your Fortran implementation suffers two major shortcomings:
You mix IO and computations (and read from the file entry by entry).
You don't use vector/matrix operations.
This implementation does perform the same operation as yours and is faster by a factor of 20 on my machine:
program test
integer gridsize,unit
real mini,maxi,mean
real, allocatable :: tmp (:,:,:)
allocate( tmp(gridsize, gridsize, gridsize))
read(unit=unit) tmp
mini = minval(tmp)
maxi = maxval(tmp)
mean = sum(tmp)/gridsize**3
print *, mini, maxi, mean
end program
The idea is to read in the whole file into one array tmp in one go. Then, I can use the functions MAXVAL, MINVAL, and SUM on the array directly.
For the accuracy issue: Simply using double precision values and doing the conversion on the fly as
mean = sum(real(tmp, kind=kind(1.d0)))/real(gridsize**3, kind=kind(1.d0))
only marginally increases the calculation time. I tried performing the operation element-wise and in slices, but that did only increase the required time at the default optimization level.
At -O3, the element-wise addition performs ~3 % better than the array operation. The difference between double and single precision operations is less than 2% on my machine - on average (the individual runs deviate by far more).
Here is a very fast implementation using LAPACK:
program test
integer gridsize,unit, i, j
real mini,maxi
integer :: t1, t2, rate
real, allocatable :: tmp (:,:,:)
real, allocatable :: work(:)
! double precision :: mean
real :: mean
real :: slange
call system_clock(count_rate=rate)
call system_clock(t1)
allocate( tmp(gridsize, gridsize, gridsize), work(gridsize))
read(unit=unit) tmp
mini = minval(tmp)
maxi = maxval(tmp)
! mean = sum(tmp)/gridsize**3
! mean = sum(real(tmp, kind=kind(1.d0)))/real(gridsize**3, kind=kind(1.d0))
mean = 0.d0
do j=1,gridsize
do i=1,gridsize
mean = mean + slange('1', gridsize, 1, tmp(:,i,j),gridsize, work)
enddo !i
enddo !j
mean = mean / gridsize**3
print *, mini, maxi, mean
call system_clock(t2)
print *,real(t2-t1)/real(rate)
end program
This uses the single precision matrix 1-norm SLANGE on matrix columns. The run-time is even faster than the approach using single precision array functions - and does not show the precision issue.
The numpy is faster because you wrote much more efficient code in python (and much of the numpy backend is written in optimized Fortran and C) and terribly inefficient code in Fortran.
Look at your python code. You load the entire array at once and then call functions that can operate on an array.
Look at your fortran code. You read one value at a time and do some branching logic with it.
The majority of your discrepancy is the fragmented IO you have written in Fortran.
You can write the Fortran just about the same way as you wrote the python and you'll find it runs much faster that way.
program test
implicit none
integer :: gridsize, unit
real :: mini, maxi, mean
real, allocatable :: array(:,:,:)
open(unit=unit, file='T.out', status='old', access='stream',&
form='unformatted', action='read')
read(unit) array
maxi = maxval(array)
mini = minval(array)
mean = sum(array)/size(array)
end program test