Why does Bessel function in Scipy and EXCEL give different results? - python

I tried to use Scipy and EXCEL to calculate Bessel function, but they give different results. Do you know why? Thanks in advance.
Python code:
import scipy.special as ss
result = ss.k1(0.2155481626213)
print(result)
EXCEL (I use the OneDrive Excel web app of today's version)
=BESSELK(0,2155481626213; 1)
The result from Python is 4.405746469429914
The result from Excel is 4,405746474969860.

Since the error of the result is quite small, the complexity of the numerical calculations and the error propagation can cause the difference.
Side note:
even Wolfram Alpha got a different value:4.405746469430.

As #HubertusKaiser says; the error is so small that we can assign it to rounding-errors/floating-points.
There's an excelent explanation why 0.1+0.2 != 0.3 for most computers here.
Now imagine doing a lot of those "wrong" floating-points calculations, you end up with the error difference you see

Related

Python Panda.read_csv rounds to get import errors?

I have a 10000 x 250 dataset in a csv file. When I use the command
data = pd.read_csv('pool.csv', delimiter=',',header=None)
while I am in the correct path I actually import the values.
First I get the Dataframe. Since I want to work with the numpy package I need to convert this to its values using
data = data.values
And this is when i gets weird. I have at position [9999,0] in the file a -0.3839 as value. However after importing and calculating with it I noticed, that Python (or numpy) does something strange while importing.
Calling the value of data[9999,0] SHOULD give the expected -0.3839, but gives something like -0.383899892....
I already imported the file in other languages like Matlab and there was no issue of rounding those values. I aswell tried to use the .to_csv command from the pandas package instead of .values. However there is the exact same problem.
The last 10 elements of the first column are
-0.2716
0.3711
0.0487
-1.518
0.5068
0.4456
-1.753
-0.4615
-0.5872
-0.3839
Is there any import routine, which does not have those rounding errors?
Passing float_precision='round_trip' should solve this issue:
data = pd.read_csv('pool.csv',delimiter=',',header=None,float_precision='round_trip')
That's a floating point error. This is because of how computers work. (You can look it up if you really want to know how it works.) Don't be bothered by it, it is very small.
If you really want to use exact precision (because you are testing for exact values) you can look at the decimal module of Python, but your program will be a lot slower (probably like 100 times slower).
You can read more here: https://docs.python.org/3/tutorial/floatingpoint.html
You should know that all languages have this problem, only some are better in hiding it. (Also note that in Python3 this "hiding" of the floating point error has been improved.)
Since this problem cannot be solved by an ideal solution, you are given the task to solve it yourself and choose the most appropriate solution for your situtation
I don't know about 'round_trip' and its limitations, but it probably can help you. Other solutions would be to use float_format from the to_csv method. (https://docs.python.org/3/library/string.html#format-specification-mini-language)

Why the outputs of a value in Jupiter sometime is correct, and some times not?

I have written a code which was calculating a measure correctly for a specific data correctly. But after some time, some of the values are not shown correctly.
The generated measure is between 0 and 1 (which was 0.8998 for few times of running); however, it shows -4.486000218950312e+183 value for the same data, which somethin irrelevant and generated by machine. Calculating Dice score by using medpy library:
import medpy.metric.binary as metrics
mean_dsc=np.ndarray(no_slices,dtype=float)
for i in range(no_slices)
segm=res_vol[:,:,i]
gt=lbl[:,:,i]
mean_dsc[i]=metrics.dc(segm, gt)
print mean_dsc
What is the reason for this? Is there any bug with jupyter or python? How can i resolve this issue?
your help is appreciated
I could solve the issue with this, the problem was that I was creating the array like this:
mean_dsc=np.ndarray(no_slices,dtype=float)
I changed it as creating array by np.zeros() instead of np.ndarray.

Disassembly of a Python program using SymPy's solve function (what's going on behind the scenes?)

I have this Python code which solves a 3 variable linear equation.
import numpy as np
from sympy import *
init_printing(use_latex='mathjax')
A = Matrix([[-2,3,-1],[2,2,3],[-4,-1,1]])
x,y,z= symbols('x,y,z')
In[12]:
X =Matrix([[x],[y],[z]])
B = Matrix([[1],[1],[1]])
solve(A*X-B)
I am happy as well as baffled with that output. I want to understand what steps sympy follows to solve this, and what solver it's using?
Part 1 of the question is How is sympy solving AX-B above?
Part 2: In general is there a method to see the disassembly for any python program (for the purpose of understanding it)?
There are two basic methods:
Read the source
The best way to understand it is to read the source. In IPython, you can type solve?? and it will show you the source code, as well as what file that source is in. You can also look at the SymPy GitHub.
solve in SymPy is a bit complicated, because it can solve many different types of equations. I believe in this case, you want to look at solve_linear_system, which uses row reduction. That will be replaced with linsolve in a future version, which uses essentially the same algorithm (Gauss-Jordan elimination).
Use a visual debugger
Another way to understand what is going on is to step through the code in a visual debugger. I recommend a debugger that can show you the code of the function that is being run, as well as a list of the variable, along with their values (pdb is not a great debugger in this respect). I personally prefer PuDB, which runs in the terminal, but there are other good ones as well. The advantage of using a debugger is that you can see exactly what code paths are being traversed and what values the variables have at each step.

Irregular results from numpy and scipy

I am creating finite element code in Python that relies on numpy and scipy for array, matrix and linear algebra calculations. The initial generated code seems to be working and I am getting the results I need.
However, for some other feature I need to call a function that performs the analysis more than one time and when I review the results they differ completely from the first call although both are called using the same inputs. The only thing I can think of is that the garbage collection is not working and the memory is being corrupted.
Here is the procedure used:
call setup file to generate model database: mDB = F0(inputs)
call first analysis with some variable input: r1 = F1(mDB, v1)
repeat first analysis with the same variable from step2: r2 = F1(mDB, v1)
Since nothing has changed, I would expect that the results from step#2 and step#3 would be the same, however, my code produces different results (verified using matplotlib).
I am using:
Python 2.7 (32bit) on Windows 7 with numpy-1.6.2 and scipy-0.11.0
If your results are sensitive to rounding error (e.g. you have some programming error in your code), then in general floating point results are not reproducible. This occurs already due to the way modern compilers optimize code, so it does not require e.g. accessing uninitialized memory.
Please see:
http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf
Another likely possibility is that your computation function modifies the input data.
The point you mention in the comment above does not exclude this possibility,
as Python is pass-by-reference.
Ok, based on suggestion from above I found the problem.
I rely in my code on dictionaries (hash tables). I call the contents of the original input dictionary mDB and modify those, and I thought the original contents do not get changed inside a separate function, but they do. I come from Fortran and Matlab where these do not change.
The answer was to deepcopy the contents of my original dictionary rather than simple assignment. Note that I tried simple copy as in:
A = mDB['A'].copy()
but that did not work either. I had to use:
import copy
A = copy.deepcopy(mDB['A'])
I know some would say that I should read the manual that "Assignment statements in Python do not copy objects, they create bindings between a target and an object" (documentsation), but this is still new and weird behavior for me.
Any suggestions for using other than dictionaries for storing my original data?

DFT in Python taking significantly longer than C

I'm currently working on translating some C code To Python. This code is being used to help identify errors arising from the CLEAN algorithm used in Radio Astronomy. In order to do this analysis the value of the Fourier Transforms of Intensity Maps, Q Stokes Map and U Stokes Map must be found at specific pixel values (given by ANT_pix). These Maps are just 257*257 arrays.
The below code takes a few seconds to run with C but takes hours to run with Python. I'm pretty sure that it is terribly optimized as my knowledge of Python is quite poor.
Thanks for any help you can give.
Update My question is if there is a better way to implement the loops in Python which will speed things up. I've read quite a few answer here for other questions on Python which recommend avoiding nested for loops in Python if possible and I'm just wondering if anyone knows a good way of implementing something like the Python code below without the loops or with better optimised loops. I realise this may be a tall order though!
I've been using the FFT up till now but my supervisor wants to see what sort of difference the DFT will make. This is because the Antenna position will not, in general, occur at exact pixels values. Using FFT requires round to the closest pixel value.
I'm using Python as CASA, the computer program used to reduce Radio Astronomy datasets is written in python and implementing Python scripts in it is far far easier than C.
Original Code
def DFT_Vis(ANT_Pix="",IMap="",QMap="",UMap="", NMap="", Nvis=""):
UV=numpy.zeros([Nvis,6])
Offset=(NMap+1)/2
ANT=ANT_Pix+Offset;
i=0
l=0
k=0
SumI=0
SumRL=0
SumLR=0
z=0
RL=QMap+1j*UMap
LR=QMap-1j*UMap
Factor=[math.e**(-2j*math.pi*z/NMap) for z in range(NMap)]
for i in range(Nvis):
X=ANT[i,0]
Y=ANT[i,1]
for l in range(NMap):
for k in range(NMap):
Temp=Factor[int((X*l)%NMap)]*Factor[int((Y*k)%NMap)];
SumI+=IMap[l,k]*Temp
SumRL+=RL[l,k]*Temp
SumLR+=IMap[l,k]*Temp
k=1
UV[i,0]=SumI.real
UV[i,1]=SumI.imag
UV[i,2]=SumRL.real
UV[i,3]=SumRL.imag
UV[i,4]=SumLR.real
UV[i,5]=SumLR.imag
l=1
k=1
SumI=0
SumRL=0
SumLR=0
return(UV)
You should probably use numpy's fourier transform code, rather than writing your own: http://docs.scipy.org/doc/numpy/reference/routines.fft.html
If you are interested in boosting the performance of your script cython could be an option.
I am not an expert on the FFT, but my understanding is that the FFT is simply a fast way to compute the DFT. So to me your question sounds like you are trying to write a bubble sort algorithm to see if it gives a better answer than quicksort. They are both sorting algorithms that would give the same result!
So I am questioning your basic premise. I am wondering if you can just change your rounding on your data and get the same result from the SciPy FFT code.
Also, according to my DSP textbook, the FFT can produce a more accurate answer than computing the DFT the long way, simply because floating point operations are inexact, and the FFT invokes fewer floating point operations along the way to finding the correct answer.
If you have some working C code that does the calculation you want, you could always wrap the C code to let you call it from Python. Discussion here: Wrapping a C library in Python: C, Cython or ctypes?
To answer your actual question: as #ZoZo123 noted, it would be a big win to change from range() to xrange(). With range(), Python has to build a list of numbers, and then destroy the list when done; with xrange() Python just makes an iterator that yields up the numbers one at a time. (But note that in Python 3.x, range() makes an iterator and there is no xrange().)
Also, if this code does not have to integrate with the rest of your code, you might try running this code under PyPy. This is exactly the sort of code that PyPy can best optimize. The problem with PyPy is that currently your project must be "pure" Python, and it looks like you are using NumPy. (There are projects to get NumPy and PyPy to work together, but that's not done yet.) http://pypy.org/
If this code does need to integrate with the rest of your code, then I think you need to look at Cython (as noted by #Krzysztof RosiƄski).

Categories