unexpected Shapiro results python

unexpected Shapiro results python - python

Shapiro p value does not show normality though histogram and qqplot seem to show normality. My concern is if I am correctly using the scipy Shapiro function.
pty.hist(RankList[4])
sm.qqplot(np.array(RankList[4]), line='s')
print(stats.shapiro(RankList[4]))
pty.show()
RESULT From Shapiro
(0.9911481738090515, 7.637918031377922e-08)
I would expect the p value to be higher.

It looks like you have quite a bit of data, if you check the notes on the SciPy documentation page it states:
For N > 5000 the W test statistic is accurate but the p-value may not be.
You may want to consider using the Anderson Darling test.

Related

Is it possible to somehow take this integral?

I have the following code, but it shows error:
IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
potC=sc.integrate.quad(lambda r: Psi(r,n2)*(-1/r)Psi(r,n1)(r**2),0,np.inf)
How to fix it?
import scipy as sc
import numpy as np
def Psi(r,n):
return 2*np.exp(-r/n)*np.sqrt(n)*sc.special.hyp1f1(1-n, 2, 2*r/n)/n**2
def p(n1,n2):
potC=sc.integrate.quad(lambda r: Psi(r,n2)*(-1/r)*Psi(r,n1)*(r**2),0,np.inf)
pot=potC[0]
return pot
print(p(15,15))

The error literally says what your problem is. Your function is not "well behaved" in some regions.
For example with n = 15 and r = 50 your special.hyp1f1(-14, 2, 2*50/15) result in NaN. I am not familiar with this function, so I do not know if this is the expected behaviour, but this is what happens.
You can try to isolate these points and exclude them from the integration (if you know lower and upper bounds of the function's value (if it is defined) you can also update the expected error of your integration) and just integrate in the well behaved regions.
If it is a bug in scipy, then please report it to them.
Ps.: Tested with scipy 1.8.0
Ps2.: With some reading I found, that you can get the values correctly, if you do your calculations with complex number, so the following code gives you a value:
import scipy as sc
from scipy import integrate
from scipy import special
import numpy as np
def Psi(r,n):
r = np.array(r,dtype=complex)
return 2*np.exp(-r/n)*np.sqrt(n)*special.hyp1f1(1-n, 2, 2*r/n)/n**2
def p(n1,n2):
potC=integrate.quad(lambda r: Psi(r,n2)*(-1/r)*Psi(r,n1)*(r**2),0,np.inf)
pot=potC[0]
return pot
print(p(15,15))

stokesCavity example for FiPy returns False

I have tried to run the stokesCavity example, which uses lid-driven boundary conditions for the flow. At the end of the code, the values in the top-right cell are compared with some reference values.
>>> print(numerix.allclose(pressure.globalValue[..., -1], 162.790867927)) #doctest: +NOT_PYAMGX_SOLVER
1
>>> print(numerix.allclose(xVelocity.globalValue[..., -1], 0.265072740929)) #doctest: +NOT_PYAMGX_SOLVER
1
>>> print(numerix.allclose(yVelocity.globalValue[..., -1], -0.150290488304)) #doctest: +NOT_PYAMGX_SOLVER
1
When I tried running this example, my output was
False
False
False
The actual values I get for the top-right cell are 129.235, 0.278627 and -0.166620 (instead of 162.790867927, 0.265072740929 and -0.150290488304). Does anyone know why am I getting different values? I've tried changing the solver (used scipy, Trilinos and pysparse), but the results do not change up to the 12th digit. The velocity profile looks similar to the one shown in their manual, but I am still worried something is off.
I run it on Linux (python 2.7.14, fipy 3.2, pysparse 1.2.dev0, Trilinos 12.12, scipy 1.2.1) and on Windows (python 2.7.15, fipy 3.1.3, scipy 1.1.0).

When run as part of the test suite, this example only does 5 sweeps, and the numerical check is hard-wired for this. When you run the example in isolation, it does 300 sweeps and the solution is better (or at least differently) converged. There's nothing wrong with the example, other than it's not written in a very robust way. Thanks for asking about this; we'll try to clean up the example.

SVD calculation error from lapack function when using scikit-learn's Linear Discriminant Analysis class

I'm classifying 2-class, 1-D data using scikit-learn's LDA classifier in a machine learning pipeline I've created. The following exception occurred:
ValueError: Internal work array size computation failed: -10
at the following line:
LinearDiscriminantAnalysis.fit(X,y)
where X = [-5e15, -5e15, -5e15, 5.7e16] and y = [0, 0, 0, 1], both float64 data-type
Additionally the following error was printed to console:
Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD
After a quick Google search, dgesdd is a function in LAPACK which scikit-learn relies upon. The dgesdd documentation tells us that the function computes the singular value decomposition (SVD) of a real M-by-N matrix A.
Going back to the original exception, I found it was raised in scipy.linalg.lapack.py at the _compute_lwork function. This function takes as input a function, which in this case I believe is the dgesdd function. CTRL-F "-10" on the dgesdd documentation page gives the logic behind this error code, but I don't know Fortran so I'm not exactly sure what it means.
I want to bet that the SVD calculation is failing due to either (1) large values in X array, or (2) the fact that the 3 of the values in the X array are the exact same number.
I will keep reading into SVD and its limitations. Any insight on how to avoid this error would be tremendously appreciated.
Here is a screenshot of the error

This is the definition of DGESDD:
subroutine dgesdd (JOBZ, M, N, A, LDA, S, U, LDU, VT, LDVT, WORK, LWORK, IWORK, INFO)
The error, that you have, indicates that the value that is passed to MKL's implementation of the routine for the 10th parameter, LDVT, the leading dimension of the V**T matrix does not comply with expecation of said routing.
This could be a bug in Intels implementation, rather unlikely, assuming that there is a battery on testing stress testing this routines, but not impossible. Which version of MKL is this? Or it's a bug in the LDA code, rather likely:
LDVT is INTEGER
The leading dimension of the array VT. LDVT >= 1;
if JOBZ = 'A' or JOBZ = 'O' and M >= N, LDVT >= N;
if JOBZ = 'S', LDVT >= min(M,N).
Would you please print M, N, LDA, LDU and LDVT?
If you set LDVT properly the workspace analysis will run just fine.

regard to Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD problem. Actually this problem has been fixed in MKL v.2018 u4 ( Sep 2018). Here is the link to MKL 2018 bug fix list.
You may easier to check version of MKL you use by setting env variable MKL_VERBOSE=1 to the system environments and look at the output which will contain such kind info.
E.x:
MKL_VERBOSE Intel(R) MKL 2019.0 Update 2 Product build 20190118 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors, Lnx 2.80GHz lp64 intel_thread
MKL_VERBOSE ZGETRF(85,85,0x13e66f0,85,0x13e1080,0) 6.18ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:20

"print_level =-1" doesn't remove all messages

I am succeding fiiting a function with iminuit in python but I can't get rid of the message even with "print_level =-1" or "print_level =0".
Here is the minimalist code I use:
from iminuit import Minuit
m = Minuit(chi2, alpha=1., beta=0., print_level=-1)
it returns:
creatFitsFile.py:430: InitialParamWarning: errordef is not given. Default to 1.
m = Minuit(chi2, alpha=1., beta=0., print_level=-1)
creatFitsFile.py:430: InitialParamWarning: Parameter alpha is floating but does not have initial step size. Assume 1.
I just want it to be quiet because I fit inside a loop with ~170.000 set of data.
Thank you

Try setting pedantic=False in the parameter list.
Example -
from iminuit import Minuit
m = Minuit(chi2, pedantic=False, alpha=1., beta=0., print_level=-1)
From the documentation -
pedantic: warns about parameters that do not have initial value or initial error/stepsize set.
From the warnings in your console, seems like it is these warnings that are getting logged, most probably setting pedantic=False should fix it.

Does Python/Scipy have a firls( ) replacement (i.e. a weighted, least squares, FIR filter design)?

I am porting code from Matlab to Python and am having trouble finding a replacement for the firls( ) routine. It is used for, least-squares linear-phase Finite Impulse Response (FIR) filter design.
I looked at scipy.signal and nothing there looked like it would do the trick. Of course I was able to replace my remez and freqz algorithsm, so that's good.
On one blog I found an algorithm that implemented this filter without weighting, but I need one with weights.
Thanks, David

The firls equivalent in python now appears to be implemented as part of the signal package:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.firls.html#scipy.signal.firls
Also I agree with everything that #pev hall stated above especially how firls is optimum in many situations (such as when overall signal to noise is being optimized for a given number of taps), and to not use the boxcar window as he stated, they are not equivalent at all! firls generally outperforms all window and frequency sampling approaches to filter design when designing traditional FIR filters.
To my current understanding, scipy.signal and Octave only support odd length (even order) least squared filters. In this case when I need an even length (Type II or Type IV filter), I resort to using the windowed design approach with a Kaiser Window specifically. I have found the Kaiser window solution to come quite close to the optimum least squares solution.

This blog post contains code detailing how to use scipy.signal to implement FIR filters.

Obviously, this post is somewhat dated, but maybe it is still interesting for some:
I think there are two near-equivalents to firls in Python:
You can try the firwin function with window='boxcar'. This is similar to Matlab where fir1 with a boxcar window delivers the same (? or at least very similar results) as firls.
You could also try the firwin2 method (frequency sampling method, similar to fir2 in Matlab), again using window='boxcar'
I did try one example from the Matlab firls reference and achieved near-identical results for:
Matlab:
F = [0 0.3 0.4 0.6 0.7 0.9];
A = [0 1 0 0 0.5 0.5];
b = firls(24,F,A,'hilbert');
Python:
F = [0, 0.3, 0.4, 0.6, 0.7, 0.9, 1]
A = [0, 1, 0, 0, 0.5, 0.5, 0]
bb = sig.firwin2( 25, F,A, window='boxcar', antisymmetric=True )
I had to increase the order to N = 25 and I also had to add another data point (F = 1, A = 0) which Python insisted upon; the option antisymmetric = True is only necessary for this special case (Hilbert filter)

This post is really in response to
You can try the firwin function with window='boxcar'...
Do don't use boxcar it means no window at all (it is ideal but only works "ideally" with an infinite number of multipliers - sinc in time). The whole perpose of using a window is the reduce the number of multipliers required to get good stop band attenuation. See Window function
When comparing filters please use dB/log scale.
Scipy not having firls (FIR least squares filter) function is a large limitation (as it generates the optimum filter in many situations).
REMEZ has it's place but the flat roll off is a real killer when your trying to get the best results (and not just meeting some managers spec). ( warning scipy remez implementation can give amplification in stop band - see plot at bottom)
If you are using python (or need to use some window) I recommend using the kasiar window which gets very good results and can easily be tweaked for your attenuation vs transition vs multipliers requirement(attenuation (in dB) = 2.285 * (multipliers - 1) * pi * width + 7.95). It performance is not quite as good as firls but it has the benefit of being fast and easy to calculate (great if you don't store the coefficients).

I found a firls() implementation attached here in SciPy ticket 648
Minor changes to get it working:
Swap the following two lines:
bands, desired, weight = array(bands), array(desired), array(weight)
if weight==None : weight = ones(len(bands)/2)
import roots from numpy instead of scipy.signal

Since version 0.18 in July, 2016 scipy includes an implementation of firls, as scipy.signal.firls.

It seems unlikely that you'll find exactly what you seek already written in Python, but perhaps the Matlab function's help page gives or references a description of the algorithm?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

unexpected Shapiro results python - python

It looks like you have quite a bit of data, if you check the notes on the SciPy documentation page it states: For N > 5000 the W test statistic is accurate but the p-value may not be. You may want to consider using the Anderson Darling test.

Related

Is it possible to somehow take this integral?

stokesCavity example for FiPy returns False

SVD calculation error from lapack function when using scikit-learn's Linear Discriminant Analysis class

"print_level =-1" doesn't remove all messages

Does Python/Scipy have a firls( ) replacement (i.e. a weighted, least squares, FIR filter design)?

Categories

Resources