How to calculate the cumulative distribution function in python without using scipy - python

How can I calculate the cumulative distribution function of a normal distribution in python without using scipy?
I'm specifically referring to this function:
from scipy.stats import norm
norm.cdf(1.96)
I have a Django app running on Heroku and getting scipy up and running on Heroku is quite a pain. Since I only need this one function from scipy, I'm hoping I can use an alternative. I'm already using numpy and pandas, but I can't find the function in there. Are there any alternative packages I can use or even implement it myself?

Just use math.erf:
import math
def normal_cdf(x):
"cdf for standard normal"
q = math.erf(x / math.sqrt(2.0))
return (1.0 + q) / 2.0
Edit to show comparison with scipy:
scipy.stats.norm.cdf(1.96)
# 0.9750021048517795
normal_cdf(1.96)
# 0.9750021048517796

This question seems to be a duplicate of How to calculate cumulative normal distribution in Python where there are many alternatives to scipy listed.
I wanted to highlight the answer of Xavier Guihot https://stackoverflow.com/users/9297144/xavier-guihot which shows that from python3.8 the normal is now a built in:
from statistics import NormalDist
NormalDist(mu=0, sigma=1).cdf(1.96)
# 0.9750021048517796

Related

Implement the MATLAB 'fitdist' in Python

I am working on a image processing tool, and I am having some trouble finding a good substitute for matlab's fitdist to a Python soltuion.
The matlab code works something like this:
pdR = fitdist(Red,'Kernel','Support','positive');
Any of you have found a good implementation for this in Python?
Generally SciPy is useful in your case:
import scipy.stats as st
# KDE
st.gaussian_kde(data)
# Fit to specified distribution (Normal distribution in this example)
st.norm.fit(data)
Full reference is here: https://docs.scipy.org/doc/scipy/reference/stats.html

Calculate Signal to Noise ratio in python scipy version 1.1

I have looked around online and it seems that the signaltonoise ratio function inside the scipy.stats is deprecated and is not available in version 1.1. Is there any other equivalent method inside scipy package since I have not been able to find it online.
And if not scipy then is there any other library recommended for such calculations ?
As indicated in scipy issue #609 on github, the signaltonoise function
[...] is not useful except for backwards compatibility. The reason is that there's a Matlab signal-to-noise function http://www.mathworks.com/help/signal/ref/snr.html which means something different. This is not good, because scipy clones the Matlab interface of other signal-related functions, and this incompatibility apparently has no offsetting benefit.
If you do need this function for backward compatibility, the short implementation can be found in the history of scipy repository as (reproduced here without the documentation comments, license):
def signaltonoise(a, axis=0, ddof=0):
a = np.asanyarray(a)
m = a.mean(axis)
sd = a.std(axis=axis, ddof=ddof)
return np.where(sd == 0, 0, m/sd)

Welch's ANOVA in Python

I'm not sure if stackoverflow is the best forum for this, but anyway...
Scipy implements ANOVA using stats.f_oneway, which assumes equal variances. It says in the docs that if the variances are unequal, one could consider the Kruskal-Wallis test instead.
However, what I want is Welch's ANOVA. Scipy has a Welch t-test, but of course this doesn't work if I have more than two groups.
What I find interesting is that scipy used to have stats.oneway which allowed for an equal variance setting. However, it has been deprecated.
Is there an easy way to implement Welch's ANOVA in Python?
Just required the same thing. I had to copy code from R package. Also requested scipy.stats to add this feature. Here is the ~10 lines of code for the implementation
https://github.com/scipy/scipy/issues/11122
The pingouin package has a Welch's ANOVA implemented. You can find the documentation for it at https://pingouin-stats.org/generated/pingouin.welch_anova.html.

Multidimensional/multivariate dynamic time warping (DTW) library/code in Python

I am working on a time series data. The data available is multi-variate. So for every instance of time there are three data points available.
Format:
| X | Y | Z |
So one time series data in above format would be generated real time. I am trying to find a good match of this real time generated time series within another time series base data, which is already stored (which is much larger in size and was collected at a different frequency). If I apply standard DTW to each of the series (X,Y,Z) individually they might end up getting a match at different points within the base database, which is unfavorable. So I need to find a point in base database where all three components (X,Y,Z) match well and at the same point.
I have researched into the matter and found out that multidimensional DTW is a perfect solution to such a problem. In R the dtw package does include multidimensional DTW but I have to implement it in Python. The R-Python bridging package namely "rpy2" can probably of help here but I have no experience in R. I have looked through available DTW packages in Python like mlpy, dtw but are not help. Can anyone suggest a package in Python to do the same or the code for multi-dimensional DTW using rpy2.
Thanks in advance!
Thanks #lgautier I dug deeper and found implementation of multivariate DTW using rpy2 in Python. Just passing the template and query as 2D matrices (matrices as in R) would allow rpy2 dtw package to do a multivariate DTW. Also if you have R installed, loading the R dtw library and "?dtw" would give access to the library's documentation and different functionalities available with the library.
For future reference to other users with similar questions:
Official documentation of R dtw package: https://cran.r-project.org/web/packages/dtw/dtw.pdf
Sample code, passing two 2-D matrices for multivariate DTW, the open_begin and open_end arguments enable subsequence matching:
import numpy as np
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
from rpy2.robjects.packages import importr
import rpy2.robjects as robj
R = rpy2.robjects.r
DTW = importr('dtw')
# Generate our data
template = np.array([[1,2,3,4,5],[1,2,3,4,5]]).transpose()
rt,ct = template.shape
query = np.array([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]]).transpose()
rq,cq = query.shape
#converting numpy matrices to R matrices
templateR=R.matrix(template,nrow=rt,ncol=ct)
queryR=R.matrix(query,nrow=rq,ncol=cq)
# Calculate the alignment vector and corresponding distance
alignment = R.dtw(templateR,queryR,keep=True, step_pattern=R.rabinerJuangStepPattern(4,"c"),open_begin=True,open_end=True)
dist = alignment.rx('distance')[0][0]
print dist
It seems like tslearn's dtw_path() is exactly what you are looking for. to quote the docs linked before:
Compute Dynamic Time Warping (DTW) similarity measure between (possibly multidimensional) time series and return both the path and the similarity.
[...]
It is not required that both time series share the same size, but they must be the same dimension. [...]
The implementation they provide follows:
H. Sakoe, S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26(1), pp. 43–49, 1978.
I think that it is a good idea to try out a method in whatever implementation is already available before considering whether it worth working on a reimplementation.
Did you try the following ?
from rpy2.robjects.packages import importr
# You'll obviously need the R package "dtw" installed with your R
dtw = importr("dtw")
# all functions and objects in the R package "dtw" are now available
# with `dtw.<function or object>`
I happened upon this post and thought I would provide some updated information in case anyone else is trying to locate a way to do multivariate DTW in Python. The DTADistance package has the option to perform multivariate DTW.

Migrating from Stata to Python

Some coworkers who have been struggling with Stata 11 are asking for my help to try to automate their laborious work. They mainly use 3 commands in Stata:
tsset (sets a time series analysis)
as in: tsset year_column, yearly
varsoc (Obtain lag-order selection statistics for VARs)
as in: varsoc column_a column_b
vec (vector error-correction model)
as in: vec column_a column_b, trend(con) lags(1) noetable
Does anyone know any scientific library that I can use through python for this same functionality?
I believe both scikits.timeseries and econpy / pytrix implement vector autoregression methods, but I haven't put either through their paces.
scikits.timeseries is mainly for data handling and has only some statistical, econometric analysis and no vectorautoregression. pytrix has some econometrics functions but also no VAR.
(At least last time I looked.)
scikits.statsmodels and pandas both have VAR, pandas also does the data handling for time series. I haven't seen any vector error correction models in python yet, but scikits.statsmodels is getting close.
http://groups.google.ca/group/pystatsmodels?hl=en&pli=1
Check out scikits.statsmodels.tsa.api.VAR (may need to get the latest development version--use Google) and, in check out the documentation for it:
http://statsmodels.sourceforge.net/devel/vector_ar.html#var
These models integrate with pandas also. I'll be working in the coming months to improve integration of pandas with the rest of statsmodels
Vector Error Correction Models have not been implemented yet but are on the TODO list!
Use Rpy2 and call the R var package.
I have absolutely no clue what any of those do, but NumPy and SciPy. Maybe Sage or SymPy.

Categories