I have some data and want to find the distribution that fits them well. I found one post inMATLAB and one post in r. This post talks about a method in Python. It is trying different distributions and see which one fits better. I was wondering if there is any direct way (like allfitdist() in MATLAB) in Python.
Fitter in python provides similar functionality. The code looks like:
from fitter import Fitter
f = Fitter(data)
f.fit()
For more information, please take a look at https://pypi.python.org/pypi/fitter
I have some data in a CSV file to which I am trying to fit a Poisson distribution. I want to get the lambda for this data so that I can sample using this. How do I do this using python or any of its libraries?
A Poisson distribution has a single parameter - the mean, λ. So you don't need to 'fit' anything per se. Testing whether your data follows such a distribution is another question. Hope this helps.
import numpy as np
poisson_lambda = np.mean(data)
Today i'm using MFCC from librosa in python with the code below. It gives an array with dimension(40,40).
import librosa
sound_clip, s = librosa.load(filename.wav)
mfcc=librosa.feature.mfcc(sound_clip, n_mfcc=40, n_mels=60)
Is there a similiar way to extract the GFCC from another library? I do not find it in
librosa.
For example essentia:
https://essentia.upf.edu/documentation/essentia_python_tutorial.html
https://essentia.upf.edu/documentation/reference/std_GFCC.html
import essentia
import essentia.standard
essentia.standard.GFCC
#Get array with dimension (40,40)
I have been facing similar problems, therefore I wrote a small library called spafe, that simplifies features extractions from audio files. Among the supported features there is GFCC. The extraction can be done as in the following:
import scipy
from spafe.features.gfcc import gfcc
# read wav
fs, sig = scipy.io.wavfile.read("test.wav")
# compute features
gfccs = gfcc(sig, fs=fs, num_ceps=13)
You can find a thorough example of GFCC extraction (as a jupyter-notebook) under gfcc-features-example.
The documentation of all the possible input variables and their significance are available under: gfcc-docs.
the gfcc implementation is done as in the following paper
https://github.com/jsingh811/pyAudioProcessing provides gfcc, mfcc, spectral and chroma feature extraction capability along with classification, cross validation and Hyperparameter tuning.
The readme describes getting started methods as well as examples on how to run classifications.
How can I calculate the cumulative distribution function of a normal distribution in python without using scipy?
I'm specifically referring to this function:
from scipy.stats import norm
norm.cdf(1.96)
I have a Django app running on Heroku and getting scipy up and running on Heroku is quite a pain. Since I only need this one function from scipy, I'm hoping I can use an alternative. I'm already using numpy and pandas, but I can't find the function in there. Are there any alternative packages I can use or even implement it myself?
Just use math.erf:
import math
def normal_cdf(x):
"cdf for standard normal"
q = math.erf(x / math.sqrt(2.0))
return (1.0 + q) / 2.0
Edit to show comparison with scipy:
scipy.stats.norm.cdf(1.96)
# 0.9750021048517795
normal_cdf(1.96)
# 0.9750021048517796
This question seems to be a duplicate of How to calculate cumulative normal distribution in Python where there are many alternatives to scipy listed.
I wanted to highlight the answer of Xavier Guihot https://stackoverflow.com/users/9297144/xavier-guihot which shows that from python3.8 the normal is now a built in:
from statistics import NormalDist
NormalDist(mu=0, sigma=1).cdf(1.96)
# 0.9750021048517796
I am working on a time series data. The data available is multi-variate. So for every instance of time there are three data points available.
Format:
| X | Y | Z |
So one time series data in above format would be generated real time. I am trying to find a good match of this real time generated time series within another time series base data, which is already stored (which is much larger in size and was collected at a different frequency). If I apply standard DTW to each of the series (X,Y,Z) individually they might end up getting a match at different points within the base database, which is unfavorable. So I need to find a point in base database where all three components (X,Y,Z) match well and at the same point.
I have researched into the matter and found out that multidimensional DTW is a perfect solution to such a problem. In R the dtw package does include multidimensional DTW but I have to implement it in Python. The R-Python bridging package namely "rpy2" can probably of help here but I have no experience in R. I have looked through available DTW packages in Python like mlpy, dtw but are not help. Can anyone suggest a package in Python to do the same or the code for multi-dimensional DTW using rpy2.
Thanks in advance!
Thanks #lgautier I dug deeper and found implementation of multivariate DTW using rpy2 in Python. Just passing the template and query as 2D matrices (matrices as in R) would allow rpy2 dtw package to do a multivariate DTW. Also if you have R installed, loading the R dtw library and "?dtw" would give access to the library's documentation and different functionalities available with the library.
For future reference to other users with similar questions:
Official documentation of R dtw package: https://cran.r-project.org/web/packages/dtw/dtw.pdf
Sample code, passing two 2-D matrices for multivariate DTW, the open_begin and open_end arguments enable subsequence matching:
import numpy as np
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
from rpy2.robjects.packages import importr
import rpy2.robjects as robj
R = rpy2.robjects.r
DTW = importr('dtw')
# Generate our data
template = np.array([[1,2,3,4,5],[1,2,3,4,5]]).transpose()
rt,ct = template.shape
query = np.array([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]]).transpose()
rq,cq = query.shape
#converting numpy matrices to R matrices
templateR=R.matrix(template,nrow=rt,ncol=ct)
queryR=R.matrix(query,nrow=rq,ncol=cq)
# Calculate the alignment vector and corresponding distance
alignment = R.dtw(templateR,queryR,keep=True, step_pattern=R.rabinerJuangStepPattern(4,"c"),open_begin=True,open_end=True)
dist = alignment.rx('distance')[0][0]
print dist
It seems like tslearn's dtw_path() is exactly what you are looking for. to quote the docs linked before:
Compute Dynamic Time Warping (DTW) similarity measure between (possibly multidimensional) time series and return both the path and the similarity.
[...]
It is not required that both time series share the same size, but they must be the same dimension. [...]
The implementation they provide follows:
H. Sakoe, S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26(1), pp. 43–49, 1978.
I think that it is a good idea to try out a method in whatever implementation is already available before considering whether it worth working on a reimplementation.
Did you try the following ?
from rpy2.robjects.packages import importr
# You'll obviously need the R package "dtw" installed with your R
dtw = importr("dtw")
# all functions and objects in the R package "dtw" are now available
# with `dtw.<function or object>`
I happened upon this post and thought I would provide some updated information in case anyone else is trying to locate a way to do multivariate DTW in Python. The DTADistance package has the option to perform multivariate DTW.