How to calculate doubly-non central cumulative F distribution in Python - python
This is an alternative question to a previous one I asked here:
How to calculate inverse of cumulative doubly non-central f distribution in python
I found a way to avoid the inverse problem, so I just need to calculate the cumulative doubly non-central f distribution (DNCF) in python. I don't know much about statistics but wolfram indicates the DNCF is just a ratio of non-central chi squared distributions http://mathworld.wolfram.com/NoncentralF-Distribution.html.
Given my DNCF distribution with numerator degrees of freedom and non-centrality parameter (Dfn,Ncn) and denominator degrees of freedom/ non-centrality parameter (Dfd,Ncd) would it be accurate to simply do something like this:
def DNCF(x,Dfn,Dfd,Ncn,Ncd):
out=scipy.stats.ncx2.cdf(x,Dfn,Ncn)/scipy.stats.ncx2.cdf(x,Dfd,Ncd)
return out
This seems like it would work to me, but I would appreciate some verification from someone who knows more about statistics.
Thanks
Related
A simple way to compute cumulative distribution function in Python
I'm trying to compute the distribution function of any of the usual distributions in Python... However, all the methods I've seen involve first drawing N samples from said distribution, and then order them somehow, and then do a cumulative sum. In Mathematica, I can just do CDF[ChiSquaredDistribution[df],quantile]. If I want another distribution, I just substitute ChiSquaredDistribution for the name of that other distribution. Is there a simple way, like in Mathematica, to compute a cumulative distribution function in Python?
Python - calculate normal distribution
I'm quite new to python world. Also, I'm not a statistician. I'm in the need to implementing mathematical models developed by mathematicians in a computer science programming language. I've chosen python after some research. I'm comfortable with programming as such (PHP/HTML/javascript). I have a column of values that I've extracted from a MySQL database & in need to calculate the below: Normal distribution of it. (I don't have the sigma & mu values. These need to be calculated too apparently). Mixture of normal distribution Estimate density of normal distribution Calculate 'Z' score The array of values looks similar to the one below ( I've populated sample data)- data = [3,3,3,3,3,3,3,9,12,6,3,3,3,3,9,21,3,12,3,6,3,30,12,6,3,3,24,30,3,3,3,12,3,3,3,3,3,3,3,6,9,3,3,3,3,3,3,3,3,3,3,3,3,33,3,3,3,6,3,3,6,6,15,3,3,3,3,6,3,3,3,3,3,3,3,3,12,12,3,3,3,3,3,3,78,9,12,3,6,3,15,6,3,3,3,30,3,6,78,3,9,9,3,78,3,3,3,3,3,12,15,3,3,78,3,3,33,78,15,9,3,3,21,6,3,6,30,6,6,3,3,3,3,12,3,3,3,3,3,12,3,3,3,3,3,3,3,3,3,3,3,3,12,6,3,3,9,3,3,12,3,3,3,3,6,3,3,6,3,3,18,6,3,3,3,3,3,6,3,3,3,3,3,3,3,3,9,21,3,9,3,3,12,12,3,3,15,30,3,12,3,3,6,3,3,3,9,9,6,6,3,3,27,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,6,12,6,3,3,3,3,30,3,3,3,3,6,18,24,6,3,3,42,3,3,6,3,15,3,3,3,3,9,3,60,81,54,3,9,3,3,6,3,6,3,3,3,3,6,3,3,3,33,24,3,3,3,3,3,3,3,3,3,3,3,3,3,93,3,3,21,3,3,3,3,6,6,30,3,3,3,3,6,3,9,3,3,6,3,6,3,3,3,39,9,30,6,45,3,3,3,3,3,24,12,3,6,3,78,3,3,3,3,3,3,3,3,3,3,3,9,6,3,3,3,6,15,3,78,3,3,30,3,3,3,33,24,3,3,6,3,3,3,6,3,3,3,12,15,3,3,3,21,3,3,3,3,9,6,3,6,3,3,3,3,6,6,3,15,6,9,3,3,18,3,3,3,3,3,3,3,3,21,3,3,6,3,3,3,3,3,3,12,3,3,3,3,3,3,6,21,12,3,6,9,3,3,3,3,9,15,3,6,78,6,6,3,9,3,9,3,6,3,3,3,24,3,3,6,3,3,27,3,6,3,3,3,3,3,3,3,3,3,3,3,3,21,3,9,6,6,9,27,30,3,3,9,12,6,3,3,12,9,3,21,3,6,9,9,3,3,3,3,9,6,3,3,6,3,3,3,3,3,6,3,6,3,3,3,24,6,3,3,3,3,3,3,3,3,3,3,18,3,3,3,3,3,9,6,3,3,3,18,3,9,3,3,15,9,12,3,18,3,6,3,3,3,6,3,3,3,3,3,3,3,21,9,15,3,3,3,21,3,3,3,3,3,6,9,3,3,21,6,3,3,15,3,18,3,3,21,3,21,3,9,3,6,21,3,9,15,3,69,21,3,3,3,9,3,3,3,12,3,3,9,3,3,27,3,3,9,3,9,3,3,3,3,3,30,3,12,21,18,27,3,3,12,3,6,3,30,3,21,9,15,6,3,3,3,15,9,12,12,33,3,3,30,3,6,6,21,3,3,12,3,3,6,51,3,3,3,3,12,3,6,3,9,78,21,3,3,21,18,6,12,3,3,3,21,9,6,3,3,3,3,3,3,6,3,6,27,3,3,3,3,3,3,12,3,3,3,3,6,3,18,3,3,15,3,3,18,9,6,3,3,24,3,6,12,30,3,12,24,3,3,3,9,3,12,27,3,3,6,3,9,3,9,3,15,3,6,3,3,9,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,6,3,3,3,9,15,3,3,3,3,9,3,6,3,3,3,3,27,3,6,3,3,3,3,3,3,3,3,3,3,9,3,3,3,12,3,3,3,27,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,9,3,3,3,3,3,3,15,3,3,3,3,3,3,12,3,6,6,3,3,3,3,6,3,3,6,3,3,3,3,3,6,3,3,3,3,6,12,6,3,3,3,3,6,3,3,3,3,3,3,3,3,3,6,3,6,3,3,6,3,3,6,3,3,3,6,6,6,3,3,27,3,3,3,3,3,3,3,27,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,6,3,3,3,6,3,54,75,3,57,3,6,27,18,3,3,3,3,27,3,3,3,3,3,9,3,27,3,3,6,6,30,3,3,6,3,3,3,6,15,3,6,3,3,6,3,3,3,3,6,3,3,27,9,3,18,3,3,6,6,3,9,3,3,3,6,3,3,3,3,3,3,3,3,6,3,3,3,6,3,3,6,3,3,3,3,6,6,3,3,3,6,6,3,3,3,3,3,3,3,6,3,3,6,3,3,3,3,3,6,3,18,3,3,6,3,6,3,3,3,3,3,3,3,3,6,15,3,6,15,6,3,3,3,3,3,3,3,3,3,3,3,3,6,3,6,3,3,6,12,3,3,6,3,3,6,3,3,3,3,3,27,3,3,3,3,9,3,27,3,3,27,3,3,3,3,3,3,9,6,3,9,3,6,3,3,6,3,6,3,3,3,6,3,3,6,3,18,3,3,3,9,6,3,3,3,3,3,6,3,6,6,3,18,27,3,3,3,6,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,21,3,3,3,3,6,9,3,3,3,3,3,3,6,3,6,3,3,3,3,3,6,3,6,3,3,3,3,3,18,3,3,18,3,3,3,3,6,3,3,3,18,6,3,3,3,3,3,3,3,6,3,3,3,6,3,3,3,3,3,3,6,3,3,3,3,3,3,6,3,3,6,3,6,3,3,3,6,3,3,6,3,3,3,3,6,3,3,3,6,3,3,3,3,3,3,3,6,6,3,3,3,3,3,6,3,6,3,54,3,6,3,6,6,6,3,3,3,3,3,3,6,3,3,6,3,3,6,3,3,9,12,3,6,3,3,3,3,3,6,6,3,3,3,3,6,3,6,3,3,3,3,3,3,3,3,6,3,3,3,3,3,6,3,3,3,3,3,12,3,3,6,9,27,21,3,3,3,3,3,21,6,3,3,3,3,3,3,3,3,3,3,3,6,3,3,12,3,3,3,3,3,3,3,3,3,3,3,6,3,3,6,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,9,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,6,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,6,3,3,3,3,6,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,6,6,3,3,3,3,3,3,6,6,3,3,3,3,3,3,6,3,3,6,3,3,3,6,3,3,3,3,6,6,3,6,3,6,6,3,9,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,6,3,3,3,9,9,3,3,3,3,3,6,3,3,3,3,6,3,3,3,3,6,3,3,3,3,3,6,3,6,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,6,3,3,6,3,3,3,3,3,3,3,6,3,3,3,135,3,9,3,3,6,9,3,3,3,6,3,3,3,3,6,3,3,6,6,3,3,3,3,3,3,3,3,3,3,3,3,6,6,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,6,3,3,3,135,3,3,3,6,3,3,3,3,6,6,3,3,69,87,57,9,3,3,3,12,3,6,3,3,3,6,3,3,3,3,3,3,3,3,3,3,6,9,12,3,3,3,3,3,3,3,3,6,3,3,9,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,9,3,3,3,3,12,3,3,33,3,6,3,3,3,3,3,3,6,3,6,3,3,6,3,3,3,6,3,6,3,3,6,3,3,3,6,3,3,6,3,3,3,6,3,3,3,3,9,3,3,6,6,3,3,3,6,6,3,3,3,3,3,3,6,3,3,3,3,6,3,3,3,6,3,18,3,6,3,3,3,3,9,3,3,3,3,3,3,6,3,3,6,3,3,3,3,3,135,3,9,3,3,3,3,3,3,3,3,6,6,3,6,6,3,3,6,3,3,3,6,6,3,3,3,3,6,9,3,3,3,3,3,3,6,6,3,3,3,3,3,3,135,3,3,3,6,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,6,6,6,3,3,3,6,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,9,6,3,3,3,9,3,3,3,3,9,3,3,3,3,3,3,3,3,3,9,3,6,6,3,6,3,3,6,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,9,3,24,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,6,3,3,3,3,3,3,6,3,135,3,3,3,3,3,3,6,6,3,3,3,3,3,3,3,3,6,3,3,3,3,3,9,6,3,3,3,9,3,3,3,3,3,3,6,3,3,6,3,9,3,3,3,6,3,3,3,6,6,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,9,3,3,3,3,3,9,6,3,9,3,6,3,3,21,9,3,3,3,6,3,3,3,3,6,3,3,3,3,9,3,3,3,3,3,3,3,135,3,6,6,6,3,6,3,3,9,6,6,3,3,3,3,3,3,9,3,6,3,3,3,3,3,3,3,6,9,6,3,3,6,3,6,6,3,3,3,3,6,3,6,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,6,3,6,3,12,3,24,3,3,3,3,3,3,21,3,3,3,3,3,3,3,6,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,15,3,3,3,3,3,3,3,6,3,3,6,6,3,3,9,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,9,3,3,3,6,3,3,3,6,3,6,3,3,3,3,3,3,3,3,3,12,3,3,3,3,3,3,6,3,6,6,3,3,3,6,3,3,6,3,3,3,3,9,6,3,3,3,6,9,3,3,3,6,9,3,6,3,3,3,3,3,3,6,3,3,3,3,6,6,3,3,3,3,3,3,3,3,3,3,9,15,3,3,3,6,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,12,3,3,3,6,6,6,3,3,3,6,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,12,12,6,3,3,3,3,3,3,3,3,3,9,6,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,6,3,3,3,3,6,3,3,3,6,3,3,3,3,3,3,3,6,3,3,3,6,3,3,6,3,3,12,3,3,3,6,3,3,3,3,564,84,3,60,6,15,3,3,3,3,3,6,3,3,3,3,3,3,3,9,3,3,3,3,3,3,3,3,3,3,3,6,9,3,3,3,3,3,9,3,3,3,3,3,12,6,3,3,3,3,3,3,3,3,6,3,3,3,3,9,57,3,6,3,6,3,3,6,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,9,3,3,3,3,6,3,3,3,6,12,3,6,3,3,3,3,3,3,3,3,6,3,6,3,3,3,6,3,3,6,3,3,36,3,3,6,6,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,12,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,6,3,3,6,3,6,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,12,6,3,3,3,3,3,3,3,12,3,3,3,6,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,9,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,12,3,3,3,3,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,9,3,3,3,3,3,3,3,9,3,3,3,3,3,3,3,3,3,6,3,3,3,3,3,3,3,3,3,3,6,3,3,3,27,3,3,6,3,3,3,3,3,6,3,3,3,3,6,3,3,9,3,3,3,12,3,3,3,3,3,6,9,3,6,3,3] I've looked around & found quite a bit about cumulative distribution as here (These have the mu & sigma values ready anyway which isn't the case in my scenario). I'm not too sure if cumulative normal distribution & normal distribution are the same. Could I please get some pointers on how to get started with this please? I'd very much appreciate any help here please.
A distribution and the cumulative distribution are not the same - the latter is the integral of the former. If the normal distribution looks like a "bell", the cumulative normal distribution looks like a gentle "step" function. E.g., for the following "bells" you'd get the following "steps" If you have an array data, the following will fit it to a normal distribution using scipy.stats.norm: import numpy as np from scipy.stats import norm mu, std = norm.fit(data) This will return the mean and standard deviation, the combination of which define a normal distribution.
Normal and cumulative distributions are not the same. I'll leave that bit of research to you. The formula for normal distribution is easy if you have the mean and standard deviation:
The thing that you may look at is the normal distribution not the cumulative normal distribution. You can calculate the frequency of each element that occurs in the array and plot it to visualize the distribution. Then you can use numpy to calculate mean = numpy.mean(array) and standard deviation as std = numpy.std(array). Hope this helps.
python scipy.stats pdf and expect functions
I was wondering if someone could please explain what the following functions in scipy.stats do: rv_continuous.expect rv_continuous.pdf I have read the documentation but I am still confused. Here is my task, quite simple in theory, but I am still confused with what these functions do. So, I have a list of areas, 16383 values. I want to find the probability that the variable area takes any value between a smaller value , called "inf" and a larger value "sup". So, what I thought is: scipy.stats.rv_continuous.pdf(a) #a being the list of areas scipy.stats.rv_continuous.expect(pdf, lb = inf, ub = sup) So that i can get the probability that any area is between sup and inf. Can anyone help me by explaining in a simple way what the functions do and any hint on how to compute the integral of f(a) between inf and sup, please? Thanks Blaise
rv_continuous is a base class for all of the probability distributions implemented in scipy.stats. You would not call methods on rv_continuous yourself. Your question is not entirely clear about what you want to do, so I will assume that you have an array of 16383 data points drawn from some unknown probability distribution. From the raw data, you will need to estimate the cumulative distribution, find the values of that cumulative distribution at the sup and inf values and subtract to find the probability that a value drawn from the unknown distribution. There are lots of ways to estimate the unknown distribution from the data depending on how much modelling you want to do and how many assumptions you want to make. At the more complicated end of the spectrum, you could try to fit one of the standard parametric probability distributions to the data. For example, if you had a suspicion that your data were lognormally distributed, you could use scipy.stats.lognorm.fit(data, floc=0) to find the parameters of the lognormal distribution that fit your data. Then you could use scipy.stats.lognorm.cdf(sup, *params) - scipy.stats.lognorm.cdf(inf, *params) to estimate the probability of the value being between those values. In the middle are the non-parametric forms of distribution estimation like histograms and kernel density estimates. For example, scipy.stats.gaussian_kde(data).integrate_box_1d(inf, sup) is an easy way to make this estimate using a Gaussian kernel density estimate of the unknown distribution. However, kernel density estimates aren't always appropriate and require some tweaking to get right. The simplest thing you could do is just count the number of data points that fall between inf and sup and divide by the total number of data points that you have. This only works well with a largish number of points (which you have) and with bounds that aren't too far in the tails of the data.
The cumulative density function might give you what you want. Then the probability P of being between two values is P(inf < area < sup) = cdf(sup) - cdf(inf) There's a tutorial about probabilities here and here They are all related. The pdf is the "density" of the probabilities. They must be greater than zero and sum to 1. I think of it as indicating how likely something is. The expectation is is a generalisation of the idea of average. E[x] = sum(x.P(x))
Maximum Likelihood Estimate pseudocode
I need to code a Maximum Likelihood Estimator to estimate the mean and variance of some toy data. I have a vector with 100 samples, created with numpy.random.randn(100). The data should have zero mean and unit variance Gaussian distribution. I checked Wikipedia and some extra sources, but I am a little bit confused since I don't have a statistics background. Is there any pseudo code for a maximum likelihood estimator? I get the intuition of MLE but I cannot figure out where to start coding. Wiki says taking argmax of log-likelihood. What I understand is: I need to calculate log-likelihood by using different parameters and then I'll take the parameters which gave the maximum probability. What I don't get is: where will I find the parameters in the first place? If I randomly try different mean & variance to get a high probability, when should I stop trying?
I just came across this, and I know its old, but I'm hoping that someone else benefits from this. Although the previous comments gave pretty good descriptions of what ML optimization is, no one gave pseudo-code to implement it. Python has a minimizer in Scipy that will do this. Here's pseudo code for a linear regression. # import the packages import numpy as np from scipy.optimize import minimize import scipy.stats as stats import time # Set up your x values x = np.linspace(0, 100, num=100) # Set up your observed y values with a known slope (2.4), intercept (5), and sd (4) yObs = 5 + 2.4*x + np.random.normal(0, 4, 100) # Define the likelihood function where params is a list of initial parameter estimates def regressLL(params): # Resave the initial parameter guesses b0 = params[0] b1 = params[1] sd = params[2] # Calculate the predicted values from the initial parameter guesses yPred = b0 + b1*x # Calculate the negative log-likelihood as the negative sum of the log of a normal # PDF where the observed values are normally distributed around the mean (yPred) # with a standard deviation of sd logLik = -np.sum( stats.norm.logpdf(yObs, loc=yPred, scale=sd) ) # Tell the function to return the NLL (this is what will be minimized) return(logLik) # Make a list of initial parameter guesses (b0, b1, sd) initParams = [1, 1, 1] # Run the minimizer results = minimize(regressLL, initParams, method='nelder-mead') # Print the results. They should be really close to your actual values print results.x This works great for me. Granted, this is just the basics. It doesn't profile or give CIs on the parameter estimates, but its a start. You can also use ML techniques to find estimates for, say, ODEs and other models, as I describe here. I know this question was old, hopefully you've figured it out since then, but hopefully someone else will benefit.
If you do maximum likelihood calculations, the first step you need to take is the following: Assume a distribution that depends on some parameters. Since you generate your data (you even know your parameters), you "tell" your program to assume Gaussian distribution. However, you don't tell your program your parameters (0 and 1), but you leave them unknown a priori and compute them afterwards. Now, you have your sample vector (let's call it x, its elements are x[0] to x[100]) and you have to process it. To do so, you have to compute the following (f denotes the probability density function of the Gaussian distribution): f(x[0]) * ... * f(x[100]) As you can see in my given link, f employs two parameters (the greek letters µ and σ). You now have to calculate the values for µ and σ in a way such that f(x[0]) * ... * f(x[100]) takes the maximum possible value. When you've done that, µ is your maximum likelihood value for the mean, and σ is the maximum likelihood value for standard deviation. Note that I don't explicitly tell you how to compute the values for µ and σ, since this is a quite mathematical procedure I don't have at hand (and probably I would not understand it); I just tell you the technique to get the values, which can be applied to any other distributions as well. Since you want to maximize the original term, you can "simply" maximize the logarithm of the original term - this saves you from dealing with all these products, and transforms the original term into a sum with some summands. If you really want to calculate it, you can do some simplifications that lead to the following term (hope I didn't mess up anything): Now, you have to find values for µ and σ such that the above beast is maximal. Doing that is a very nontrivial task called nonlinear optimization. One simplification you could try is the following: Fix one parameter and try to calculate the other. This saves you from dealing with two variables at the same time.
You need a numerical optimisation procedure. Not sure if anything is implemented in Python, but if it is then it'll be in numpy or scipy and friends. Look for things like 'the Nelder-Mead algorithm', or 'BFGS'. If all else fails, use Rpy and call the R function 'optim()'. These functions work by searching the function space and trying to work out where the maximum is. Imagine trying to find the top of a hill in fog. You might just try always heading up the steepest way. Or you could send some friends off with radios and GPS units and do a bit of surveying. Either method could lead you to a false summit, so you often need to do this a few times, starting from different points. Otherwise you may think the south summit is the highest when there's a massive north summit overshadowing it.
As joran said, the maximum likelihood estimates for the normal distribution can be calculated analytically. The answers are found by finding the partial derivatives of the log-likelihood function with respect to the parameters, setting each to zero, and then solving both equations simultaneously. In the case of the normal distribution you would derive the log-likelihood with respect to the mean (mu) and then deriving with respect to the variance (sigma^2) to get two equations both equal to zero. After solving the equations for mu and sigma^2, you'll get the sample mean and sample variance as your answers. See the wikipedia page for more details.
nth order fourier series curve fitting in Python
I've been looking for a way to code a snippet in Python which calculate for any n-th order of Fourier series curve fitting. To calculate a certain order of Fourier series curve fitting, say 3 order is quite simple, however to do it where the order n is variable, still not workable yet. Perhaps somebody has done it, but my searching can't find it yet. I wonder if anybody could give a help. Thanks.
Well the formula is n-th cos_coeff = (2/T)*integral(-T/2,T/2, f(t)*cos(n*t*2*pi/2)dt) n-th sin coeff = (2/T)*integral(-T/2,T/2, f(t)*sin(n*t*2*pi/2)dt) Check scypi and scipy.integrate for details on integration. Here it should be cos_coeff(f, T, N) = (2/T)*quad(lambda t: f(t)*cos(N*t*2*math.pi/2),-T/2,T/2) (Not tested, though) I am not familiar with Discrete Fourier Transform, but you can perhaps compute said coefficient from it, too. Check http://docs.scipy.org/doc/scipy/reference/tutorial/fftpack.html