Get mean of a distribution? - python

So, I generated a vector d of data that follows a normal distribution with some mean and variance.
I want then to calculate a vector s such that each component of it is a function of the type si=f(di).
Then I want to do the mean. Is there in Python any quick way to do that without any cycle?

You can use numpy to perform a function on an entire array for example if I had such a function
def f(x):
return x * 2
Then I could use numpy as follows
>>> d = numpy.array([1,2,6,7])
>>> f(d)
array([ 2, 4, 12, 14])
Then to calculate the mean
>>> s = f(d)
>>> numpy.mean(s)
8.0

Related

why percentile() method is not calculating the appropriate percentile? Like the 25th percentile for this data should be 1.5 and 2 if rounded off

import numpy as np
value = [1, 2, 3, 4, 5, 6]
x = np.percentile(value, 25)
print(x)
I am calculating percentile using this code to cross verify
import sys
import numpy as np
from numpy import math
def my_percentile(data, percentile):
n = len(data)
p = n * percentile / 100
if p.is_integer():
return sorted(data)[int(p)]
else:
return sorted(data)[int(math.ceil(p)) - 1]
t = [1, 2, 3, 4, 5, 6]
per = my_percentile(t, 25)
print(per)
There's more than one way to calculate quartiles. Wikipedia has a good summary under quantiles.
The values returned by numpy's default calculation match those returned by, for example, R's summary() function.
You need to do one of these things.
Switch to numpy.percentile's default way of calculating quartiles,
provide a value to numpy.percentile's parameter interpolation, or
write your own custom function.
Valid values for interpolation in numpy.percentile are here.
I didn't suggest a value for interpolation, because you didn't include your expected output in your question. You need to consider the effect of your decision on all quartiles, not just on one.
(I don't think scipy.stats.percentileofscore() will work for you.

How to call a function with parameters as matrix?

I am trying to call scipy.stats.multivariate_normal with four different parameters for mu and sigma. And then for each generated probability density function I need to call that pdf on an array of say, 10 values.
For simplicity let's say that above mentioned function is addXY:
def addXY(x, y):
return x+y
params=[[1,2],[1,3],[1,4],[1,5]] # mu and sigma, four versions
inputs=[1,2,3] # values, in this case 3 of them
matrix = []
for pdf_params in params:
row = []
for inp in inputs:
entry = addXY(*pdf_params)
row.append(entry*inp)
matrix.append(row)
print matrix
Is this pythonic?
Is there a way to pass params and inputs and get a matrix with all combinations in it that is more pythonic/vectorized/faster?
!Important notice: Inputs in the example are scalar values (I've set scalar values to simplify problem description, I am actually using array of n-dimensional vectors and thus multivariate_normal pdf).
Hints and tips about similar operations are welcome.
Based on your description of what you are trying to compute, you don't need multivariate_normal. You are calling the PDF method with a set of scalar values for a distribution with a scalar mu and sigma. So you can use the pdf() method of scipy.stats.norm. This method will broadcast its arguments, so by passing in arrays with the proper shape, you can compute the PDF for the different values of mu and sigma in one call. Here's an example.
Here are your x values (you called them inputs), and the parameters:
In [23]: x = np.array([1, 2, 3])
In [24]: params = np.array([[1, 2], [1, 3], [1, 4], [1, 5]])
For convenience, separate the parameters into arrays of mu and sigma values.
In [25]: mu = params[:, 0]
In [26]: sig = params[:, 1]
We'll use scipy.stats.norm to compute the PDF.
In [27]: from scipy.stats import norm
This call computes the PDF for the desired combinations of x and parameters. mu.reshape(-1, 1) and sig.reshape(-1, 1) are 2D arrays with shape (4, 1). x has shape (3,), so when these arguments are broadcast, the result has shape (4, 3). Each row is the PDF evaluated at x for one of the pairs of mu and sigma.
In [28]: p = norm.pdf(x, loc=mu.reshape(-1, 1), scale=sig.reshape(-1, 1))
In [29]: p
Out[29]:
array([[ 0.19947114, 0.17603266, 0.12098536],
[ 0.13298076, 0.12579441, 0.10648267],
[ 0.09973557, 0.09666703, 0.08801633],
[ 0.07978846, 0.07820854, 0.07365403]])
In other words, the rows of p are:
norm.pdf(x, loc=mu[0], scale=sig[0])
norm.pdf(x, loc=mu[1], scale=sig[1])
norm.pdf(x, loc=mu[2], scale=sig[2])
norm.pdf(x, loc=mu[3], scale=sig[3])
This is only my idea to shorten the code and utilize more library.
In your code, in fact, you do not use numpy, scipy. Question will be whether you would like to use numpy.array for further data processing.
Option 1: just use list to present array and list of list to present matrix:
from itertools import product
matrix_list = [sum(param)*input_x for param, input_x in product(params, inputs)]
matrix = zip(*[iter(matrix_list)]*len(inputs))
print matrix
Credit for using zip method should be given to
convert a flat list to list of list in python
Option 2: use numpy.array and numpy.matrix for further processing
from itertools import product
import numpy as np
matrix_array = np.array([sum(param)*input_x for param, input_x in product(params, inputs)])
matrix = matrix_array.reshape(len(params),len(inputs))
print matrix

Use Python SciPy to compute the Rodrigues formula P_n(x) (Legendre polynomials)

I'm trying to use Python to calculate the Rodrigues formula, P_n(x).
http://en.wikipedia.org/wiki/Rodrigues%27_formula
That is, I would like a function which takes into two input parameters, n and x, and returns the output of this formula.
However, I don't think SciPy has this function yet. SpiPy does offer a Legendre module:
http://docs.scipy.org/doc/numpy/reference/routines.polynomials.legendre.html
I don't think any of these is the Rodrigues formula. Am I wrong?
Is there a standard way SciPy offers to do this?
EDIT: I would like the input parameters to be arrays, not just single input values.
If you simply want P_n(x), then you can create a suitable object representing the P_n polynomial using scipy.special.legendre and call it with your values of x:
In [1]: from scipy.special import legendre
In [2]: n = 3
In [3]: Pn = legendre(n)
In [4]: Pn(2.5)
Out[4]: 35.3125 # P_3(2.5)
The object Pn is, in a sense, the "output" of the Rodrigues formula: it is a polynomial of the required order, which can be evaluated at a provided value of x. If you want a single function that takes n and x, you can use eval_legendre:
In [5]: from scipy.special import eval_legendre
In [6]: eval_legendre(3, 2.5)
Out[6]: 35.3125
As noted in the docs, this is the recommended way to do it for large-ish n (e.g. n > 20), instead of creating a polynomial object with all the coefficients which does not handle rounding errors and numerical stability as well.
EDIT: Both approaches work with arrays (at least for the x argument). For example:
In [7]: x = np.array([0, 1, 2, 5, 10])
In [8]: Pn(x)
Out[8]:
array([ 0.00000000e+00, 1.00000000e+00, 1.70000000e+01,
3.05000000e+02, 2.48500000e+03])

Overcoming broadcasting error for Legendre polynomails, scipy eval_legendre

I am trying to evaluate the Legendre polynomial P_n(x) with scipy's special function
scipy.special.eval_legendre(n, x)
which allows you to evaluate a Legendre at certain points. I would then like to sum these Legendre polynomials together, \Sigma_n P_n(x).
Begin by evaluating P_n(x) at several n values, let's say 10. Define an array
arr = np.arange(10) = array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
and you can evaluate P_n(x) at these values.
My argument however is a 100 by 100 matrix. So,
eval_legendre(np.arange(10), matrix)
will not work as there's a broadcasting error. That's easy to overcome.
But then, I would like to take the sum of all of these Legendre polynomials
"Sum = P_0(x) + P_1(x) + P_2(x) + ... + P_10(x)"
using
import numpy as np
np.sum()
That is more complex, as I am summing each P_n(x).
I suspect the correct approach is something like
for i in arr:
np.sum(i, matrix)
Is there a more clean/tidy way to do this?
This should do the job:
sum( [eval_legendre(x,matrix) for x in range(1,10)] )
Each call to the eval_legendre function returns a matrix of the shape of the matrix you pass to it. So we can make a list of these matrices using list comprehension, and sum them as you suggested.

Calculating Covariance with Python and Numpy

I am trying to figure out how to calculate covariance with the Python Numpy function cov. When I pass it two one-dimentional arrays, I get back a 2x2 matrix of results. I don't know what to do with that. I'm not great at statistics, but I believe covariance in such a situation should be a single number. This is what I am looking for. I wrote my own:
def cov(a, b):
if len(a) != len(b):
return
a_mean = np.mean(a)
b_mean = np.mean(b)
sum = 0
for i in range(0, len(a)):
sum += ((a[i] - a_mean) * (b[i] - b_mean))
return sum/(len(a)-1)
That works, but I figure the Numpy version is much more efficient, if I could figure out how to use it.
Does anybody know how to make the Numpy cov function perform like the one I wrote?
Thanks,
Dave
When a and b are 1-dimensional sequences, numpy.cov(a,b)[0][1] is equivalent to your cov(a,b).
The 2x2 array returned by np.cov(a,b) has elements equal to
cov(a,a) cov(a,b)
cov(a,b) cov(b,b)
(where, again, cov is the function you defined above.)
Thanks to unutbu for the explanation. By default numpy.cov calculates the sample covariance. To obtain the population covariance you can specify normalisation by the total N samples like this:
numpy.cov(a, b, bias=True)[0][1]
or like this:
numpy.cov(a, b, ddof=0)[0][1]
Note that starting in Python 3.10, one can obtain the covariance directly from the standard library.
Using statistics.covariance which is a measure (the number you're looking for) of the joint variability of two inputs:
from statistics import covariance
# x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
covariance(x, y)
# 0.75

Categories