how to execute sigma operation in python - python

I am trying to create a function that computes this formula:
.
Formula non-screenshot:
distance = sigma * (( observed - expected)**2 / expected )
This is my current code:
def distance(observed, expected):
num = (observed - expected)**2
den = (expected)
dist = sigma * (num/den)
return dist
I have no idea how I would compute sigma, so I appreciate any help/feedback!
Thanks!

Sigma here means the sum over multiple observed and expected pairs.
For example:
If observed is a list of numbers [ 1,1,3,3,...]
and expected is a list of expected values corresponding to the observed values, say, [1.2,1.3,3.1,3.2...]
Then you are required to find the sum over their individual distances.
def distance(observed, expected):
res = 0
for o, e in zip(observed,expected):
res += (o-e)**2/e
return res

Sigma is summation over the range and not the multiplication.
Observed and Expected must be list of numbers of same length.
def distance(observed, expected):
sample_space_length = len(observed)
distance = 0
for x in range(sample_space_length):
distance += ((observed[x] - expected[x]) ** 2) / expected[x]
return distance

observed should be a list of number
def distance(observed, expected):
return sum((item - expected)**2*1.0/ expected for item in observed )
observed = [1,3,45,56,3,2,4,5,6,7]
expected = sum(observed)/len(observed)
print distance(observed,expected)
274.461538462

Related

Taylor series for log(x)

I'm trying to evaluate a Taylor polynomial for the natural logarithm, ln(x), centred at a=1 in Python. I'm using the series given on Wikipedia however when I try a simple calculation like ln(2.7) instead of giving me something close to 1 it gives me a gigantic number. Is there something obvious that I'm doing wrong?
def log(x):
n=1000
s=0
for i in range(1,n):
s += ((-1)**(i+1))*((x-1)**i)/i
return s
Using the Taylor series:
Gives the result:
EDIT: If anyone stumbles across this an alternative way to evaluate the natural logarithm of some real number is to use numerical integration (e.g. Riemann sum, midpoint rule, trapezoid rule, Simpson's rule etc) to evaluate the integral that is often used to define the natural logarithm;
That series is only valid when x is <= 1. For x>1 you will need a different series.
For example this one (found here):
def ln(x): return 2*sum(((x-1)/(x+1))**i/i for i in range(1,100,2))
output:
ln(2.7) # 0.9932517730102833
math.log(2.7) # 0.9932517730102834
Note that it takes a lot more than 100 terms to converge as x gets bigger (up to a point where it'll become impractical)
You can compensate for that by adding the logarithms of smaller factors of x:
def ln(x):
if x > 2: return ln(x/2) + ln(2) # ln(x) = ln(x/2 * 2) = ln(x/2) + ln(2)
return 2*sum(((x-1)/(x+1))**i/i for i in range(1,1000,2))
which is something you can also do in your Taylor based function to support x>1:
def log(x):
if x > 1: return log(x/2) - log(0.5) # ln(2) = -ln(1/2)
n=1000
s=0
for i in range(1,n):
s += ((-1)**(i+1))*((x-1)**i)/i
return s
These series also take more terms to converge when x gets closer to zero so you may want to work them in the other direction as well to keep the actual value to compute between 0.5 and 1:
def log(x):
if x > 1: return log(x/2) - log(0.5) # ln(x/2 * 2) = ln(x/2) + ln(2)
if x < 0.5: return log(2*x) + log(0.5) # ln(x*2 / 2) = ln(x*2) - ln(2)
...
If performance is an issue, you'll want to store ln(2) or log(0.5) somewhere and reuse it instead of computing it on every call
for example:
ln2 = None
def ln(x):
if x <= 2:
return 2*sum(((x-1)/(x+1))**i/i for i in range(1,10000,2))
global ln2
if ln2 is None: ln2 = ln(2)
n2 = 0
while x>2: x,n2 = x/2,n2+1
return ln2*n2 + ln(x)
The program is correct, but the Mercator series has the following caveat:
The series converges to the natural logarithm (shifted by 1) whenever −1 < x ≤ 1.
The series diverges when x > 1, so you shouldn't expect a result close to 1.
The python function math.frexp(x) can be used to advantage here to modify the problem so that the taylor series is working with a value close to one. math.frexp(x) is described as:
Return the mantissa and exponent of x as the pair (m, e). m is a float
and e is an integer such that x == m * 2**e exactly. If x is zero,
returns (0.0, 0), otherwise 0.5 <= abs(m) < 1. This is used to “pick
apart” the internal representation of a float in a portable way.
Using math.frexp(x) should not be regarded as "cheating" because it is presumably implemented just by accessing the bit fields in the underlying binary floating point representation. It isn't absolutely guaranteed that the representation of floats will be IEEE 754 binary64, but as far as I know every platform uses this. sys.float_info can be examined to find out the actual representation details.
Much like the other answer does you can use the standard logarithmic identities as follows: Let m, e = math.frexp(x). Then log(x) = log(m * 2e) = log(m) + e * log(2). log(2) can be precomputed to full precision ahead of time and is just a constant in the program. Here is some code illustrating this to compute the two similar taylor series approximations to log(x). The number of terms in each series was determined by trial and error rather than rigorous analysis.
taylor1 implements log(1 + x) = x1 - (1/2) * x2 + (1/3) * x3 ...
taylor2 implements log(x) = 2 * [t + (1/3) * t3 + (1/5) * t5 ...], where t = (x - 1) / (x + 1).
import math
import struct
_LOG_OF_2 = 0.69314718055994530941723212145817656807550013436025
def taylor1(x):
m, e = math.frexp(x)
log_of_m = 0
num_terms = 36
sign = 1
m_minus1_power = m - 1
for k in range(1, num_terms + 1):
log_of_m += sign * m_minus1_power / k
sign = -sign
m_minus1_power *= m - 1
return log_of_m + e * _LOG_OF_2
def taylor2(x):
m, e = math.frexp(x)
num_terms = 12
half_log_of_m = 0
t = (m - 1) / (m + 1)
t_squared = t * t
t_power = t
denominator = 1
for k in range(num_terms):
half_log_of_m += t_power / denominator
denominator += 2
t_power *= t_squared
return 2 * half_log_of_m + e * _LOG_OF_2
This seems to work well over most of the domain of log(x), but as x approaches 1 (and log(x) approaches 0) the transformation provided by x = m * 2e actually produces a less accurate result. So a better algorithm would first check if x is close to 1, say abs(x-1) < .5, and if so the just compute the taylor series approximation directly on x.
My answer is just using the Taylor series for In(x). I really hope this helps. It is simple and straight to the point.
enter image description here

Taylor expansion in python

How do I calculate and print out the value of ln(1+x) using the series expansion:
ln(1+x) expansion
using a while loop and including terms whose magnitude is greater than 10-8. Print out the sum to each number of terms to show the result converging.
So far this is my code but it calculates lnsum2 to be a very large number and hence never ends.
n=1
lnsum2= np.cumsum((((-1)**(n+1)*(x**n)/n)))
while lnsum2>10**-8:
n+=1
lnsum2 = lnsum2 + np.cumsum((((-1)**(n+1)*(x**n)/n)))
else: print('The sum of terms greater than 10^-8 is:', lnsum2)
Many thanks.
Right I've now got code that works using a while loop. Thanks for all the help!!
Maybe it's a bit over-kill, but here's a nice solution using sympy to evaluate infinite series.
from sympy.abc import k
from sympy import Sum, oo as inf
import math
x = 0.5
result = Sum(
(
x**(2*k-1) /
(2*k-1)
) - (
x**(2*k) / (2*k)
),
(k, 1, inf)).doit()
#print(result) # 0.5*hyper((0.5, 1), (3/2,), 0.25) - 0.14384103622589
print(float(result)) # 0.4054651081081644
print(math.log(x+1, math.e)) # 0.4054651081081644
EDIT:
I think the problem with your original code is that you haven't quite implemented the series (if I'm understanding the figure in your question correctly). It looks like the series you're trying to implement can be represented as
x^(2n-1) x^(2n)
( + ---------- - -------- ... for n = 1 to n = infinity )
2n-1 2n
whereas your code actually implements this series
(-1)^2 * (x * 1) ( (-1)^(n+1) * (x^n) )
----------------- + ( -------------------- ... for n = 2 to n = infinity )
1 ( n )
EDIT 2:
If you really have to do the iterations yourself, rather than using sympy, here is code which works:
import math
x = 0.5
n=0
sums = []
while True:
n += 1
this_sum = (x**(2*n-1) / (2*n-1)) - (x**(2*n) / (2*n))
if abs(this_sum) < 1e-8:
break
sums.append(this_sum)
lnsum = sum(sums)
print('The sum of terms greater than 10^-8 is:\t\t', lnsum)
print('math.log yields:\t\t\t\t', math.log(x+1, math.e))
Output:
The sum of terms greater than 10^-8 is: 0.4054651046035002
math.log yields: 0.4054651081081644

Weighted averaging a list

Thanks for your responses. Yes, I was looking for the weighted average.
rate = [14.424, 14.421, 14.417, 14.413, 14.41]
amount = [3058.0, 8826.0, 56705.0, 30657.0, 12984.0]
I want the weighted average of the top list based on each item of the bottom list.
So, if the first bottom-list item is small (such as 3,058 compared to the total 112,230), then the first top-list item should have less of an effect on the top-list average.
Here is some of what I have tried. It gives me an answer that looks right, but I am not sure if it follows what I am looking for.
for g in range(len(rate)):
rate[g] = rate[g] * (amount[g] / sum(amount))
rate = sum(rate)
EDIT:
After comparing other responses with my code, I decided to use the zip code to keep it as short as possible.
You could use numpy.average to calculate weighted average.
In [13]: import numpy as np
In [14]: rate = [14.424, 14.421, 14.417, 14.413, 14.41]
In [15]: amount = [3058.0, 8826.0, 56705.0, 30657.0, 12984.0]
In [17]: weighted_avg = np.average(rate, weights=amount)
In [19]: weighted_avg
Out[19]: 14.415602815646439
for g in range(len(rate)):
rate[g] = rate[g] * amount[g] / sum(amount)
rate = sum(rate)
is the same as:
sum(rate[g] * amount[g] / sum(amount) for g in range(len(rate)))
which is the same as:
sum(rate[g] * amount[g] for g in range(len(rate))) / sum(amount)
which is the same as:
sum(x * y for x, y in zip(rate, amount)) / sum(amount)
Result:
14.415602815646439
This looks like a weighted average.
values = [1, 2, 3, 4, 5]
weights = [2, 8, 50, 30, 10]
s = 0
for x, y in zip(values, weights):
s += x * y
average = s / sum(weights)
print(average) # 3.38
This outputs 3.38, which indeed tends more toward the values with the highest weights.
Let's use python zip function
zip([iterable, ...])
This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence. When there are multiple arguments which are all of the same length, zip() is similar to map() with an initial argument of None. With a single sequence argument, it returns a list of 1-tuples. With no arguments, it returns an empty list.
weights = [14.424, 14.421, 14.417, 14.413, 14.41]
values = [3058.0, 8826.0, 56705.0, 30657.0, 12984.0]
weighted_average = sum(weight * value for weight, value in zip(weights, values)) / sum(weights)
As a documented and tested function:
def weighted_average(values, weights=None):
"""
Returns the weighted average of `values` with weights `weights`
Returns the simple aritmhmetic average if `weights` is None.
>>> weighted_average([3, 9], [1, 2])
7.0
>>> 7 == (3*1 + 9*2) / (1 + 2)
True
"""
if weights == None:
weights = [1 for _ in range(len(values))]
normalization = 0
val = 0
for value, weight in zip(values, weights):
val += value * weight
normalization += weight
return val / normalization
For completeness another version where the values and weights are stored in tuples:
def weighted_average(values_and_weights):
"""
The input is expected in the form:
[(value_1, weight_1), (value_2, weight_2), ...(value_n, weight_n)]
>>> weighted_average([(3,1), (9,2)])
7.0
>>> 7 == (3*1 + 9*2) / (1 + 2)
True
"""
normalization = 0
val = 0
for value, weight in values_and_weights:
val += value * weight
normalization += weight
return val / normalization

Implementation of Theil inequality index in python

I am trying to implement Theil's index (http://en.wikipedia.org/wiki/Theil_index) in Python to measure inequality of revenue in a list.
The formula is basically Shannon's entropy, so it deals with log. My problem is that I have a few revenues at 0 in my list, and log(0) makes my formula unhappy. I believe adding a tiny float to 0 wouldn't work as log(tinyFloat) = -inf, and that would mess my index up.
[EDIT]
Here's a snippet (taken from another, much cleaner -and freely available-, implementation)
def error_if_not_in_range01(value):
if (value <= 0) or (value > 1):
raise Exception, \
str(value) + ' is not in [0,1)!'
def H(x)
n = len(x)
entropy = 0.0
sum = 0.0
for x_i in x: # work on all x[i]
print x_i
error_if_not_in_range01(x_i)
sum += x_i
group_negentropy = x_i*log(x_i)
entropy += group_negentropy
error_if_not_1(sum)
return -entropy
def T(x):
print x
n = len(x)
maximum_entropy = log(n)
actual_entropy = H(x)
redundancy = maximum_entropy - actual_entropy
inequality = 1 - exp(-redundancy)
return redundancy,inequality
Is there any way out of this problem?
If I understand you correctly, the formula you are trying to implement is the following:
In this case, your problem is calculating the natural logarithm of Xi / mean(X), when Xi = 0.
However, since that has to be multiplied by Xi / mean(X) first, if Xi == 0 the value of ln(Xi / mean(X)) doesn't matter because it will be multiplied by zero. You can treat the value of the formula for that entry as zero, and skip calculating the logarithm entirely.
In the case that you are implementing Shannon's formula directly, the same holds:
In both the first and second form, calculating the log is not necessary if Pi == 0, because whatever value it is, it will have been multiplied by zero.
UPDATE:
Given the code you quoted, you can replace x_i*log(x_i) with a function as follows:
def Group_negentropy(x_i):
if x_i == 0:
return 0
else:
return x_i*log(x_i)
def H(x)
n = len(x)
entropy = 0.0
sum = 0.0
for x_i in x: # work on all x[i]
print x_i
error_if_not_in_range01(x_i)
sum += x_i
group_negentropy = Group_negentropy(x_i)
entropy += group_negentropy
error_if_not_1(sum)
return -entropy

iterative Newton's method

I have got this code to solve Newton's method for a given polynomial and initial guess value. I want to turn into an iterative process which Newton's method actually is. The program should keeping running till the output value "x_n" becomes constant. And that final value of x_n is the actual root. Also, while using this method in my algorithm it should always produce a positive root between 0 and 1. So does converting the negative output (root) into a positive number would make any difference? Thank you.
import copy
poly = [[-0.25,3], [0.375,2], [-0.375,1], [-3.1,0]]
def poly_diff(poly):
""" Differentiate a polynomial. """
newlist = copy.deepcopy(poly)
for term in newlist:
term[0] *= term[1]
term[1] -= 1
return newlist
def poly_apply(poly, x):
""" Apply a value to a polynomial. """
sum = 0.0
for term in poly:
sum += term[0] * (x ** term[1])
return sum
def poly_root(poly):
""" Returns a root of the polynomial"""
poly_d = poly_diff(poly)
x = float(raw_input("Enter initial guess:"))
x_n = x - (float(poly_apply(poly, x)) / poly_apply(poly_d, x))
print x_n
if __name__ == "__main__" :
poly_root(poly)
First, in poly_diff, you should check to see if the exponent is zero, and if so simply remove that term from the result. Otherwise you will end up with the derivative being undefined at zero.
def poly_root(poly):
""" Returns a root of the polynomial"""
poly_d = poly_diff(poly)
x = None
x_n = float(raw_input("Enter initial guess:"))
while x != x_n:
x = x_n
x_n = x - (float(poly_apply(poly, x)) / poly_apply(poly_d, x))
return x_n
That should do it. However, I think it is possible that for certain polynomials this may not terminate, due to floating point rounding error. It may end up in a repeating cycle of approximations that differ only in the least significant bits. You might terminate when the percentage of change reaches a lower limit, or after a number of iterations.
import copy
poly = [[1,64], [2,109], [3,137], [4,138], [5,171], [6,170]]
def poly_diff(poly):
newlist = copy.deepcopy(poly)
for term in newlist:
term[0] *= term[1]
term[1] -= 1
return newlist
def poly_apply(poly, x):
sum = 0.0
for term in poly:
sum += term[0] * (x ** term[1])
return sum
def poly_root(poly):
poly_d = poly_diff(poly)
x = float(input("Enter initial guess:"))
x_n = x - (float(poly_apply(poly, x)) / poly_apply(poly_d, x))
print (x_n)
if __name__ == "__main__" :
poly_root(poly)

Categories