I'm trying to calculate the error for the Taylor series I've calculated with the following code:
# Define initial values, including appropriate value of x for the series input
import numpy as np
x = -0.9
i = 1
taySum = 0
ln = np.log(1.9)
terms = 1
''' Iterate through the series while checking that
the difference between the obtained series value and ln(1.9)
exceeds 10 digits of accuracy. Stop iterating once the series
value is within 10 digit accuracy of ln(1.9).'''
while (abs(taySum - ln) > 0.5e-10) == True:
taySum += (-1) * (pow(x,i))/(i)
i += 1
terms += 1
print ('value: {}, terms: {}'.format(taySum, terms))
I need to somehow incorporate the error function which calculates the kth derivative and I'm not sure how to do this. The error formula is available at this website which is the following:
There is no way to calculate the error in a taylor series exactly unless you know the exact value it is converging to, which for something like ln 1.9 we don't. The formula you have quoted gives the error in terms of a quantity c < z < x (assuming c < x), which in this case is in the range 0 < z < 0.9, but is otherwise unknown. This means we cannot use this use formula to find the exact error. (otherwise we could find the exact value of ln 1.9, which is not possible)
What this formula does is put bounds on the error. If you look at the formula you will see it has the same form as the next term in the series with the argument of f^(n+1) changed from c, the point you are expanding around, to z, our unknown parameter. In other words it is saying that the error is approximately the same size as the next term in the series, or at least it will if the parameter you are expanding in is small.
With this in mind I would approximate the error by simply computing the next term in the series and saying it is basically that.
Related
I will need some help to convert a formula into a python code.
D=2∑ni=1{Ylog[YE(Y)]−[Y−E(Y)]}
Let's say that you've solved the issue of computing the mean of a random variable like Y. So, i'm assuming that you have something called mean(sample: list) which receives a list of values taken by the R.V. (Random Variable) Y.
Now, to convert the formula into code, there may be many ways. I personally prefer the ones that are easier to read and follow.
import math
sample = [] # Whatever values your RV takes
mean_of_Y = mean(sample)
D = 0
# This part will make the summation - obs stands for observation
for obs in sample:
D += obs*math.log(obs/mean_of_Y, 10) - (obs - mean_of_Y)
# I made the assumption that the log's base is 10
# Now we need to multiply it by 2
D = D*2
# If you want to see it
print(D)
import math
ys = [1,2,3] # 3 classes
expectation = {
1 : 2.718281828459045,
2 : 7.38905609893065,
3 : 20.085536923187668,
}
#i have given exponent value as expectation in the example, some sample values
D = sum([((y * math.log(y/expectation[y])) - (y - expectation[y])) for y in ys]) * 2
Expected value is differ for discrete and continous random variable.
If the expected value is discrete, the expected value can be calculated as follows,
P(Y=y) can be calculated by Probability mass function
If the expected value is continous, the expected value can be calculated as follows,
Where the Fy(Y) can be calculated by Probability density function and (a,b) is the range of the values
I have an array "D" that contains dogs and their health conditions.
The classifier() method returns either 1 or 0 and takes one row of the 2D array as input.
I want to compare the classifier result to column 13 of the 2D array
In an ideal case the classifier would always return the same value as specified in that column.
Now I try to calculate the total hitrate of the classifier by adding up successes and dividing it by the total number of results.
So far I have worked out an enumerate for loop to hand over rows to the classifier in sequence.
def accuracy(D, classifier):
for i, item in enumerate(D):
if classifier(item)==D[i,13]
#Compare result of classifier with actual value
x+=1 #Increase x on a hit
acc=(x/D.length)
#Divide x by length of D to calculate hitrate eg. "0.5"; 100% would be "1"
return acc
There is probably a simple formatting error somewhere or I have an error in my logic.
(Am 2 Days into Python now)
I think I might not be doing the if compare correctly.
Assuming both D and classifier are defined, there are some errors in your code which should all give reasonable error messages (apart from the float casting, that one can be tricky with python).
You're both missing a : in the if-query, as well as you're trying to access the array D like D[i, 13] which isn't allowed. 2D-arrays is accessed with another set of [], like D[i][13]. However, since you're already enumerating the 2D-array, you may as well use the item[13] to get the value.
Lastly, if you want a decimal value at the end you'll also need to cast at least one of the values to a float, like float(x)/D.length, otherwise it will just round it to 0 or 1.
Fixed code:
for i, item in enumerate(D):
if classifier(item) == D[i][13]:
# if classifier(item) == item[13]: # This should also work, you can use either.
x += 1 #Increase x on a hit
acc = (float(x)/D.length)
# Divide x by length of D to calculate hitrate eg. "0.5"; 100% would be "1"
return acc
I am using scipy.optimize.minimize to try to determine the optimal parameters of a probability density function (PDF). My PDF involves a discrete Gaussian kernel (https://en.wikipedia.org/wiki/Gaussian_function and https://en.wikipedia.org/wiki/Scale_space_implementation#The_discrete_Gaussian_kernel).
In theory, I know the average value of the PDF (where the PDF should be centered on). So if I were to calculate the expectation value of my PDF, I should recover the mean value that I already know. My PDF is sampled at discrete values of n (which must never be negative and should start at 0 to make any physical sense), and I am trying to determine the optimal value of t (the "scaling factor") to recover the average value of the PDF (which again, I already know ahead of time).
My minimal working example to determine the optimal "scaling factor" t is the following:
#!/usr/bin/env python3
import numpy as np
from scipy.special import iv
from scipy.optimize import minimize
def discrete_gaussian_kernel(t, n):
return np.exp(-t) * iv(n, t)
def expectation_value(t, average):
# One constraint is that the starting value
# of the range over which I sample the PDF
# should be 0.
# Method 1 - This seems to give good, consistent results
int_average = int(average)
ceiling_average = int(np.ceil(average))
N = range(int_average - ceiling_average + 1,
int_average + ceiling_average + 2)
# Method 2 - The multiplicative factor for 'end' is arbitrary.
# I should in principle be able make end be as large as
# I want since the PDF goes to zero for large values of n,
# but this seems to impact the result and I do now know why.
#start = 0
#end = 2 * int(average)
#N = range(start, end)
return np.sum([n * discrete_gaussian_kernel(t, n - average) for n in N])
def minimize_function(t, average):
return average - expectation_value(t, average)
if __name__ == '__main__':
average = 8.33342
#average = 7.33342
solution = minimize(fun = minimize_function,
x0 = 1,
args = average)
print(solution)
t = solution.x[0]
print(' solution t =', t)
print(' given average =', average)
print('recalculated average =', expectation_value(t, average))
I have two problems with my minimal working example:
1) The code works OK for some values of what I choose for the variable "average." One example of this is when the value is 8.33342. However, the code does not work for other values, for example 7.33342. In this case, I get
RuntimeWarning: overflow encountered in exp
so I was thinking that maybe scipy.optimize.minimize is choosing a bad value for t (like a large negative number). I am confident that this is the problem since I have printed out the value of t in the function expectation_value, and t becomes increasingly negative. So I would like to add bounds to the possible values of what "t" can take ("t" should not be negative). Looking at the documentation of scipy.optimize.minimize, there is a bounds keyword argument. So I tried:
solution = minimize(fun = minimize_function,
x0 = 1,
args = average,
bounds = ((0, None)))
but I get the error:
ValueError: length of x0 != length of bounds
I searched for this error on stackoverflow, and there are some other threads, but I did not find any helpful. How can I set a bound successfully?
2) My other question has to do with scipy.optimize.minimize being sensitive to the range over which I calculate the expectation value. For an average value of
average = 8.33342
and the method of calculating the range as
# Method 1 - This seems to give good, consistent results
int_average = int(average)
ceiling_average = int(np.ceil(average))
N = range(int_average - ceiling_average + 1,
int_average + ceiling_average + 2)
the "recalculated average" is 8.3329696426. But for the other method (which has a very similar range),
# Method 2 - The multiplicative factor for 'end' is arbitrary.
# I should in principle be able make end be as large as
# I want since the PDF goes to zero for large values of n,
# but this seems to impact the result and I do now know why.
start = 0
end = 2 * int(average)
N = range(start, end)
the "recalculated average" is 8.31991111857. The ranges are similar in each case, so I don't know why there is such a large change, especially since I what my recalculated average to be as close as possible to the true average. And if I were to extend the range to larger values (which I think is reasonable since the PDF goes to zero there),
start = 0
end = 4 * int(average)
N = range(start, end)
the "recalculated average" is 9.12939372912, which is even worse. So is there a consistent method to calculate the range so that the reconstructed average is always as close as possible to the true average? The scaling factor can take on any value so I would think scipy.optimize.minimize should be able to find a scaling factor to get back the true average exactly.
EDIT: after more trial and error, I figured out that for some reason, python says that 1/52 is 0, can anyone explain me why, so I can avoid this problem in the future?
I've been struggling with a script for a while now, mainly because me or my fellow students simply can't find out what's wrong with it.
Trying to keep things simple, we've got data and a model and we have to rescale some of the datapoints to the model and then do a chi2square minimalization in order to find the best rescaling factor.
I've tried multiple things already. Tried putting everything in 1 loop, when that didn't work, I tried splitting the loops up etc.
The relevant part of my code looks like this:
#Here I pick the values of the model that correspond to the data
y4 = np.zeros((len(l),1))
for x in range(0,len(l)):
if l[x] < 2.16:
for y in range(0,len(lmodel)):
if lmodel[y] == l[x]:
y4[x] = y2[y]
elif lmodel[y] < l[x] < lmodel[y+1]:
y4[x] = (y2[y] + y2[y+1])/2
else:
y4[x] = y1[x]
#Do Chi2 calculation
#First, I make a matrix with all the possible rescaled values
chi2 = np.zeros((200,1))
y3 = np.zeros((len(l),len(chi2)))
for z in range(0,len(chi2)):
for x in range(0,len(l)):
if l[x] < 2.16:
y3[x,z] = y1[x]*10**(0.4*Al[x]*z/100)
else:
y3[x,z] = y1[x]
#Here I calculate the chisquare for each individual column and put it in the chi2 array
dummy = np.zeros((len(l),1))
for x in range(0,len(chi2)):
for t in range(0, len(l)):
dummy[t] = (1/52)*((y3[t,x] - y4[t])/fle[t])**2
chi2[x] = np.sum(dummy)
The thing is that no matter what I try, for some reason, my dummy array is always all zeros, making every single chi square value 0.
I've tried making 'dummy' a matrix and summing afterwards, I've tried printing individual values for the calculation of the dummy[t]'s, and some of them were 0 (as expected), some weren't, so logically, if the individual values aren't all 0, neither should every value in dummy be.
I just can't find where I go wrong, and why I keep getting arrays of zeros.
In Python 2 (which most people are still using), 1 / 52 is an integer division, so returns 0. You can fix it by explicitly using floating point numbers, e.g. 1.0 / 52.
In Python 3, this is no longer true--dividing two integers can return a float.
I am working on a homework problem for which I am supposed to make a function that interpolates sin(x) for n+1 interpolation points and compares the interpolation to the actual values of sin at those points. The problem statement asks for a function Lagrangian(x,points) that accomplishes this, although my current attempt at executing it does not use 'x' and 'points' in the loops, so I think I will have to try again (especially since my code doesn't work as is!) However, why I can't I access the items in the x_n array with an index, like x_n[k]? Additionally, is there a way to only access the 'x' values in the points array and loop over those for L_x? Finally, I think my 'error' definition is wrong, since it should also be an array of values. Is it necessary to make another for loop to compare each value in the 'error' array to 'max_error'? This is my code right now (we are executing in a GUI our professor made, so I think some of the commands are unique to that such as messages.write()):
def problem_6_run(problem_6_n, problem_6_m, plot, messages, **kwargs):
n = problem_6_n.value
m = problem_6_m.value
messages.write('\n=== PROBLEM 6 ==========================\n')
x_n = np.linspace(0,2*math.pi,n+1)
y_n = np.sin(x_n)
points = np.column_stack((x_n,y_n))
i = 0
k = 1
L_x = 1.0
def Lagrange(x, points):
for i in n+1:
for k in n+1:
return L_x = (x- x_n[k] / x_n[i] - x_n[k])
return Lagrange = y_n[i] * L_x
error = np.sin(x) - Lagrange
max_error = 0
if error > max_error:
max_error = error
print.messages('Maximum error = &g' % max_error)
plot.draw_lines(n+1,np.sin(x))
plot.draw_points(m,Lagrange)
plots.draw_points(m,error)
Edited:
Yes, the different things ThiefMaster mentioned are part of my (non CS) professor's environment; and yes, voithos, I'm using numpy and at this point have definitely had more practice with Matlab than Python (I guess that's obvious!). n and m are values entered by the user in the GUI; n+1 is the number of interpolation points and m is the number of points you plot against later.
Pseudocode:
Given n and m
Generate x_n a list of n evenly spaced points from 0 to 2*pi
Generate y_n a corresponding list of points for sin(x_n)
Define points, a 2D array consisting of these ordered pairs
Define Lagrange, a function of x and points
for each value in the range n+1 (this is where I would like to use points but don't know how to access those values appropriately)
evaluate y_n * (x - x_n[later index] / x_n[earlier index] - x_n[later index])
Calculate max error
Calculate error interpolation Lagrange - sin(x)
plot sin(x); plot Lagrange; plot error
Does that make sense?
Some suggestions:
You can access items in x_n via x_n[k] (to answer your question).
Your loops for i in n+1: and for k in n+1: only execute once each, one with i=n+1 and one with k=n+1. You need to use for i in range(n+1) (or xrange) to get the whole list of values [0,1,2,...,n].
in error = np.sin(x) - Lagrange: You haven't defined x anywhere, so this will probably result in an error. Did you mean for this to be within the Lagrange function? Also, you're subtracting a function (Lagrange) from a number np.sin(x), which isn't going to end well.
When you use the return statement in your def Lagrange you are exiting your function. So your loop will never loop more than once because you're returning out of the function. I think you might actually want to store those values instead of returning them.
Can you write some pseudocode to show what you'd like to do? e.g.:
Given a set of points `xs` and "interpolated" points `ys`:
For each point (x,y) in (xs,ys):
Calculate `sin(x)`
Calculate `sin(x)-y` being the difference between the function and y
.... etc etc
This will make the actual code easier for you to write, and easier for us to help you with (especially if you intellectually understand what you're trying to do, and the only problem is with converting that into python).
So : try fix up some of these points in your code, and try write some pseudocode to say what you want to do, and we'll keep helping you :)