using trapezoidal numerical integration in python - python

I need to integrate this function using trapezoidal rule in python:
theta = .518/r^2 * dr/(sqrt(2*1.158 + 2/r - .518^2/2r^2))
I have written my code and I should be seeing an ellipsoidal structure when plotted. theta should run from 0 to 2pi and r_min = .16 & r_max = .702
import numpy as np
import matplotlib.pyplot as plt
def trapezoidal(f, a, b, n):
h = float(b-a)/n
result = 0.5*f(a) + 0.5*f(b)
for i in range(1, n):
result += f(a + i*h)
result *= h
return result
intg =[]
v = lambda r: (0.5108/(r**2))* (1./np.sqrt(2*1.158+(2/r)-.5108**2/(2*r**2)))
n = np.arange(1,1000,100)
theta = np.arange(0,2*np.pi,100)
for j in n:
numerical = trapezoidal(v, .16,.702 , j)
intg.append(numerical)
plt.plot(numerical,theta)
plt.show()
I am doing some very elementary mistake I guess, because I am getting no plot out of it. I think the trapezoidal routine is correct, because it worked for other functions. your help is very appreciated

Alternatively, you could use quadpy (a project of mine).
import numpy as np
import quadpy
val = quadpy.line_segment.integrate_split(
lambda r: 0.5108/r**2 / np.sqrt(2*1.158 + 2/r - 0.5108**2/(2*r**2)),
0.15, 0.702, 100,
quadpy.line_segment.Trapezoidal()
)
print(val)
gives 0.96194633532. The trapezoidal formula is mostly implemented for historical purposes, however. A better and equally simple rule is quadpy.line_segment.Midpoint. An even better approach is certainly adaptive quadrature
val, error_estimate = quadpy.line_segment.integrate_adaptive(
lambda r: 0.5108/r**2 / np.sqrt(2*1.158 + 2/r - 0.5108**2/(2*r**2)),
[0.15, 0.702],
1.0e-10
)
print(val)
which gives the more accurate 0.961715309492, or even tanh-sinh quadrature
val, error_estimate = quadpy.line_segment.tanh_sinh(
lambda r: 0.5108/r**2 / np.sqrt(2*1.158 + 2/r - 0.5108**2/(2*r**2)),
0.15, 0.702,
1.0e-30
)
print(val)
which gives 0.9617153094932353183036398697528.

There are a couple of issues here.
First one is that the third argument in np.arrange is not the number of values to be generated but the step. This means that theta will have only one value and that n and thus intg will have 10 instead of 100 values.
Assuming that was your intention (100 values) you can do this
intg =[]
v = lambda r: (0.5108/(r**2))* (1./np.sqrt(2*1.158+(2/r)-.5108**2/(2*r**2)))
n = np.arange(1,1000,10)
theta = np.arange(0,2*np.pi,2*np.pi/100)
#print theta
for j in n:
numerical = trapezoidal(v, .16,.702 , j)
intg.append(numerical)
Then you're plotting numerical which is basically a single number and what you probably wanted to plot was the integral value intg - to do so you also need to convert intg from a list into np.array:
intg = np.array(intg)
With these changes the program works as intended,
plt.plot(intg,theta)
plt.show()

If you print the lengths of your numerical and theta, you will see that they are empty lists/arrays.
Try the following:
import numpy as np
import matplotlib.pyplot as plt
def trapezoidal(f, a, b, n):
h = float(b-a)/n
result = 0.5*f(a) + 0.5*f(b)
for i in range(1, n):
result += f(a + i*h)
result *= h
return result
intg =[]
v = lambda r: (0.5108/(r**2))* (1./np.sqrt(2*1.158+(2/r)-.5108**2 /(2*r**2)))
n = np.arange(1, 1001)
theta = np.linspace(0,2.*np.pi,1000)
for j in n:
numerical = trapezoidal(v, .16,.702 , j)
intg.append(numerical)
plt.plot(intg,theta)
plt.show()

Related

how to implement least square polynomial with no built in methods using python?

currently running into a problem solving this.
The objective of the exercise given is to find a polynom of certian degree (the degree is given) from a dataset of points (that can be noist) and to best fit it using least sqaure method.
I don't understand the steps that lead to solving the linear equations?
what are the steps or should anyone provide such a python program that lead to the matrix that I put as an argument in my decomposition program?
Note:I have a python program for cubic splines ,LU decomposition/Guassian decomposition.
Thanks.
I tried to apply guassin / LU decomposition straight away on the dataset but I understand there are more steps to the solution...
I donwt understand how cubic splines add to the mix either..
Edit:
guassian elimintaion :
import numpy as np
import math
def swapRows(v,i,j):
if len(v.shape) == 1:
v[i],v[j] = v[j],v[i]
else:
v[[i,j],:] = v[[j,i],:]
def swapCols(v,i,j):
v[:,[i,j]] = v[:,[j,i]]
def gaussPivot(a,b,tol=1.0e-12):
n = len(b)
# Set up scale factors
s = np.zeros(n)
for i in range(n):
s[i] = max(np.abs(a[i,:]))
for k in range(0,n-1):
# Row interchange, if needed
p = np.argmax(np.abs(a[k:n,k])/s[k:n]) + k
if abs(a[p,k]) < tol: error.err('Matrix is singular')
if p != k:
swapRows(b,k,p)
swapRows(s,k,p)
swapRows(a,k,p)
# Elimination
for i in range(k+1,n):
if a[i,k] != 0.0:
lam = a[i,k]/a[k,k]
a[i,k+1:n] = a[i,k+1:n] - lam*a[k,k+1:n]
b[i] = b[i] - lam*b[k]
if abs(a[n-1,n-1]) < tol: error.err('Matrix is singular')
# Back substitution
b[n-1] = b[n-1]/a[n-1,n-1]
for k in range(n-2,-1,-1):
b[k] = (b[k] - np.dot(a[k,k+1:n],b[k+1:n]))/a[k,k]
return b
def polyFit(xData,yData,m):
a = np.zeros((m+1,m+1))
b = np.zeros(m+1)
s = np.zeros(2*m+1)
for i in range(len(xData)):
temp = yData[i]
for j in range(m+1):
b[j] = b[j] + temp
temp = temp*xData[i]
temp = 1.0
for j in range(2*m+1):
s[j] = s[j] + temp
temp = temp*xData[i]
for i in range(m+1):
for j in range(m+1):
a[i,j] = s[i+j]
return gaussPivot(a,b)
degree = 10 # can be any degree
polyFit(xData,yData,degree)
I was under the impression the code above gets a dataset of points and a degree. The output should be coeefients of a polynom that fits those points but I have a grader that was provided by my proffesor , and after checking the grading the polynom that returns has a lrage error.
After that I tried the following LU decomposition instead:
import numpy as np
def swapRows(v,i,j):
if len(v.shape) == 1:
v[i],v[j] = v[j],v[i]
else:
v[[i,j],:] = v[[j,i],:]
def swapCols(v,i,j):
v[:,[i,j]] = v[:,[j,i]]
def LUdecomp(a,tol=1.0e-9):
n = len(a)
seq = np.array(range(n))
# Set up scale factors
s = np.zeros((n))
for i in range(n):
s[i] = max(abs(a[i,:]))
for k in range(0,n-1):
# Row interchange, if needed
p = np.argmax(np.abs(a[k:n,k])/s[k:n]) + k
if abs(a[p,k]) < tol: error.err('Matrix is singular')
if p != k:
swapRows(s,k,p)
swapRows(a,k,p)
swapRows(seq,k,p)
# Elimination
for i in range(k+1,n):
if a[i,k] != 0.0:
lam = a[i,k]/a[k,k]
a[i,k+1:n] = a[i,k+1:n] - lam*a[k,k+1:n]
a[i,k] = lam
return a,seq
def LUsolve(a,b,seq):
n = len(a)
# Rearrange constant vector; store it in [x]
x = b.copy()
for i in range(n):
x[i] = b[seq[i]]
# Solution
for k in range(1,n):
x[k] = x[k] - np.dot(a[k,0:k],x[0:k])
x[n-1] = x[n-1]/a[n-1,n-1]
for k in range(n-2,-1,-1):
x[k] = (x[k] - np.dot(a[k,k+1:n],x[k+1:n]))/a[k,k]
return x
the results were a bit better but nowhere near what it should be
Edit 2:
I tried the chebyshev method suggested in the comments and came up with:
import numpy as np
def chebyshev_transform(x, n):
"""
Transforms x-coordinates to Chebyshev coordinates
"""
return np.cos(n * np.arccos(x))
def chebyshev_design_matrix(x, n):
"""
Constructs the Chebyshev design matrix
"""
x_cheb = chebyshev_transform(x, n)
T = np.zeros((len(x), n+1))
T[:,0] = 1
T[:,1] = x_cheb
for i in range(2, n+1):
T[:,i] = 2 * x_cheb * T[:,i-1] - T[:,i-2]
return T
degree =10
f = lambda x: np.cos(X)
xdata = np.linspace(-1,1,num=100)
ydata = np.array([f(i) for i in xdata])
M = chebyshev_design_matrix(xdata,degree)
D_x ,D_y = np.linalg.qr(M)
D_x, seq = LUdecomp(D_x)
A = LUsolve(D_x,D_y,seq)
I can't use linalg.qr in my program , it was just for checking how it works.In addition , I didn't get the 'slow way' of the formula that were in the comment.
The program cant get an x point that is not between -1 and 1 , is there any way around it , any normalizition?
Thanks a lot.
Hints:
You are probably asked for an unsophisticated method. If the degree of the polynomial remains low, you can use the straightforward approach below. For the sake of the explanation, I'll use a cubic model.
Assume that you want to fit your data to this polynomial, by observing that it seems to follow a cubic behavior:
ax³ + bx² + cx + d ~ y
[All x and y should be understood with an index i which is omitted for notational convenience.]
If there are more than four data points, you get an overdetermined system of equations, usually with no solution. The trick is to consider the error on the individual equations, e = ax³ + bx² + cx + d - y, and to minimize the total error. As the error is a signed number, negative errors would make minimization impossible. Instead, we minimize the sum of squared errors. (The sum of absolute errors is another option but it unfortunately leads to a much harder problem.)
Min(a, b, c, d) Σ(ax³ + bx² + cx + d - y)²
As the unknown parameters are unconstrained, it suffices to look for a stationary point, i.e. cancel the gradient of the total error. By differentiation on the unknowns a, b, c and d, we obtain
2Σ(ax³x³ + bx²x³ + cxx³ + dx³ - yx³) = 0
2Σ(ax³x² + bx²x² + cxx² + dx² - yx²) = 0
2Σ(ax³x + bx²x + cxx + dx - yx ) = 0
2Σ(ax³ + bx² + cx + d - y ) = 0
As you can recognize, this is a square linear system of equations.

Iterative Binomial Update without Loop

Can this be done without a loop?
import numpy as np
n = 10
x = np.random.random(n+1)
a, b = 0.45, 0.55
for i in range(n):
x = a*x[:-1] + b*x[1:]
I came across this setup in another question. There it was a covered by a little obscure nomenclature. I guess it is related to Binomial options pricing model but don't quite understand the topic to be honest. I just was intrigued by the formula and this iterative update / shrinking of x and wondered if it can be done without a loop. But I can not wrap my head around it and I am not sure if this is even possible.
What makes me think that it might work is that this vatiaton
n = 10
a, b = 0.301201, 0.59692
x0 = 123
x = x0
for i in range(n):
x = a*x + b*x
# ~42
is actually just x0*(a + b)**n
print(np.allclose(x, x0*(a + b)**n))
# True
You are calculating:
sum( a ** (n - i) * b ** i * x[i] * choose(n, i) for 0 <= i <= n)
[That's meant to be pseudocode, not Python.] I'm not sure of the best way to convert that into Numpy.
choose(n, i) is n!/ (i! (n-i)!), not the numpy choose function.
Using #mathfux's comment, one can do
import numpy as np
from scipy.stats import binom
binomial = binom(p=p, n=n)
pmf = binomial(np.arange(n+1))
res = np.sum(x * pmf)
So
res = x.copy()
for i in range(n):
res = p*res[1:] + (p-1)*res[:-1]
is just the expected value of a binomial distributed random variable x.

Does the LCG fail the Kolmogorov-Smirnov test as badly as my code suggests?

I use the following Python code to illustrate the generation of random variables to students:
import numpy as np
import scipy.stats as stats
def lcg(n, x0, M=2**32, a=1103515245, c=12345):
result = np.zeros(n)
for i in range(n):
result[i] = (a*x0 + c) % M
x0 = result[i]
return np.array([x/M for x in result])
x = lcg(10**6, 3)
print(stats.kstest(x, 'uniform'))
The default parameters are the ones used by glibc, according to Wikipedia. The last line of the code prints
KstestResult(statistic=0.043427751892089805, pvalue=0.0)
The pvalue of 0.0 indicates that the observation would basically never occur if the elements of x were truly distributed according to a uniform distribution.
My question is: is there a bug in my code, or does the LCG with the parameters given not pass the Kolmogorov-Smirnov test with 10**6 replicas?
There is problem with your code, it makes uniform distribution like
I've changed your LCG implementation a bit, and all is good now (Python 3.7, Anaconda, Win10 x64)
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
def lcg(n, x0, M=2**32, a=1103515245, c=12345):
result = np.zeros(n)
for i in range(n):
x0 = (a*x0 + c) % M
result[i] = x0
return np.array([x/float(M) for x in result])
#x = np.random.uniform(0.0, 1.0, 1000000)
x = lcg(1000000, 3)
print(stats.kstest(x, 'uniform'))
count, bins, ignored = plt.hist(x, 15, density=True)
plt.plot(bins, np.ones_like(bins), linewidth=2, color='r')
plt.show()
which prints
KstestResult(statistic=0.0007238884545415214, pvalue=0.6711878724246786)
and plots
UPDATE
as #pjs pointed out, you'd better divide by float(M) right in the loop, no need for
second pass over whole array
def lcg(n, x0, M=2**32, a=1103515245, c=12345):
result = np.empty(n)
for i in range(n):
x0 = (a*x0 + c) % M
result[i] = x0 / float(M)
return result
To complement Severin's answer, the reason my code was not working properly is that result was an array of floating point numbers.
We can see the difference between the two implementations already at the second iteration.
After the first iteration, x0 = 3310558080.
In [9]: x0 = 3310558080
In [10]: float_x0 = float(x0)
In [11]: (a*x0 + c) % M
Out[11]: 465823161
In [12]: (a*float_x0 + c) % M
Out[12]: 465823232.0
In [13]: a*x0
Out[13]: 3653251310737929600
In [14]: a*float_x0
Out[14]: 3.6532513107379297e+18
So the problem had to do with the use of floating point numbers.

A question about formatting a DataFrame for use with matplotlib

I have a solution to a PDE that I would like to plot. I have seen two ways to do this in the documentation, one of which works for me, and one that doesn't. No error is generated. One simply results in the correct plot (a sin wave), and the other generates a line with slope of 1. The second method may be useful to know in the future, even if I have code that works now. Thanks in advance.
Working solution:
plt.plot(arange(0, 16*pi, Dt), u[:, index])
plt.show()
This is great and super simple! The below method was found in the matplotlib documentation as well, but it yields an incorrect plot. I'd like to know my error:
Non working solution:
df = pd.DataFrame({'t':arange(0, 16*pi, Dt), 'u':u[:,index]})
plt.plot('t', 'u', data=df)
plt.show()
full code for context
from math import sin, cos, pi, fabs, log10, ceil, floor
from numpy import arange, zeros
import pandas as pd
from matplotlib import pyplot as plt
#function applies periodic boundary condition where h is the period
def apply_pbc(f, i, Dx, M, h):
f[i][0] = f[i][int(h/Dx)]
f[i][int((M + Dx)/Dx)] = f[i][int((M + Dx)/Dx - 1)]
return f
# function for finding an index associated with
# a particular data point of interest for plotting
# or other analysis
def find_index(start, stop, step, x):
counter = len(arange(start, stop, step))
for i in arange(counter):
x_i = start + i*step
if abs(x - x_i) < pow(10, -15):
index = i
print("x = ", x_i, "#index j = ", i)
break
return index
#main body
if __name__ == "__main__":
#constants
a = 0.25
b = 0.25
c = 1
#period of boundary conditions
h = 4*pi
#space and time endpoints
M = 4*pi
N = 16*pi
#mesh
Dx = 0.005*4*pi
Dt = (0.25*Dx)/c
#simplification of numeric method
r = (Dt*pow(c,2))/pow(Dx,2)
#get size of data set
rows = len(arange(0, N, Dt))
cols = len(arange(-Dx, M, Dx))
#initiate solution arrays
u = zeros((rows, cols))
v = zeros((rows, cols))
#apply initial conditions
for j in range(cols):
x = -Dx + j*Dx
u[0][j] = cos(x)
v[0][j] = 0
#solve
for i in range(1, rows):
for j in range(1, cols - 1):
u[i][j] = u[i-1][j] + v[i-1][j]*Dt \
+ (a/2)*(u[i-1][j+1] - 2*u[i-1][j] + u[i-1][j-1])
v[i][j] = v[i-1][j] \
+ r*(u[i-1][j+1] - 2*u[i-1][j] + u[i-1][j-1]) \
+ (b/2)*(v[i-1][j+1] - 2*v[i-1][j] + v[i-1][j-1])
apply_pbc(u, i, Dx, M, h)
apply_pbc(v, i, Dx, M, h)
print("done")
#we want to plot the solution u(t,x), where x = pi
index = find_index(-Dx, M + Dx, Dx, pi)
df = pd.DataFrame({'t':arange(0,16*pi, Dt), 'u':u[:,index]})
plt.plot('t', 'x', data=df)
# plt.plot(arange(0, 16*pi, Dt), u[:, index])
plt.show()
From the documentation of plt.plot():
Plotting labelled data
There's a convenient way for plotting objects with labelled data (i.e.
data that can be accessed by index obj['y']). Instead of giving the
data in x and y, you can provide the object in the data parameter and
just give the labels for x and y:
plot('xlabel', 'ylabel', data=obj)
I think there is just a typo in your code. In the full code you provide, this is the line that makes the plot:
plt.plot('t', 'x', data=df)
which does indeed give
while changing to
plt.plot('t', 'u', data=df)
works as expected:
Last bit of the code:
df = pd.DataFrame({'t':arange(0,16*pi, Dt), 'u':u[:,index]})
plt.plot('t', 'x', data=df) # <-- 'x' instead of 'u'
# plt.plot(arange(0, 16*pi, Dt), u[:, index])
plt.show()

fmin_slsqp returns initial guess finding the minimum of cubic spline

I am trying to find the minimum of a natural cubic spline. I have written the following code to find the natural cubic spline. (I have been given test data and have confirmed this method is correct.) Now I can not figure out how to find the minimum of this function.
This is the data
xdata = np.linspace(0.25, 2, 8)
ydata = 10**(-12) * np.array([1,2,1,2,3,1,1,2])
This is the function
import scipy as sp
import numpy as np
import math
from numpy.linalg import inv
from scipy.optimize import fmin_slsqp
from scipy.optimize import minimize, rosen, rosen_der
def phi(x, xd,yd):
n = len(xd)
h = np.array(xd[1:n] - xd[0:n-1])
f = np.divide(yd[1:n] - yd[0:(n-1)],h)
q = [0]*(n-2)
for i in range(n-2):
q[i] = 3*(f[i+1] - f[i])
A = np.zeros(((n-2),(n-2)))
#define A for j=0
A[0,0] = 2*(h[0] + h[1])
A[0,1] = h[1]
#define A for j = n-2
A[-1,-2] = h[-2]
A[-1,-1] = 2*(h[-2] + h[-1])
#define A for in the middle
for j in range(1,(n-3)):
A[j,j-1] = h[j]
A[j,j] = 2*(h[j] + h[j+1])
A[j,j+1] = h[j+1]
Ainv = inv(A)
B = Ainv.dot(q)
b = (n)*[0]
b[1:(n-1)] = B
# now we find a, b, c and d
a = [0]*(n-1)
c = [0]*(n-1)
d = [0]*(n-1)
s = [0]*(n-1)
for r in range(n-1):
a[r] = 1/(3*h[r]) * (b[r + 1] - b[r])
c[r] = f[r] - h[r]*((2*b[r] + b[r+1])/3)
d[r] = yd[r]
#solution 1 start
for m in range(n-1):
if xd[m] <= x <= xd[m+1]:
s = a[m]*(x - xd[m])**3 + b[m]*(x-xd[m])**2 + c[m]*(x-xd[m]) + d[m]
return(s)
#solution 1 end
I want to find the minimum on the domain of my xdata, so a fmin didn't work as you can not define bounds there. I tried both fmin_slsqp and minimize. They are not compatible with the phi function I wrote so I rewrote phi(x, xd,yd) and added an extra variable such that phi is phi(x, xd,yd, m). M indicates in which subfunction of the spline we are calculating a solution (from x_m to x_m+1). In the code we replaced #solution 1 by the following
# solution 2 start
return(a[m]*(x - xd[m])**3 + b[m]*(x-xd[m])**2 + c[m]*(x-xd[m]) + d[m])
# solution 2 end
To find the minimum in a domain x_m to x_(m+1) we use the following code: (we use an instance where m=0, so x from 0.25 to 0.5. The initial guess is 0.3)
fmin_slsqp(phi, x0 = 0.3, bounds=([(0.25,0.5)]), args=(xdata, ydata, 0))
What I would then do (I know it's crude), is iterate this with a for loop to find the minimum on all subdomains and then take the overall minimum. However, the function fmin_slsqp constantly returns the initial guess as the minimum. So there is something wrong, which I do not know how to fix. If you could help me this would be greatly appreciated. Thanks for reading this far.
When I plot your function phi and the data you feed in, I see that its range is of the order of 1e-12. However, fmin_slsqp is unable to handle that level of precision and fails to find any change in your objective.
The solution I propose is scaling the return of your objective by the same order of precision like so:
return(s*1e12)
Then you get good results.
>>> sol = fmin_slsqp(phi, x0=0.3, bounds=([(0.25, 0.5)]), args=(xdata, ydata))
>>> print(sol)
Optimization terminated successfully. (Exit mode 0)
Current function value: 1.0
Iterations: 2
Function evaluations: 6
Gradient evaluations: 2
[ 0.25]

Categories