Using Horner's method to find P(x) and P'(x)

Using Horner's method to find P(x) and P'(x) - python

My textbook gives the pseudo code for Horner's method as follows:
P(x) = a_n x n + a_n−1 x n−1 + ··· + a_1 x + a_0 = (x−x0)Q(x) + b0
INPUT degree n; coefficients a_0 , a_1 , . . . , a_n ; x0 .
OUTPUT y = P(x 0 ); z = P (x 0 ).
Step 1 Set y = a_n ; (Compute b n for P.)
z = a_n . (Compute b n−1 for Q.)
Step 2 For j = n − 1, n − 2, . . . , 1
set y = x0 * y + a_j ; (Compute b_j for P.)
z = x0 * z + y. (Compute b_j−1 for Q.)
Step 3 Set y = x0 + y + a_0 .
Step 4 OUTPUT (y, z);
STOP.
Now the issue I have here is that the subscript in the pseudo code does not seem to match the subscripts in the formula
I did a python implementation, but the the answers I got seemed wrong, so I changed it slightly
def horner(x0, *a):
'''
Horner's method is an algorithm to calculate a polynomial at
f(x0) and f'(x0)
x0 - The value to avaluate
a - An array of the coefficients
The degree is the polynomial is set equal to the number of coefficients
'''
n = len(a)
y = a[0]
z = a[0]
for j in range(1, n):
y = x0 * y + a[j]
z = x0 * z + y
y = x0 * y + a[-1]
print('P(x0) =', y)
print('P\'(x0) =', z)
It works pretty well, but I need someone with more experience in this regard to give it a once over.
As a very basic test I took the polynomial 2x^4 with a value of -2 for x.
To use it in the method I call it as such
horner(-2, 2, 0, 0 ,0)
The output does indeed look correct for this very simple problem. Since f(x) = 2x^4 then f(-2) = -32, but this is where my implementation gives a different result my answer is positive

I found the problem and I am adding the answer here since it may prove useful to someone in the future
Firstly there is a but in the algorithm, the loop must go up to n-1
def horner(x0, *a):
'''
Horner's method is an algorithm to calculate a polynomial at
f(x0) and f'(x0)
x0 - The value to avaluate
a - An array of the coefficients
The degree is the polynomial is set equal to the number of coefficients
'''
n = len(a)
y = a[0]
z = a[0]
for j in range(1, n - 1):
y = x0 * y + a[j]
z = x0 * z + y
y = x0 * y + a[-1]
print('P(x0) =', y)
print('P\'(x0) =', z)
Second, and this was my biggest mistake, I am not passing enough coefficients to the method
This will calculate 2x^3
horner(-2, 2, 0, 0, 0)
I actually had to call
horner(-2, 2, 0, 0, 0, 0)

Related

Error in implementation of Crank-Nicolson method applied to 1D TDSE?

This is more of a computational physics problem, and I've asked it on physics stack exchange, but no answers on there. This is, I suppose, a mix of the disciplines on here and there (and maybe even mathematics stack exchange), so finding the right place to post is a task in of itself apparently...
I'm attempting to use Crank-Nicolson scheme to solve the TDSE in 1D. The initial wave is a real Gaussian that has been normalised wrt its probability density. As the solution evolves, a depression grows in the central peak of the real part of the wave, and the imaginary part's central trough is perhaps a bit higher than I expect (image below).
Does this behaviour seem reasonable? I have searched around and not seen questions/figures that are similar. I've tested another person's code from Github and it exhibits the same behaviour, which makes me feel a bit better. But I still think the center peak should just decrease in height and increase in width. The likelihood of me getting a physics-based explanation is relatively low here I'd assume, but a computational-based explanation on errors I may have made is more likely.
I'm happy to give more information, for example my code, or the matrices used in the scheme, etc. Thanks in advance!
Here's a link to GIF of time evolution:
And the part of my code relevant to solving the 1D TDSE:
(pretty much the entire thing except the plotting)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# Define function for norm.
def normf(dxc, uc, ic):
return sum(dxc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of position.
def xexpf(dxc, xc, uc, ic):
return sum(dxc * xc * np.square(np.abs(uc[ic, :])))
# Define function for expectation value of squared position.
def xexpsf(dxc, xc, uc, ic):
return sum(dxc * np.square(xc) * np.square(np.abs(uc[ic, :])))
# Define function for standard deviation.
def sdaf(xexpc, xexpsc, ic):
return np.sqrt(xexpsc[ic] - np.square(xexpc[ic]))
# Time t: t0 =< t =< tf. Have N steps at which to evaluate the CN scheme. The
# time interval is dt. decp: variable for plotting to certain number of decimal
# places.
t0 = 0
tf = 20
N = 200
dt = tf / N
t = np.linspace(t0, tf, num = N + 1, endpoint = True)
decp = str(dt)[::-1].find('.')
# Initialise array for filling with norm values at each time step.
norm = np.zeros(len(t))
# Initialise array for expectation value of position.
xexp = np.zeros(len(t))
# Initialise array for expectation value of squared position.
xexps = np.zeros(len(t))
# Initialise array for alternate standard deviation.
sda = np.zeros(len(t))
# Position x: -a =< x =< a. M is an even number. There are M + 1 total discrete
# positions, for the points to be symmetric and centred at x = 0.
a = 100
M = 1200
dx = (2 * a) / M
x = np.linspace(-a, a, num = M + 1, endpoint = True)
# The gaussian function u diffuses over time. sd sets the width of gaussian. u0
# is the initial gaussian at t0.
sd = 1
var = np.power(sd, 2)
mu = 0
u0 = np.sqrt(1 / np.sqrt(np.pi * var)) * np.exp(-np.power(x - mu, 2) / (2 * \
var))
u = np.zeros([len(t), len(x)], dtype = 'complex_')
u[0, :] = u0
# Normalise u.
u[0, :] = u[0, :] / np.sqrt(normf(dx, u, 0))
# Set coefficients of CN scheme.
alpha = dt * -1j / (4 * np.power(dx, 2))
beta = dt * 1j / (4 * np.power(dx, 2))
# Tridiagonal matrices Al and AR. Al to be solved using Thomas algorithm.
Al = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Al[i + 1, i] = alpha
Al[i, i] = 1 - (2 * alpha)
Al[i, i + 1] = alpha
# Corner elements for BC's.
Al[M, M], Al[0, 0] = 1 - alpha, 1 - alpha
Ar = np.zeros([len(x), len(x)], dtype = 'complex_')
for i in range (0, M):
Ar[i + 1, i] = beta
Ar[i, i] = 1 - (2 * beta)
Ar[i, i + 1] = beta
# Corner elements for BC's.
Ar[M, M], Ar[0, 0] = 1 - 2*beta, 1 - beta
# Thomas algorithm variables. Following similar naming as in Wiki article.
a = np.diag(Al, -1)
b = np.diag(Al)
c = np.diag(Al, 1)
NT = len(b)
cp = np.zeros(NT - 1, dtype = 'complex_')
for n in range(0, NT - 1):
if n == 0:
cp[n] = c[n] / b[n]
else:
cp[n] = c[n] / (b[n] - (a[n - 1] * cp[n - 1]))
d = np.zeros(NT, dtype = 'complex_')
dp = np.zeros(NT, dtype = 'complex_')
# Iterate over each time step to solve CN method. Maintain boundary
# conditions. Keep track of standard deviation.
for i in range(0, N):
# BC's.
u[i, 0], u[i, M] = 0, 0
# Find RHS.
d = np.dot(Ar, u[i, :])
for n in range(0, NT):
if n == 0:
dp[n] = d[n] / b[n]
else:
dp[n] = (d[n] - (a[n - 1] * dp[n - 1])) / (b[n] - (a[n - 1] * \
cp[n - 1]))
nc = NT - 1
while nc > -1:
if nc == NT - 1:
u[i + 1, nc] = dp[nc]
nc -= 1
else:
u[i + 1, nc] = dp[nc] - (cp[nc] * u[i + 1, nc + 1])
nc -= 1
norm[i] = normf(dx, u, i)
xexp[i] = xexpf(dx, x, u, i)
xexps[i] = xexpsf(dx, x, u, i)
sda[i] = sdaf(xexp, xexps, i)
# Fill in final norm value.
norm[N] = normf(dx, u, N)
# Fill in final position expectation value.
xexp[N] = xexpf(dx, x, u, N)
# Fill in final squared position expectation value.
xexps[N] = xexpsf(dx, x, u, N)
# Fill in final standard deviation value.
sda[N] = sdaf(xexp, xexps, N)

Find the inverse (reciprocal) of a polynomial modulo another polynomial with coefficients in a finite field

If I have a polynomial P, is there a way to calculate P^-1 modulo Q, being Q another polynomial?
I know that the coefficients of both polynomials belongs to the field of integers modulo z, being z an integer.
I´m not sure if SymPy has already a function for that in its galoistools module.

This is essentially the same as finding polynomials S, T such that PS + QT = 1. Which is possible when gcd(P, Q) = 1, and can be done with galoistools.gf_gcdex. For example, let's invert 3x^3+2x+4 modulo x^2+2x+3 with the coefficient field Z/11Z:
from sympy.polys.domains import ZZ
from sympy.polys.galoistools import gf_gcdex
p = ZZ.map([3, 0, 2, 4])
q = ZZ.map([1, 2, 3])
z = 11
s, t, g = gf_gcdex(p, q, z, ZZ)
if len(g) == 1 and g[0] == 1:
print(s)
else:
print('no inverse')
This prints [8, 5] - the inverse is 8x+5. Sanity check by hand:
(3x^3+2x+4)*(8x+5) = 24x^4 + 15x^3 + 16x^2 + 42x + 20
= 2x^4 + 4x^3 + 5x^2 + 9x + 9
= (x^2 + 2x + 3)*(2x^2 - 1) + 1
= 1 mod q

Solve a nonlinear ODE system for the Frenet frame

I have checked python non linear ODE with 2 variables , which is not my case. Maybe my case is not called as nonlinear ODE, correct me please.
The question isFrenet Frame actually, in which there are 3 vectors T(s), N(s) and B(s); the parameter s>=0. And there are 2 scalar with known math formula expression t(s) and k(s). I have the initial value T(0), N(0) and B(0).
diff(T(s), s) = k(s)*N(s)
diff(N(s), s) = -k(s)*T(s) + t(s)*B(s)
diff(B(s), s) = -t(s)*N(s)
Then how can I get T(s), N(s) and B(s) numerically or symbolically?
I have checked scipy.integrate.ode but I don't know how to pass k(s)*N(s) into its first parameter at all
def model (z, tspan):
T = z[0]
N = z[1]
B = z[2]
dTds = k(s) * N # how to express function k(s)?
dNds = -k(s) * T + t(s) * B
dBds = -t(s)* N
return [dTds, dNds, dBds]
z = scipy.integrate.ode(model, [T0, N0, B0]

Here is a code using solve_ivp interface from Scipy (instead of odeint) to obtain a numerical solution:
import numpy as np
from scipy.integrate import solve_ivp
from scipy.integrate import cumtrapz
import matplotlib.pylab as plt
# Define the parameters as regular Python function:
def k(s):
return 1
def t(s):
return 0
# The equations: dz/dt = model(s, z):
def model(s, z):
T = z[:3] # z is a (9, ) shaped array, the concatenation of T, N and B
N = z[3:6]
B = z[6:]
dTds = k(s) * N
dNds = -k(s) * T + t(s) * B
dBds = -t(s)* N
return np.hstack([dTds, dNds, dBds])
T0, N0, B0 = [1, 0, 0], [0, 1, 0], [0, 0, 1]
z0 = np.hstack([T0, N0, B0])
s_span = (0, 6) # start and final "time"
t_eval = np.linspace(*s_span, 100) # define the number of point wanted in-between,
# It is not necessary as the solver automatically
# define the number of points.
# It is used here to obtain a relatively correct
# integration of the coordinates, see the graph
# Solve:
sol = solve_ivp(model, s_span, z0, t_eval=t_eval, method='RK45')
print(sol.message)
# >> The solver successfully reached the end of the integration interval.
# Unpack the solution:
T, N, B = np.split(sol.y, 3) # another way to unpack the z array
s = sol.t
# Bonus: integration of the normal vector in order to get the coordinates
# to plot the curve (there is certainly better way to do this)
coords = cumtrapz(T, x=s)
plt.plot(coords[0, :], coords[1, :]);
plt.axis('equal'); plt.xlabel('x'); plt.xlabel('y');
T, N and B are vectors. Therefore, there are 9 equations to solve: z is a (9,) array.
For constant curvature and no torsion, the result is a circle:

thanks for your example. And I thought it again, found that since there is formula for dZ where Z is matrix(T, N, B), we can calculate Z[i] = Z[i-1] + dZ[i-1]*deltaS according to the concept of derivative. Then I code and find this idea can solve the circle example. So
is Z[i] = Z[i-1] + dZ[i-1]*deltaS suitable for other ODE? will it fail in some situation, or does scipy.integrate.solve_ivp/scipy.integrate.ode supply advantage over the direct usage of Z[i] = Z[i-1] + dZ[i-1]*deltaS?
in my code, I have to normalize Z[i] because ||Z[i]|| is not always 1. Why does it happen? A float numerical calculation error?
my answer to my question, at least it works for the circle
import numpy as np
from scipy.integrate import cumtrapz
import matplotlib.pylab as plt
# Define the parameters as regular Python function:
def k(s):
return 1
def t(s):
return 0
def dZ(s, Z):
return np.array(
[k(s) * Z[1], -k(s) * Z[0] + t(s) * Z[2], -t(s)* Z[1]]
)
T0, N0, B0 = np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1])
deltaS = 0.1 # step to calculate dZ/ds
num = int(2*np.pi*1/deltaS) + 1 # how many points on the curve we have to calculate
T = np.zeros([num, ], dtype=object)
N = np.zeros([num, ], dtype=object)
B = np.zeros([num, ], dtype=object)
T[0] = T0
N[0] = N0
B[0] = B0
for i in range(num-1):
temp_dZ = dZ(i*deltaS, np.array([T[i], N[i], B[i]]))
T[i+1] = T[i] + temp_dZ[0]*deltaS
T[i+1] = T[i+1]/np.linalg.norm(T[i+1]) # have to do this
N[i+1] = N[i] + temp_dZ[1]*deltaS
N[i+1] = N[i+1]/np.linalg.norm(N[i+1])
B[i+1] = B[i] + temp_dZ[2]*deltaS
B[i+1] = B[i+1]/np.linalg.norm(B[i+1])
coords = cumtrapz(
[
[i[0] for i in T], [i[1] for i in T], [i[2] for i in T]
]
, x=np.arange(num)*deltaS
)
plt.figure()
plt.plot(coords[0, :], coords[1, :]);
plt.axis('equal'); plt.xlabel('x'); plt.xlabel('y');
plt.show()

I found that the equation I listed in the first post does not work for my curve. So I read Gray A., Abbena E., Salamon S-Modern Differential Geometry of Curves and Surfaces with Mathematica. 2006 and found that for arbitrary curve, Frenet equation should be written as
diff(T(s), s) = ||r'||* k(s)*N(s)
diff(N(s), s) = ||r'||*(-k(s)*T(s) + t(s)*B(s))
diff(B(s), s) = ||r'||* -t(s)*N(s)
where ||r'||(or ||r'(s)||) is diff([x(s), y(s), z(s)], s).norm()
now the problem has changed to be some different from that in the first post, because there is no r'(s) function or discrete data array. So I think this is suitable for a new reply other than comment.
I met 2 questions while trying to solve the new equation:
how can we program with r'(s) if scipy's solve_ivp is used?
I try to modify my gaussian solution, but the result is totally wrong.
thanks again
import numpy as np
from scipy.integrate import cumtrapz
import matplotlib.pylab as plt
# Define the parameters as regular Python function:
def k(s):
return 1
def t(s):
return 0
def dZ(s, Z, r_norm):
return np.array([
r_norm * k(s) * Z[1],
r_norm*(-k(s) * Z[0] + t(s) * Z[2]),
r_norm*(-t(s)* Z[1])
])
T0, N0, B0 = np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1])
deltaS = 0.1 # step to calculate dZ/ds
num = int(2*np.pi*1/deltaS) + 1 # how many points on the curve we have to calculate
T = np.zeros([num, ], dtype=object)
N = np.zeros([num, ], dtype=object)
B = np.zeros([num, ], dtype=object)
R0 = N0
T[0] = T0
N[0] = N0
B[0] = B0
for i in range(num-1):
r_norm = np.linalg.norm(R0)
temp_dZ = dZ(i*deltaS, np.array([T[i], N[i], B[i]]), r_norm)
T[i+1] = T[i] + temp_dZ[0]*deltaS
T[i+1] = T[i+1]/np.linalg.norm(T[i+1])
N[i+1] = N[i] + temp_dZ[1]*deltaS
N[i+1] = N[i+1]/np.linalg.norm(N[i+1])
B[i+1] = B[i] + temp_dZ[2]*deltaS
B[i+1] = B[i+1]/np.linalg.norm(B[i+1])
R0 = R0 + T[i]*deltaS
coords = cumtrapz(
[
[i[0] for i in T], [i[1] for i in T], [i[2] for i in T]
]
, x=np.arange(num)*deltaS
)
plt.figure()
plt.plot(coords[0, :], coords[1, :]);
plt.axis('equal'); plt.xlabel('x'); plt.xlabel('y');
plt.show()

Computing integral with the Trapezoidal Rule (the approximate value of the integral, and the number of iterations)

The program needs to compute define integral with a predetermined
accuracy (eps) with the Trapezoidal Rule and my function needs to return:
1.the approximate value of the integral.
2.the number of iterations.
My code:
from math import *
def f1(x):
return (x ** 2 - 1)**(-0.5)
def f2(x):
return (cos(x)/(x + 1))
def integral(f,a,b,eps):
n = 2
x = a
h = (b - a) / n
sum = 0.5 * (f(a) + f(b))
for i in range(n):
sum = sum + f(a + i * h)
sum_2 = h * sum
k = 0
flag = 1
while flag == 1:
n = n * 2
sum = 0
k = k + 1
x = a
h = (b - a) / n
sum = 0.5 * (f(a) + f(b))
for i in range(n):
sum = sum + f(a + i * h)
sum_new = h * sum
if eps > abs(sum_new - sum_2):
t1 = sum_new
t2 = k
return t1, t2
else:
sum_2 = sum_new
x1 = float(input("First-begin: "))
x2 = float(input("First-end: "))
y1 = float(input("Second-begin: "))
y2 = float(input("Second-end: "))
int_1 = integral(f1,x1,y1,1e-6)
int_2 = integral(f2,x2,y2,1e-6)
print(int_1)
print(int_2)
It doesn't work correct. Help, please!

You implemented the math wrong. The error is in the lines
for i in range(n):
sum = sum + f(a + i * h)
range(n) always starts at 0, so in your first iteration you just add the f(a) term again.
If you replace it with
for i in range(1, n):
sum = sum + f(a + i * h)
it works.
Also, you have a ton of redundant code; you basically coded the core of the integration algorithm twice. Try to follow the DRY-principle.

The trapezoidal rule of integration simply says that an approximation to the integral $\int_a^b f(x) dx$ is (b-a) (f(a)+f(b))/2. The error is proportional to (b-a)^2, so that it is possible to have a better estimate using the composite rule, i.e., subdividing the initial interval in a number of shorter intervals.
Is it possible to use shorter intervals and still reuse the function values previously computed, so minimizing the total number of function evaluation?
Yes, it is possible if we divide each interval in two equal parts, so that at stage 0 we use 1 intervals, at stage 1 2 equal intervals and in general, at stage n, we use 2n equal intervals.
Let's start with a simple problem and see if it possible to generalize the procedure…
a, b = 0, 32
L = b-a = 32
by the trapezoidal rule the initial approximation say I0, is given by
I0 = L * (f0+f1)/2
= L * S0
with S0 = (f0+f1)/2; a pictorial representation of the real axis, the coordinates of the interval extremes and the evaluated functions follows
x0 x1
01234567890123456789012345679012
f0 f1
Next, we divide the original interval in two,
L = L/2
x0 x2 x1
01234567890123456789012345679012
f0 f2 f1
and the new approximation, stage n=1, is obtained using two times the trapezoidal rule and applying a bit of algebra
I1 = L * (f0+f2)/2 + L * (f2+f1)/2
= L * [(f0+f1)/2 + f2]
= L * [S0 + S1]
with S1 = f2
Another subdivision, stage n=2, L = L/2 and
x0 x3 x2 x4 x1
012345678901234567890123456789012
f0 f3 f2 f4 f1
I2 = L * [(f0+f3) + (f3+f2) + (f2+f4) + (f4+f1)] / 2
= L * [(f0+f1)/2 + f2 + (f3+f4)]
= L * [S0+S1+S2]
with S2 = f3 + f4.
It is not difficult, given this picture,
x0 x5 x3 x6 x2 x7 x4 x8 x1
012345678901234567890123456789012
f0 f5 f3 f6 f2 f7 f4 f8 f1
to understand that our next approximation can be computed as follows
L = L/2
S3 = f5+f6+f7+f8
I3 = L*[S0+S1+S2+S3]
Now, we have to understand how to compute a generalization of Sn,
n = 1, … — for us, the pseudocode is
L_n = (b-a)/2**n
list_x_n = list(a + L_n + 2*Ln*j for j=0, …, 2**n-1)
Sn = Sum(f(xj) for each xj in list_x_n)
For n = 3, L = (b-a)/8 = 4, we have from the formula above list_x_n = [4, 12, 20, 28], please check with the picture...
Now we are ready to code our algorithm in Python
def trapaezia(f, a, b, tol):
"returns integ(f, (a,b)), estimated error and number of evaluations"
from math import fsum # controls accumulation of rounding errors in sums
L = b - a
S = (f(a)+f(b))/2
I = L*S
n = 1
while True:
L = L/2
new_points = (a+L+j*L for j in range(0, n+n, 2))
delta_S = fsum(f(x) for x in new_points)
new_S = S + delta_S
new_I = L*new_S
# error is estimated using Richardson extrapolation (REP)
err = (new_I - I) * 4/3
if abs(err) > tol:
n = n+n
S, I = new_S, new_I
else:
# we return a better estimate using again REP
return (4*new_I-I)/3, err, n+n+1
If you are curious about Richardson extrapolation, I recommend this document that deals exactly with the application of REP to the trapezoidal rule quadrature algorithm.
If you are curious about math.fsum, the docs don't say too much but the link to the original implementation that also includes an extended explanation of all the issues involved.

Karatsuba algorithm too much recursion

I am trying to implement the Karatsuba multiplication algorithm in c++ but right now I am just trying to get it to work in python.
Here is my code:
def mult(x, y, b, m):
if max(x, y) < b:
return x * y
bm = pow(b, m)
x0 = x / bm
x1 = x % bm
y0 = y / bm
y1 = y % bm
z2 = mult(x1, y1, b, m)
z0 = mult(x0, y0, b, m)
z1 = mult(x1 + x0, y1 + y0, b, m) - z2 - z0
return mult(z2, bm ** 2, b, m) + mult(z1, bm, b, m) + z0
What I don't get is: how should z2, z1, and z0 be created? Is using the mult function recursively correct? If so, I'm messing up somewhere because the recursion isn't stopping.
Can someone point out where the error is?

NB: the response below addresses directly the OP's question about
excessive recursion, but it does not attempt to provide a correct
Karatsuba algorithm. The other responses are far more informative in
this regard.
Try this version:
def mult(x, y, b, m):
bm = pow(b, m)
if min(x, y) <= bm:
return x * y
# NOTE the following 4 lines
x0 = x % bm
x1 = x / bm
y0 = y % bm
y1 = y / bm
z0 = mult(x0, y0, b, m)
z2 = mult(x1, y1, b, m)
z1 = mult(x1 + x0, y1 + y0, b, m) - z2 - z0
retval = mult(mult(z2, bm, b, m) + z1, bm, b, m) + z0
assert retval == x * y, "%d * %d == %d != %d" % (x, y, x * y, retval)
return retval
The most serious problem with your version is that your calculations of x0 and x1, and of y0 and y1 are flipped. Also, the algorithm's derivation does not hold if x1 and y1 are 0, because in this case, a factorization step becomes invalid. Therefore, you must avoid this possibility by ensuring that both x and y are greater than b**m.
EDIT: fixed a typo in the code; added clarifications
EDIT2:
To be clearer, commenting directly on your original version:
def mult(x, y, b, m):
# The termination condition will never be true when the recursive
# call is either
# mult(z2, bm ** 2, b, m)
# or mult(z1, bm, b, m)
#
# Since every recursive call leads to one of the above, you have an
# infinite recursion condition.
if max(x, y) < b:
return x * y
bm = pow(b, m)
# Even without the recursion problem, the next four lines are wrong
x0 = x / bm # RHS should be x % bm
x1 = x % bm # RHS should be x / bm
y0 = y / bm # RHS should be y % bm
y1 = y % bm # RHS should be y / bm
z2 = mult(x1, y1, b, m)
z0 = mult(x0, y0, b, m)
z1 = mult(x1 + x0, y1 + y0, b, m) - z2 - z0
return mult(z2, bm ** 2, b, m) + mult(z1, bm, b, m) + z0

Usually big numbers are stored as arrays of integers. Each integer represents one digit. This approach allows to multiply any number by the power of base with simple left shift of the array.
Here is my list-based implementation (may contain bugs):
def normalize(l,b):
over = 0
for i,x in enumerate(l):
over,l[i] = divmod(x+over,b)
if over: l.append(over)
return l
def sum_lists(x,y,b):
l = min(len(x),len(y))
res = map(operator.add,x[:l],y[:l])
if len(x) > l: res.extend(x[l:])
else: res.extend(y[l:])
return normalize(res,b)
def sub_lists(x,y,b):
res = map(operator.sub,x[:len(y)],y)
res.extend(x[len(y):])
return normalize(res,b)
def lshift(x,n):
if len(x) > 1 or len(x) == 1 and x[0] != 0:
return [0 for i in range(n)] + x
else: return x
def mult_lists(x,y,b):
if min(len(x),len(y)) == 0: return [0]
m = max(len(x),len(y))
if (m == 1): return normalize([x[0]*y[0]],b)
else: m >>= 1
x0,x1 = x[:m],x[m:]
y0,y1 = y[:m],y[m:]
z0 = mult_lists(x0,y0,b)
z1 = mult_lists(x1,y1,b)
z2 = mult_lists(sum_lists(x0,x1,b),sum_lists(y0,y1,b),b)
t1 = lshift(sub_lists(z2,sum_lists(z1,z0,b),b),m)
t2 = lshift(z1,m*2)
return sum_lists(sum_lists(z0,t1,b),t2,b)
sum_lists and sub_lists returns unnormalized result - single digit can be greater than the base value. normalize function solved this problem.
All functions expect to get list of digits in the reverse order. For example 12 in base 10 should be written as [2,1]. Lets take a square of 9987654321.
» a = [1,2,3,4,5,6,7,8,9]
» res = mult_lists(a,a,10)
» res.reverse()
» res
[9, 7, 5, 4, 6, 1, 0, 5, 7, 7, 8, 9, 9, 7, 1, 0, 4, 1]

The goal of the Karatsuba multiplication is to improve on the divide-and conquer multiplication algorithm by making 3 recursive calls instead of four. Therefore, the only lines in your script that should contain a recursive call to the multiplication are those assigning z0,z1 and z2. Anything else will give you a worse complexity. You can't use pow to compute bm when you haven't defined multiplication yet (and a fortiori exponentiation), either.
For that, the algorithm crucially uses the fact that it is using a positional notation system. If you have a representation x of a number in base b, then x*bm is simply obtained by shifting the digits of that representation m times to the left. That shifting operation is essentially "free" with any positional notation system. That also means that if you want to implement that, you have to reproduce this positional notation, and the "free" shift. Either you chose to compute in base b=2 and use python's bit operators (or the bit operators of a given decimal, hex, ... base if your test platform has them), or you decide to implement for educational purposes something that works for an arbitrary b, and you reproduce this positional arithmetic with something like strings, arrays, or lists.
You have a solution with lists already. I like to work with strings in python, since int(s, base) will give you the integer corresponding to the string s seen as a number representation in base base: it makes tests easy. I have posted an heavily commented string-based implementation as a gist here, including string-to-number and number-to-string primitives for good measure.
You can test it by providing padded strings with the base and their (equal) length as arguments to mult:
In [169]: mult("987654321","987654321",10,9)
Out[169]: '966551847789971041'
If you don't want to figure out the padding or count string lengths, a padding function can do it for you:
In [170]: padding("987654321","2")
Out[170]: ('987654321', '000000002', 9)
And of course it works with b>10:
In [171]: mult('987654321', '000000002', 16, 9)
Out[171]: '130eca8642'
(Check with wolfram alpha)

I believe that the idea behind the technique is that the zi terms are computed using the recursive algorithm, but the results are not unified together that way. Since the net result that you want is
z0 B^2m + z1 B^m + z2
Assuming that you choose a suitable value of B (say, 2) you can compute B^m without doing any multiplications. For example, when using B = 2, you can compute B^m using bit shifts rather than multiplications. This means that the last step can be done without doing any multiplications at all.
One more thing - I noticed that you've picked a fixed value of m for the whole algorithm. Typically, you would implement this algorithm by having m always be a value such that B^m is half the number of digits in x and y when they are written in base B. If you're using powers of two, this would be done by picking m = ceil((log x) / 2).
Hope this helps!

In Python 2.7: Save this file as Karatsuba.py
def karatsuba(x,y):
"""Karatsuba multiplication algorithm.
Return the product of two numbers in an efficient manner
#author Shashank
date: 23-09-2018
Parameters
----------
x : int
First Number
y : int
Second Number
Returns
-------
prod : int
The product of two numbers
Examples
--------
>>> import Karatsuba.karatsuba
>>> a = 1234567899876543211234567899876543211234567899876543211234567890
>>> b = 9876543211234567899876543211234567899876543211234567899876543210
>>> Karatsuba.karatsuba(a,b)
12193263210333790590595945731931108068998628253528425547401310676055479323014784354458161844612101832860844366209419311263526900
"""
if len(str(x)) == 1 or len(str(y)) == 1:
return x*y
else:
n = max(len(str(x)), len(str(y)))
m = n/2
a = x/10**m
b = x%10**m
c = y/10**m
d = y%10**m
ac = karatsuba(a,c) #step 1
bd = karatsuba(b,d) #step 2
ad_plus_bc = karatsuba(a+b, c+d) - ac - bd #step 3
prod = ac*10**(2*m) + bd + ad_plus_bc*10**m #step 4
return prod

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Horner's method to find P(x) and P'(x) - python

Related

Error in implementation of Crank-Nicolson method applied to 1D TDSE?

Find the inverse (reciprocal) of a polynomial modulo another polynomial with coefficients in a finite field

Solve a nonlinear ODE system for the Frenet frame

Computing integral with the Trapezoidal Rule (the approximate value of the integral, and the number of iterations)

Karatsuba algorithm too much recursion

Categories

Resources