Speeding up arithmetic with Python Decimal library - python
I am trying to run a function that is similar to Google's PageRank algorithm (for non-commercial purposes, of course). Here is the Python code; note that a[0] is the only thing that matters here, and a[0] contains an n x n matrix such as [[0,1,1],[1,0,1],[1,1,0]]. Also, you can find where I got this code from on Wikipedia:
def GetNodeRanks(a): # graph, names, size
numIterations = 10
adjacencyMatrix = copy.deepcopy(a[0])
b = [1]*len(adjacencyMatrix)
tmp = [0]*len(adjacencyMatrix)
for i in range(numIterations):
for j in range(len(adjacencyMatrix)):
tmp[j] = 0
for k in range(len(adjacencyMatrix)):
tmp[j] = tmp[j] + adjacencyMatrix[j][k] * b[k]
norm_sq = 0
for j in range(len(adjacencyMatrix)):
norm_sq = norm_sq + tmp[j]*tmp[j]
norm = math.sqrt(norm_sq)
for j in range(len(b)):
b[j] = tmp[j] / norm
print b
return b
When I run this implementation (on a matrix much larger than a 3 x 3 matrix, n.b.), it does not yield enough precision to calculate the ranks in a way that allows me to compare them usefully. So I tried this instead:
from decimal import *
getcontext().prec = 5
def GetNodeRanks(a): # graph, names, size
numIterations = 10
adjacencyMatrix = copy.deepcopy(a[0])
b = [Decimal(1)]*len(adjacencyMatrix)
tmp = [Decimal(0)]*len(adjacencyMatrix)
for i in range(numIterations):
for j in range(len(adjacencyMatrix)):
tmp[j] = Decimal(0)
for k in range(len(adjacencyMatrix)):
tmp[j] = Decimal(tmp[j] + adjacencyMatrix[j][k] * b[k])
norm_sq = Decimal(0)
for j in range(len(adjacencyMatrix)):
norm_sq = Decimal(norm_sq + tmp[j]*tmp[j])
norm = Decimal(norm_sq).sqrt
for j in range(len(b)):
b[j] = Decimal(tmp[j] / norm)
print b
return b
Even at this unhelpfully low precision, the code was extremely slow and never finished running in the time I sat waiting for it to run. Previously, the code was quick but insufficiently precise.
Is there a sensible/easy way to make the code run quickly and precisely at the same time?
Few tips for speeding up:
optimize code inside of loops
move all things out of inner loop up, if possible.
do not recompute, what is already known, use variables
do not do things, which are not necessary, skip them
consider using list comprehension, it is often a bit faster
stop optimizing as soon as it gets acceptable speed
Walking through your code:
from decimal import *
getcontext().prec = 5
def GetNodeRanks(a): # graph, names, size
# opt: pass in directly a[0], you do not use the rest
numIterations = 10
adjacencyMatrix = copy.deepcopy(a[0])
#opt: why copy.deepcopy? You do not modify adjacencyMatric
b = [Decimal(1)]*len(adjacencyMatrix)
# opt: You often call Decimal(1) and Decimal(0), it takes some time
# do it only once like
# dec_zero = Decimal(0)
# dec_one = Decimal(1)
# prepare also other, repeatedly used data structures
# len_adjacencyMatrix = len(adjacencyMatrix)
# adjacencyMatrix_range = range(len_ajdacencyMatrix)
# Replace code with pre-calculated variables yourself
tmp = [Decimal(0)]*len(adjacencyMatrix)
for i in range(numIterations):
for j in range(len(adjacencyMatrix)):
tmp[j] = Decimal(0)
for k in range(len(adjacencyMatrix)):
tmp[j] = Decimal(tmp[j] + adjacencyMatrix[j][k] * b[k])
norm_sq = Decimal(0)
for j in range(len(adjacencyMatrix)):
norm_sq = Decimal(norm_sq + tmp[j]*tmp[j])
norm = Decimal(norm_sq).sqrt #is this correct? I woudl expect .sqrt()
for j in range(len(b)):
b[j] = Decimal(tmp[j] / norm)
print b
return b
Now few samples of how can be list processing optimized in Python.
Using sum, change:
norm_sq = Decimal(0)
for j in range(len(adjacencyMatrix)):
norm_sq = Decimal(norm_sq + tmp[j]*tmp[j])
to:
norm_sq = sum(val*val for val in tmp)
A bit of list comprehension:
Change:
for j in range(len(b)):
b[j] = Decimal(tmp[j] / norm)
change to:
b = [Decimal(tmp_itm / norm) for tmp_itm in tmp]
If you get this coding style, you will be able optimizing the initial loops too and will probably find, that some of pre-calculated variables are becoming obsolete.
Related
Why does my program for the Chudnovsky algorithm give the wrong result?
I am trying to code the Chudnovsky algorithm in python. However, when I run my code, it gives me a very small number (-5.051212624421025e-55) which is not pi. I am in middle school, and I don't know anybody that can help me. What am I doing wrong? Here is a link to the Chudnovsky formula: https://levelup.gitconnected.com/generating-the-value-of-pi-to-a-known-number-of-decimals-places-in-python-e93986bb474d Here is my code: import math def fact(exi): memory = exi for i in range(1, exi): memory *= i return memory k = 10 s = 0 for i in range(0, k): a = -1^k b = fact(6*k) c = (545140134*k) + 13591409 d = fact(3*k) e = (fact(k))^3 f = (3 * k) + 3/2 g = math.pow(640320, f) numerator = (a*b*c) denominator = (d*e*f) s += (numerator / denominator) s *= 12 print(1 / s) Here is my updated code: import math def fact(exi): memory = exi for i in range(1, exi): memory *= i return memory k = 17 s = 0 for i in range(1, k): a = (-1)**i b = fact(6*i) c = (545140134*i) + 13591409 d = fact(3*i) e = (fact(i))**3 f = (3 * i) + 3/2 g = math.pow(640320, f) num = (a*b*c) den = (d*e*g) s += (num / den) s *= 12 print(1 / s)
I see two mistakes: When comparing with the formula shown on Wikipedia, it looks like you should use the iteration variable i (named q in the formula) where you currently use k in the loop. In your code k is the upper bound for i. The exponentiation operator in Python is **, not ^ (which is bitwise XOR).
Memory limit exceeded in Python
I am solving a problem that needs either a list of integers or a dictionary of size 10^18. Upon running the code the compiler throws an error message saying "Memory Limit Exceeded". Here is my code: def fun(l, r, p): #f = [None, 1, 1] f = {0:0, 1:1, 2:1} su = 0 for i in range(1, r): if i%2 == 0: f[i+2] = 2*f[i+1] - f[i] + 2 #f.append(2*f[i+1] - f[i] + 2) else: f[i+2] = 3*f[i] #f.append(3*f[i]) for k in range(l, r): su = su + f[k] su = (su + f[r]) % p print(su) t, p = input().split() p = int(p) t = int(t) #t = 3 #p = 100000007 for i in range(t): l , r = input().split() l = int(l) r = int(r) fun(l, r, p) It is showing memory limit exceeded with a maximum memory usage of 306612 KiB.
Two observations here: You don't need to store all numbers simultaneously. You can use the deque and generator functions to generate the numbers by keeping track of only the last three digits generated instead of the entire sequence. import itertools from collections import deque def infinite_fun_generator(): seed = [0, 1, 1] dq = deque(maxlen=2) dq.extend(seed) yield from seed for i in itertools.count(1): if i % 2 == 0: dq.append(2 * dq[-1] - dq[-2] + 2) else: dq.append(3 * dq[-2]) yield dq[-1] def fun(l, r, p): funs = itertools.islice(infinite_fun_generator(), l, r + 1) summed_funs = itertools.accumulate(funs, lambda a, b: (a + b) % p) return deque(summed_funs, maxlen=1)[-1] You might have a better chance asking this in Math.SE since I don't want to do the math right now, but just like with the Fibonacci sequence there's likely an analytic solution that you can use to compute the nth member of the sequence analytically without having to iteratively compute the intermediate numbers and it may even be possible to analytically derive a formula to compute the sums in constant time.
Confusing result with quadratic regression
So, I'm trying to fit some pairs of x,y data with a quadratic regression, a sample formula can be found at http://polynomialregression.drque.net/math.html. Following is my code that does the regression using that explicit formula and using numpy inbuilt functions, import numpy as np x = [6.230825,6.248279,6.265732] y = [0.312949,0.309886,0.306639472] toCheck = x[2] def evaluateValue(coeff,x): c,b,a = coeff val = np.around( a+b*x+c*x**2,9) act = 0.306639472 error= np.abs(act-val)*100/act print "Value = {:.9f} Error = {:.2f}%".format(val,error) ###### USing numpy###################### coeff = np.polyfit(x,y,2) evaluateValue(coeff, toCheck) ################# Using explicit formula def determinant(a,b,c,d,e,f,g,h,i): # the matrix is [[a,b,c],[d,e,f],[g,h,i]] return a*(e*i - f*h) - b*(d*i - g*f) + c*(d*h - e*g) a = b = c = d = e = m = n = p = 0 a = len(x) for i,j in zip(x,y): b += i c += i**2 d += i**3 e += i**4 m += j n += j*i p += j*i**2 det = determinant(a,b,c,b,c,d,c,d,e) c0 = determinant(m,b,c,n,c,d,p,d,e)/det c1 = determinant(a,m,c,b,n,d,c,p,e)/det c2 = determinant(a,b,m,b,c,n,c,d,p)/det evaluateValue([c2,c1,c0], toCheck) ######Using another explicit alternative def determinantAlt(a,b,c,d,e,f,g,h,i): return a*e*i - a*f*h - b*d*i +b*g*f + c*d*h - c*e*g # <- barckets removed a = b = c = d = e = m = n = p = 0 a = len(x) for i,j in zip(x,y): b += i c += i**2 d += i**3 e += i**4 m += j n += j*i p += j*i**2 det = determinantAlt(a,b,c,b,c,d,c,d,e) c0 = determinantAlt(m,b,c,n,c,d,p,d,e)/det c1 = determinantAlt(a,m,c,b,n,d,c,p,e)/det c2 = determinantAlt(a,b,m,b,c,n,c,d,p)/det evaluateValue([c2,c1,c0], toCheck) This code gives this output Value = 0.306639472 Error = 0.00% Value = 0.308333580 Error = 0.55% Value = 0.585786477 Error = 91.03% As, you can see these are different from each other and third one is totally wrong. Now my questions are: 1. Why the explicit formula is giving slightly wrong result and how to improve that? 2. How numpy is giving so accurate result? 3. In the third case only by openning the parenthesis, how come the result changes so drastically?
So there are a few things that are going on here that are unfortunately plaguing the way you are doing things. Take a look at this code: for i,j in zip(x,y): b += i c += i**2 d += i**3 e += i**4 m += j n += j*i p += j*i**2 You are building features such that the x values are not only squared, but cubed and fourth powered. If you print out each of these values before you put them into the 3 x 3 matrix to solve: In [35]: a = b = c = d = e = m = n = p = 0 ...: a = len(x) ...: for i,j in zip(xx,y): ...: b += i ...: c += i**2 ...: d += i**3 ...: e += i**4 ...: m += j ...: n += j*i ...: p += j*i**2 ...: print(a, b, c, d, e, m, n, p) ...: ...: 3 18.744836 117.12356813829001 731.8283056811686 4572.738547313946 0.9294744720000001 5.807505391292503 36.28641270376207 When dealing with floating-point arithmetic and especially for small values, the order of operations does matter. What's happening here is that by fluke, the mix of both small values and large values that have been computed result in a value that is very small. Therefore, when you compute the determinant using the factored form and expanded form, notice how you get slightly different results but also look at the precision of the values: In [36]: det = determinant(a,b,c,b,c,d,c,d,e) In [37]: det Out[37]: 1.0913403514223319e-10 In [38]: det = determinantAlt(a,b,c,b,c,d,c,d,e) In [39]: det Out[39]: 2.3283064365386963e-10 The determinant is on the order of 10-10! The reason why there's a discrepancy is because with floating-point arithmetic, theoretically both determinant methods should yield the same result but unfortunately in reality they are giving slightly different results and this is due to something called error propagation. Because there are a finite number of bits that can represent a floating-point number, the order of operations changes how the error propagates, so even though you are removing the parentheses and the formulas do essentially match, the order of operations to get to the result are now different. This article is an essential read for any software developer who deals with floating-point arithmetic regularly: What Every Computer Scientist Should Know About Floating-Point Arithmetic. Therefore, when you're trying to solve the system with Cramer's Rule, inevitably when you divide by the main determinant in your code, even though the change is on the order of 10-10, the change is negligible between the two methods but you will get very different results because you're dividing by this number when solving for the coefficients. The reason why NumPy doesn't have this problem is because they solve the system by least-squares and the pseudo-inverse and not using Cramer's Rule. I would not recommend using Cramer's Rule to find regression coefficients mostly due to experience and that there are more robust ways of doing it. However to solve your particular problem, it's good to normalize the data so that the dynamic range is now centered at 0. Therefore, the features you use to construct your coefficient matrix are more sensible and thus the computational process has an easier time dealing with the data. In your case, something as simple as subtracting the data with the mean of the x values should work. As such, if you have new data points you want to predict, you must subtract by the mean of the x data first prior to doing the prediction. Therefore at the beginning of your code, perform mean subtraction and regress on this data. I've showed you where I've modified the code given your source above: import numpy as np x = [6.230825,6.248279,6.265732] y = [0.312949,0.309886,0.306639472] # Calculate mean me = sum(x) / len(x) # Make new dataset that is mean subtracted xx = [pt - me for pt in x] #toCheck = x[2] # Data point to check is now mean subtracted toCheck = x[2] - me def evaluateValue(coeff,x): c,b,a = coeff val = np.around( a+b*x+c*x**2,9) act = 0.306639472 error= np.abs(act-val)*100/act print("Value = {:.9f} Error = {:.2f}%".format(val,error)) ###### USing numpy###################### coeff = np.polyfit(xx,y,2) # Change evaluateValue(coeff, toCheck) ################# Using explicit formula def determinant(a,b,c,d,e,f,g,h,i): # the matrix is [[a,b,c],[d,e,f],[g,h,i]] return a*(e*i - f*h) - b*(d*i - g*f) + c*(d*h - e*g) a = b = c = d = e = m = n = p = 0 a = len(x) for i,j in zip(xx,y): # Change b += i c += i**2 d += i**3 e += i**4 m += j n += j*i p += j*i**2 det = determinant(a,b,c,b,c,d,c,d,e) c0 = determinant(m,b,c,n,c,d,p,d,e)/det c1 = determinant(a,m,c,b,n,d,c,p,e)/det c2 = determinant(a,b,m,b,c,n,c,d,p)/det evaluateValue([c2,c1,c0], toCheck) ######Using another explicit alternative def determinantAlt(a,b,c,d,e,f,g,h,i): return a*e*i - a*f*h - b*d*i +b*g*f + c*d*h - c*e*g # <- barckets removed a = b = c = d = e = m = n = p = 0 a = len(x) for i,j in zip(xx,y): # Change b += i c += i**2 d += i**3 e += i**4 m += j n += j*i p += j*i**2 det = determinantAlt(a,b,c,b,c,d,c,d,e) c0 = determinantAlt(m,b,c,n,c,d,p,d,e)/det c1 = determinantAlt(a,m,c,b,n,d,c,p,e)/det c2 = determinantAlt(a,b,m,b,c,n,c,d,p)/det evaluateValue([c2,c1,c0], toCheck) When I run this, we now get: In [41]: run interp_test Value = 0.306639472 Error = 0.00% Value = 0.306639472 Error = 0.00% Value = 0.306639472 Error = 0.00% As some final reading for you, this is a similar problem that someone else encountered which I addressed in their question: Fitting a quadratic function in python without numpy polyfit. The summary is that I advised them not to use Cramer's Rule and to use least-squares through the pseudo-inverse. I showed them how to get exactly the same results without using numpy.polyfit. Also, using least-squares generalizes where if you have more than 3 points, you can still fit a quadratic through your points so that the model has the smallest error possible.
Trying to find the T(n) function and complexity in this bit of Python
def wum(aList): a = 7 b = 5 n = len(aList) for i in range(n): for j in range(n): a = i * a b = j * j w = i * j v = i + w x = v * v for k in range(n): w = a * k + 23 v = b * b a = w + v I got T(n) = 2n + 6n^2 complexity O(n^2), does that seem right? Help!
I always find it a bit difficult to give an exact value for T(n) since it’s hard to define what 1 means there. But assuming that each of those assignments is 1 (regardless of what kind of calculation happens), then the total T(n) would be as following: n * (6n + 2) + 3. But in big-O notation, that is O(n²), yes. You can easily see that since you have two nested loop levels, both over n. Btw. your function is probably an example from your instructor or something, but it’s really a bad example. You can easily modify the logic to be in linear and yield the same results: a = 7 b = 5 n = len(aList) for i in range(n): a *= i ** n # `a` is multiplicated `n` times by (constant) `i` b = (n - 1) ** 2 # at the end of the loop, `j` is `(n - 1)` v = i + (i * b) # at the end of the loop, `w` is `i * b` x = v * v w = a * (n - 1) + 23 # at the end of the loop, `k` is `(n - 1)` v = b ** 2 # `b` (and as such `v`) is never changed in the loop a = w + v And since nothing of that uses any value of the list, you could actually make these calculations in constant time too (I’ll leave that for your exercise ;) ). And finally, you could argue that since the function does not return anything, and also does not mutate the input list, the function is a big NO-OP, and as such can be replaced by a function that does nothing: def wum(aList): pass
Tridiagonal Matrix Algorithm (TDMA) aka Thomas Algorithm, using Python with NumPy arrays
I found an implementation of the thomas algorithm or TDMA in MATLAB. function x = TDMAsolver(a,b,c,d) %a, b, c are the column vectors for the compressed tridiagonal matrix, d is the right vector n = length(b); % n is the number of rows % Modify the first-row coefficients c(1) = c(1) / b(1); % Division by zero risk. d(1) = d(1) / b(1); % Division by zero would imply a singular matrix. for i = 2:n-1 temp = b(i) - a(i) * c(i-1); c(i) = c(i) / temp; d(i) = (d(i) - a(i) * d(i-1))/temp; end d(n) = (d(n) - a(n) * d(n-1))/( b(n) - a(n) * c(n-1)); % Now back substitute. x(n) = d(n); for i = n-1:-1:1 x(i) = d(i) - c(i) * x(i + 1); end end I need it in python using numpy arrays, here my first attempt at the algorithm in python. import numpy aa = (0.,8.,9.,3.,4.) bb = (4.,5.,9.,4.,7.) cc = (9.,4.,5.,7.,0.) dd = (8.,4.,5.,9.,6.) ary = numpy.array a = ary(aa) b = ary(bb) c = ary(cc) d = ary(dd) n = len(b)## n is the number of rows ## Modify the first-row coefficients c[0] = c[0]/ b[0] ## risk of Division by zero. d[0] = d[0]/ b[0] for i in range(1,n,1): temp = b[i] - a[i] * c[i-1] c[i] = c[i]/temp d[i] = (d[i] - a[i] * d[i-1])/temp d[-1] = (d[-1] - a[-1] * d[-2])/( b[-1] - a[-1] * c[-2]) ## Now back substitute. x = numpy.zeros(5) x[-1] = d[-1] for i in range(-2, -n-1, -1): x[i] = d[i] - c[i] * x[i + 1] They give different results, so what am I doing wrong?
I made this since none of the online implementations for python actually work. I've tested it against built-in matrix inversion and the results match. Here a = Lower Diag, b = Main Diag, c = Upper Diag, d = solution vector import numpy as np def TDMA(a,b,c,d): n = len(d) w= np.zeros(n-1,float) g= np.zeros(n, float) p = np.zeros(n,float) w[0] = c[0]/b[0] g[0] = d[0]/b[0] for i in range(1,n-1): w[i] = c[i]/(b[i] - a[i-1]*w[i-1]) for i in range(1,n): g[i] = (d[i] - a[i-1]*g[i-1])/(b[i] - a[i-1]*w[i-1]) p[n-1] = g[n-1] for i in range(n-1,0,-1): p[i-1] = g[i-1] - w[i-1]*p[i] return p For an easy performance boost for large matrices, use numba! This code outperforms np.linalg.inv() in my tests: import numpy as np from numba import jit #jit def TDMA(a,b,c,d): n = len(d) w= np.zeros(n-1,float) g= np.zeros(n, float) p = np.zeros(n,float) w[0] = c[0]/b[0] g[0] = d[0]/b[0] for i in range(1,n-1): w[i] = c[i]/(b[i] - a[i-1]*w[i-1]) for i in range(1,n): g[i] = (d[i] - a[i-1]*g[i-1])/(b[i] - a[i-1]*w[i-1]) p[n-1] = g[n-1] for i in range(n-1,0,-1): p[i-1] = g[i-1] - w[i-1]*p[i] return p
There's at least one difference between the two: for i in range(1,n,1): in Python iterates from index 1 to the last index n-1, while for i = 2:n-1 iterates from index 1 (zero-based) to the last-1 index, since Matlab has one-based indexing.
In your loop, the Matlab version iterates over the second through second-to last elements. To do the same in Python, you want: for i in range(1,n-1): (As noted in voithos's comment, this is because the range function excludes the last index, so you need to correct for this in addition to the change to 0 indexing).
Writing somthing like this in python is going to be really slow. You would be much better off using LAPACK to do the numerical heavy lifting and use python for everything around it. LAPACK is compiled so it will run much faster than python it is also much more higly optimised than it is feasible for most of us to match. SciPY provides low level wrappers for LAPACK so that you can call it from python very simply, the one you are looking for can be found here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lapack.dgtsv.html#scipy.linalg.lapack.dgtsv