vectorizing a double for loop - python

This is a performance question. I am trying to optimize the following double for loop. Here is a MWE
import numpy as np
from timeit import default_timer as tm
# L1 and L2 will range from 0 to 3 typically, sometimes up to 5
# all of the following are dummy values but match correct `type`
L1, L2, x1, x2, fac = 2, 3, 2.0, 4.5, 2.3
saved_values = np.random.uniform(high=75.0, size=[max(L1,L2) + 1, max(L1,L2) + 1])
facts = np.random.uniform(high=65.0, size=[L1 + L2 + 1])
val = 0
start = tm()
for i in range(L1+1):
sf = saved_values[L1][i] * x1 ** (L1 - i)
for j in range(L2 + 1):
m = i + j
if m % 2 == 0:
num = sf * facts[m] / (2 * fac) ** (m / 2)
val += saved_values[L2][j] * x1 ** (L1 - j) * num
end = tm()
time = end-start
print("Long way: time taken was {} and value is {}".format(time, val))
My idea for a solution is to take out the if m % 2 == 0: statement and then calculate all i and j combinations i.e., a matrix, which I should be able to vectorize, and then use something like np.where() to add up all of the elements meeting the requirement of if m % 2 == 0: where m= i+j.
Even if this is not faster than the explicit for loops, it should be vectorized because in reality I will be sending arrays to a function containing the double for loops, so being able to do that part vectorized, should get me the speed gains I am after, even if vectorizing this double for loop does not.
I am stuck spinning my wheels right now on how to broadcast, but account for the sf factor as well as the m factor in the inner loop.

Related

Let n be a square number. Using Python, how we can efficiently calculate natural numbers y up to a limit l such that n+y^2 is again a square number?

Using Python, I would like to implement a function that takes a natural number n as input and outputs a list of natural numbers [y1, y2, y3, ...] such that n + y1*y1 and n + y2*y2 and n + y3*y3 and so forth is again a square.
What I tried so far is to obtain one y-value using the following function:
def find_square(n:int) -> tuple[int, int]:
if n%2 == 1:
y = (n-1)//2
x = n+y*y
return (y,x)
return None
It works fine, eg. find_square(13689) gives me a correct solution y=6844. It would be great to have an algorithm that yields all possible y-values such as y=44 or y=156.
Simplest slow approach is of course for given N just to iterate all possible Y and check if N + Y^2 is square.
But there is a much faster approach using integer Factorization technique:
Lets notice that to solve equation N + Y^2 = X^2, that is to find all integer pairs (X, Y) for given fixed integer N, we can rewrite this equation to N = X^2 - Y^2 = (X + Y) * (X - Y) which follows from famous school formula of difference of squares.
Now lets rename two factors as A, B i.e. N = (X + Y) * (X - Y) = A * B, which means that X = (A + B) / 2 and Y = (A - B) / 2.
Notice that A and B should be of same odditiy, either both odd or both even, otherwise in last formulas above we can't have whole division by 2.
We will factorize N into all possible pairs of two factors (A, B) of same oddity. For fast factorization in code below I used simple to implement but yet quite fast algorithm Pollard Rho, also two extra algorithms were needed as a helper to Pollard Rho, one is Fermat Primality Test (which allows fast checking if number is probably prime) and second is Trial Division Factorization (which helps Pollard Rho to factor out small factors, which could cause Pollard Rho to fail).
Pollard Rho for composite number has time complexity O(N^(1/4)) which is very fast even for 64-bit numbers. Any faster factorization algorithm can be chosen if needed a bigger space to be searched. My fast algorithm time is dominated by speed of factorization, remaining part of algorithm is blazingly fast, just few iterations of loop with simple formulas.
If your N is a square itself (hence we know its root easily), then Pollard Rho can factor N even much faster, within O(N^(1/8)) time. Even for 128-bit numbers it means very small time, 2^16 operations, and I hope you're solving your task for less than 128 bit numbers.
If you want to process a range of possible N values then fastest way to factorize them is to use techniques similar to Sieve of Erathosthenes, using set of prime numbers, it allows to compute all factors for all N numbers within some range. Using Sieve of Erathosthenes for the case of range of Ns is much faster than factorizing each N with Pollard Rho.
After factoring N into pairs (A, B) we compute (X, Y) based on (A, B) by formulas above. And output resulting Y as a solution of fast algorithm.
Following code as an example is implemented in pure Python. Of course one can use Numba to speed it up, Numba usually gives 30-200 times speedup, for Python it achieves same speed as optimized C++. But I thought that main thing here is to implement fast algorithm, Numba optimizations can be done easily afterwards.
I added time measurement into following code. Although it is pure Python still my fast algorithm achieves 8500x times speedup compared to regular brute force approach for limit of 1 000 000.
You can change limit variable to tweak amount of searched space, or num_tests variable to tweak amount of different tests.
Following code implements both solutions - fast solution find_fast() described above plus very tiny brute force solution find_slow() which is very slow as it scans all possible candidates. This slow solution is only used to compare correctness in tests and compare speedup.
Code below uses nothing except few standard Python library modules, no external modules were used.
Try it online!
def find_slow(N):
import math
def is_square(x):
root = int(math.sqrt(float(x)) + 0.5)
return root * root == x, root
l = []
for y in range(N):
if is_square(N + y ** 2)[0]:
l.append(y)
return l
def find_fast(N):
import itertools, functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = factor(N)
mfs = {}
for e in fs:
mfs[e] = mfs.get(e, 0) + 1
fs = sorted(mfs.items())
del mfs
Ys = set()
for take_a in itertools.product(*[
(range(v + 1) if k != 2 else range(1, v)) for k, v in fs]):
A = Prod([p ** t for (p, _), t in zip(fs, take_a)])
B = N // A
assert A * B == N, (N, A, B, take_a)
if A < B:
continue
X = (A + B) // 2
Y = (A - B) // 2
assert N + Y ** 2 == X ** 2, (N, A, B, X, Y)
Ys.add(Y)
return sorted(Ys)
def trial_div_factor(n, limit = None):
# https://en.wikipedia.org/wiki/Trial_division
fs = []
while n & 1 == 0:
fs.append(2)
n >>= 1
all_checked = False
for d in range(3, (limit or n) + 1, 2):
if d * d > n:
all_checked = True
break
while True:
q, r = divmod(n, d)
if r != 0:
break
fs.append(d)
n = q
if n > 1 and all_checked:
fs.append(n)
n = 1
return fs, n
def fermat_prp(n, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def pollard_rho_factor(n):
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
import math, random
fs, n = trial_div_factor(n, 1 << 7)
if n <= 1:
return fs
if fermat_prp(n):
return sorted(fs + [n])
for itry in range(8):
failed = False
x = random.randint(2, n - 2)
for cycle in range(1, 1 << 60):
y = x
for i in range(1 << cycle):
x = (x * x + 1) % n
d = math.gcd(x - y, n)
if d == 1:
continue
if d == n:
failed = True
break
return sorted(fs + pollard_rho_factor(d) + pollard_rho_factor(n // d))
if failed:
break
assert False, f'Pollard Rho failed! n = {n}'
def factor(N):
import functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = pollard_rho_factor(N)
assert N == Prod(fs), (N, fs)
return sorted(fs)
def test():
import random, time
limit = 1 << 20
num_tests = 20
t0, t1 = 0, 0
for i in range(num_tests):
if (round(i / num_tests * 1000)) % 100 == 0 or i + 1 >= num_tests:
print(f'test {i}, ', end = '', flush = True)
N = random.randrange(limit)
tb = time.time()
r0 = find_slow(N)
t0 += time.time() - tb
tb = time.time()
r1 = find_fast(N)
t1 += time.time() - tb
assert r0 == r1, (N, r0, r1, t0, t1)
print(f'\nTime slow {t0:.05f} sec, fast {t1:.05f} sec, speedup {round(t0 / max(1e-6, t1))} times')
if __name__ == '__main__':
test()
Output:
test 0, test 2, test 4, test 6, test 8, test 10, test 12, test 14, test 16, test 18, test 19,
Time slow 26.28198 sec, fast 0.00301 sec, speedup 8732 times
For the easiest solution, you can try this:
import math
n=13689 #or we can ask user to input a square number.
for i in range(1,9999):
if math.sqrt(n+i**2).is_integer():
print(i)

How to calculate sum of terms of Taylor series of sin(x) without using inner loops or if-else?

I can't use inner loops
I can't use if-else
I need to compute the following series:
x - x^3/3! + x^5/5! - x^7/7! + x^9/9! ...
I am thinking something like the following:
n =1
x =0.3
one=1
fact1=1
fact2=1
term =0
sum =0
for i in range(1, n+1, 2):
one = one * (-1)
fact1 = fact1*i
fact2 = fact2*i+1
fact = fact1*fact2
x = x * x
term = x/fact
sum = sum + term
But, I am finding hard times in keeping the multiplications of both fact and x.
You want to compute a sum of terms. Each term is the previous term mutiplied by -1 * x * x and divided by n * (n+1). Just write it:
def func(x):
eps = 1e-6 # the expected precision order
term = x
sum = term
n = 1
while True:
term *= -x * x
term /= (n+1) * (n+2)
if abs(term) < eps: break
sum += term
n += 2
return sum
Demo:
>>> func(math.pi / 6)
0.4999999918690232
giving as expected 0.5 with a precision of 10e-6
Note: the series is the well known development of the sin function...
Isn't that a Taylor series for sin(x)? And can you use list comprehension? With list comprehension that could be something like
x = 0.3
sum([ (-1)**(n+1) * x**(2n-1) / fact(2n-1) for n in range(1, numOfTerms)])
If you can't use list comprehension you could simply loop that like this
x=0.3
terms = []
for n in range(1, numberOfTerms):
term = (-1)**(n+1)*x**(2n-1)/fact(2n-1)
terms.append(term)
sumOfTerms = sum(terms)
Then calculating the factorial by recursion:
def fact(k):
if (k == 1):
return n
else:
return fact(k-1)*k
Calcualting the factorial using Striling's approximation:
fact(k) = sqrt(2*pi*k)*k**k*e**(-k)
No if-else here nor inner loops. But then there will be precision errors and need to use math lib to get the constants or get even more precision error and use hard coded values for pi and e.
Hope this can help!
n = NUMBER_OF_TERMS
x = VALUE_OF_X
m = -1
sum = x # Final sum
def fact(i):
f = 1
while i >= 1:
f = f * i
i = i - 1
return f
for i in range(1, n):
r = 2 * i + 1
a = pow (x , r)
term = a * m / fact(r);
sum = sum + term;
m = m * (-1)

Trying to find the T(n) function and complexity in this bit of Python

def wum(aList):
a = 7
b = 5
n = len(aList)
for i in range(n):
for j in range(n):
a = i * a
b = j * j
w = i * j
v = i + w
x = v * v
for k in range(n):
w = a * k + 23
v = b * b
a = w + v
I got T(n) = 2n + 6n^2 complexity O(n^2), does that seem right? Help!
I always find it a bit difficult to give an exact value for T(n) since it’s hard to define what 1 means there. But assuming that each of those assignments is 1 (regardless of what kind of calculation happens), then the total T(n) would be as following: n * (6n + 2) + 3.
But in big-O notation, that is O(n²), yes. You can easily see that since you have two nested loop levels, both over n.
Btw. your function is probably an example from your instructor or something, but it’s really a bad example. You can easily modify the logic to be in linear and yield the same results:
a = 7
b = 5
n = len(aList)
for i in range(n):
a *= i ** n # `a` is multiplicated `n` times by (constant) `i`
b = (n - 1) ** 2 # at the end of the loop, `j` is `(n - 1)`
v = i + (i * b) # at the end of the loop, `w` is `i * b`
x = v * v
w = a * (n - 1) + 23 # at the end of the loop, `k` is `(n - 1)`
v = b ** 2 # `b` (and as such `v`) is never changed in the loop
a = w + v
And since nothing of that uses any value of the list, you could actually make these calculations in constant time too (I’ll leave that for your exercise ;) ).
And finally, you could argue that since the function does not return anything, and also does not mutate the input list, the function is a big NO-OP, and as such can be replaced by a function that does nothing:
def wum(aList):
pass

Finding c so that sum(x+c) over positives = K

Say I have a 1D array x with positive and negative values in Python, e.g.:
x = random.rand(10) * 10
For a given positive value of K, I would like to find the offset c that makes the sum of positive elements of the array y = x + c equal to K.
How can I solve this problem efficiently?
How about binary search to determine which elements of x + c are going to contribute to the sum, followed by solving the linear equation? The running time of this code is O(n log n), but only O(log n) work is done in Python. The running time could be dropped to O(n) via a more complicated partitioning strategy. I'm not sure whether a practical improvement would result.
import numpy as np
def findthreshold(x, K):
x = np.sort(np.array(x))[::-1]
z = np.cumsum(np.array(x))
l = 0
u = x.size
while u - l > 1:
m = (l + u) // 2
if z[m] - (m + 1) * x[m] >= K:
u = m
else:
l = m
return (K - z[l]) / (l + 1)
def test():
x = np.random.rand(10)
K = np.random.rand() * x.size
c = findthreshold(x, K)
assert np.abs(K - np.sum(np.clip(x + c, 0, np.inf))) / K <= 1e-8
Here's a randomized expected O(n) variant. It's faster (on my machine, for large inputs), but not dramatically so. Watch out for catastrophic cancellation in both versions.
def findthreshold2(x, K):
sumincluded = 0
includedsize = 0
while x.size > 0:
pivot = x[np.random.randint(x.size)]
above = x[x > pivot]
if sumincluded + np.sum(above) - (includedsize + above.size) * pivot >= K:
x = above
else:
notbelow = x[x >= pivot]
sumincluded += np.sum(notbelow)
includedsize += notbelow.size
x = x[x < pivot]
return (K - sumincluded) / includedsize
You can sort x in descending order, loop over x and compute the required c thus far. If the next element plus c is positive, it should be included in the sum, so c gets smaller.
Note that it might be the case that there is no solution: if you include elements up to m, c is such that m+1 should also be included, but when you include m+1, c decreases and a[m+1]+c might get negative.
In pseudocode:
sortDescending(x)
i = 0, c = 0, sum = 0
while i < x.length and x[i] + c >= 0
sum += x[i]
c = (K - sum) / i
i++
if i == 0 or x[i-1] + c < 0
#no solution
The running time is obviously O(n log n) because it is dominated by the initial sort.

How to optimize math operations on matrix in python

I am trying to reduce the time of a function that performs a serie of calculations with two matrix. Searching for this, I've heard of numpy, but I really do not know how apply it to my problem. Also, I Think one of the things is making my function slow is having many dots operators (I heard of that in this this page ).
The math correspond with a factorization for the Quadratic assignment problem:
My code is:
delta = 0
for k in xrange(self._tam):
if k != r and k != s:
delta +=
self._data.stream_matrix[r][k] \
* (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) + \
self._data.stream_matrix[s][k] \
* (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]) + \
self._data.stream_matrix[k][r] \
* (self._data.distance_matrix[sol[k]][sol[s]] - self._data.distance_matrix[sol[k]][sol[r]]) + \
self._data.stream_matrix[k][s] \
* (self._data.distance_matrix[sol[k]][sol[r]] - self._data.distance_matrix[sol[k]][sol[s]])
return delta
Running this on a problem of size 20 (Matrix of 20x20) take about 20 segs, the bottleneck is in this function
ncalls tottime percall cumtime percall filename:lineno(function)
303878 15.712 0.000 15.712 0.000 Heuristic.py:66(deltaC)
I tried to apply map to the for loop, but because the loop body isn't a function call, it is not possible.
How could I reduce the time?
EDIT1
To answer eickenberg comment:
sol is a permutation, for example [1,2,3,4]. the function is called when I am generating neighbor solutions, so, a neighbor of [1,2,3,4] is [2,1,3,4]. I am changing only two positions in the original permutation and then call deltaC, which calculates a factorization of the solution with positions r,s swaped (In the example above r,s = 0,1). This permutation is made to avoid calculate the entire cost of the neighbor solution. I suppose I can store the values of sol[k,r,s] in a local variable to avoid looking up its value in each iteration. I do not know if this is what you was asking in your comment.
EDIT2
A minimal working example:
import random
distance_matrix = [[0, 12, 6, 4], [12, 0, 6, 8], [6, 6, 0, 7], [4, 8, 7, 0]]
stream_matrix = [[0, 3, 8, 3], [3, 0, 2, 4], [8, 2, 0, 5], [3, 4, 5, 0]]
def deltaC(r, s, S=None):
'''
Difference between C with values i and j swapped
'''
S = [0,1,2,3]
if S is not None:
sol = S
else:
sol = S
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(4):
if k != r and k != s:
delta += (stream_matrix[r][k] \
* (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
stream_matrix[s][k] \
* (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
stream_matrix[k][r] \
* (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
stream_matrix[k][s] \
* (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s]))
return delta
for _ in xrange(303878):
d = deltaC(random.randint(0,3), random.randint(0,3))
print d
Now I think the better option is use NumPy. I tried with Matrix(), but did not improve the performance.
Best solution found
Well, Finally I was able to reduce the time a bit more combining #TooTone's solution and storing the indexes in a set to avoid the if. The time has dropped from about 18 seconds to 8 seconds. Here is the code:
def deltaC(self, r, s, sol=None):
delta = 0
sol = self.S if sol is None else self.S
sol_r, sol_s = sol[r], sol[s]
stream_matrix = self._data.stream_matrix
distance_matrix = self._data.distance_matrix
indexes = set(xrange(self._tam)) - set([r, s])
for k in indexes:
sol_k = sol[k]
delta += \
(stream_matrix[r][k] - stream_matrix[s][k]) \
* (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
+ \
(stream_matrix[k][r] - stream_matrix[k][s]) \
* (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
return delta
In order to reduce the time even more, I think the best way would be write a module.
In the simple example you've given, with for k in xrange(4): the loop body only executes twice (if r==s), or three times (if r!=s) and an initial numpy implementation, below, is slower by a large factor. Numpy is optimized for performing calculations over long vectors and if the vectors are short the overheads can outweigh the benefits. (And note in this formula, the matrices are being sliced in different dimensions, and indexed non-contiguously, which can only make things more complicated for a vectorizing implementation).
import numpy as np
distance_matrix_np = np.array(distance_matrix)
stream_matrix_np = np.array(stream_matrix)
n = 4
def deltaC_np(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
K = np.array([i for i in xrange(n) if i!=r and i!=s])
return np.sum(
(stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
* (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
(stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
* (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))
In this numpy implementation, rather than a for loop over the elements in K, the operations are applied across all the elements in K within numpy. Also, note that your mathematical expression can be simplified. Each term in brackets on the left is the negative of the term in brackets on the right.
This applies to your original code too. For example, (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) is equal to -1 times (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]), so you were doing unnecessary computation, and your original code can be optimized without using numpy.
It turns out that the bottleneck in the numpy function is the innocent-looking list comprehension
K = np.array([i for i in xrange(n) if i!=r and i!=s])
Once this is replaced with vectorizing code
if r==s:
K=np.arange(n-1)
K[r:] += 1
else:
K=np.arange(n-2)
if r<s:
K[r:] += 1
K[s-1:] += 1
else:
K[s:] += 1
K[r-1:] += 1
the numpy function is much faster.
A graph of run times is shown immediately below (right at the bottom of this answer is the original graph before optimizing the numpy function). You can see that it either makes sense to use your optimized original code or the numpy code, depending on how large the matrix is.
The full code is below for reference, partly in case someone else can take it further. (The function deltaC2 is your original code optimized to take account of the way the mathematical expression can be simplified.)
def deltaC(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(n):
if k != r and k != s:
delta += \
stream_matrix[r][k] \
* (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
stream_matrix[s][k] \
* (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
stream_matrix[k][r] \
* (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
stream_matrix[k][s] \
* (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s])
return delta
import numpy as np
def deltaC_np(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
if r==s:
K=np.arange(n-1)
K[r:] += 1
else:
K=np.arange(n-2)
if r<s:
K[r:] += 1
K[s-1:] += 1
else:
K[s:] += 1
K[r-1:] += 1
#K = np.array([i for i in xrange(n) if i!=r and i!=s]) #TOO SLOW
return np.sum(
(stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
* (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
(stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
* (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))
def deltaC2(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(n):
if k != r and k != s:
sol_k = sol[k]
delta += \
(stream_matrix[r][k] - stream_matrix[s][k]) \
* (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
+ \
(stream_matrix[k][r] - stream_matrix[k][s]) \
* (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
return delta
import time
N=200
elapsed1s = []
elapsed2s = []
elapsed3s = []
ns = range(10,410,10)
for n in ns:
distance_matrix_np=np.random.uniform(0,n**2,size=(n,n))
stream_matrix_np=np.random.uniform(0,n**2,size=(n,n))
distance_matrix=distance_matrix_np.tolist()
stream_matrix=stream_matrix_np.tolist()
sol = range(n-1,-1,-1)
sol_np = np.array(range(n-1,-1,-1))
Is = np.random.randint(0,n-1,4)
Js = np.random.randint(0,n-1,4)
total1 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total1 += deltaC(i,j, sol)
elapsed1 = (time.clock() - start)
start = time.clock()
total2 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total2 += deltaC_np(i,j, sol_np)
elapsed2 = (time.clock() - start)
total3 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total3 += deltaC2(i,j, sol_np)
elapsed3 = (time.clock() - start)
print n, elapsed1, elapsed2, elapsed3, total1, total2, total3
elapsed1s.append(elapsed1)
elapsed2s.append(elapsed2)
elapsed3s.append(elapsed3)
#Check errors of one method against another
#err = 0
#for i in range(min(n,50)):
# for j in range(min(n,50)):
# err += np.abs(deltaC(i,j,sol)-deltaC_np(i,j,sol_np))
#print err
import matplotlib.pyplot as plt
plt.plot(ns, elapsed1s, label='Original',lw=2)
plt.plot(ns, elapsed3s, label='Optimized',lw=2)
plt.plot(ns, elapsed2s, label='numpy',lw=2)
plt.legend(loc='upper left', prop={'size':16})
plt.xlabel('matrix size')
plt.ylabel('time')
plt.show()
And here is the original graph before optimizing out the list comprehension in deltaC_np

Categories