With Python: My recursive function is way to slow - python

I want to program the following (I've just start to learn python):
f[i]:=f[i-1]-(1/n)*(1-(1-f[i-1])^n)-(1/n)*(f[i-1])^n+(2*f[0]/n);
with F[0]=x, x belongs to [0,1] and n a constant integer.
My try:
import pylab as pl
import numpy as np
N=20
n=100
h=0.01
T=np.arange(0, 1+h, h)
def f(i):
if i == 0:
return T
else:
return f(i-1)-(1./n)*(1-(1-f(i-1))**n)-(1./n)*(f(i-1))**n+2.*T/n
pl.figure(figsize=(10, 6), dpi=80)
pl.plot(T,f(N), color="red",linestyle='--', linewidth=2.5)
pl.show()
For N=10 (number of iterations) it returns the correct plot fast enough, but for N=20 it keeps running and running (more than 30 minutes already).

The reason why your run time is so slow is the fact that, like the simplistic calculation of the nth fibonacci number it runs in exponential time (in this case 3^n). To see this, before F[i] can return it's value, it has to call f[i-1] 3 times, but then each of those has to call F[i-2] 3 times (3*3 calls), and then each of those has to call F[i-3] 3 times (3*3*3 calls), and so on. In this example, as others have shown, this can be calculated simply in linear time. That you see it slow for N = 20 is because your function has to be called 3^20 = 3486784401 times before you get the answer!

You calculate f(i-1) three times in a single recursion layer - so after the first run you "know" the answer but still calculate it two more times. A naive approach:
fi_1 = f(i-1)
return fi_1-(1./n)*(1-(1-fi_1)**n)-(1./n)*(fi_1)**n+2.*T/n
But of course we can still do better and cache every evaluation of f:
cache = {}
def f_cached(i):
if not i in cache:
cache[i] = f(i)
return(cache[i])
Then replace every every occurence of f with f_cached.
There are also libraries out there that can do that for you automatically (with a decorator).
While recursion often yields nice and easy formulas, python is not that good at evaluating them (see tail recursion). You are probably better off with rewriting it in a iterativ way and calculate that.

First of all you are calculating f[i-1] three times when you can save it's result in some variable and calculate it only once :
t = f(i-1)
return t-(1./n)*(1-(1-t)**n)-(1./n)*(t)**n+2.*T/n
It will increase the speed of the program, but I would also like to recommend to calculate f without using recursion.
fs = T
for i in range(1,N+1):
tmp = fs
fs = (tmp-(1./n)*(1-(1-tmp)**n)-(1./n)*(tmp)**n+2.*T/n)

Related

Does this manipulation in python save more time for my highly inefficient program?

I am using the following code unchanged in form but changed in content:
import numpy as np
import matplotlib.pyplot as plt
import random
from random import seed
from random import randint
import math
from math import *
from random import *
import statistics
from statistics import *
n=1000
T_plot=[0];
X_relm=[0];
class Objs:
def __init__(self, xIn, yIn, color):
self.xIn= xIn
self.yIn = yIn
self.color = color
def yfT(self, t):
return self.yIn*t+self.yIn*t
def xfT(self, t):
return self.xIn*t-self.yIn*t
xi=np.random.uniform(0,1,n);
yi=np.random.uniform(0,1,n);
O1 = [Objs(xIn = i, yIn = j, color = choice(["Black", "White"])) for i,j
in zip(xi,yi)]
X=sorted(O1,key=lambda x:x.xIn)
dt=1/(2*n)
T=20
iter=40000
Black=[]
White=[]
Xrelm=[]
for i in range(1,iter+1):
t=i*dt
for j in range(n-1):
check=X[j].xfT(t)-X[j+1].xfT(t);
if check<0:
X[j],X[j+1]=X[j+1],X[j]
if check<-10:
X[j].color,X[j+1].color=X[j+1].color,X[j].color
if X[j].color=="Black":
Black.append(X[j].xfT(t))
else:
White.append(X[j].xfT(t))
Xrel=mean(Black)-mean(White)
Xrelm.append(Xrel)
plot1=plt.figure(1);
plt.plot(T_plot,Xrelm);
plt.xlabel("time")
plt.ylabel("Relative ")
and it keeps running (I left it for 10 hours) without giving output for some parameters simply because it's too big I guess. I know that my code is not faulty totally (in the sense that it should give something even if wrong) because it does give outputs for fewer time steps and other parameters.
So, I am focusing on trying to optimize my code so that it takes lesser time to run. Now, this is a routine task for coders but I am a newbie and I am coding simply because the simulation will help in my field. So, in general, any inputs of a general nature that give insights on how to make one's code faster are appreciated.
Besides that, I want to ask whether defining a function a priori for the inner loop will save any time.
I do not think it should save any time since I am doing the same thing but I am not sure maybe it does. If it doesn't, any insights on how to deal with nested loops in a more efficient way along with those of general nature are appreciated.
(I have tried to shorten the code as far as I could and still not miss relevant information)
There are several issues in your code:
the mean is recomputed from scratch based on the growing array. Thus, the complexity of mean(Black)-mean(White) is quadratic to the number of elements.
The mean function is not efficient. Using a basic sum and division is much faster. In fact, a manual mean is about 25~30 times faster on my machine.
The CPython interpreter is very slow so you should avoid using loops as much as possible (OOP code does not help either). If this is not possible and your computation is expensive, then consider using a natively compiled code. You can use tools like PyPy, Numba or Cython or possibly rewrite a part in C.
Note that strings are generally quite slow and there is no reason to use them here. Consider using enumerations instead (ie. integers).
Here is a code fixing the first two points:
dt = 1/(2*n)
T = 20
iter = 40000
Black = []
White = []
Xrelm = []
cur1, cur2 = 0, 0
sum1, sum2 = 0.0, 0.0
for i in range(1,iter+1):
t = i*dt
for j in range(n-1):
check = X[j].xfT(t) - X[j+1].xfT(t)
if check < 0:
X[j],X[j+1] = X[j+1],X[j]
if check < -10:
X[j].color, X[j+1].color = X[j+1].color, X[j].color
if X[j].color == "Black":
Black.append(X[j].xfT(t))
else:
White.append(X[j].xfT(t))
delta1, delta2 = sum(Black[cur1:]), sum(White[cur2:])
sum1, sum2 = sum1+delta1, sum2+delta2
cur1, cur2 = len(Black), len(White)
Xrel = sum1/cur1 - sum2/cur2
Xrelm.append(Xrel)
Consider resetting Black and White to an empty list if you do not use them later.
This is several hundreds of time faster. It now takes 2 minutes as opposed to >20h (estimation) for the initial code.
Note that using a compiled code should be at least 10 times faster here so the execution time should be no more than dozens of seconds.
As mentioned in earlier comments, this one is a bit too broad to answer.
To illustrate; your iteration itself doesn't take very long:
import time
start = time.time()
for i in range(10000):
for j in range(10000):
pass
end = time.time()
print (end-start)
On my not-so-great machine that takes ~2s to complete.
So the looping portion is only a tiny fraction of your 10h+ run time.
The detail of what you're doing in the loop is the key.
Whilst very basic, the approach I've shown in the code above could be applied to your existing code to work out which bit(s) are the least performant and then raise a new question with some more specific, actionable detail.

Coefficient matrix and loops challenge

I'm trying to solve the equation for T(p,q) as shown more clearly in the attached image.
Where:
p = 0.60
q = 0.45
M - coefficient matrix with 3 rows and 6 columns
I created five matrices and put them within their own functions in order to later call them in the while loop. However, the loop doesn't cycle through the various values of i.
How can I get the loop to work or is there another/better way I can solve the following equation?
(FYI this is approx my third day ever working with Python and coding)
import numpy as np
def M1(i):
M = np.array([[1,3,0,4,0,0],[3,3,1,4,0,0],[4,4,0,3,1,0]])
return M[i-1,0]
def M2(i):
M = np.array([[1,3,0,4,0,0],[3,3,1,4,0,0],[4,4,0,3,1,0]])
return M[i-1,1]
def M3(i):
M = np.array([[1,3,0,4,0,0],[3,3,1,4,0,0],[4,4,0,3,1,0]])
return M[i-1,2]
def M4(i):
M = np.array([[1,3,0,4,0,0],[3,3,1,4,0,0],[4,4,0,3,1,0]])
return M[i-1,3]
def M5(i):
M = np.array([[1,3,0,4,0,0],[3,3,1,4,0,0],[4,4,0,3,1,0]])
return M[i-1,4]
def T(p,q):
sum_i = 0
i = 1
while i <=5:
sum_i = sum_i + ((M1(i)*p**M2(i))*((1-p)**M3(i))*(q**M4(i))*((1-q)**M5(i)))
i = i +1
return sum_i
print(T(0.6,0.45))
"""I printed the below equation (using a single value for i) to test if the above loop is working and since I get the same answer as the loop, I can see that the loop is not cycling through the various values of i as expected"""
i=1
p=0.6
q=0.45
print(((M1(i)*p**M2(i))*((1-p)**M3(i))*(q**M4(i))*((1-q)**M5(i))))
return is placed inside while loop, you need to change the code a bit
while i <=5:
sum_i = sum_i + ((M1(i)*p**M2(i))*((1-p)**M3(i))*(q**M4(i))*((1-q)**M5(i)))
i = i +1
return sum_i
The real power with numpy is to look at computations like these and try to understand what is repeatable and what structure they have. Once you find operations that are similar or parallel, try to set up your numpy function calls so that they can be done element-wise in parallel with one call.
For instance, inside the typical element of the sum, there are four things being explicitly raised to a power (a fifth if you consider M(i, 1)^1). In one function call, you can perform all four of these exponentiations in parallel with one function call if you arrange your arrays smartly:
M = np.array([[1,3,0,4,0,0],[3,3,1,4,0,0],[4,4,0,3,1,0]])
ps_and_qs = np.array([[p, (1-p), q, (1-q)]])
a = np.power(ps_and_qs, M[:,1:5])
Now a will be populated with a 3 x 4 matrix with all of your exponentiations.
Now the next step is how to reduce these. There are several reduction functions that are built into numpy that are efficiently implemented with vectorized loops where possible. They can speed up your code quite a bit. In particular, there is both a product reduction as well as a sum reduction. With your equation, we need to first multiply across the rows to get one number per row and then sum across the remaining column like this:
b = M[:,:1] * np.product(a,axis=1)
c = np.sum(b, axis=0)
c should now be a scalar equal to T evaluated at (p,q). That is a lot to take in for a third day, but something to consider if you continue to use numpy for numerical analysis on bigger projects that need better performance.

Code finding the first triangular number with more than 500 divisors will not finish running

Okay, so I'm working on Euler Problem 12 (find the first triangular number with a number of factors over 500) and my code (in Python 3) is as follows:
factors = 0
y=1
def factornum(n):
x = 1
f = []
while x <= n:
if n%x == 0:
f.append(x)
x+=1
return len(f)
def triangle(n):
t = sum(list(range(1,n)))
return t
while factors<=500:
factors = factornum(triangle(y))
y+=1
print(y-1)
Basically, a function goes through all the numbers below the input number n, checks if they divide into n evenly, and if so add them to a list, then return the length in that list. Another generates a triangular number by summing all the numbers in a list from 1 to the input number and returning the sum. Then a while loop continues to generate a triangular number using an iterating variable y as the input for the triangle function, and then runs the factornum function on that and puts the result in the factors variable. The loop continues to run and the y variable continues to increment until the number of factors is over 500. The result is then printed.
However, when I run it, nothing happens - no errors, no output, it just keeps running and running. Now, I know my code isn't the most efficient, but I left it running for quite a bit and it still didn't produce a result, so it seems more likely to me that there's an error somewhere. I've been over it and over it and cannot seem to find an error.
I'd merely request that a full solution or a drastically improved one isn't given outright but pointers towards my error(s) or spots for improvement, as the reason I'm doing the Euler problems is to improve my coding. Thanks!
You have very inefficient algorithm.
If you ask for pointers rather than full solution, main pointers are:
There is a more efficient way to calculate next triangular number. There is an explicit formula in the wiki. Also if you generate sequence of all numbers it is just more efficient to add next n to the previous number. (Sidenote list in sum(list(range(1,n))) makes no sense to me at all. If you want to use this approach anyway, sum(xrange(1,n) will probably be much more efficient as it doesn't require materialization of the range)
There are much more efficient ways to factorize numbers
There is a more efficient way to calculate number of factors. And it is actually called after Euler: see Euler's totient function
Generally Euler project problems (as in many other programming competitions) are not supposed to be solvable by sheer brute force. You should come up with some formula and/or more efficient algorithm first.
As far as I can tell your code will work, but it will take a very long time to calculate the number of factors. For 150 factors, it takes on the order of 20 seconds to run, and that time will grow dramatically as you look for higher and higher number of factors.
One way to reduce the processing time is to reduce the number of calculations that you're performing. If you analyze your code, you're calculating n%1 every single time, which is an unnecessary calculation because you know every single integer will be divisible by itself and one. Are there any other ways you can reduce the number of calculations? Perhaps by remembering that if a number is divisible by 20, it is also divisible by 2, 4, 5, and 10?
I can be more specific, but you wanted a pointer in the right direction.
From the looks of it the code works fine, it`s just not the best approach. A simple way of optimizing is doing until the half the number, for example. Also, try thinking about how you could do this using prime factors, it might be another solution. Best of luck!
First you have to def a factor function:
from functools import reduce
def factors(n):
step = 2 if n % 2 else 1
return set(reduce(list.__add__,
([i, n//i] for i in range(1, int(pow(n,0.5) + 1)) if n % i
== 0)))
This will create a set and put all of factors of number n into it.
Second, use while loop until you get 500 factors:
a = 1
x = 1
while len(factors(a)) < 501:
x += 1
a += x
This loop will stop at len(factors(a)) = 500.
Simple print(a) and you will get your answer.

How to find complexity for the following program?

So I am not a CS major and have hard time answering questions about a program's big(O) complexity.
I wrote the following routine to output the pairs of numbers in an array which sum to 0:
asd=[-3,-2,-3,2,3,2,4,5,8,-8,9,10,-4]
def sum_zero(asd):
for i in range(len(asd)):
for j in range(i,len(asd)):
if asd[i]+asd[j]==0:
print asd[i],asd[j]
Now if someone asks the complexity of this method I know since the first loop goes thorough all the n items it will be more than (unless I am wrong) but can someone explain how to find the correct complexity?
If there is better more efficient way of solving this?
I won't give you a full solution, but will try to guide you.
You should get a pencil and a paper, and ask yourself:
How many times does the statement print asd[i], asd[j] execute? (in worst case, meaning that you shouldn't really care about the condition there)
You'll find that it really depends on the loop above it, which gets executed len(asd) (denote it by n) times.
The only thing you need to know, how many times is the inner loop executed giving that the outer loop has n iterations? (i runs from 0 up to n)
If you still not sure about the result, just take a real example, say n=20, and calculate how many times is the lowest statement executed, this will give you a very good indication about the answer.
def sum_zero(asd):
for i in range(len(asd)): # len called once = o(1), range called once = o(1)
for j in range(i,len(asd)): # len called once per i times = O(n), range called once per i times = O(n)
if asd[i]+asd[j]==0: # asd[i] called twice per j = o(2*n²)
# adding is called once per j = O(n²)
# comparing with 0 is called once per j = O(n²)
print asd[i],asd[j] # asd[i] is called twice per j = O(2*n²)
sum_zero(asd) # called once, o(1)
Assuming the worst case scenario (the if-condition always being true):
Total:
O(1) * 3
O(n) * 2
O(n²) * 6
O(6n² + 2n + 3)
A simple program to demonstrate the complexity:
target= []
quadraditc = []
linear = []
for x in xrange(1,100):
linear.append(x)
target.append(6*(x**2) + 2*x + 3)
quadraditc.append(x**2)
import matplotlib.pyplot as plt
plt.plot(linear,label="Linear")
plt.plot(target,label="Target Function")
plt.plot(quadraditc,label="Quadratic")
plt.ylabel('Complexity')
plt.xlabel('Time')
plt.legend(loc=2)
plt.show()
EDIT:
As pointed out by #Micah Smith, the above answer is the worst case operations, the Big-O is actually O(n^2), since the constants and lower order terms are omitted.

Unexpected performance curve from CPython merge sort

I have implemented a naive merge sorting algorithm in Python. Algorithm and test code is below:
import time
import random
import matplotlib.pyplot as plt
import math
from collections import deque
def sort(unsorted):
if len(unsorted) <= 1:
return unsorted
to_merge = deque(deque([elem]) for elem in unsorted)
while len(to_merge) > 1:
left = to_merge.popleft()
right = to_merge.popleft()
to_merge.append(merge(left, right))
return to_merge.pop()
def merge(left, right):
result = deque()
while left or right:
if left and right:
elem = left.popleft() if left[0] > right[0] else right.popleft()
elif not left and right:
elem = right.popleft()
elif not right and left:
elem = left.popleft()
result.append(elem)
return result
LOOP_COUNT = 100
START_N = 1
END_N = 1000
def test(fun, test_data):
start = time.clock()
for _ in xrange(LOOP_COUNT):
fun(test_data)
return time.clock() - start
def run_test():
timings, elem_nums = [], []
test_data = random.sample(xrange(100000), END_N)
for i in xrange(START_N, END_N):
loop_test_data = test_data[:i]
elapsed = test(sort, loop_test_data)
timings.append(elapsed)
elem_nums.append(len(loop_test_data))
print "%f s --- %d elems" % (elapsed, len(loop_test_data))
plt.plot(elem_nums, timings)
plt.show()
run_test()
As much as I can see everything is OK and I should get a nice N*logN curve as a result. But the picture differs a bit:
Things I've tried to investigate the issue:
PyPy. The curve is ok.
Disabled the GC using the gc module. Wrong guess. Debug output showed that it doesn't even run until the end of the test.
Memory profiling using meliae - nothing special or suspicious.
`
I had another implementation (a recursive one using the same merge function), it acts the similar way. The more full test cycles I create - the more "jumps" there are in the curve.
So how can this behaviour be explained and - hopefully - fixed?
UPD: changed lists to collections.deque
UPD2: added the full test code
UPD3: I use Python 2.7.1 on a Ubuntu 11.04 OS, using a quad-core 2Hz notebook. I tried to turn of most of all other processes: the number of spikes went down but at least one of them was still there.
You are simply picking up the impact of other processes on your machine.
You run your sort function 100 times for input size 1 and record the total time spent on this. Then you run it 100 times for input size 2, and record the total time spent. You continue doing so until you reach input size 1000.
Let's say once in a while your OS (or you yourself) start doing something CPU-intensive. Let's say this "spike" lasts as long as it takes you to run your sort function 5000 times. This means that the execution times would look slow for 5000 / 100 = 50 consecutive input sizes. A while later, another spike happens, and another range of input sizes look slow. This is precisely what you see in your chart.
I can think of one way to avoid this problem. Run your sort function just once for each input size: 1, 2, 3, ..., 1000. Repeat this process 100 times, using the same 1000 inputs (it's important, see explanation at the end). Now take the minimum time spent for each input size as your final data point for the chart.
That way, your spikes should only affect each input size only a few times out of 100 runs; and since you're taking the minimum, they will likely have no impact on the final chart at all.
If your spikes are really really long and frequent, you of course might want to increase the number of repetitions beyond the current 100 per input size.
Looking at your spikes, I notice the execution slows down exactly 3 times during a spike. I'm guessing the OS gives your python process one slot out of three during high load. Whether my guess is correct or not, the approach I recommend should resolve the issue.
EDIT:
I realized that I didn't clarify one point in my proposed solution to your problem.
Should you use the same input in each of your 100 runs for the given input size? Or should use 100 different (random) inputs?
Since I recommended to take the minimum of the execution times, the inputs should be the same (otherwise you'll be getting incorrect output, as you'll measuring the best-case algorithm complexity instead of the average complexity!).
But when you take the same inputs, you create some noise in your chart since some inputs are simply faster than others.
So a better solution is to resolve the system load problem, without creating the problem of only one input per input size (this is obviously pseudocode):
seed = 'choose whatever you like'
repeats = 4
inputs_per_size = 25
runtimes = defaultdict(lambda : float('inf'))
for r in range(repeats):
random.seed(seed)
for i in range(inputs_per_size):
for n in range(1000):
input = generate_random_input(size = n)
execution_time = get_execution_time(input)
if runtimes[(n, i)] > execution_time:
runtimes[(n,i)] = execution_time
for n in range(1000):
runtimes[n] = sum(runtimes[(n,i)] for i in range(inputs_per_size))/inputs_per_size
Now you can use runtimes[n] to build your plot.
Of course, depending if your system is super-noisy, you might change (repeats, inputs_per_size) from (4,25) to say, (10,10), or even (25,4).
I can reproduce the spikes using your code:
You should choose an appropriate timing function (time.time() vs. time.clock() -- from timeit import default_timer), number of repetitions in a test (how long each test takes), and number of tests to choose the minimal time from. It gives you a better precision and less external influence on the results. Read the note from timeit.Timer.repeat() docs:
It’s tempting to calculate mean and standard deviation from the result
vector and report these. However, this is not very useful. In a
typical case, the lowest value gives a lower bound for how fast your
machine can run the given code snippet; higher values in the result
vector are typically not caused by variability in Python’s speed, but
by other processes interfering with your timing accuracy. So the min()
of the result is probably the only number you should be interested in.
After that, you should look at the entire vector and apply common
sense rather than statistics.
timeit module can choose appropriate parameters for you:
$ python -mtimeit -s 'from m import testdata, sort; a = testdata[:500]' 'sort(a)'
Here's timeit-based performance curve:
The figure shows that sort() behavior is consistent with O(n*log(n)):
|------------------------------+-------------------|
| Fitting polynom | Function |
|------------------------------+-------------------|
| 1.00 log2(N) + 1.25e-015 | N |
| 2.00 log2(N) + 5.31e-018 | N*N |
| 1.19 log2(N) + 1.116 | N*log2(N) |
| 1.37 log2(N) + 2.232 | N*log2(N)*log2(N) |
To generate the figure I've used make-figures.py:
$ python make-figures.py --nsublists 1 --maxn=0x100000 -s vkazanov.msort -s vkazanov.msort_builtin
where:
# adapt sorting functions for make-figures.py
def msort(lists):
assert len(lists) == 1
return sort(lists[0]) # `sort()` from the question
def msort_builtin(lists):
assert len(lists) == 1
return sorted(lists[0]) # builtin
Input lists are described here (note: the input is sorted so builtin sorted() function shows expected O(N) performance).

Categories