python - time to execute a loop increases suddenly - python

I have some small piece of software that calculates the number of factors of each triangle number to see what is the first one of them has more than X number of factors (yes, it's a projecteuler problem, number 12,, although i didn't solve it yet)... as am trying making X some random values to see what the code does and in how much time, I noticed something strange (to me at least): until X=47 the execution time increases in obviously normal way, but when X = 48 it increases more than normal, and function calls are much greater than the rate, it (explodes) if I would say that.. why does it do that??
the code:
def fac(n):
c=0
for i in range (1,n+1):
if n%i==0:
c=c+1
return c
n=1
while True:
summ=0
for i in range (1,n+1):
summ=summ+i
if fac(summ)>X:
break
n=n+1
print summ
and when profiling:
when X=45 : 314 function calls in 0.027 CPU seconds
when X=46 : 314 function calls in 0.026 CPU seconds
when X=47 : 314 function calls in 0.026 CPU seconds
when X=48 : 674 function calls in 0.233 CPU seconds
when X=49 : 674 function calls in 0.237 CPU seconds
I assume that if I continued I would meet other points that system calls increases and time increases suddenly, and previously there were points like that but time was so small so it did't matter so much.. Why function calls suddenly increases?? Isn't it supposed just to call the function one more time for the new value??
P.S. am using cProfile as a profiler, and X in the code here is just for demonstration, I write the value directly in the code... thank you in advance...

Have you looked at the actual values involved?
The first triangular number with more than 47 factors is T(104) = 5460, which has 48 factors.
But the first triangular number with more than 48 factors is T(224) = 25200, which has 90 factors. So no wonder it takes a lot more work.
If your code runs up to T(n), then it calls range 2n times and fac n times, for a total of 3n function calls. Thus for T(104) it requires 312 function calls, and for T(224) it requires 672 function calls. Presumably there are 2 function calls of overhead somewhere that you're not showing us, which explains the profiling results you get.
Your current strategy is not going to get you to the answer for the Project Euler problem. But I can give some hints.
Do you have to start over again with summ=0 each time you compute a triangular number?
Do you have to loop over all the numbers up to n in order to work out how many divisors it has? Could there be a quicker way? (How many divisors does 216 = 65536 have? Do you have to loop over all the numbers from 1 to 65536?)
How many divisors do triangular numbers have? (Look at some small triangular numbers where you can compute the answer.) Can you see any patterns that would help you compute the answer for much bigger triangular numbers?

If you check the output you'll see several spikes (sudden increasement) in execution time.
The reason is that the number of loops needed is not going up gradually but abruptly. Print out n after you while True loop and you'll see it.
Note: Euler is math site, don't write brute force algorithms ;)

Related

Is this just very unlikely? Or is it impossible

So, I'm a beginner in python (coding in general, really), and I've tried to make this little program which generates a random number of rods in 305 attempts
import random
rods = 0
def blazerods():
global rods
seed = random.randint(0, 100000000000)
random.seed(seed)
i = 0
rods = 0
for i in range(0, 305):
rnd = random.random()
if rnd < 0.50:
rods += 1
print(rods)
return rods
while 1==1:
blazerods()
if rods >= 211:
break
The goal is to get 211 or more rods. However, I ran the program for 30 minutes without results.
My questions are: Is it even possible to get 211 or higher with just this code I included?
Can I make it more likely that rods can be more than 211 (still being a very unlikely result, ofc) without changing the chance(50%)?
Is random.seed(seed) even useful?
The probability distribution of rods is Binomial(305,0.5), that is the probability of getting exactly n rods is (305 choose n) * 0.5^305.
To get the probability to get at least 211, you need to sum these terms from 211 to 305. Wolfram alpha gives that as 8.8e-12.
So... it is really, really unlikely and you will have to wait a long time.
If your loop runs 1000 times a second, you will expect to have enough rods about once every 4 years.
If I remember correctly, Matt Parker from the Youtube channel Stand-up Maths has something to say about this particular case in his video "How lucky is too lucky".
As pointed out by Jens, this is easy to calculate via the Binomial distribution. The SciPy stats module allows you to calculate this by doing:
from scipy import stats
# i.e. 305 draws with equal probability
d = stats.binom(305, 0.5)
# the probability of seeing something greater than this value
p = d.sf(210)
which should give you the same value as Jens got: ~8.8e-12.
Next we can use the datetime module to convert this number into the expected time you have to wait:
from datetime import timedelta
time_per_try = timedelta(seconds=1/1000)
print(time_per_try / p)
which should give you ~1300 days, or 3.6 years. Technically, this is the time you'll have to wait to have a 50% chance of seeing it, and it could appear much sooner or later.
You can calculate reasonable values of when this would happen, using the negative binomial distribution. In Python, this looks like:
for q in stats.nbinom(1, p).ppf([0.025, 0.975]):
print(time_per_try * q)
where the 0.025 and 0.975 values give you the 95% confidence interval you hear scientists talking about.
It tells you that if you had 20 computers running your algorithm in parallel, each doing 1000 tests per second, you could expect the first one to finish in around a month while the slowest one would likely be going on for more than 10 years.

Code finding the first triangular number with more than 500 divisors will not finish running

Okay, so I'm working on Euler Problem 12 (find the first triangular number with a number of factors over 500) and my code (in Python 3) is as follows:
factors = 0
y=1
def factornum(n):
x = 1
f = []
while x <= n:
if n%x == 0:
f.append(x)
x+=1
return len(f)
def triangle(n):
t = sum(list(range(1,n)))
return t
while factors<=500:
factors = factornum(triangle(y))
y+=1
print(y-1)
Basically, a function goes through all the numbers below the input number n, checks if they divide into n evenly, and if so add them to a list, then return the length in that list. Another generates a triangular number by summing all the numbers in a list from 1 to the input number and returning the sum. Then a while loop continues to generate a triangular number using an iterating variable y as the input for the triangle function, and then runs the factornum function on that and puts the result in the factors variable. The loop continues to run and the y variable continues to increment until the number of factors is over 500. The result is then printed.
However, when I run it, nothing happens - no errors, no output, it just keeps running and running. Now, I know my code isn't the most efficient, but I left it running for quite a bit and it still didn't produce a result, so it seems more likely to me that there's an error somewhere. I've been over it and over it and cannot seem to find an error.
I'd merely request that a full solution or a drastically improved one isn't given outright but pointers towards my error(s) or spots for improvement, as the reason I'm doing the Euler problems is to improve my coding. Thanks!
You have very inefficient algorithm.
If you ask for pointers rather than full solution, main pointers are:
There is a more efficient way to calculate next triangular number. There is an explicit formula in the wiki. Also if you generate sequence of all numbers it is just more efficient to add next n to the previous number. (Sidenote list in sum(list(range(1,n))) makes no sense to me at all. If you want to use this approach anyway, sum(xrange(1,n) will probably be much more efficient as it doesn't require materialization of the range)
There are much more efficient ways to factorize numbers
There is a more efficient way to calculate number of factors. And it is actually called after Euler: see Euler's totient function
Generally Euler project problems (as in many other programming competitions) are not supposed to be solvable by sheer brute force. You should come up with some formula and/or more efficient algorithm first.
As far as I can tell your code will work, but it will take a very long time to calculate the number of factors. For 150 factors, it takes on the order of 20 seconds to run, and that time will grow dramatically as you look for higher and higher number of factors.
One way to reduce the processing time is to reduce the number of calculations that you're performing. If you analyze your code, you're calculating n%1 every single time, which is an unnecessary calculation because you know every single integer will be divisible by itself and one. Are there any other ways you can reduce the number of calculations? Perhaps by remembering that if a number is divisible by 20, it is also divisible by 2, 4, 5, and 10?
I can be more specific, but you wanted a pointer in the right direction.
From the looks of it the code works fine, it`s just not the best approach. A simple way of optimizing is doing until the half the number, for example. Also, try thinking about how you could do this using prime factors, it might be another solution. Best of luck!
First you have to def a factor function:
from functools import reduce
def factors(n):
step = 2 if n % 2 else 1
return set(reduce(list.__add__,
([i, n//i] for i in range(1, int(pow(n,0.5) + 1)) if n % i
== 0)))
This will create a set and put all of factors of number n into it.
Second, use while loop until you get 500 factors:
a = 1
x = 1
while len(factors(a)) < 501:
x += 1
a += x
This loop will stop at len(factors(a)) = 500.
Simple print(a) and you will get your answer.

Complexity 0 Notation

I am a bit confused on this question and would appreciate some guidance on it:
An O(n2) function takes approx 1 second to run when N is 10000.
How long will it take when N is 30000?
I was thinking that it would either be 1 second as well or 3 seconds since it is three times the size, but I am not sure if my logic is correct.
Thank you.
From Wikipedia:
In computer science, the time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the string representing the input.
This way, if complexity is O(n^2) and input is 3 times greater, then time of work is 3^2 = 9 times greater. Time of work is 9 seconds.
There are many problems with the question.
First problem: time complexity does not, in general, measure time in seconds. For example, the time complexity of a sorting algorithm might refer to the number of comparisons (or swaps), and the time complexity of a hash table lookup might also refer to the number of comparisons performed. It's debatable whether the actual runtime is proportional to these measurements.
Second problem: the definition of big-O is this:
f(n) = O(g(n)) if there's N and k such that n > N implies f(n) < k*g(n).
That's a problem because even if the runtime in this case is measured in seconds, applying the definition to O(n^2) says only that for large enough n that the function is bounded above by some multiple of n^2.
So there's no guarantee that 10000 and 30000 are big enough to qualify for "big enough", and even if they were, you can't begin to estimate k from a single data point. And even with that estimate, you only get an upper bound.
What the question probably meant to ask was this:
Suppose that a function runs in time approximately proportional to n^2. It takes 1 second when n=10000. Approximately long does it take when n=30000?
Then, one can solve the equations:
1 sec ~= k * 10000^2
answer ~= k * 30000^2
= 3^2 * k * 10000^2
~= 3^2 * 1 sec
= 9 sec

Asymptotic complexity python

I have a task to find asymptotic complexity of this python code.
for i in range(x):
if i == 0:
for j in range(x):
for k in range(500):
print("A ")
By what I know it should be 500*x. Because the first cycle only goes once (i==0), second one goes x times and third one 500 times, so therefore it should be 500*x, shouldn´t it? However this isn´t the right answer. Could you help me out?
Asymptotic complexity describes the growth of execution time as the variable factors grow arbitrarily large. In short, constants added or multiplied don't count at all, since they don't change with the variable.
Yes, there are 500*x lines printed. You also have x-1 non-functional loop iterations. Your total time would be computed as something like
(x-1)[loop overhead] + x[loop overhead] + 500*x[loop overhead + print
time]
However, the loop overhead, being a constant, is insignificant, and dropped out of the complexity expression. Likewise, the 500 is merely a scaling factor, and is also dropped from the expression.
The complexity is O(x).
It's 501*x since you also have to check x times the line if i == 0.
As the other answer says, usually we don't include the factor. But sometimes we do.

How to calculate the growth rate, and time amount of an algorithm?

I am currently enjoying Think Complexity by Allen Downey and a few hours ago I finished the section on growth rate. I paused reading and googled growth rates, and expanded on the info the book gave me. I also found out that you can calculate the amount of time it takes for an algorithm to calculate the raw data. I have a lot of questions that can't be answered by google, or maybe I need a personal touch in my answers since it really helps me understand. My questions are:
1- How is it possible to calculate the growth rate of a simple algorithm? For example, I just wrote this loop to calculate sine of a given angle in radians using Tayler series:
for i in range(0, 360):
return sum(((-1)**i / (factorial((2 * i) + 1))) * d ** ((2*i) + 1))
And factorial:
def factorial(n):
factorial = 1
for i in range(1, n+1):
factorial *= i
return factorial
How do I calculate its growth rate?
2- I became familiar with some very bad algorithms like Bogosort. It takes a large amount of time to sort an array using bogosort. But how do you calculate the time? It differs from computer to computer.
3- What is the Big-O notation and how is it related to the growth rate?
Thanks for your answers.
Generally it is not a good approach to calculate the time a function takes since it is dependent on many factors. For this reason we often express complexity in terms of computational steps.
Multiple notations exist (big-Oh, big-Omega, big-Theta).
Big-Oh represents an upper bound, hence O(n) indicates that in the worst-case it will perform n steps.
Big-Omega (Ω) is a lower bound, hence Ω(n) indicates a minimum of n steps.
A combination of both is big-theta Ө, hence, Ө(n) indicates that it will take exactly n steps.
In your case, factorial is defined as
def factorial(n):
factorial = 1
for i in range(1, n+1):
factorial *= i
return factorial
This function will loop once from 1 to n+1, hence, it depends on its argument n. We can say that it will perform exactly n steps, hence, we can say that factorial is in Ө(n). Note that this is clearly linear.

Categories