Unexpected time result for optimization of Project euler Problem 12 - python

I have solved Project Euler problem 12 and I tried to optimize my solution.
The part I am focusing on is the part of finding the number of divisors.
The first algorithm I created I thought was going to be slower than the second but it wasn't and I don't understand why?
First(regular count goes until n**0.5):
from math import sqrt
def get(n):
count = 0
limit = sqrt(n)
for i in range(1,int(limit)+1):
if n%i==0:
count+=2
if limit.is_integer():
return count-1
return count
Second(Prime factoring to get each the degree of each prime in order to use this fomula, I am using the form of primes as you can see here to calculate faster but its is still slower ).:
def Get_Devisors_Amount(n):#Prime factorization
if n <=1: return 1
dcount = 1
count = 0
while n%2==0:
count+=1
n//=2
dcount*=(count+1)
count = 0
while n%3==0:
count+=1
n//=3
dcount*=(count+1)
i = 1#count for the form of primes 6n+-1
while n!=1:
t = 6*i+1
count = 0
while n%t==0:
count+=1
n//=t
dcount*=(count+1)
t = 6*i-1
count = 0
while n%t==0:
count+=1
n//=t
if count!=0:
dcount*=(count+1)
i+=1
if dcount==1: return 2# n is a prime
return dcount
How I tested the time
import time
start = time.time()
for i in range(1,1000):
get(i)
print(time.time()-start)
start = time.time()
for i in range(1,1000):
Get_Devisors_Amount(i)
print(time.time()-start)
Output:
get: 0.00299835205078125
Get_Devisors_Amount: 0.009994029998779297
Although I am using property and a formula that I think should make the search time lower the first method is still faster. could you explain why to me?

In the first approach, you testing divisibility with each number from 1 to sqrt(x), so the complexity of testing a single number is sqrt(x). According to this formula, the sum of first n roots can be approximated to n*sqrt(n).
Time complexity of method 1: O(N*sqrt(N)) (N is the total count of numbers being tested).
In the second approach, there are 2 cases:
If a number isn't prime, all primes upto n are tested. Complexity - O(n/6) = O(n)
If a number is prime, we can approximate the complexity to be O(log(n)) (there might be a more accurate calculation of the complexity for this case, I'm making an approximation since this wouldn't matter in the proof)
For the prime numbers, using the fact that we test them with (n/6) primes, the complexity would become 5/6 + 7/6 + 11/6 + 13/6 + 17/6 ..... (last prime before n)/6. This can be reduced to (sum of all prime numbers till n)/6 for the time being. Now, the sum of all prime numbers upto N can be approximated as N^2/(2*logN). Thus the complexity for this step becomes N^2/(6*(2*logN)) = N^2/(12*lognN).
Time complexity of method 2: O(N^2/(12*lognN)) (N is the total count of numbers being tested).
(if you want, you can make more accurate bounds for the time complexities of each step. I have made a few approximations since it helps in proving the point without making any overoptimistic assumption).

Your first algorithm wisely only considers divisors up to sqrt(n).
But your second algorithm considers divisors all the way up to n, although admittedly if n has many factors, n will be reduced along the way.
If you fix this in your algorithm, by changing this:
t = 6*i-1
to this:
t = 6*i-1
if t*t > n:
return dcount * 2
Then your second algorithm will be faster.
(The * 2 is because the algorithm would eventually find the remaining prime factor (n itself) and then dcount *= (count + 1) would double dcount before returning it.)

Related

Using dictionaries to improve algorithm efficiency

The nth triangle number is defined as the sum 1+2+...+n. I'm working on Project Euler problem 12 which asks to find the smallest triangle number that has over 500 divisors, so (in Python) I wrote two functions, mytri(n) and mydiv(n), to compute the nth triangle number, and the number of divisors of n, respectively. Then, I used a while loop that iterates until mydiv(mytri(n)) is greater than or equal to 500:
import math
def mytri(n):
return n*(n+1)/2
def mydivs(n):
num = 0
max = math.floor(n/2)
for k in range(1,max+1):
if n%k == 0:
num += 1
return num+1
n = 1
while (mydivs(mytri(n)) <= 500): n += 1
print(mytri(n))
I thought I wrote mytri() and mydiv() pretty efficiently, but based on some tests, it seems like this program gets unwieldy very quickly. To compute the first number with over 100 divisors takes less than a second, but to compute the first number with over 150 divisors takes about 8-9 seconds, indicating that it's probably exponential in time? I don't have much experience with computational complexity or writing efficient algorithms but I once saw an example of using dictionaries (memoization I think?) to greatly improve a recursive algorithm to compute the Fibonacci numbers, and I was wondering if a similar idea could be used here.
For example, the nth triangle number can be expressed as n(n+1)/2, so without loss of generality it's the product of an odd and even number, say n and (n+1)/2 respectively. If you could store the divisors for each number up to n in a dictionary, then you wouldn't have to redo the computations in mydiv(), and instead you could just reference the dictionary. The only issue is finding out which divisors between n and (n+1)/2 overlap to get the right number of them. Is this a reasonable line of attack? Or am I missing something here?
Additionally, what is the time complexity of my algorithm and how would I calculate it?
mytri(n)'s Time complexity is O(1). mydivs(n)'s time complexity is O(n/2) which is O(n).while (mydivs(mytri(n)) <= 500)'s time complexity is O(n^3) since it is loop inside a loop, one loop runs N times and other runs N^2 times.. You can reduce the mydivs(n)'s time complexity to O(sqrt(n).
def new_mydivs(n):
res=set()
for i in range(1,int(n**0.5)+1):
#print(i)
if n%i==0:
res.update([i,n//i])
#print(res)
return len(res) #returns the number of divisors.
The time complexity of new_mydivs(n) is O(sqrt(n)).
Your code performance time for finding a number with 250 divisors.
import time
import timeit
import math
def mytri(n):
return n*(n+1)/2
def mydivs(n):
num = 0
max = math.floor(n/2)
for k in range(1,max+1):
if n%k == 0:
num += 1
return num+1
def main():
n = 1
while (mydivs(mytri(n)) <= 250): n += 1
print(mytri(n))
startTime=time.time()
main()
print(time.time()-startTime)
output:
2162160.0
100.24735450744629
My code performance time for 250 divisors:
import time
import timeit
import math
def mytri(n):
return n*(n+1)/2
def mydivs(n):
res=set()
for i in range(1,int(n**0.5)+1):
#print(i)
if n%i==0:
res.update([i,n//i])
#print(res)
return len(res) #returns the number of divisors.
def main():
n = 1
while (mydivs(mytri(n)) <= 250): n += 1
print(mytri(n))
startTime=time.time()
main()
print(time.time()-startTime)
output:
2162160.0
0.22459840774536133
for 500 divisors:
76576500.0
5.7917985916137695
for 750 divisors:
236215980.0
17.126375198364258
See the performance increased drastically.
RE: time complexity. You have two loops one inside another, one running up to N, another running up to N^2. This gives us the O(N^3) time complexity.
You may use dictionaries to save the partial result, but the overall complexity will still be O(N^3), however with a smaller constant factor, because you still have to loop over the rest of the values.

Optimising Factoring Program

This is my factorising code which is used to find all the factors of a number but after roughly 7 digits, the program begins to slow down.
so I was wondering if there is any method of optimising this program to allow it to factorise numbers faster.
number = int(input("Input the whole number here?\n"))
factors = [1]
def factorization():
global factors
for i in range(1 , number):
factor = (number/i)
try:
factorInt = int(number/i)
if factorInt == factor:
factors.append(factorInt)
except ValueError:
pass
factorization()
print(factors)
The most effective optimization is by noting that when the number has non trivial factors, and the smallest of them is smaller than the square root of the number, and there is no need to continue looping past this square root.
Indeed, let this smallest factor be m. We have n = m.p and the other factor is such that p >= m. But if m > √n, then m.p >= n, a contradiction.
Note that this optimization only speeds-up the processing of prime numbers (for the composite ones, the search stops before √n anyway). But the density of primes and the fact that n is much larger than √n make it abolutely worth.
Another optimization is by noting that the smallest divisor must be a prime, and you can use a stored table of primes. (There are less than 51 million primes below one billion.) The speedup will be less noticeable.
Let me offer a NumPy-based solution. It seems quite efficient:
import numpy as np
def factorize(number):
n = np.arange(2, np.sqrt(number), dtype=int)
n2 = number / n
low = n[n2.astype(int) == n2]
return np.concatenate((low, number // low,))
factorize(34976237696437)
#array([ 71, 155399, 3170053, 492623066147, 225073763, 11033329])])
# 176 msec

Trial Division faster than Sieve for Primality Test?

I wrote two primality tests in python. First one is based on trial division, the second one applies sieve of Eratosthenes. My understanding is that sieve should have a smaller time complexity than trial, so sieve should be asymptotically faster.
However when I run it, trial division is much faster. For example, when n = 6*(10**11), is_prime(n) takes less than a second, but is_prime_sieve(n) practically never ends! Did I wrote the sieve wrong?
My code is:
# determines if prime using trial division
def is_prime(n):
d = {}
u = math.floor(math.sqrt(n))
i = 2
# trial division: works pretty well for determining 600 billion
while (i <= u):
if (n % i == 0):
return False
i += 1
return True
# primality test with sieve
def is_prime_sieve(n):
# first find all prime numbers from 2 to u
# then test them
u = math.floor(math.sqrt(n))
prime = {}
lst = range(2, int(u)+1)
for i in lst:
j = 2
prime[i] = True
while (i*j <= u):
prime[i*j] = False
j += 1
while (u >= 2):
if (u not in prime) or (prime[u]):
if (n % u == 0):
return False
u -= 1
return True
For the Sieve of Erastothenes you are recomputing the sieve every time. The sieve should be cached so that you only generate it once. It works well when you build the sieve once and then perform many primality checks; it is very inefficient if you only check a single number.
This means, by the way, that you need to anticipate the highest prime number and generate the sieve table up to that number.
When done right, is_prime_sieve becomes simply:
def is_prime_sieve(n):
return prime[n]
You would not need the while loop.
The sieve finds all primes from 1 to n. Calculating one sieve is an awful lot faster than doing trial division for each of these numbers. Obviously if you determine all primes from 1 to n, and then throw away all the information for the first n-1 numbers, that's very inefficient.
It's like comparing the speed of a bus and a two seater sports car. The bus is much much faster if you need to take fifty people from A to B. If you take a single passenger, guess what, the sports car is faster.
But even with the traditional method of building the sieve, there is still far too many transactions occurring.
I have developed a way of extracting prime numbers without division (except for data management purposes) adapting the basic sieve of Eratosthenes. I do not have to set any upper or lower bounds, the algorithm is completely open ended. I develop a data string from which I can go anywhere in the calculated range and pull up all the prime numbers in a subset range. I waste no calculations with division.

Trying to understand a solution to project Euler # 3

The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ? # http://projecteuler.net/problem=3
I have a deal going with myself that if I can't solve a project Euler problem I will understand the best solution I can find. I did write an algorithm which worked for smaller numbers but was too inefficient to work for bigger ones. So I googled Zach Denton's answer and started studying it.
Here is his code:
#!/usr/bin/env python
import math
def factorize(n):
res = []
# iterate over all even numbers first.
while n % 2 == 0:
res.append(2)
n //= 2
# try odd numbers up to sqrt(n)
limit = math.sqrt(n+1)
i = 3
while i <= limit:
if n % i == 0:
res.append(i)
n //= i
limit = math.sqrt(n+i)
else:
i += 2
if n != 1:
res.append(n)
return res
print max(factorize(600851475143))
Here are the bits I can't figure out for myself:
In the second while loop, why does he use a sqrt(n + 1) instead of just sqrt(n)?
Why wouldn't you also use sqrt(n + 1) when iterating over the even numbers in the first while loop?
How does the algorithm manage to find only prime factors? In the algorithm I first wrote I had a separate test for checking whether a factor was prime, but he doesn't bother.
I suspect the +1 has to do with the imprecision of float (I am not sure whether it's actually required, or is simply a defensive move on the author's part).
The first while loop factors all twos out of n. I don't see how sqrt(n + 1) would fit in there.
If you work from small factor to large factors, you automatically eliminate all composite candidates. Think about it: once you've factored out 5, you've automatically factored out 10, 15, 20 etc. No need to check whether they're prime or not: by that point n will not be divisible by them.
I suspect that checking for primality is what's killing your original algorithm's performance.

Project Euler #25: Keep getting Overflow error (result to large) - is it to do with calculating fibonacci number?

I'm working on solving the Project Euler problem 25:
What is the first term in the Fibonacci sequence to contain 1000
digits?
My piece of code works for smaller digits, but when I try a 1000 digits, i get the error:
OverflowError: (34, 'Result too large')
I'm thinking it may be on how I compute the fibonacci numbers, but i've tried several different methods, yet i get the same error.
Here's my code:
'''
What is the first term in the Fibonacci sequence to contain 1000 digits
'''
def fibonacci(n):
phi = (1 + pow(5, 0.5))/2 #Golden Ratio
return int((pow(phi, n) - pow(-phi, -n))/pow(5, 0.5)) #Formula: http://bit.ly/qDumIg
n = 0
while len(str(fibonacci(n))) < 1000:
n += 1
print n
Do you know what may the cause of this problem and how i could alter my code avoid this problem?
Thanks in advance.
The problem here is that only integers in Python have unlimited length, floating point values are still calculated using normal IEEE types which has a maximum precision.
As such, since you're using an approximation, using floating point calculations, you will get that problem eventually.
Instead, try calculating the Fibonacci sequence the normal way, one number (of the sequence) at a time, until you get to 1000 digits.
ie. calculate 1, 1, 2, 3, 5, 8, 13, 21, 34, etc.
By "normal way" I mean this:
/ 1 , n < 3
Fib(n) = |
\ Fib(n-2) + Fib(n-1) , n >= 3
Note that the "obvious" approach given the above formulas is wrong for this particular problem, so I'll post the code for the wrong approach just to make sure you don't waste time on that:
def fib(n):
if n <= 3:
return 1
else:
return fib(n-2) + fib(n-1)
n = 1
while True:
f = fib(n)
if len(str(f)) >= 1000:
print("#%d: %d" % (n, f))
exit()
n += 1
On my machine, the above code starts going really slow at around the 30th fibonacci number, which is still only 6 digits long.
I modified the above recursive approach to output the number of calls to the fib function for each number, and here are some values:
#1: 1
#10: 67
#20: 8361
#30: 1028457
#40: 126491971
I can reveal that the first Fibonacci number with 1000 digits or more is the 4782th number in the sequence (unless I miscalculated), and so the number of calls to the fib function in a recursive approach will be this number:
1322674645678488041058897524122997677251644370815418243017081997189365809170617080397240798694660940801306561333081985620826547131665853835988797427277436460008943552826302292637818371178869541946923675172160637882073812751617637975578859252434733232523159781720738111111789465039097802080315208597093485915332193691618926042255999185137115272769380924184682248184802491822233335279409301171526953109189313629293841597087510083986945111011402314286581478579689377521790151499066261906574161869200410684653808796432685809284286820053164879192557959922333112075826828349513158137604336674826721837135875890203904247933489561158950800113876836884059588285713810502973052057892127879455668391150708346800909439629659013173202984026200937561704281672042219641720514989818775239313026728787980474579564685426847905299010548673623281580547481750413205269166454195584292461766536845931986460985315260676689935535552432994592033224633385680958613360375475217820675316245314150525244440638913595353267694721961
And that is just for the 4782th number. The actual value is the sum of all those values for all the fibonacci numbers from 1 up to 4782. There is no way this will ever complete.
In fact, if we would give the code 1 year of running time (simplified as 365 days), and assuming that the machine could make 10.000.000.000 calls every second, the algorithm would get as far as to the 83rd number, which is still only 18 digits long.
Actually, althought the advice given above to avoid floating-point numbers is generally good advice for Project Euler problems, in this case it is incorrect. Fibonacci numbers can be computed by the formula F_n = phi^n / sqrt(5), so that the first fibonacci number greater than a thousand digits can be computed as 10^999 < phi^n / sqrt(5). Taking the logarithm to base ten of both sides -- recall that sqrt(5) is the same as 5^(1/2) -- gives 999 < n log_10(phi) - 1/2 log_10(5), and solving for n gives (999 + 1/2 log_10(5)) / log_10(phi) < n. The left-hand side of that equation evaluates to 4781.85927, so the smallest n that gives a thousand digits is 4782.
You can use the sliding window trick to compute the terms of the Fibonacci sequence iteratively, rather than using the closed form (or doing it recursively as it's normally defined).
The Python version for finding fib(n) is as follows:
def fib(n):
a = 1
b = 1
for i in range(2, n):
b = a + b
a = b - a
return b
This works when F(1) is defined as 1, as it is in Project Euler 25.
I won't give the exact solution to the problem here, but the code above can be reworked so it keeps track of n until a sentry value (10**999) is reached.
An iterative solution such as this one has no trouble executing. I get the answer in less than a second.
def fibonacci():
current = 0
previous = 1
while True:
temp = current
current = current + previous
previous = temp
yield current
def main():
for index, element in enumerate(fibonacci()):
if len(str(element)) >= 1000:
answer = index + 1 #starts from 0
break
print(answer)
import math as m
import time
start = time.time()
fib0 = 0
fib1 = 1
n = 0
k = 0
count = 1
while k<1000 :
n = fib0 + fib1
k = int(m.log10(n))+1
fib0 = fib1
fib1 = n
count += 1
print n
print count
print time.time()-start
takes 0.005388 s on my pc. did nothing fancy just followed simple code.
Iteration will always be better. Recursion was taking to long for me as well.
Also used a math function for calculating the number of digits in a number instead of taking the number in a list and iterating through it. Saves a lot of time
Here is my very simple solution
list = [1,1,2]
for i in range(2,5000):
if len(str(list[i]+list[i-1])) == 1000:
print (i + 2)
break
else:
list.append(list[i]+list[i-1])
This is sort of a "rogue" way of doing it, but if you change the 1000 to any number except one, it gets it right.
You can use the datatype Decimal. This is a little bit slower but you will be able to have arbitrary precision.
So your code:
'''
What is the first term in the Fibonacci sequence to contain 1000 digits
'''
from Decimal import *
def fibonacci(n):
phi = (Decimal(1) + pow(Decimal(5), Decimal(0.5))) / 2 #Golden Ratio
return int((pow(phi, Decimal(n))) - pow(-phi, Decimal(-n)))/pow(Decimal(5), Decimal(0.5)))
n = 0
while len(str(fibonacci(n))) < 1000:
n += 1
print n

Categories