Using dictionaries to improve algorithm efficiency

Using dictionaries to improve algorithm efficiency - python

The nth triangle number is defined as the sum 1+2+...+n. I'm working on Project Euler problem 12 which asks to find the smallest triangle number that has over 500 divisors, so (in Python) I wrote two functions, mytri(n) and mydiv(n), to compute the nth triangle number, and the number of divisors of n, respectively. Then, I used a while loop that iterates until mydiv(mytri(n)) is greater than or equal to 500:
import math
def mytri(n):
return n*(n+1)/2
def mydivs(n):
num = 0
max = math.floor(n/2)
for k in range(1,max+1):
if n%k == 0:
num += 1
return num+1
n = 1
while (mydivs(mytri(n)) <= 500): n += 1
print(mytri(n))
I thought I wrote mytri() and mydiv() pretty efficiently, but based on some tests, it seems like this program gets unwieldy very quickly. To compute the first number with over 100 divisors takes less than a second, but to compute the first number with over 150 divisors takes about 8-9 seconds, indicating that it's probably exponential in time? I don't have much experience with computational complexity or writing efficient algorithms but I once saw an example of using dictionaries (memoization I think?) to greatly improve a recursive algorithm to compute the Fibonacci numbers, and I was wondering if a similar idea could be used here.
For example, the nth triangle number can be expressed as n(n+1)/2, so without loss of generality it's the product of an odd and even number, say n and (n+1)/2 respectively. If you could store the divisors for each number up to n in a dictionary, then you wouldn't have to redo the computations in mydiv(), and instead you could just reference the dictionary. The only issue is finding out which divisors between n and (n+1)/2 overlap to get the right number of them. Is this a reasonable line of attack? Or am I missing something here?
Additionally, what is the time complexity of my algorithm and how would I calculate it?

mytri(n)'s Time complexity is O(1). mydivs(n)'s time complexity is O(n/2) which is O(n).while (mydivs(mytri(n)) <= 500)'s time complexity is O(n^3) since it is loop inside a loop, one loop runs N times and other runs N^2 times.. You can reduce the mydivs(n)'s time complexity to O(sqrt(n).
def new_mydivs(n):
res=set()
for i in range(1,int(n**0.5)+1):
#print(i)
if n%i==0:
res.update([i,n//i])
#print(res)
return len(res) #returns the number of divisors.
The time complexity of new_mydivs(n) is O(sqrt(n)).
Your code performance time for finding a number with 250 divisors.
import time
import timeit
import math
def mytri(n):
return n*(n+1)/2
def mydivs(n):
num = 0
max = math.floor(n/2)
for k in range(1,max+1):
if n%k == 0:
num += 1
return num+1
def main():
n = 1
while (mydivs(mytri(n)) <= 250): n += 1
print(mytri(n))
startTime=time.time()
main()
print(time.time()-startTime)
output:
2162160.0
100.24735450744629
My code performance time for 250 divisors:
import time
import timeit
import math
def mytri(n):
return n*(n+1)/2
def mydivs(n):
res=set()
for i in range(1,int(n**0.5)+1):
#print(i)
if n%i==0:
res.update([i,n//i])
#print(res)
return len(res) #returns the number of divisors.
def main():
n = 1
while (mydivs(mytri(n)) <= 250): n += 1
print(mytri(n))
startTime=time.time()
main()
print(time.time()-startTime)
output:
2162160.0
0.22459840774536133
for 500 divisors:
76576500.0
5.7917985916137695
for 750 divisors:
236215980.0
17.126375198364258
See the performance increased drastically.

RE: time complexity. You have two loops one inside another, one running up to N, another running up to N^2. This gives us the O(N^3) time complexity.
You may use dictionaries to save the partial result, but the overall complexity will still be O(N^3), however with a smaller constant factor, because you still have to loop over the rest of the values.

Related

Unexpected time result for optimization of Project euler Problem 12

I have solved Project Euler problem 12 and I tried to optimize my solution.
The part I am focusing on is the part of finding the number of divisors.
The first algorithm I created I thought was going to be slower than the second but it wasn't and I don't understand why?
First(regular count goes until n**0.5):
from math import sqrt
def get(n):
count = 0
limit = sqrt(n)
for i in range(1,int(limit)+1):
if n%i==0:
count+=2
if limit.is_integer():
return count-1
return count
Second(Prime factoring to get each the degree of each prime in order to use this fomula, I am using the form of primes as you can see here to calculate faster but its is still slower ).:
def Get_Devisors_Amount(n):#Prime factorization
if n <=1: return 1
dcount = 1
count = 0
while n%2==0:
count+=1
n//=2
dcount*=(count+1)
count = 0
while n%3==0:
count+=1
n//=3
dcount*=(count+1)
i = 1#count for the form of primes 6n+-1
while n!=1:
t = 6*i+1
count = 0
while n%t==0:
count+=1
n//=t
dcount*=(count+1)
t = 6*i-1
count = 0
while n%t==0:
count+=1
n//=t
if count!=0:
dcount*=(count+1)
i+=1
if dcount==1: return 2# n is a prime
return dcount
How I tested the time
import time
start = time.time()
for i in range(1,1000):
get(i)
print(time.time()-start)
start = time.time()
for i in range(1,1000):
Get_Devisors_Amount(i)
print(time.time()-start)
Output:
get: 0.00299835205078125
Get_Devisors_Amount: 0.009994029998779297
Although I am using property and a formula that I think should make the search time lower the first method is still faster. could you explain why to me?

In the first approach, you testing divisibility with each number from 1 to sqrt(x), so the complexity of testing a single number is sqrt(x). According to this formula, the sum of first n roots can be approximated to n*sqrt(n).
Time complexity of method 1: O(N*sqrt(N)) (N is the total count of numbers being tested).
In the second approach, there are 2 cases:
If a number isn't prime, all primes upto n are tested. Complexity - O(n/6) = O(n)
If a number is prime, we can approximate the complexity to be O(log(n)) (there might be a more accurate calculation of the complexity for this case, I'm making an approximation since this wouldn't matter in the proof)
For the prime numbers, using the fact that we test them with (n/6) primes, the complexity would become 5/6 + 7/6 + 11/6 + 13/6 + 17/6 ..... (last prime before n)/6. This can be reduced to (sum of all prime numbers till n)/6 for the time being. Now, the sum of all prime numbers upto N can be approximated as N^2/(2*logN). Thus the complexity for this step becomes N^2/(6*(2*logN)) = N^2/(12*lognN).
Time complexity of method 2: O(N^2/(12*lognN)) (N is the total count of numbers being tested).
(if you want, you can make more accurate bounds for the time complexities of each step. I have made a few approximations since it helps in proving the point without making any overoptimistic assumption).

Your first algorithm wisely only considers divisors up to sqrt(n).
But your second algorithm considers divisors all the way up to n, although admittedly if n has many factors, n will be reduced along the way.
If you fix this in your algorithm, by changing this:
t = 6*i-1
to this:
t = 6*i-1
if t*t > n:
return dcount * 2
Then your second algorithm will be faster.
(The * 2 is because the algorithm would eventually find the remaining prime factor (n itself) and then dcount *= (count + 1) would double dcount before returning it.)

How to stop loop running out of memory?

I've come back to programming after a long haitus so please forgive any stupid errors/inefficient code.
I am creating an encryption program that uses the RSA method of encryption which involves finding the coprimes of numbers to generate a key. I am using the Euclidean algorithm to generate highest common factors and then add the coprime to the list if HCF == 1. I generate two lists of coprimes for different numbers then compare to find coprimes in both sets. The basic code is below:
def gcd(a, b):
while b:
a,b=b,a%b
return a
def coprimes(n):
cp = []
for i in range(1,n):
if gcd(i, n) == 1:
cp.append(i)
print(cp)
def compare(n,m):
a = coprimes(n)
b = coprimes(m)
c = []
for i in a:
if i in b:
c.append(i)
print(c)
This code works perfectly for small numbers and gives me what I want but execution takes forever and is finally Killed when comupting for extremely large numbers in the billions range, which is necessary for even a moderate level of security.
I assume this is a memory issue but I cant work out how to do this in a non memory intensive way. I tried multiprocessing but that just made my computer unusable due to the amount of processes running.
How can I calculate the coprimes of large numbers and then compare two sets of coprimes in an efficent and workable way?

If the only thing you're worried about is running out of memory here you could use generators.
def coprimes(n):
for i in range(1,n):
if gcd(i, n) == 1:
yield i
This way you can use the coprime value then discard it once you don't need it. However, nothing is going to change the fact your code is O(N^2) and will always perform slow for large primes. And this assumes Euclid's algorithm is constant time, which it is not.

You could change the strategy and approach this from the perspective of common prime factors. The common coprimes between n and m will be all numbers that are not divisible by any of their common prime factors.
def primeFactors(N):
p = 2
while p*p<=N:
count = 0
while N%p == 0:
count += 1
N //= p
if count: yield p
p += 1 + (p&1)
if N>1: yield N
import math
def compare2(n,m):
# skip list for multiples of common prime factors
skip = { p:p for p in primeFactors(math.gcd(n,m)) }
for x in range(1,min(m,n)):
if x in skip:
p = skip[x] # skip multiple of common prime
m = x + p # determine next multiple to skip
while m in skip: m += p # for that prime
skip[m] = p
else:
yield x # comon coprime of n and m
The performance is considerably better than matching lists of coprimes, especially on larger numbers:
from timeit import timeit
timeit(lambda:list(compare2(10**5,2*10**5)),number=1)
# 0.025 second
timeit(lambda:list(compare2(10**6,2*10**6)),number=1)
# 0.196 second
timeit(lambda:list(compare2(10**7,2*10**7)),number=1)
# 2.18 seconds
timeit(lambda:list(compare2(10**8,2*10**8)),number=1)
# 20.3 seconds
timeit(lambda:list(compare2(10**9,2*10**9)),number=1)
# 504 seconds
At some point, building lists of all the coprimes becomes a bottleneck and you should just use/process them as they come out of the generator (for example to count how many there are):
timeit(lambda:sum(1 for _ in compare2(10**9,2*10**9)),number=1)
# 341 seconds
Another way to approach this, which is somewhat slower than the prime factor approach but much simpler to code, would be to list coprimes of the gcd between n and m:
import math
def compare3(n,m):
d = math.gcd(n,m)
for c in range(1,min(n,m)):
if math.gcd(c,d) == 1:
yield c
timeit(lambda:list(compare3(10**6,2*10**6)),number=1)
# 0.28 second
timeit(lambda:list(compare3(10**7,2*10**7)),number=1)
# 2.84 seconds
timeit(lambda:list(compare3(10**8,2*10**8)),number=1)
# 30.8 seconds
Given that it uses no memory resource, it could be advantageous in some cases:
timeit(lambda:sum(1 for _ in compare3(10**9,2*10**9)),number=1)
# 326 seconds

Python-Finding the amount of neighbours given a list

Problem statment
On the number line there are N houses (There can be more houses in 1 number). Two houses are said to be neighbours if the distance between them is less than some give D. (the distance between 2 houses which have the same number is one )
Find the number of all neighbours.
Mathematicaly the problem boils down to this. Given a multiset N and a number D
Find the number of houses where the distance between them is less than D
def main():
number_of_ppl,distance=map(int, input().split())
inputs=map(int,input().split())
numbers=sorted(inputs)
counter=0
sum=0
for x in range(0,len(numbers)):
for y in range(i+1,len(numbers)):
if abs(numbers[x]-numbers[y])<=distance:
counter +=1
else:
break
sum+=counter
counter=0
print(sum)
main()
This code works however it fails at 3 of the 8 test cases due to ineficient time. Is there something I am missing ?
How could I make this algorithem faster?
I tried using dictionaries but got the same result
P.S If it helps I can post the test cases where this program fails

Your current code is incomplete, but it seems you are doing 2 loops which if the distance is big enough, takes O(n^2) to run. This can be reduced to O(n log n). Note that when you iterate the numbers in order, when you analize the neighbours of arr[i], when looking at the neighbours of arr[i + 1] they will be the same + maybe some more.
def main():
number_of_ppl,distance=map(int, input().split())
inputs=map(int,input().split())
numbers = sorted(inputs)
sum = 0
pointer = 0
for idx, number in enumerate(numbers):
while pointer < len(numbers) and number + distance >= numbers[pointer]:
pointer += 1
sum += (pointer - idx)
print(sum)
main()

Optimizing Prime Number Python Code

I'm relatively new to the python world, and the coding world in general, so I'm not really sure how to go about optimizing my python script. The script that I have is as follows:
import math
z = 1
x = 0
while z != 0:
x = x+1
if x == 500:
z = 0
calculated = open('Prime_Numbers.txt', 'r')
readlines = calculated.readlines()
calculated.close()
a = len(readlines)
b = readlines[(a-1)]
b = int(b) + 1
for num in range(b, (b+1000)):
prime = True
calculated = open('Prime_Numbers.txt', 'r')
for i in calculated:
i = int(i)
q = math.ceil(num/2)
if (q%i==0):
prime = False
if prime:
calculated.close()
writeto = open('Prime_Numbers.txt', 'a')
num = str(num)
writeto.write("\n" + num)
writeto.close()
print(num)
As some of you can probably guess I'm calculating prime numbers. The external file that it calls on contains all the prime numbers between 2 and 20.
The reason that I've got the while loop in there is that I wanted to be able to control how long it ran for.
If you have any suggestions for cutting out any clutter in there could you please respond and let me know, thanks.

Reading and writing to files is very, very slow compared to operations with integers. Your algorithm can be sped up 100-fold by just ripping out all the file I/O:
import itertools
primes = {2} # A set containing only 2
for n in itertools.count(3): # Start counting from 3, by 1
for prime in primes: # For every prime less than n
if n % prime == 0: # If it divides n
break # Then n is composite
else:
primes.add(n) # Otherwise, it is prime
print(n)
A much faster prime-generating algorithm would be a sieve. Here's the Sieve of Eratosthenes, in Python 3:
end = int(input('Generate primes up to: '))
numbers = {n: True for n in range(2, end)} # Assume every number is prime, and then
for n, is_prime in numbers.items(): # (Python 3 only)
if not is_prime:
continue # For every prime number
for i in range(n ** 2, end, n): # Cross off its multiples
numbers[i] = False
print(n)

It is very inefficient to keep storing and loading all primes from a file. In general file access is very slow. Instead save the primes to a list or deque. For this initialize calculated = deque() and then simply add new primes with calculated.append(num). At the same time output your primes with print(num) and pipe the result to a file.
When you found out that num is not a prime, you do not have to keep checking all the other divisors. So break from the inner loop:
if q%i == 0:
prime = False
break
You do not need to go through all previous primes to check for a new prime. Since each non-prime needs to factorize into two integers, at least one of the factors has to be smaller or equal sqrt(num). So limit your search to these divisors.
Also the first part of your code irritates me.
z = 1
x = 0
while z != 0:
x = x+1
if x == 500:
z = 0
This part seems to do the same as:
for x in range(500):
Also you limit with x to 500 primes, why don't you simply use a counter instead, that you increase if a prime is found and check for at the same time, breaking if the limit is reached? This would be more readable in my opinion.
In general you do not need to introduce a limit. You can simply abort the program at any point in time by hitting Ctrl+C.
However, as others already pointed out, your chosen algorithm will perform very poor for medium or large primes. There are more efficient algorithms to find prime numbers: https://en.wikipedia.org/wiki/Generating_primes, especially https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes.

You're writing a blank line to your file, which is making int() traceback. Also, I'm guessing you need to rstrip() off your newlines.
I'd suggest using two different files - one for initial values, and one for all values - initial and recently computed.
If you can keep your values in memory a while, that'd be a lot faster than going through a file repeatedly. But of course, this will limit the size of the primes you can compute, so for larger values you might return to the iterate-through-the-file method if you want.
For computing primes of modest size, a sieve is actually quite good, and worth a google.
When you get into larger primes, trial division by the first n primes is good, followed by m rounds of Miller-Rabin. If Miller-Rabin probabilistically indicates the number is probably a prime, then you do complete trial division or AKS or similar. Miller Rabin can say "This is probably a prime" or "this is definitely composite". AKS gives a definitive answer, but it's slower.
FWIW, I've got a bunch of prime-related code collected together at http://stromberg.dnsalias.org/~dstromberg/primes/

Project Euler #25: Keep getting Overflow error (result to large) - is it to do with calculating fibonacci number?

I'm working on solving the Project Euler problem 25:
What is the first term in the Fibonacci sequence to contain 1000
digits?
My piece of code works for smaller digits, but when I try a 1000 digits, i get the error:
OverflowError: (34, 'Result too large')
I'm thinking it may be on how I compute the fibonacci numbers, but i've tried several different methods, yet i get the same error.
Here's my code:
'''
What is the first term in the Fibonacci sequence to contain 1000 digits
'''
def fibonacci(n):
phi = (1 + pow(5, 0.5))/2 #Golden Ratio
return int((pow(phi, n) - pow(-phi, -n))/pow(5, 0.5)) #Formula: http://bit.ly/qDumIg
n = 0
while len(str(fibonacci(n))) < 1000:
n += 1
print n
Do you know what may the cause of this problem and how i could alter my code avoid this problem?
Thanks in advance.

The problem here is that only integers in Python have unlimited length, floating point values are still calculated using normal IEEE types which has a maximum precision.
As such, since you're using an approximation, using floating point calculations, you will get that problem eventually.
Instead, try calculating the Fibonacci sequence the normal way, one number (of the sequence) at a time, until you get to 1000 digits.
ie. calculate 1, 1, 2, 3, 5, 8, 13, 21, 34, etc.
By "normal way" I mean this:
/ 1 , n < 3
Fib(n) = |
\ Fib(n-2) + Fib(n-1) , n >= 3
Note that the "obvious" approach given the above formulas is wrong for this particular problem, so I'll post the code for the wrong approach just to make sure you don't waste time on that:
def fib(n):
if n <= 3:
return 1
else:
return fib(n-2) + fib(n-1)
n = 1
while True:
f = fib(n)
if len(str(f)) >= 1000:
print("#%d: %d" % (n, f))
exit()
n += 1
On my machine, the above code starts going really slow at around the 30th fibonacci number, which is still only 6 digits long.
I modified the above recursive approach to output the number of calls to the fib function for each number, and here are some values:
#1: 1
#10: 67
#20: 8361
#30: 1028457
#40: 126491971
I can reveal that the first Fibonacci number with 1000 digits or more is the 4782th number in the sequence (unless I miscalculated), and so the number of calls to the fib function in a recursive approach will be this number:
1322674645678488041058897524122997677251644370815418243017081997189365809170617080397240798694660940801306561333081985620826547131665853835988797427277436460008943552826302292637818371178869541946923675172160637882073812751617637975578859252434733232523159781720738111111789465039097802080315208597093485915332193691618926042255999185137115272769380924184682248184802491822233335279409301171526953109189313629293841597087510083986945111011402314286581478579689377521790151499066261906574161869200410684653808796432685809284286820053164879192557959922333112075826828349513158137604336674826721837135875890203904247933489561158950800113876836884059588285713810502973052057892127879455668391150708346800909439629659013173202984026200937561704281672042219641720514989818775239313026728787980474579564685426847905299010548673623281580547481750413205269166454195584292461766536845931986460985315260676689935535552432994592033224633385680958613360375475217820675316245314150525244440638913595353267694721961
And that is just for the 4782th number. The actual value is the sum of all those values for all the fibonacci numbers from 1 up to 4782. There is no way this will ever complete.
In fact, if we would give the code 1 year of running time (simplified as 365 days), and assuming that the machine could make 10.000.000.000 calls every second, the algorithm would get as far as to the 83rd number, which is still only 18 digits long.

Actually, althought the advice given above to avoid floating-point numbers is generally good advice for Project Euler problems, in this case it is incorrect. Fibonacci numbers can be computed by the formula F_n = phi^n / sqrt(5), so that the first fibonacci number greater than a thousand digits can be computed as 10^999 < phi^n / sqrt(5). Taking the logarithm to base ten of both sides -- recall that sqrt(5) is the same as 5^(1/2) -- gives 999 < n log_10(phi) - 1/2 log_10(5), and solving for n gives (999 + 1/2 log_10(5)) / log_10(phi) < n. The left-hand side of that equation evaluates to 4781.85927, so the smallest n that gives a thousand digits is 4782.

You can use the sliding window trick to compute the terms of the Fibonacci sequence iteratively, rather than using the closed form (or doing it recursively as it's normally defined).
The Python version for finding fib(n) is as follows:
def fib(n):
a = 1
b = 1
for i in range(2, n):
b = a + b
a = b - a
return b
This works when F(1) is defined as 1, as it is in Project Euler 25.
I won't give the exact solution to the problem here, but the code above can be reworked so it keeps track of n until a sentry value (10**999) is reached.

An iterative solution such as this one has no trouble executing. I get the answer in less than a second.
def fibonacci():
current = 0
previous = 1
while True:
temp = current
current = current + previous
previous = temp
yield current
def main():
for index, element in enumerate(fibonacci()):
if len(str(element)) >= 1000:
answer = index + 1 #starts from 0
break
print(answer)

import math as m
import time
start = time.time()
fib0 = 0
fib1 = 1
n = 0
k = 0
count = 1
while k<1000 :
n = fib0 + fib1
k = int(m.log10(n))+1
fib0 = fib1
fib1 = n
count += 1
print n
print count
print time.time()-start
takes 0.005388 s on my pc. did nothing fancy just followed simple code.
Iteration will always be better. Recursion was taking to long for me as well.
Also used a math function for calculating the number of digits in a number instead of taking the number in a list and iterating through it. Saves a lot of time

Here is my very simple solution
list = [1,1,2]
for i in range(2,5000):
if len(str(list[i]+list[i-1])) == 1000:
print (i + 2)
break
else:
list.append(list[i]+list[i-1])
This is sort of a "rogue" way of doing it, but if you change the 1000 to any number except one, it gets it right.

You can use the datatype Decimal. This is a little bit slower but you will be able to have arbitrary precision.
So your code:
'''
What is the first term in the Fibonacci sequence to contain 1000 digits
'''
from Decimal import *
def fibonacci(n):
phi = (Decimal(1) + pow(Decimal(5), Decimal(0.5))) / 2 #Golden Ratio
return int((pow(phi, Decimal(n))) - pow(-phi, Decimal(-n)))/pow(Decimal(5), Decimal(0.5)))
n = 0
while len(str(fibonacci(n))) < 1000:
n += 1
print n

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.