Reduce time complexity of brute forcing - largest prime factor

Reduce time complexity of brute forcing - largest prime factor - python

I am writing a code to find the largest prime factor of a very large number.
Problem 3 of Project Euler :
What is the largest prime factor of the number 600851475143 ?
I coded it in C...but the data type long long int is not sufficient enough to hold the value .
Now, I have rewritten the code in Python. How can I reduce the time taken for execution (as it is taking a considerable amount of time)?
def isprime(b):
x=2
while x<=b/2:
if(b%x)==0:
return 0
x+=1
return 1
def lpf(a):
x=2
i=2
while i<=a/2:
if a%i==0:
if isprime(i)==1:
if i>x:
x=i
print(x)
i+=1
print("final answer"+x)
z=600851475143
lpf(z)

There are many possible algorithmic speed ups. Some basic ones might be:
First, if you are only interested in the largest prime factor, you should check for them from the largest possible ones, not smallest. So instead of looping from 2 to a/2 try to check from a downto 2.
You could load the database of primes instead of using isprime function (there are dozens of such files in the net)
Also, only odd numbers can be primes (except for 2) so you can "jump" 2 values in each iteration
Your isprime checker could also be speededup, you do not have to look for divisiors up to b/2, it is enough to check to sqrt(b), which reduces complexity from O(n) to O(sqrt(n)) (assuming that modulo operation is constant time).

You could use the 128 int provided by GCC: http://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html . This way, you can continue to use C and avoid having to optimize Python's speed. In addition, you can always add your own custom storage type to hold numbers bigger than long long in C.

I think you're checking too many numbers (incrementing by 1 and starting at 2 in each case). If you want to check is_prime by trial division, you need to divide by fewer numbers: only odd numbers to start (better yet, only primes). You can range over odd numbers in python the following way:
for x in range(3, some_limit, 2):
if some_number % x == 0:
etc.
In addition, once you have a list of primes, you should be able to run through that list backwards (because the question asks for highest prime factor) and test if any of those primes evenly divides into the number.
Lastly, people usually go up to the square-root of a number when checking trial division because anything past the square-root is not going to provide new information. Consider 100:
1 x 100
2 x 50
5 x 20
10 x 10
20 x 5
etc.
You can find all the important divisor information by just checking up to the square root of the number. This tip is useful both for testing primes and for testing where to start looking for a potential divisor for that huge number.

First off, your two while loops only need to go up to the sqrt(n) since you will have hit anything past that earlier (you then need to check a/i for primeness as well). In addition, if you find the lowest number that divides it, and the result of the division is prime, then you have found the largest.
First, correct your isprime function:
def isprime(b):
x=2
sqrtb = sqrt(b)
while x<=sqrtb:
if(b%x)==0:
return 0
x+=1
return 1
Then, your lpf:
def lpf(a):
x=2
i=2
sqrta = sqrt(a)
while i<=sqrt(a):
if a%i==0:
b = a//i # integer
if isprime(b):
return b
if isprime(i):
x=i
print(x)
i+=1
return x

Related

Why does this 'optimized' prime checker run at the same speed as the regular version?

Given this plain is_prime1 function which checks all the divisors from 1 to sqrt(p) with some bit-playing in order to avoid even numbers which are of-course not primes.
import time
def is_prime1(p):
if p & 1 == 0:
return False
# if the LSD is 5 then it is divisible by 5 (i.e. not a prime)
elif p % 10 == 5:
return False
for k in range(2, int(p ** 0.5) + 1):
if p % k == 0:
return False
return True
Versus this "optimized" version. The idea is to save all the primes we have found until a certain number p, then we iterate on the primes (using this basic arithmetic rule that every number is a product of primes) so we don't iterate through the numbers until sqrt(p) but over the primes we found which supposed to be a tiny bit compared to sqrt(p). We also iterate only on half the elements, because then the largest prime would most certainly won't "fit" in the number p.
import time
global mem
global lenMem
mem = [2]
lenMem = 1
def is_prime2(p):
global mem
global lenMem
# if p is even then the LSD is off
if p & 1 == 0:
return False
# if the LSD is 5 then it is divisible by 5 (i.e. not a prime)
elif p % 10 == 5:
return False
for div in mem[0: int(p ** 0.5) + 1]:
if p % div == 0:
return False
mem.append(p)
lenMem += 1
return True
The only idea I have in mind is that "global variables are expensive and time consuming" but I don't know if there is another way, and if there is, will it really help?
On average, when running this same program:
start = time.perf_counter()
for p in range(2, 100000):
print(f'{p} is a prime? {is_prime2(p)}') # change to is_prime1 or is_prime2
end = time.perf_counter()
I get that for is_prime1 the average time for checking the numbers 1-100K is ~0.99 seconds and so is_prime2 (maybe a difference of +0.01s on average, maybe as I said the usage of global variables ruin some performance?)

The difference is a combination of three things:
You're just not doing that much less work. Your test case includes testing a ton of small numbers, where the distinction between testing "all numbers from 2 to square root" and testing "all primes from 2 to square root" just isn't that much of a difference. Your "average case" is roughly the midpoint of the range, 50,000, square root of 223.6, which means testing 48 primes, or testing 222 numbers if the number is prime, but most numbers aren't prime, and most numbers have at least one small factor (proof left as exercise), so you short-circuit and don't actually test most of the numbers in either set (if there's a factor below 8, which applies to ~77% of all numbers, you've saved maybe two tests by limiting yourself to primes)
You're slicing mem every time, which is performed eagerly, and completely, even if you don't use all the values (and as noted, you almost never do for the non-primes). This isn't a huge cost, but then, you weren't getting huge savings from skipping non-primes, so it likely eats what little savings you got from the other optimization.
(You found this one, good show) Your slice of primes took a number of primes to test equal to the square root of number to test, not all primes less than the square root of the number to test. So you actually performed the same number of tests, just with different numbers (many of them primes larger than the square root that definitely don't need to be tested).
A side-note:
Your up-front tests aren't actually saving you much work; you redo both tests in the loop, so they're wasted effort when the number is prime (you test them both twice). And your test for divisibility by five is pointless; % 10 is no faster than % 5 (computers don't operate in base-10 anyway), and if not p % 5: is a slightly faster, more direct, and more complete (your test doesn't recognize multiples of 10, just multiples of 5 that aren't multiples of 10) way to test for divisibility.
The tests are also wrong, because they don't exclude the base case (they say 2 and 5 are not prime, because they're divisible by 2 and 5 respectively).

First of all, you should remove the print call, it is very time consuming.
You should just time your function, not the print function, so you could do it like this:
start = time.perf_counter()
for p in range(2, 100000):
## print(f'{p} is a prime? {is_prime2(p)}') # change to is_prime1 or is_prime2
is_prime1(p)
end = time.perf_counter()
print ("prime1", end-start)
start = time.perf_counter()
for p in range(2, 100000):
## print(f'{p} is a prime? {is_prime2(p)}') # change to is_prime1 or is_prime2
is_prime2(p)
end = time.perf_counter()
print ("prime2", end-start)
is_prime1 is still faster for me.

If you want to hold primes in global memory to accelerate multiple calls, you need to ensure that the primes list is properly populated even when the function is called with numbers in random order. The way is_prime2() stores and uses the primes assumes that, for example, it is called with 7 before being called with 343. If not, 343 will be treated as a prime because 7 is not yet in the primes list.
So the function must compute and store all primes up to √49 before it can respond to the is_prime(343) call.
In order to quickly build a primes list, the Sieve of Eratosthenes is one of the fastest method. But, since you don't know in advance how many primes you need, you can't allocate the sieve's bit flags in advance. What you can do is use a rolling window of the sieve to move forward by chunks (of let"s say 1000000 bits at a time). When a number beyond your maximum prime is requested, you just generate more primes chunk by chunk until you have enough to respond.
Also, since you're going to build a list of primes, you might as well make it a set and check if the requested number is in it to respond to the function call. This will require generating more primes than needed for divisions but, in the spirit of accelerating subsequent calls, that should not be an issue.
Here's an example of an isPrime() function that uses that approach:
primes = {3}
sieveMax = 3
sieveChunk = 1000000 # must be an even number
def isPrime(n):
if not n&1: return n==2
global primes,sieveMax, sieveChunk
while n>sieveMax:
base,sieveMax = sieveMax, sieveMax + sieveChunk
sieve = [True]* sieveChunk
for p in primes:
i = (p - base%p)%p
sieve[i::p]=[False]*len(sieve[i::p])
for i in range(0, sieveChunk,2):
if not sieve[i]: continue
p = i + base
primes.add(p)
sieve[i::p] = [False]*len(sieve[i::p])
return n in primes
On the first call to an unknown prime, it will perform slower than the divisions approach but as the prime list builds up, it will provide much better response time.

My python program won't execute or show anything in the terminal

So I was trying to solve a project Euler question that asks we shoud find the largest prime factor of 600851475143.
This is my code:
factors = [i for i in range(1,600851475144) if 600851475143%i is 0]
prime_factors = []
for num in factors:
factors_of_num = [i for i in range(1, num+1) if num%i is 0]
if factors_of_num == [1, num]:
prime_factors.append(num)
print(max(prime_factors))
The issue is that this code won't run for a large number like this. How can \i get this to work?

Your program is executing, but range(1, 600851475144) is just taking a rrrrrrrrrrrrrrrrrrrrrrrreally long time. There are much better ways to get prime factors instead of first checking each number individually whether it is a divisor and then checking which of those are primes.
First, for each pair of divisors p * q = n, either p or q has to be <= sqrt(n), so you'd in fact only have to check the numbers in range(1, 775147) to get one part of those pairs and get the other for free. This alone should be enough to make your program finish in time. But you'd still get all the divisors, and then have to check which of those are prime.
Next, you do not actually have to get all the prime factors of those divisors to determine whether those are prime: You can use any to stop as soon as you find the first non-primitive factor. And here, too, testing up to sqrt(num) is enough. (Also, you could start with the largest divisor, so you can stop the loop as soon as you find the first one that's prime.)
Alternatively, as soon as you find a divisor, divide the target number by that divisor until it can not be divided any more, then continue with the new, smaller target number and the next potential divisor. This way, all your divisors are guaranteed to be prime (otherwise the number would already have been reduced by its prime factors), and you will also need much fewer tests (unless the number itself is prime).

How many times would this algorithm run?

I have a function in python to calculate the prime factors of a number:
def prime_factors(n):
i = 2
factors = []
while i * i <= n:
if n % i:
i += 1
else:
n //= i
factors.append(i)
if n > 1:
factors.append(n)
return factors
My question is, how many times will the program run to stop on input n?
I can figure it out fairly easy if you take an example such as 56, but abstractly, I cannot seem to write a "formula" for how many times the program will run before having calculated all prime factors.

More of a mathematics question than a programming question, but kinda interesting...
Let the prime factorization of n be of the form:
Where P1 is the smallest prime factor, P2 is the 2nd smallest etc, and Pk is the largest prime factor.
There are two groups of loop iterations to think about:
number of loops that occur because i is a prime factor of n
number of loops that occur because i is not a prime factor of n
Group 1
This number is easier, because it's always the same formula. It's the sum of each of the exponents of the prime factors minus 1, i.e.:
If you think about how the code acts when i is a prime factor (i.e. keeps looping while dividing n by i), the sum of the exponents should be fairly intuitive. You subtract 1 because the last append occurs outside the loop.
Group 2
The second group of loops is slightly more tricky, because the formula itself is actually dependent on whether az > 1, and also on the relative size of Pk and P(k-1) (the second smallest prime factor of n).
If ak > 1, then the last i that went through the loop had the value Pk. So the non-prime factor values of i that were evaluated are all the numbers between 2 and Pk that are not in the set P1 through to Pk. The formula for this turns out to be:
Remember k is the number of distinct prime factors of n. You subtract 1 at the end because it's between 2 and Pk, not 1 and Pk.
On the other hand, let's think about the case where ak = 1 (and ak < 1 is impossible, because ak is the exponent on the largest prime factor - it's got to be 1 or bigger). Then i never took the value of Pk, as the loop ends before that point because of the while i * i <= n: condition.
So in that case, what's the maximum value that i will take? Well it depends.
Remember that if ak = 1, then the last value that n took was Pk. If the second largest prime factor P(k-1) is bigger than the square root of Pk, then once the loop finishes with i = P(k-1), it will immediately exit the loop - so the largest i is P(k-1). On the other hand, if the square root of Pk is bigger, then the loop will continue up to that level before exiting. So the biggest i when ak = 1 is equal to max(P(k-1), Pk**0.5). The formula for the number of loops here would look like:
which simplifies to:
Notice that we take the floor of the square root of Pk, because i will only take integer values.
One formula to...
You put it all together, and it yields:

The loop executes about max(prime_factors(n)) times?
If n is prime, then the worst case is n times.

If I understand your question and since you are not using recursive calls; you are asking "how many times the while loop will execute in order to generate all prime numbers given integer n".
In your code; you are running a while loop with the condition i * i <= n. But since you are manipulating the variable i inside your method the answer is complex(refer #dnozay's answer).
If you are interested read this link which tells how complex it is. Keep read on this link to get insight on prime-counting function.

Optimizing Prime Number Python Code

I'm relatively new to the python world, and the coding world in general, so I'm not really sure how to go about optimizing my python script. The script that I have is as follows:
import math
z = 1
x = 0
while z != 0:
x = x+1
if x == 500:
z = 0
calculated = open('Prime_Numbers.txt', 'r')
readlines = calculated.readlines()
calculated.close()
a = len(readlines)
b = readlines[(a-1)]
b = int(b) + 1
for num in range(b, (b+1000)):
prime = True
calculated = open('Prime_Numbers.txt', 'r')
for i in calculated:
i = int(i)
q = math.ceil(num/2)
if (q%i==0):
prime = False
if prime:
calculated.close()
writeto = open('Prime_Numbers.txt', 'a')
num = str(num)
writeto.write("\n" + num)
writeto.close()
print(num)
As some of you can probably guess I'm calculating prime numbers. The external file that it calls on contains all the prime numbers between 2 and 20.
The reason that I've got the while loop in there is that I wanted to be able to control how long it ran for.
If you have any suggestions for cutting out any clutter in there could you please respond and let me know, thanks.

Reading and writing to files is very, very slow compared to operations with integers. Your algorithm can be sped up 100-fold by just ripping out all the file I/O:
import itertools
primes = {2} # A set containing only 2
for n in itertools.count(3): # Start counting from 3, by 1
for prime in primes: # For every prime less than n
if n % prime == 0: # If it divides n
break # Then n is composite
else:
primes.add(n) # Otherwise, it is prime
print(n)
A much faster prime-generating algorithm would be a sieve. Here's the Sieve of Eratosthenes, in Python 3:
end = int(input('Generate primes up to: '))
numbers = {n: True for n in range(2, end)} # Assume every number is prime, and then
for n, is_prime in numbers.items(): # (Python 3 only)
if not is_prime:
continue # For every prime number
for i in range(n ** 2, end, n): # Cross off its multiples
numbers[i] = False
print(n)

It is very inefficient to keep storing and loading all primes from a file. In general file access is very slow. Instead save the primes to a list or deque. For this initialize calculated = deque() and then simply add new primes with calculated.append(num). At the same time output your primes with print(num) and pipe the result to a file.
When you found out that num is not a prime, you do not have to keep checking all the other divisors. So break from the inner loop:
if q%i == 0:
prime = False
break
You do not need to go through all previous primes to check for a new prime. Since each non-prime needs to factorize into two integers, at least one of the factors has to be smaller or equal sqrt(num). So limit your search to these divisors.
Also the first part of your code irritates me.
z = 1
x = 0
while z != 0:
x = x+1
if x == 500:
z = 0
This part seems to do the same as:
for x in range(500):
Also you limit with x to 500 primes, why don't you simply use a counter instead, that you increase if a prime is found and check for at the same time, breaking if the limit is reached? This would be more readable in my opinion.
In general you do not need to introduce a limit. You can simply abort the program at any point in time by hitting Ctrl+C.
However, as others already pointed out, your chosen algorithm will perform very poor for medium or large primes. There are more efficient algorithms to find prime numbers: https://en.wikipedia.org/wiki/Generating_primes, especially https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes.

You're writing a blank line to your file, which is making int() traceback. Also, I'm guessing you need to rstrip() off your newlines.
I'd suggest using two different files - one for initial values, and one for all values - initial and recently computed.
If you can keep your values in memory a while, that'd be a lot faster than going through a file repeatedly. But of course, this will limit the size of the primes you can compute, so for larger values you might return to the iterate-through-the-file method if you want.
For computing primes of modest size, a sieve is actually quite good, and worth a google.
When you get into larger primes, trial division by the first n primes is good, followed by m rounds of Miller-Rabin. If Miller-Rabin probabilistically indicates the number is probably a prime, then you do complete trial division or AKS or similar. Miller Rabin can say "This is probably a prime" or "this is definitely composite". AKS gives a definitive answer, but it's slower.
FWIW, I've got a bunch of prime-related code collected together at http://stromberg.dnsalias.org/~dstromberg/primes/

Trying to understand a solution to project Euler # 3

The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ? # http://projecteuler.net/problem=3
I have a deal going with myself that if I can't solve a project Euler problem I will understand the best solution I can find. I did write an algorithm which worked for smaller numbers but was too inefficient to work for bigger ones. So I googled Zach Denton's answer and started studying it.
Here is his code:
#!/usr/bin/env python
import math
def factorize(n):
res = []
# iterate over all even numbers first.
while n % 2 == 0:
res.append(2)
n //= 2
# try odd numbers up to sqrt(n)
limit = math.sqrt(n+1)
i = 3
while i <= limit:
if n % i == 0:
res.append(i)
n //= i
limit = math.sqrt(n+i)
else:
i += 2
if n != 1:
res.append(n)
return res
print max(factorize(600851475143))
Here are the bits I can't figure out for myself:
In the second while loop, why does he use a sqrt(n + 1) instead of just sqrt(n)?
Why wouldn't you also use sqrt(n + 1) when iterating over the even numbers in the first while loop?
How does the algorithm manage to find only prime factors? In the algorithm I first wrote I had a separate test for checking whether a factor was prime, but he doesn't bother.

I suspect the +1 has to do with the imprecision of float (I am not sure whether it's actually required, or is simply a defensive move on the author's part).
The first while loop factors all twos out of n. I don't see how sqrt(n + 1) would fit in there.
If you work from small factor to large factors, you automatically eliminate all composite candidates. Think about it: once you've factored out 5, you've automatically factored out 10, 15, 20 etc. No need to check whether they're prime or not: by that point n will not be divisible by them.
I suspect that checking for primality is what's killing your original algorithm's performance.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.