python prime factorization performance

python prime factorization performance - python

I'm relatively new to python and I'm confused about the performance of two relatively simple blocks of code. The first function generates a prime factorization of a number n given a list of primes. The second generates a list of all factors of n. I would have though prime_factor would be faster than factors (for the same n), but this is not the case. I'm not looking for better algorithms, but rather I would like to understand why prime_factor is so much slower than factors.
def prime_factor(n, primes):
prime_factors = []
i = 0
while n != 1:
if n % primes[i] == 0:
factor = primes[i]
prime_factors.append(factor)
n = n // factor
else: i += 1
return prime_factors
import math
def factors(n):
if n == 0: return []
factors = {1, n}
for i in range(2, math.floor(n ** (1/2)) + 1):
if n % i == 0:
factors.add(i)
factors.add(n // i)
return list(factors)
Using the timeit module,
{ i:factors(i) for i in range(1, 10000) } takes 2.5 seconds
{ i:prime_factor(i, primes) for i in range(1, 10000) } takes 17 seconds
This is surprising to me. factors checks every number from 1 to sqrt(n), while prime_factor only checks primes. I would appreciate any help in understanding the performance characteristics of these two functions.
Thanks
Edit: (response to roliu)
Here is my code to generate a list of primes from 2 to up_to:
def primes_up_to(up_to):
marked = [0] * up_to
value = 3
s = 2
primes = [2]
while value < up_to:
if marked[value] == 0:
primes.append(value)
i = value
while i < up_to:
marked[i] = 1
i += value
value += 2
return primes

Without seeing what you used for primes, we have to guess (we can't run your code).
But a big part of this is simply mathematics: there are (very roughly speaking) about n/log(n) primes less than n, and that's a lot bigger than sqrt(n). So when you pass a prime, prime_factor(n) does a lot more work: it goes through O(n/log(n)) operations before finding the first prime factor (n itself!), while factors(n) gives up after O(sqrt(n)) operations.
This can be very significant. For example, sqrt(10000) is just 100, but there are 1229 primes less than 10000. So prime_factor(n) can need to do over 10 times as much work to deal with the large primes in your range.

Related

Lucas-Lehmer primality test with python

I wrote the code below, to get the Lucas-Lehmer series up to p, for p the exponent of a Mersenne number. After checking it I found that it doesn't work for some primes p such as 11, 23, 29 etc. Any help will be very valuable!
Here's the code:
def ll_series (p):
ll_list=[4]
print 4
for i in range(1, p+1):
ll_list.append((ll_list[i-1]**2 - 2) % (2**p-1))
print(ll_list[i])
return ll_list

The Lucas-Lehmer test tests if a Mersenne number is prime. 11 is not a Mersenne number therefore the test fails.
Mersennse number is - M_n = 2^n-1.
http://mathworld.wolfram.com/MersenneNumber.html
p.s the implementation can be more efficient if you calculate M=(2^p-1) only once

Try this:
lucas_lehmer = [4]
def mersenne(n):
return (2 ** n) - 1
def ll(n):
global lucas_lehmer
if len(lucas_lehmer) < n:
for num in range(n-1):
lucas_lehmer.append(lucas_lehmer[-1] ** 2 - 2)
return lucas_lehmer[n-1]
def check_prime(n):
m = mersenne(n)
if ll(n - 1) % m == 0:
return m
else:
return -1
Things to note:
You need to call the check_prime function
The input to the check_prime function must be a smaller prime number
Not all prime numbers produce prime results in the Mersenne sequence.
If 2^n - 1 is not prime, it will return -1
I have also cleared the code up a bit and made it easier to read.

Optimizing Totient function

I'm trying to maximize the Euler Totient function on Python given it can use large arbitrary numbers. The problem is that the program gets killed after some time so it doesn't reach the desired ratio. I have thought of increasing the starting number into a larger number, but I don't think it's prudent to do so. I'm trying to get a number when divided by the totient gets higher than 10. Essentially I'm trying to find a sparsely totient number that fits this criteria.
Here's my phi function:
def phi(n):
amount = 0
for k in range(1, n + 1):
if fractions.gcd(n, k) == 1:
amount += 1
return amount

The most likely candidates for high ratios of N/phi(N) are products of prime numbers. If you're just looking for one number with a ratio > 10, then you can generate primes and only check the product of primes up to the point where you get the desired ratio
def totientRatio(maxN,ratio=10):
primes = []
primeProd = 1
isPrime = [1]*(maxN+1)
p = 2
while p*p<=maxN:
if isPrime[p]:
isPrime[p*p::p] = [0]*len(range(p*p,maxN+1,p))
primes.append(p)
primeProd *= p
tot = primeProd
for f in primes:
tot -= tot//f
if primeProd/tot >= ratio:
return primeProd,primeProd/tot,len(primes)
p += 1 + (p&1)
output:
totientRatio(10**6)
16516447045902521732188973253623425320896207954043566485360902980990824644545340710198976591011245999110,
10.00371973209101,
55
This gives you the smallest number with that ratio. Multiples of that number will have the same ratio.
n = 16516447045902521732188973253623425320896207954043566485360902980990824644545340710198976591011245999110
n*2/totient(n*2) = 10.00371973209101
n*11*13/totient(n*11*13) = 10.00371973209101
No number will have a higher ratio until you reach the next product of primes (i.e. that number multiplied by the next prime).
n*263/totient(n*263) = 10.041901868473037
Removing a prime from the product affects the ratio by a proportion of (1-1/P).
For example if m = n/109, then m/phi(m) = n/phi(n) * (1-1/109)
(n//109) / totient(n//109) = 9.91194248684247
10.00371973209101 * (1-1/109) = 9.91194248684247
This should allow you to navigate the ratios efficiently and find the numbers that meed your need.
For example, to get a number with a ratio that is >= 10 but closer to 10, you can go to the next prime product(s) and remove one or more of the smaller primes to reduce the ratio. This can be done using combinations (from itertools) and will allow you to find very specific ratios:
m = n*263/241
m/totient(m) = 10.000234225865265
m = n*(263...839) / (7 * 61 * 109 * 137) # 839 is 146th prime
m/totient(m) = 10.000000079805726

I have a partial solution for you, but the results don't look good.. (this solution may not give you an answer with modern computer hardware (amount of ram is limiting currently)) I took an answer from this pcg challenge and modified it to spit out ratios of n/phi(n) up to a particular n
import numba as nb
import numpy as np
import time
n = int(2**31)
#nb.njit("i4[:](i4[:])", locals=dict(
n=nb.int32, i=nb.int32, j=nb.int32, q=nb.int32, f=nb.int32))
def summarum(phi):
#calculate phi(i) for i: 1 - n
#taken from <a>https://codegolf.stackexchange.com/a/26753/42652</a>
phi[1] = 1
i = 2
while i < n:
if phi[i] == 0:
phi[i] = i - 1
j = 2
while j * i < n:
if phi[j] != 0:
q = j
f = i - 1
while q % i == 0:
f *= i
q //= i
phi[i * j] = f * phi[q]
j += 1
i += 1
#divide each by n to get ratio n/phi(n)
i = 1
while i < n: #jit compiled while loop is faster than: for i in range(): blah blah blah
phi[i] = i//phi[i]
i += 1
return phi
if __name__ == "__main__":
s1 = time.time()
a = summarum(np.zeros(n, np.int32))
locations = np.where(a >= 10)
print(len(locations))
I only have enough ram on my work comp. to test about 0 < n < 10^8 and the largest ratio was about 6. You may or may not have any luck going up to larger n, although 10^8 already took several seconds (not sure what the overhead was... spyder's been acting strange lately)

p55# is a sparsely totient number satisfying the desired condition.
Furthermore, all subsequent primorial numbers are as well, because pn# / phi(pn#) is a strictly increasing sequence:
p1# / phi(p1#) is 2, which is positive. For n > 1, pn# / phi(pn#) is equal to pn-1#pn / phi(pn-1#pn), which, since pn and pn-1# are coprime, is equal to (pn-1# / phi(pn-1#)) * (pn/phi(pn)). We know pn > phi(pn) > 0 for all n, so pn/phi(pn) > 1. So we have that the sequence pn# / phi(pn#) is strictly increasing.
I do not believe these to be the only sparsely totient numbers satisfying your request, but I don't have an efficient way of generating the others coming to mind. Generating primorials, by comparison, amounts to generating the first n primes and multiplying the list together (whether by using functools.reduce(), math.prod() in 3.8+, or ye old for loop).
As for the general question of writing a phi(n) function, I would probably first find the prime factors of n, then use Euler's product formula for phi(n). As an aside, make sure to NOT use floating-point division. Even finding the prime factors of n by trial division should outperform computing gcd n times, but when working with large n, replacing this with an efficient prime factorization algorithm will pay dividends. Unless you want a good cross to die on, don't write your own. There's one in sympy that I'm aware of, and given the ubiquity of the problem, probably plenty of others around. Time as needed.
Speaking of timing, if this is still relevant enough to you (or a future reader) to want to time... definitely throw the previous answer in the mix as well.

An Explanation for the totient finder in this program

I need an explanation for the program suggested in the edit in the first answer over here. It is a program to find the totients of a range of numbers. Can somebody provide a simple explanation? (Ignore the summation part for now, I need to find out how the init method finds the totients.) I know there is an explanation in the answer, but that is an explanation for different programs, I need an explanation for this particular one.
class Totient:
def __init__(self, n):
self.totients = [1 for i in range(n)]
for i in range(2, n):
if self.totients[i] == 1:
for j in range(i, n, i):
self.totients[j] *= i - 1
k = j / i
while k % i == 0:
self.totients[j] *= i
k /= i
def __call__(self, i):
return self.totients[i]
if __name__ == '__main__':
from itertools import imap
totient = Totient(10000)
print sum(imap(totient, range(10000)))

It's a variant of the Sieve of Eratosthenes for finding prime numbers.
If you want to know the totient of a single number n, the best way to find it is to factor n and take the product of 1 less than each factor; for instance, 30 = 2 * 3 * 5, and subtracting 1 from each factor, then multiplying, gives the totient 1 * 2 * 4 = 8. But if you want to find the totients of all the numbers less than a given n, a better approach than factoring each of them is sieving. The idea is simple: Set up an array X from 0 to n, store i in each Xi, then run through the array starting from 0 and whenever Xi = i loop over the multiples of i, multiplying each by i − 1.
Further discussion and code at my blog.

I'm not completely sure what the code is doing -- but frankly it looks pretty bad. It clearly is trying to use that Euler's totient function is multiplicative, meaning that a,b are relatively prime then t(a,b) = t(a)*t(b), together with the fact that if p is a prime then t(p) = p-1. But -- it seems to be using crude trial division to determine such things. If you really want to calculate the totient of all numbers in a given range then you should use an algorithm that sieves the numbers as you go along.
Here is a version which sieves as it goes and exploits the multiplicative nature to the hilt. At each pass through the main loop it starts with a prime, p which hasn't yet been processed. It determines all powers of p <= n and then uses a direct formula for these powers (see https://en.wikipedia.org/wiki/Euler%27s_totient_function ). Once these totients have been added, it forms all possible products <= n of these powers and the numbers for which the totients have been previously computed. This gives a whole slew of numbers to add to the list of previously determined numbers. At most sqrt(n) passes need to be made through the main loop. It runs almost instantly for n = 10000. It returns a list where the ith value is the totient of i (with t(0) = 0 for convenience):
def allTotients(n):
totients = [None]*(n+1) #totients[i] will contain the t(i)
totients[0] = 0
totients[1] = 1
knownTotients = [] #known in range 2 to n
p = 2
while len(knownTotients) < n - 1:
powers = [p]
k = 2
while p ** k <= n:
powers.append(p ** k)
k +=1
totients[p] = p - 1
for i in range(1,len(powers)):
totients[powers[i]] = powers[i] - powers[i-1]
#at this stage powers represent newly discovered totients
#combine with previously discovered totients to get still more
newTotients = powers[:]
for m in knownTotients:
for pk in powers:
if m*pk > n: break
totients[m*pk] = totients[m]*totients[pk]
newTotients.append(m*pk)
knownTotients.extend(newTotients)
#if there are any unkown totients -- the smallest such will be prime
if len(knownTotients) < n-1:
p = totients.index(None)
return totients
For completeness sake, here is a Python implementation of the algorithm to compute the totient of a single number which user448810 described in their answer:
from math import sqrt
#crude factoring algorithm:
small_primes = [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,
53,59,61,67,71,73,79,83,89,97]
def factor(n):
#returns a list of prime factors
factors = []
num = n
#first pull out small prime factors
for p in small_primes:
while num % p == 0:
factors.append(p)
num = num // p
if num == 1: return factors
#now do trial division, starting at 101
k = 101
while k <= sqrt(num):
while num % k == 0:
factors.append(k)
num = num // k
k += 2
if num == 1:
return factors
else:
factors.append(num)
return factors
def totient(n):
factors = factor(n)
unique_factors = set()
t = 1
for p in factors:
if p in unique_factors:
t *= p
else:
unique_factors.add(p)
t *= (p-1)
return t

Euler Project 47 finding all numbers contaning 4 factors

I first create a list of 10^6 false values and what I want to do is to iterate True over the interval for all numbers containing 4 distinct prime factors.
What that means is that the number 2 * 2 * 3 * 5 * 7 is a number containing 4 distinct prime numbers.
I really have no clue how to create the numbers, I don't even know how to think of the it. I want to have 4 different kinds of numbers but in all possible different amounts. Code as far:
""" Pre do prime list """
sieve = [True] * 1000
sieve[0] = sieve[1] = False
def primes(sieve, x):
for i in range(x+x, len(sieve), x):
sieve[i] = False
for x in range(2, int(len(sieve) ** 0.5) + 1):
primes(sieve, x)
PRIMES = list((x for x in range(2, len(sieve)) if sieve[x]))
""" Main """
Numbers = [False] * 10 ** 6
Factors = PRIMES[0] * PRIMES[1] * PRIMES[2] * PRIMES[3]
Numbers[Factors] = True
for prime in PRIMES:
for prime in PRIMES[1:]:
for prime in PRIMES[2:]:
for prime in PRIMES[3:]:

I think the easiest way is to keep track of how many prime factors you have found for each number. You can perform the Sieve of Eratosthenes, but instead of marking multiples of a prime as composite, increment the count of primes dividing them. Make sure that you use an unoptimized loop: Once you choose the prime p, increment the count of primes dividing p, 2*p, 3*p etc. instead of marking p^2, p^2+2*p, etc. composite.
Another possibility is to record the smallest prime factor of each number as you perform the Sieve of Eratosthenes. This lets you find the prime factorization recursively, and you can check which of these have exactly 4 prime factors.

could you do it the other way round: get a list of primes up to root(N) then generate products that are less than N. Something like:
res = {}
for i in range(n):
for j in range(i,n):
for k in range(j,n):
for m in range(k,n):
prod = p[i] * p[j] * p[k] * p[m]
if prod < N:
res[prod] = [p[i], p[j], p[k], p[m]]
ed. just noticed distinct so you would have to put each p[] ** u and iterate each over a suitable number with another four nested loops! Probably still faster to do it this way round.
PPS after a little thought, the above method will be significantly slower than just using the modified sieve as suggested by Douglas Zare. By the time I get to 10**6 my first suggestion takes minutes but the modified sieve is less than 10s
class Numb(object):
def __init__(self):
self.is_prime = True
self.pf = []
def tick(self, factor):
self.is_prime = False
self.pf.append(factor)
N = 1000000
sieve = [Numb() for i in range(N)]
sieve[0].is_prime = sieve[1].is_prime = False
def primes(sieve, x):
for i in range(x + x, len(sieve), x):
sieve[i].tick(x)
for x in range(2, int(len(sieve) ** 0.5) + 1):
if sieve[x].is_prime:
primes(sieve, x)

I continued doing something like this, before trying out sieve method.. And in the end realising that I should use an search algorithm that finds two odd number with 4 distinct factors which differ by 2 and try the number in-between and the number before and after. if this condition is meet the problem is solved.
It's actually falls back on the same problem as stated in post but through some number-theory magic reduces to just finding numbers with no multiples of factors where question conditions is meet.
Factors = list([0, 0, 1, 1, 2, 1, 2, 1, 3, 2]) + [0] * 5 * 10 ** 5
for prime in PRIMES:
Factors[prime] = 1
for number in range(10, 5 * 10 ** 5):
if Factors[number] == 1:
continue
for prime in PRIMES:
if number % prime == 0:
Factors[number] = Factors[prime] + Factors[number // prime]
break

Optimalization of the primes finding function

After 10 minutes of work I have written a function presented below. It returns a list of all primes lower than an argument. I have used all known for me programing and mathematical tricks in order to make this function as fast as possible. To find all the primes lower than a million it takes about 2 seconds.
Do you see any possibilities to optimize it even further? Any ideas?
def Primes(To):
if To<2:
return []
if To<3:
return [2]
Found=[2]
n=3
LastSqr=0
while n<=To:
k=0
Limit=len(Found)
IsPrime=True
while k<Limit:
if k>=LastSqr:
if Found[k]>pow(n,0.5):
LastSqr=k
break
if n%Found[k]==0:
IsPrime=False
break
k+=1
if IsPrime:
Found.append(n)
n+=1
return Found

You can use a couple tricks to speed things up, using the basic sieve of erastothenes. One is to use Wheel Factorization to skip calculating numbers that are known not to be prime. For example, besides 2 and 3, all primes are congruent to 1 or 5 mod 6. This means you don't have to process 4 of every 6 numbers at all.
At the next level, all primes are congruent to 1, 7, 11, 13, 17, 19, 23, or 29, mod 30. You can throw out 22 of every 30 numbers.
Here is a simple implementation of the sieve of Erastothenes that doesn't calculate or store even numbers:
def basic_gen_primes(n):
"""Return a list of all primes less then or equal to n"""
if n < 2:
return []
# The sieve. Each entry i represents (2i + 1)
size = (n + 1) // 2
sieve = [True] * size
# 2(0) + 1 == 1 is not prime
sieve[0] = False
for i, value in enumerate(sieve):
if not value:
continue
p = 2*i + 1
# p is prime. Remove all of its multiples from the sieve
# p^2 == (2i + 1)(2i + 1) == (4i^2 + 4i + 1) == 2(2i^2 + 2i) + 1
multiple = 2 * i * i + 2 * i
if multiple >= size:
break
while multiple < size:
sieve[multiple] = False
multiple += p
return [2] + [2*i+1 for i, value in enumerate(sieve) if value]
As mentioned, you can use more exotic sieves as well.

You can check only odd numbers. So why don't you use n+=2 instead of n+=1?

google and wikipedia for better algorithms. If you are only looking for small primes this might be fast enough. But the real algorithms are a lot faster for large primes.
http://en.wikipedia.org/wiki/Quadratic_sieve
start with that page.
Increment n by two instead of one. ?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python prime factorization performance - python

Related

Lucas-Lehmer primality test with python

Optimizing Totient function

An Explanation for the totient finder in this program

Euler Project 47 finding all numbers contaning 4 factors

Optimalization of the primes finding function

Categories

Resources