Computing Eulers Totient Function - python

I am trying to find an efficient way to compute Euler's totient function.
What is wrong with this code? It doesn't seem to be working.
def isPrime(a):
return not ( a < 2 or any(a % i == 0 for i in range(2, int(a ** 0.5) + 1)))
def phi(n):
y = 1
for i in range(2,n+1):
if isPrime(i) is True and n % i == 0 is True:
y = y * (1 - 1/i)
else:
continue
return int(y)

Here's a much faster, working way, based on this description on Wikipedia:
Thus if n is a positive integer, then φ(n) is the number of integers k in the range 1 ≤ k ≤ n for which gcd(n, k) = 1.
I'm not saying this is the fastest or cleanest, but it works.
from math import gcd
def phi(n):
amount = 0
for k in range(1, n + 1):
if gcd(n, k) == 1:
amount += 1
return amount

You have three different problems...
y needs to be equal to n as initial value, not 1
As some have mentioned in the comments, don't use integer division
n % i == 0 is True isn't doing what you think because of Python chaining the comparisons! Even if n % i equals 0 then 0 == 0 is True BUT 0 is True is False! Use parens or just get rid of comparing to True since that isn't necessary anyway.
Fixing those problems,
def phi(n):
y = n
for i in range(2,n+1):
if isPrime(i) and n % i == 0:
y *= 1 - 1.0/i
return int(y)

Calculating gcd for every pair in range is not efficient and does not scales. You don't need to iterate throught all the range, if n is not a prime you can check for prime factors up to its square root, refer to https://stackoverflow.com/a/5811176/3393095.
We must then update phi for every prime by phi = phi*(1 - 1/prime).
def totatives(n):
phi = int(n > 1 and n)
for p in range(2, int(n ** .5) + 1):
if not n % p:
phi -= phi // p
while not n % p:
n //= p
#if n is > 1 it means it is prime
if n > 1: phi -= phi // n
return phi

I'm working on a cryptographic library in python and this is what i'm using. gcd() is Euclid's method for calculating greatest common divisor, and phi() is the totient function.
def gcd(a, b):
while b:
a, b=b, a%b
return a
def phi(a):
b=a-1
c=0
while b:
if not gcd(a,b)-1:
c+=1
b-=1
return c

Most implementations mentioned by other users rely on calling a gcd() or isPrime() function. In the case you are going to use the phi() function many times, it pays of to calculated these values before hand. A way of doing this is by using a so called sieve algorithm.
https://stackoverflow.com/a/18997575/7217653 This answer on stackoverflow provides us with a fast way of finding all primes below a given number.
Oke, now we can replace isPrime() with a search in our array.
Now the actual phi function:
Wikipedia gives us a clear example: https://en.wikipedia.org/wiki/Euler%27s_totient_function#Example
phi(36) = phi(2^2 * 3^2) = 36 * (1- 1/2) * (1- 1/3) = 30 * 1/2 * 2/3 = 12
In words, this says that the distinct prime factors of 36 are 2 and 3; half of the thirty-six integers from 1 to 36 are divisible by 2, leaving eighteen; a third of those are divisible by 3, leaving twelve numbers that are coprime to 36. And indeed there are twelve positive integers that are coprime with 36 and lower than 36: 1, 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, and 35.
TL;DR
With other words: We have to find all the prime factors of our number and then multiply these prime factors together using foreach prime_factor: n *= 1 - 1/prime_factor.
import math
MAX = 10**5
# CREDIT TO https://stackoverflow.com/a/18997575/7217653
def sieve_for_primes_to(n):
size = n//2
sieve = [1]*size
limit = int(n**0.5)
for i in range(1,limit):
if sieve[i]:
val = 2*i+1
tmp = ((size-1) - i)//val
sieve[i+val::val] = [0]*tmp
return [2] + [i*2+1 for i, v in enumerate(sieve) if v and i>0]
PRIMES = sieve_for_primes_to(MAX)
print("Primes generated")
def phi(n):
original_n = n
prime_factors = []
prime_index = 0
while n > 1: # As long as there are more factors to be found
p = PRIMES[prime_index]
if (n % p == 0): # is this prime a factor?
prime_factors.append(p)
while math.ceil(n / p) == math.floor(n / p): # as long as we can devide our current number by this factor and it gives back a integer remove it
n = n // p
prime_index += 1
for v in prime_factors: # Now we have the prime factors, we do the same calculation as wikipedia
original_n *= 1 - (1/v)
return int(original_n)
print(phi(36)) # = phi(2**2 * 3**2) = 36 * (1- 1/2) * (1- 1/3) = 36 * 1/2 * 2/3 = 12

It looks like you're trying to use Euler's product formula, but you're not calculating the number of primes which divide a. You're calculating the number of elements relatively prime to a.
In addition, since 1 and i are both integers, so is the division, in this case you always get 0.

With regards to efficiency, I haven't noticed anyone mention that gcd(k,n)=gcd(n-k,n). Using this fact can save roughly half the work needed for the methods involving the use of the gcd. Just start the count with 2 (because 1/n and (n-1)/k will always be irreducible) and add 2 each time the gcd is one.

Here is a shorter implementation of orlp's answer.
from math import gcd
def phi(n): return sum([gcd(n, k)==1 for k in range(1, n+1)])
As others have already mentioned it leaves room for performance optimization.

Actually to calculate phi(any number say n)
We use the Formula
where p are the prime factors of n.
So, you have few mistakes in your code:
1.y should be equal to n
2. For 1/i actually 1 and i both are integers so their evaluation will also be an integer,thus it will lead to wrong results.
Here is the code with required corrections.
def phi(n):
y = n
for i in range(2,n+1):
if isPrime(i) and n % i == 0 :
y -= y/i
else:
continue
return int(y)

Related

Errors in Directly vs Recursively Calculating a given Fibonacci Number

I was bored at work and was playing with some math and python coding, when I noticed the following:
Recursively (or if using a for loop) you simply add integers together to get a given Fibonacci number. However there is also a direct equation for calculating Fibonacci numbers, and for large n this equation will give answers that are, frankly, quite wrong with respect to the recursively calculated Fibonacci number.
I imagine this is due to rounding and floating point arithmetic ( sqrt(5) is irrational after all), and if so can anyone point me into a direction on how I could modify the fibo_calc_direct function to return a more accurate result?
Thanks!
def fib_calc_recur(n, ii = 0, jj = 1):
#n is the index of the nth fibonacci number, F_n, where F_0 = 0, F_1 = 1, ...
if n == 0: #use recursion
return ii
if n == 1:
return jj
else:
return(fib_calc_recur(n -1, jj, ii + jj))
def fib_calc_direct(n):
a = (1 + np.sqrt(5))/2
b = (1 - np.sqrt(5))/2
f = (1/np.sqrt(5)) * (a**n - b**n)
return(f)
You could make use of Decimal numbers, and set its precision depending on the magninute of n
Not your question, but I'd use an iterative version of the addition method. Here is a script that makes both calculations (naive addition, direct with Decimal) for values of n up to 4000:
def fib_calc_iter(n):
a, b = 0, 1
if n < 2:
return n
for _ in range(1, n):
a, b = b, a + b
return b
from decimal import Decimal, getcontext
def fib_calc_decimal(n):
getcontext().prec = n // 4 + 3 # Choose a precision good enough for this n
sqrt5 = Decimal(5).sqrt()
da = (1 + sqrt5) / 2
db = (1 - sqrt5) / 2
f = (da**n - db**n) / sqrt5
return int(f + Decimal(0.5)) # Round to nearest int
# Test it...
for n in range(1, 4000):
x = fib_calc_iter(n)
y = fib_calc_decimal(n)
if x != y:
print(f"Difference found for n={n}.\nNaive method={x}.\nDecimal method={y}")
break
else:
print("No differences found")

Trying to define one of Euler's approximations to pi, getting unsupported operand type(s) for 'list and 'int'

I am trying to define a function which will approximate pi in python using one of Euler's methods. His formula is as follows:
My code so far is this:
def pi_euler1(n):
numerator = list(range(2 , n))
for i in numerator:
j = 2
while i * j <= numerator[-1]:
if i * j in numerator:
numerator.remove(i * j)
j += 1
for k in numerator:
if (k + 1) % 4 == 0:
denominator = k + 1
else:
denominator = k - 1
#Because all primes are odd, both numbers inbetween them are divisible by 2,
#and by extension 1 of the 2 numbers is divisible by 4
term = numerator / denominator
I know this is wrong, and also incomplete. I'm just not quite sure what the TypeError that I mentioned earlier actually means. I'm just quite stuck with it, I want to create a list of the terms and then find their products. Am I on the right lines?
Update:
I have worked ways around this, fixing the clearly obvious errors that were prevalent thanks to msconi and Johanc, now with the following code:
import math
def pi_euler1(n):
numerator = list(range(2 , 13 + math.ceil(n*(math.log(n)+math.log(math.log(n))))))
denominator=[]
for i in numerator:
j = 2
while i * j <= numerator[-1]:
if (i * j) in numerator:
numerator.remove(i * j)
j += 1
numerator.remove(2)
for k in numerator:
if (k + 1) % 4 == 0:
denominator.append(k+1)
else:
denominator.append(k-1)
a=1
for i in range(n):
a *= numerator[i] / denominator[i]
return 4*a
This seems to work, when I tried to plot a graph of the errors from pi in a semilogy axes scale, I was getting a domain error, but i needed to change the upper bound of the range to n+1 because log(0) is undefined. Thank you guys
Here is the code with some small modifications to get it working:
import math
def pi_euler1(n):
lim = n * n + 4
numerator = list(range(3, lim, 2))
for i in numerator:
j = 3
while i * j <= numerator[-1]:
if i * j in numerator:
numerator.remove(i * j)
j += 2
euler_product = 1
for k in numerator[:n]:
if (k + 1) % 4 == 0:
denominator = k + 1
else:
denominator = k - 1
factor = k / denominator
euler_product *= factor
return euler_product * 4
print(pi_euler1(3))
print(pi_euler1(10000))
print(math.pi)
Output:
3.28125
3.148427801913721
3.141592653589793
Remarks:
You only want the odd primes, so you can start with a list of odd numbers.
j can start with 3 and increment in steps of 2. In fact, j can start at i because all the multiples of i smaller than i*i are already removed earlier.
In general it is very bad practise to remove elements from the list over which you are iterating. See e.g. this post. Internally, Python uses an index into the list over which it iterates. Coincidently, this is not a problem in this specific case, because only numbers larger than the current are removed.
Also, removing elements from a very long list is very slow, as each time the complete list needs to be moved to fill the gap. Therefore, it is better to work with two separate lists.
You didn't calculate the resulting product, nor did you return it.
As you notice, this formula converges very slowly.
As mentioned in the comments, the previous version interpreted n as the limit for highest prime, while in fact n should be the number of primes. I adapted the code to rectify that. In the above version with a crude limit; the version below tries a tighter approximation for the limit.
Here is a reworked version, without removing from the list you're iterating. Instead of removing elements, it just marks them. This is much faster, so a larger n can be used in a reasonable time:
import math
def pi_euler_v3(n):
if n < 3:
lim = 6
else:
lim = n*n
while lim / math.log(lim) / 2 > n:
lim //= 2
print(n, lim)
numerator = list(range(3, lim, 2))
odd_primes = []
for i in numerator:
if i is not None:
odd_primes.append(i)
if len(odd_primes) >= n:
break
j = i
while i * j < lim:
numerator[(i*j-3) // 2] = None
j += 2
if len(odd_primes) != n:
print(f"Wrong limit calculation, only {len(odd_primes)} primes instead of {n}")
euler_product = 1
for k in odd_primes:
denominator = k + 1 if k % 4 == 3 else k - 1
euler_product *= k / denominator
return euler_product * 4
print(pi_euler_v2(100000))
print(math.pi)
Output:
3.141752253548891
3.141592653589793
In term = numerator / denominator you are dividing a list by a number, which doesn't make sense. Divide k by the denominator in the loop in order to use the numerator element for each of the equation's factors one by one. Then you could multiply them repeatedly to the term term *= i / denominator, which you initialize in the beginning as term = 1.
Another issue is the first loop, which won't give you the first n prime numbers. For example, for n=3, list(range(2 , n)) = [2]. Therefore, the only prime you will get is 2.

Python 3 Project Euler Run Time

This is my solution to the Project Euler Problem 3:
def max_prime(x):
for i in range(2,x+1):
if x%i == 0:
a = i
x = x/i
return a
max_prime(600851475143)
It takes too much time to run. What's the problem?
There are several problems with your code:
If you're using Python 3.x, use // for integer division instead of / (which will return a float).
You solution doesn't account for the multiplicity of the prime factor. Take 24, whose factorization is 2*2*2*3. You need to divide x by 2 three times before trying the next number.
You don't need to try all the values up to the initial value of x. You can stop once x has reached 1 (you know you have reached the highest divisor at this point).
Once you solve these three problems, your solution will work fine.
==> projecteuler3.py
import eulerlib
def compute():
n = 600851475143
while True:
p = smallest_prime_factor(n)
if p < n:
n //= p
else:
return str(n)
# Returns the smallest factor of n, which is in the range [2, n]. The result is always prime.
def smallest_prime_factor(n):
assert n >= 2
for i in range(2, eulerlib.sqrt(n) + 1):
if n % i == 0:
return i
return n # n itself is prime
if __name__ == "__main__":
print(compute())
Your solution is trying to iterate up to 600851475143, which isn't necessary. You only need to iterate up to the square root of the largest prime factor.
from math import sqrt
def max_prime_factor(x):
i = 2
while i ** 2 <= x:
while x % i == 0: # factor out ALL multiples of i
x //= i
i += 1
return x
print(max_prime_factor(600851475143))

An Explanation for the totient finder in this program

I need an explanation for the program suggested in the edit in the first answer over here. It is a program to find the totients of a range of numbers. Can somebody provide a simple explanation? (Ignore the summation part for now, I need to find out how the init method finds the totients.) I know there is an explanation in the answer, but that is an explanation for different programs, I need an explanation for this particular one.
class Totient:
def __init__(self, n):
self.totients = [1 for i in range(n)]
for i in range(2, n):
if self.totients[i] == 1:
for j in range(i, n, i):
self.totients[j] *= i - 1
k = j / i
while k % i == 0:
self.totients[j] *= i
k /= i
def __call__(self, i):
return self.totients[i]
if __name__ == '__main__':
from itertools import imap
totient = Totient(10000)
print sum(imap(totient, range(10000)))
It's a variant of the Sieve of Eratosthenes for finding prime numbers.
If you want to know the totient of a single number n, the best way to find it is to factor n and take the product of 1 less than each factor; for instance, 30 = 2 * 3 * 5, and subtracting 1 from each factor, then multiplying, gives the totient 1 * 2 * 4 = 8. But if you want to find the totients of all the numbers less than a given n, a better approach than factoring each of them is sieving. The idea is simple: Set up an array X from 0 to n, store i in each Xi, then run through the array starting from 0 and whenever Xi = i loop over the multiples of i, multiplying each by i − 1.
Further discussion and code at my blog.
I'm not completely sure what the code is doing -- but frankly it looks pretty bad. It clearly is trying to use that Euler's totient function is multiplicative, meaning that a,b are relatively prime then t(a,b) = t(a)*t(b), together with the fact that if p is a prime then t(p) = p-1. But -- it seems to be using crude trial division to determine such things. If you really want to calculate the totient of all numbers in a given range then you should use an algorithm that sieves the numbers as you go along.
Here is a version which sieves as it goes and exploits the multiplicative nature to the hilt. At each pass through the main loop it starts with a prime, p which hasn't yet been processed. It determines all powers of p <= n and then uses a direct formula for these powers (see https://en.wikipedia.org/wiki/Euler%27s_totient_function ). Once these totients have been added, it forms all possible products <= n of these powers and the numbers for which the totients have been previously computed. This gives a whole slew of numbers to add to the list of previously determined numbers. At most sqrt(n) passes need to be made through the main loop. It runs almost instantly for n = 10000. It returns a list where the ith value is the totient of i (with t(0) = 0 for convenience):
def allTotients(n):
totients = [None]*(n+1) #totients[i] will contain the t(i)
totients[0] = 0
totients[1] = 1
knownTotients = [] #known in range 2 to n
p = 2
while len(knownTotients) < n - 1:
powers = [p]
k = 2
while p ** k <= n:
powers.append(p ** k)
k +=1
totients[p] = p - 1
for i in range(1,len(powers)):
totients[powers[i]] = powers[i] - powers[i-1]
#at this stage powers represent newly discovered totients
#combine with previously discovered totients to get still more
newTotients = powers[:]
for m in knownTotients:
for pk in powers:
if m*pk > n: break
totients[m*pk] = totients[m]*totients[pk]
newTotients.append(m*pk)
knownTotients.extend(newTotients)
#if there are any unkown totients -- the smallest such will be prime
if len(knownTotients) < n-1:
p = totients.index(None)
return totients
For completeness sake, here is a Python implementation of the algorithm to compute the totient of a single number which user448810 described in their answer:
from math import sqrt
#crude factoring algorithm:
small_primes = [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,
53,59,61,67,71,73,79,83,89,97]
def factor(n):
#returns a list of prime factors
factors = []
num = n
#first pull out small prime factors
for p in small_primes:
while num % p == 0:
factors.append(p)
num = num // p
if num == 1: return factors
#now do trial division, starting at 101
k = 101
while k <= sqrt(num):
while num % k == 0:
factors.append(k)
num = num // k
k += 2
if num == 1:
return factors
else:
factors.append(num)
return factors
def totient(n):
factors = factor(n)
unique_factors = set()
t = 1
for p in factors:
if p in unique_factors:
t *= p
else:
unique_factors.add(p)
t *= (p-1)
return t

Can someone explain to me this part of Dixon's factorization algorithm?

I've been trying to implement Dixon's factorization method in python, and I'm a bit confused. I know that you need to give some bound B and some number N and search for numbers between sqrtN and N whose squares are B-smooth, meaning all their factors are in the set of primes less than or equal to B. My question is, given N of a certain size, what determines B so that the algorithm will produce non-trivial factors of N? Here is a wikipedia article about the algorithm, and if it helps, here is my code for my implementation:
def factor(N, B):
def isBsmooth(n, b):
factors = []
for i in b:
while n % i == 0:
n = int(n / i)
if not i in factors:
factors.append(i)
if n == 1 and factors == b:
return True
return False
factor1 = 1
while factor1 == 1 or factor1 == N:
Bsmooth = []
BsmoothMod = []
for i in range(int(N ** 0.5), N):
if len(Bsmooth) < 2 and isBsmooth(i ** 2 % N, B):
Bsmooth.append(i)
BsmoothMod.append(i ** 2 % N)
gcd1 = (Bsmooth[0] * Bsmooth[1]) % N
gcd2 = int((BsmoothMod[0] * BsmoothMod[1]) ** 0.5)
factor1 = gcd(gcd1 - gcd2, N)
factor2 = int(N / factor1)
return (factor1, factor2)
Maybe someone could help clean my code up a bit, too? It seems very inefficient.
This article discusses the optimal size for B: https://web.archive.org/web/20160205002504/https://vmonaco.com/dixons-algorithm-and-the-quadratic-sieve/. Briefly, the optimal value is thought to be exp((logN loglogN)^(1/2)).
[ I wrote this for a different purpose, but you might find it interesting. ]
Given x2 ≡ y2 (mod n) with x ≠ ± y, about half the time gcd(x−y, n) is a factor of n. This congruence of squares, observed by Maurice Kraitchik in the 1920s, is the basis for several factoring methods. One of those methods, due to John Dixon, is important in theory because its sub-exponential run time can be proven, though it is too slow to be useful in practice.
Dixon's method begins by choosing a bound b &approx; e√(log n log log n) and identifying the factor base of all primes less than b that are quadratic residues of n (their jacobi symbol is 1).
function factorBase(n, b)
fb := [2]
for p in tail(primes(b))
if jacobi(n, p) == 1
append p to fb
return fb
Then repeatedly choose an integer r on the range 1 < r < n, calculate its square modulo n, and if the square is smooth over the factor base add it to a list of relations, stopping when there are more relations than factors in the factor base, plus a small reserve for those cases that fail. The idea is to identify a set of relations, using linear algebra, where the factor base primes combine to form a square. Then take the square root of the product of all the factor base primes in the relations, take the product of the related r, and calculate the gcd to identify the factor.
struct rel(x, ys)
function dixon(n, fb, count)
r, rels := floor(sqrt(n)), []
while count > 0
fs := smooth((r * r) % n, fb)
if fs is not null
append rel(r, fs) to rels
count := count - 1
r := r + 1
return rels
A number n is smooth if all its factors are in the factor base, which is determined by trial division; the smooth function returns a list of factors, which is null if n doesn't completely factor over the factor base.
function smooth(n, fb)
fs := []
for f in fb
while n % f == 0
append f to fs
n := n / f
if n == 1 return fs
return []
A factor is determined by submitting the accumulated relations to the linear algebra of the congruence of square solver.
For example, consider the factorization of 143. Choose r = 17, so r2 ≡ 3 (mod 143). Then choose r = 19, so r2 ≡ 75 ≡ 3 · 52. Those two relations can be combined as (17 · 19)2 ≡ 32 · 52 ≡ 152 (mod 143), and the two factors are gcd(17·19 − 15, 143) = 11 and gcd(17·19 + 15, 143) = 13. This sometimes fails; for instance, the relation 212 ≡ 22 (mod 143) can be combined with the relation on 19, but the two factors produced, 1 and 143, are trivial.
Thanks for very interesting question!
In pure Python I implemented from scratch Dixon Factorization Algorithm in 3 different flavors:
Using simplest sieve. I'm creating u64 array with all numbers in range [N; N * 2), which signify z^2 value. This array hold result of multiplication of prime numbers. Then through sieving process I iterate all factor base prime numbers and do array[k] *= p in those k positions that are divisible by p. Finally when sieved array is ready I check both that a) array index k is a perfect square, b) and array[k] == k - N. Second b) condition means that all multiplied p primes give final number, this is only true if number is divisible only by factor-base primes, i.e. it is B-smooth. This is simplest and most slowest out of my 3 solutions.
Second solution uses SymPy library to factorize every z^2. I iterate all possible z and do sympy.factorint(z * z), this gives factorization of z^2. If this factorization contains only small primes, i.e. from factor base, then I collect such z and z^2 for later processing. This version of algorithm is also slow, but much faster than first one.
Third solution uses a kind of sieving used in Quadratic Sieve. This sieving process is fastest of all three algorithms. Basically what it does, it finds all roots of equation x^2 = N (mod p) for all primes in factor base, as I have just few primes root finding is done through simple loop through all variants, for bigger primes one can use Shanks Tonelli algorithm of finding root, which is really fast. Only around 50% of primes give a root solution at all, hence only half of primes are actually used in Quadratic Sieve. Roots of such equation can be used to generate lots of solutions at once, because root + k * p is also a valid solution for all k. Sieving is done through array[offset(root) :: p] += Log2(p). Here instead of multiplication of first algorithm I used adding a logarithm of prime. First it is a bit faster to add a number than to multiply. Secondly, what is more important is that it supports any size of number, e.g. even 256-bit. While multiplying is possible only till 64-bit number, because Numpy has no 128 or 256 bit integers support. After logartithms are added, I check which logarithms are equal to logarithm of original z^2 number, this numbers are final sieved numbers.
After all three algorithms above have sieved all z^2 then I do Linear Algebra stage through Gaussian Elemination algorithm. This stage is meant to find such combination of B-smooth z^2 numbers which after multiplication of their prime factors give final number with all EVEN prime powers.
Lets call a Relation a triple z, z^2, prime factors of z^2. Basically all relations are given to Gaussian Elemination stage, where even combinations are found.
Even powers of prime numbers give us equality a^2 = b^2 (mod N), from where we can get a factor by doing factor = GCD(a + b, N), here GCD is Greatest Common Divisor found through Euclidean Algorithm. This GCD sometimes gives trivial factors 1 and N, in this case other even combinations should be checked.
To be 100% sure to get even combinations I do Sieving stage till I find a bit more than amount of prime numbers amount of relations, actually around 105% of amount of prime numbers. This extra 5% of relations ensure us that we certainly will get dependent linear equations in Gaussian stage. All these dependent equation form even combinations.
Actually we need a bit more dependent equations, not just 1 more than amount of primes, but around 5%-10% more, only because some (50-60% of them as I can see experimentally) dependencies give only trivial factor 1 or N. Hence extra equations are needed.
Put a look at console output at the end of my post. This console output shows all the impressions from my program. There I run in parallel (multi-threaded) both 2nd (Sieve_B) and 3rd (Sieve_C) algorithms. 1st one (Sieve_A) is not run by my program because it is so slow that you'll wait forever for it to finish.
At the very end of source file you can tweak variable bits = 64 to some other size, like bits = 96. This is amount of bits in composite number N. This N is created as a product of just two random prime numbers of equal size. Such a composite consisting of two equal in size primes is usually called RSA Number.
Also find B = 1 << 10, this tells degree of B-smoothness, basically factor base consists of all possible primes < B. You may increase this B limit, this will give more frequent answers of sieved z^2 hence whole factoring becomes much faster. The only limitation of huge size of B is Linear Algebra stage (Gaussian Elemination), because with bigger factor base you have to solve more linear equations of bigger size. And my Gauss is done not in very optimal way, for example instead of keeping bits as np.uint8 you may keep bits as dense np.uint64, this will increase Linear Algebra speed by 8x times more.
You may also find variable M = 1 << 23, which tells how large is sieving array size, in other words it is block size that is processed at once. Bigger block is a bit faster, but not much. Bigger values of M will not give much difference because it only tells what size of tasks sieving process is split into, it doesn't influence any computation power. More than that bigger M will occupy more memory, so you can't increases it infinitely, only till you have enough memory.
Besides all mentioned above algorithms I also used Fermat Primality Test, also Sieve of Eratosthenes (for generating prime factor base).
Plus also implemented my own algorithm of filtering square numbers. For this I take some composite modulus that looks close to Primorial, like mod = 2 * 2 * 2 * 3 * 3 * 5 * 7 * 11 * 13. And inside boolean array I mark all numbers modulus mod that are squares. Later when any number K should be checked if it is square or not I get flag_array[K % mod] and if it is True then number is "Possibly" squares, while if it is False then number is "Definitely" not square. Thus this filter gives false positives sometimes but never false negatives. This filter checking stage filters out 95% of non-squares, remaining 5% of possibly squares can be double-checked through math.isqrt().
Please, click below on Try it online! link, to test run my program on online server of ReplIt. This will give you best impression, especially if you have no Python or no personal laptop. My code below can be just run straight away after only PIP-installing python -m pip numpy sympy.
Try it online!
import threading
def GenPrimes_SieveOfEratosthenes(end):
import numpy as np
composites = np.zeros((end,), dtype = np.uint8)
for p in range(2, len(composites)):
if composites[p]:
continue
if p * p >= end:
break
composites[p * p :: p] = 1
primes = []
for p in range(2, len(composites)):
if not composites[p]:
primes.append(p)
return np.array(primes, dtype = np.uint32)
def Print(*pargs, __state = (threading.RLock(),), **nargs):
with __state[0]:
print(*pargs, flush = True, **nargs)
def IsSquare(n, *, state = []):
if len(state) == 0:
import numpy as np
Print('Pre-computing squares filter...')
squares_filter = 2 * 2 * 2 * 3 * 3 * 5 * 7 * 11 * 13
squares = np.zeros((squares_filter,), dtype = np.uint8)
squares[(np.arange(0, squares_filter, dtype = np.uint64) ** 2) % squares_filter] = 1
state.extend([squares_filter, squares])
if not state[1][n % state[0]]:
return False, None
import math
root = math.isqrt(n)
return root ** 2 == n, root
def FactorRef(x):
import sympy
return dict(sorted(sympy.factorint(x).items()))
def CheckZ(z, N, primes):
z2 = pow(z, 2, N)
factors = FactorRef(z2)
assert all(p <= primes[-1] for p in factors), (primes[-1], factors, N, z, z2)
return z
def SieveSimple(N, primes):
import time, math, numpy as np
Print('Simple Sieve of B-smooth z^2...')
sieve_block = 1 << 21
rep0_time = 0
for iiblock, iblock in enumerate(range(N, N * 2, sieve_block)):
if time.time() - rep0_time >= 30:
Print(f'Block {iiblock:>3} (2^{math.log2(max(iblock - N, 1)):>5.2f})')
rep0_time = time.time()
iblock_end = iblock + sieve_block
sieve_arr = np.ones((sieve_block,), dtype = np.uint64)
iblock_modN = iblock % N
for p in primes:
mp = 1
while True:
if mp * p >= sieve_block:
break
mp *= p
off = (mp - iblock_modN % mp) % mp
sieve_arr[off :: mp] *= p
for i in range(1 if iblock == N else 0, sieve_block):
num = iblock + i
z2 = num - N
if sieve_arr[i] < z2:
continue
assert sieve_arr[i] == z2, (sieve_arr[i], round(math.log2(sieve_arr[i]), 3), z2)
is_square, z = IsSquare(num)
if not is_square:
continue
#Print('z', z, 'z^2', z2)
yield CheckZ(z, N, primes)
def SieveFactor(N, primes):
import math
Print('Factor Sieve of B-smooth z^2...')
for iz, z in enumerate(range(math.isqrt(N - 1) + 1, math.isqrt(N * 2 - 1) + 1)):
z2 = z ** 2 - N
assert 0 <= z2 and z2 < N, (z, z2)
factors = FactorRef(z2)
if any(p > primes[-1] for p in factors):
continue
#Print('iz', iz, 'z', z, 'z^2', z2, 'z^2 factors', factors)
yield CheckZ(z, N, primes)
def BinarySearch(begin, end, Test):
while begin + 1 < end:
mid = (begin + end - 1) >> 1
if Test(mid):
end = mid + 1
else:
begin = mid + 1
assert begin + 1 == end and Test(begin), (begin, end, Test(begin))
return begin
def ModSqrt(n, p):
n %= p
def Ret(x):
if pow(x, 2, p) != n:
return []
nx = (p - x) % p
if x == nx:
return [x]
elif x <= nx:
return [x, nx]
else:
return [nx, x]
#if p % 4 == 3 and sympy.isprime(p):
# return Ret(pow(n, (p + 1) // 4, p))
for i in range(p):
if pow(i, 2, p) == n:
return Ret(i)
return []
def SieveQuadratic(N, primes):
import math, numpy as np
# https://en.wikipedia.org/wiki/Quadratic_sieve
# https://www.rieselprime.de/ziki/Multiple_polynomial_quadratic_sieve
M = 1 << 23
def Log2I(x):
return int(round(math.log2(max(1, x)) * (1 << 24)))
def Log2IF(li):
return li / (1 << 24)
Print('Quadratic Sieve of B-smooth z^2...')
plogs = {}
for p in primes:
plogs[int(p)] = Log2I(int(p))
qprimes = []
B = int(primes[-1]) + 1
for p in primes:
p = int(p)
res = []
mp = 1
while True:
if mp * p >= B:
break
mp *= p
roots = ModSqrt(N, mp)
if len(roots) == 0:
if mp == p:
break
continue
res.append((mp, tuple(roots)))
if len(res) > 0:
qprimes.append(res)
qprimes_lin = np.array([pinfo[0][0] for pinfo in qprimes], dtype = np.uint32)
yield qprimes_lin
Print('QSieve num primes', len(qprimes), f'({len(qprimes) * 100 / len(primes):.1f}%)')
x_begin0 = math.isqrt(N - 1) + 1
assert N <= x_begin0 ** 2
for iblock in range(1 << 30):
if (x_begin0 + (iblock + 1) * M) ** 2 - N >= N:
break
x_begin = x_begin0 + iblock * M
if iblock != 0:
Print('\n', end = '')
Print(f'Block {iblock} (2^{math.log2(max(1, x_begin ** 2 - N)):>6.2f})...')
a = np.zeros((M,), np.uint32)
for pinfo in qprimes:
p = pinfo[0][0]
plog = np.uint32(plogs[p])
for imp, (mp, roots) in enumerate(pinfo):
off_done = set()
for root in roots:
for off in range(mp):
if ((x_begin + off) ** 2 - N) % mp == 0 and off not in off_done:
break
else:
continue
a[off :: mp] += plog
off_done.add(off)
logs = np.log2(np.array((np.arange(M).astype(np.float64) + x_begin) ** 2 - N, dtype = np.float64))
logs2if = Log2IF(a.astype(np.float64))
logs_diff = np.abs(logs - logs2if)
for ix in range(M):
if logs_diff[ix] > 0.3:
continue
z = x_begin + ix
z2 = z * z - N
factors = FactorRef(z2)
assert all(p <= primes[-1] for p, c in factors.items())
#Print('iz', ix, 'z', z, 'z^2', z2, f'(2^{math.log2(max(1, z2)):>6.2f})', ', z^2 factors', factors)
yield CheckZ(z, N, primes)
def LinAlg(N, zs, primes):
import numpy as np
Print('Linear algebra...')
Print('Factoring...')
m = np.zeros((len(zs), len(primes) + len(zs)), dtype = np.uint8)
def SwapRows(i, j):
t = np.copy(m[i])
m[i][...] = m[j][...]
m[j][...] = t[...]
def MatToStr(m):
s = '\n'
for i in range(len(m)):
for j in range(len(m[i])):
s += str(m[i, j])
s += '\n'
return s[1:-1]
for iz, z in enumerate(zs):
z2 = z * z - N
fs = FactorRef(z2)
for p, c in fs.items():
i = np.searchsorted(primes, p, 'right') - 1
assert i >= 0 and i < len(primes) and primes[i] == p, (i, primes[i])
m[iz, i] = (int(m[iz, i]) + c) % 2
m[iz, len(primes) + iz] = 1
Print('Gaussian elemination...')
#Print(MatToStr(m)); Print()
one_col, one_rows = 0, 0
while True:
while True:
for i in range(one_rows, len(m)):
if m[i, one_col]:
break
else:
one_col += 1
if one_col >= len(primes):
break
continue
break
if one_col >= len(primes):
break
assert m[i, one_col]
assert np.all(m[i, :one_col] == 0)
for j in range(len(m)):
if i == j:
continue
if not m[j, one_col]:
continue
m[j][...] ^= m[i][...]
SwapRows(one_rows, i)
one_rows += 1
one_col += 1
assert np.all(m[one_rows:, :len(primes)] == 0)
zeros = m[one_rows:, len(primes):]
Print(f'Even combinations ({len(m) - one_rows}):')
Print(MatToStr(zeros))
return zeros
def ProcessResults(N, zs, la_zeros):
import math
Print('Computing final results...')
factors = []
for i in range(len(la_zeros)):
zero = la_zeros[i]
assert len(zero) == len(zs)
cz = []
for j in range(len(zero)):
if not zero[j]:
continue
z = zs[j]
z2 = z * z - N
cz.append((z, z2, FactorRef(z2)))
a = 1
for z, z2, fs in cz:
a = (a * z) % N
cnts = {}
for z, z2, fs in cz:
for p, c in fs.items():
cnts[p] = cnts.get(p, 0) + c
cnts = dict(sorted(cnts.items()))
b = 1
for p, c in cnts.items():
assert c % 2 == 0, (p, c, cnts)
b = (b * pow(p, c // 2, N)) % N
factor = math.gcd(a + b, N)
Print('a', str(a).rjust(len(str(N))), ' b', str(b).rjust(len(str(N))), ' factor', factor if factor != N else 'N')
if factor != 1 and factor != N:
factors.append(factor)
return factors
def SieveCollectResults(N, its):
import time, threading, queue, traceback, math
K = len(its)
qs = [queue.Queue() for i in range(K)]
last_dot, finish = False, False
def Get(it, ty, need, compul):
nonlocal last_dot, finish
try:
cnt = 0
for iz, z in enumerate(it):
if finish:
break
if iz < 4:
z2 = z * z - N
Print(('\n' if last_dot else '') + 'Sieve_' + ('C', 'B', 'A')[K - 1 - ty], ' iz', iz,
'z', z, 'z^2', z2, f'(2^{math.log2(max(1, z2)):>6.2f})', ', z^2 factors', FactorRef(z2))
last_dot = False
else:
Print(('.', 'b', 'a')[K - 1 - ty], end = '')
last_dot = True
qs[ty].put(z)
cnt += 1
if cnt >= need:
break
except:
Print(traceback.format_exc())
thr = []
for ty, (it, need, compul) in enumerate(its):
thr.append(threading.Thread(target = Get, args = (it, ty, need, compul), daemon = True))
thr[-1].start()
for ithr, t in enumerate(thr):
if its[ithr][2]:
t.join()
finish = True
if last_dot:
Print()
zs = [[] for i in range(K)]
for iq, q in enumerate(qs):
while not qs[iq].empty():
zs[iq].append(qs[iq].get())
return zs
def DixonFactor(N):
import time, math, numpy as np, sys
B = 1 << 10
primes = GenPrimes_SieveOfEratosthenes(B)
Print('Num primes', len(primes), 'last prime', primes[-1])
IsSquare(0)
it = SieveQuadratic(N, primes)
qprimes = next(it)
zs = SieveCollectResults(N, [
#(SieveSimple(N, primes), 3, False),
(SieveFactor(N, primes), 3, False),
(it, round(len(qprimes) * 1.06 + 0.5), True),
])[-1]
la_zeros = LinAlg(N, zs, qprimes)
fs = ProcessResults(N, zs, la_zeros)
if len(fs) > 0:
Print('Factored, factors', sorted(set(fs)))
else:
Print('Failed to factor! Try running program again...')
def IsPrime_Fermat(n, *, ntrials = 32):
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(ntrials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def GenRandom(bits):
import random
return random.randrange(1 << (bits - 1), 1 << bits)
def RandPrime(bits):
while True:
n = GenRandom(bits) | 1
if IsPrime_Fermat(n):
return n
def Main():
import math
bits = 64
N = RandPrime(bits // 2) * RandPrime((bits + 1) // 2)
Print('N to factor', N, f'(2^{math.log2(N):>5.1f})')
DixonFactor(N)
if __name__ == '__main__':
Main()
Console output:
N to factor 10086068308526249063 (2^ 63.1)
Num primes 172 last prime 1021
Pre-computing squares filter...
Quadratic Sieve of B-smooth z^2...
Factor Sieve of B-smooth z^2...
QSieve num primes 78 (45.3%)
Block 0 (2^ 32.14)...
Sieve_C iz 0 z 3175858067 z^2 6153202727426 (2^ 42.48) , z^2 factors {2: 1, 29: 2, 67: 1, 191: 1, 487: 1, 587: 1}
Sieve_C iz 1 z 3175859246 z^2 13641877439453 (2^ 43.63) , z^2 factors {31: 1, 61: 1, 167: 1, 179: 1, 373: 1, 647: 1}
Sieve_C iz 2 z 3175863276 z^2 39239319203113 (2^ 45.16) , z^2 factors {31: 1, 109: 1, 163: 1, 277: 1, 311: 1, 827: 1}
Sieve_C iz 3 z 3175867115 z^2 63623612174162 (2^ 45.85) , z^2 factors {2: 1, 29: 1, 41: 1, 47: 1, 61: 1, 127: 1, 197: 1, 373: 1}
.........................................................................
Sieve_B iz 0 z 3175858067 z^2 6153202727426 (2^ 42.48) , z^2 factors {2: 1, 29: 2, 67: 1, 191: 1, 487: 1, 587: 1}
......
Linear algebra...
Factoring...
Gaussian elemination...
Even combinations (7):
01000000000000000000000000000000000000000000000000001100000000000000000000000000000
11010100000010000100100000010011100000000001001001001001011001000000110001010000000
11001011000101111100011111001011010011000111101000001001011000001111100101001110000
11010010010000110110101100110101000100001100010011100011101000100010011011001001000
00010110111010000010000010000111010001010010111001000011011011101110110001001100100
00000010111000110010100110001111010101001000011010110011101000110001101101100100010
10010001111111101100011110111110110100000110111011010001010001100000010100000100001
Computing final results...
a 9990591196683978238 b 9990591196683978238 factor 1
a 936902490212600845 b 3051457985176300292 factor 3960321451
a 1072293684177681642 b 8576178744296269655 factor 2546780213
a 1578121372922149955 b 1578121372922149955 factor 1
a 2036768191033218175 b 8049300117493030888 factor N
a 1489997751586754228 b 2231890938565281666 factor 3960321451
a 9673227070299809069 b 3412883990935144956 factor 3960321451
Factored, factors [2546780213, 3960321451]

Categories