Williams p+1 integer factorization - python

I am asked to introduce to my classmates the the Williams' p+1 algorithm to factor integers and I don't think I get it right. As far as I understand, this algorithm takes an integer N that factors into primes p,q (N=pq) where p+1 is B-smooth. I understand why, starting from these premises, the algorithm works (I've written proof), but I don't get how to correctly implement and use it. I think it must be implemented as follows:
I take a, randomly, in interval [1,N-1]
I compute x=gcd(a,N). If x !=1, then I return x (I don't get why we don't check first if x is prime because we don't actually know if N really is equal to p*q and x could be composed, right?)
Normally, x == 1, so I have to compute y = gcd(V_M-2,N) where V_0 = 2, V_1 = a, V_n= aV_(n-1) - V_(n-2) . I have found a way to compute V_n doing matrix power modulus N but I don't know which M should I use (I've copied the Pollard's one, but I don't know if that works and why).
If y!=1 and y!=N, I return y (again, as with the x, I think we should check that y is prime, am I right?). Else, just try another random a and start again.
So, that is mainly my question about the implementation, overall concerning the building of M, which might be related to the fact of p+1 B-smoothness I guess.
Regarding the usage, I really don't get in which cases I should use this method and which B should I take. I am going to leave my code in Python3 here and a case example that really is turning me crazy to see if you can give me a hand.
import random
from math import floor, log, gcd
def is_prime(n): #funcion que determina si un numero es primo
for d in range(2,n):
if n%d == 0:
return False
return True
def primes_leq(B): #funcion para obtener los primos que son menor o igual que B
l=[]
for i in range(2,B+1):
if is_prime(i):
l.append(i)
return l
def matrix_square(A, mod):
return mat_mult(A,A,mod)
def mat_mult(A,B, mod):
if mod is not None:
return [[(A[0][0]*B[0][0] + A[0][1]*B[1][0])%mod, (A[0][0]*B[0][1] + A[0][1]*B[1][1])%mod],
[(A[1][0]*B[0][0] + A[1][1]*B[1][0])%mod, (A[1][0]*B[0][1] + A[1][1]*B[1][1])%mod]]
def matrix_pow(M, power, mod):
#Special definition for power=0:
if power <= 0:
return [[1,0],[0,1]]
powers = list(reversed([True if i=="1" else False for i in bin(power)[2:]])) #Order is 1,2,4,8,16,...
matrices = [None for _ in powers]
matrices[0] = M
for i in range(1,len(powers)):
matrices[i] = matrix_square(matrices[i-1], mod)
result = None
for matrix, power in zip(matrices, powers):
if power:
if result is None:
result = matrix
else:
result = mat_mult(result, matrix, mod)
return result
def williams(N, B):
flag = False
while not flag :
a = random.randint(1,N-1)
print("a : " + str(a))
x = gcd(a,N)
print("x : " + str(x))
if x != 1:
return x
else :
M = 1
A = [[0,1],[-1,a]]
for p in primes_leq(B):
M *= p **(floor(log(N,p)))
print("voy por aquí")
C = matrix_pow(A,M,N)
V = 2*C[0][0]+ a*C[0][1]
y = gcd(V-2,N)
print("y : " + str(y))
if y != 1 and y != N:
flag = True
return y
Trying to test my implementation, I've tried to follow some examples to check if my factorization is working alright. For example, I've looked at https://members.loria.fr/PZimmermann/records/Pplus1.html and I've tried williams(2**439-1,10**5) and I'm getting 104110607 but I understand I should be getting 122551752733003055543 (as in the webpage). As far as I understand, both are primes that factor N=2**439-1, but isn't this fact just contradicting the hypothesis of N being a product of two primes p*q?
Thanks for your help, it'll be appreciated

I think you're missing the point of this algorithm...
You find a factor p of N (or a trivial divisor) when M is a multiple of p+1.
If p+1 is smooth -- say all the factors of p+1 are <= B -- then it becomes becomes possible to construct an M that is a multiple of all such possible factors, like this:
M=1
for x in all primes <= B:
let y = largest power of x such that y < N
M = M*y
You should check the successive values of M produced by this loop. Alternatively you could just check all successive factorials. The point is that, in each iteration, you add new factors to M, and when all the factors of p+1 are in M, then M will of course be a multiple of p+1.
The tricky part is that M will very get large, and you can't take M mod N. What you can do, though, is calculate all the VM mod N, and instead of actually calculating each M, you just multiply the subscript of VM by the appropriate factor using the addition formula: Va+b = VaVb - Va-b

2439−1 has more than two prime factors. If you’re not getting
the one you want, you should divide by the one that you got and keep
going on the quotient.

Related

Errors in Directly vs Recursively Calculating a given Fibonacci Number

I was bored at work and was playing with some math and python coding, when I noticed the following:
Recursively (or if using a for loop) you simply add integers together to get a given Fibonacci number. However there is also a direct equation for calculating Fibonacci numbers, and for large n this equation will give answers that are, frankly, quite wrong with respect to the recursively calculated Fibonacci number.
I imagine this is due to rounding and floating point arithmetic ( sqrt(5) is irrational after all), and if so can anyone point me into a direction on how I could modify the fibo_calc_direct function to return a more accurate result?
Thanks!
def fib_calc_recur(n, ii = 0, jj = 1):
#n is the index of the nth fibonacci number, F_n, where F_0 = 0, F_1 = 1, ...
if n == 0: #use recursion
return ii
if n == 1:
return jj
else:
return(fib_calc_recur(n -1, jj, ii + jj))
def fib_calc_direct(n):
a = (1 + np.sqrt(5))/2
b = (1 - np.sqrt(5))/2
f = (1/np.sqrt(5)) * (a**n - b**n)
return(f)
You could make use of Decimal numbers, and set its precision depending on the magninute of n
Not your question, but I'd use an iterative version of the addition method. Here is a script that makes both calculations (naive addition, direct with Decimal) for values of n up to 4000:
def fib_calc_iter(n):
a, b = 0, 1
if n < 2:
return n
for _ in range(1, n):
a, b = b, a + b
return b
from decimal import Decimal, getcontext
def fib_calc_decimal(n):
getcontext().prec = n // 4 + 3 # Choose a precision good enough for this n
sqrt5 = Decimal(5).sqrt()
da = (1 + sqrt5) / 2
db = (1 - sqrt5) / 2
f = (da**n - db**n) / sqrt5
return int(f + Decimal(0.5)) # Round to nearest int
# Test it...
for n in range(1, 4000):
x = fib_calc_iter(n)
y = fib_calc_decimal(n)
if x != y:
print(f"Difference found for n={n}.\nNaive method={x}.\nDecimal method={y}")
break
else:
print("No differences found")

(RSA) This script I got off of stackoverflow returns a negative d value

So I stumbled upon this thread on here with this script and it returns a negative d value and my p and q values are both prime. Any reason for this? Possibly just a faulty script?
def egcd(a, b):
x,y, u,v = 0,1, 1,0
while a != 0:
q, r = b//a, b%a
m, n = x-u*q, y-v*q
b,a, x,y, u,v = a,r, u,v, m,n
gcd = b
return gcd, x, y
def main():
p = 153143042272527868798412612417204434156935146874282990942386694020462861918068684561281763577034706600608387699148071015194725533394126069826857182428660427818277378724977554365910231524827258160904493774748749088477328204812171935987088715261127321911849092207070653272176072509933245978935455542420691737433
q = 156408916769576372285319235535320446340733908943564048157238512311891352879208957302116527435165097143521156600690562005797819820759620198602417583539668686152735534648541252847927334505648478214810780526425005943955838623325525300844493280040860604499838598837599791480284496210333200247148213274376422459183
e = 65537
ct = 313988037963374298820978547334691775209030794488153797919908078268748481143989264914905339615142922814128844328634563572589348152033399603422391976806881268233227257794938078078328711322137471700521343697410517378556947578179313088971194144321604618116160929667545497531855177496472117286033893354292910116962836092382600437895778451279347150269487601855438439995904578842465409043702035314087803621608887259671021452664437398875243519136039772309162874333619819693154364159330510837267059503793075233800618970190874388025990206963764588045741047395830966876247164745591863323438401959588889139372816750244127256609
# compute n
n = p * q
# Compute phi(n)
phi = (p - 1) * (q - 1)
# Compute modular inverse of e
gcd, a, b = egcd(e, phi)
d = a
print( "n: " + str(d) );
# Decrypt ciphertext
pt = pow(ct,d,n)
print( "pt: " + str(pt) )
if __name__ == "__main__":
main()
This can happen, I'll explain why below, but for practical purposes you'll want to know how to fix it. The answer to that is to add phi to d and use that value instead: everything will work as RSA should.
So why does it happen? The algorithm computes the extended gcd. The result of egcd is a*e + b*phi = gcd, and in the case of RSA, we have gcd = 1 so a*e + b*phi = 1.
If you look at this equation modulo phi (which is the order of the multiplicative group), then a*e == 1 mod phi which is what you need to make RSA work. In fact, by the same congruence, you can add or subtract any multiple of phi to a and the congruence still holds.
Now look at the equation again: a*e + b*phi = 1. We know e and phi are positive integers. You can't have all positive integers in this equation or else no way would it add up to 1 (it would be much larger than 1). So that means either a or b is going to be negative. Sometimes it will be a that is negative, other times it will be b. When it is b, then your a comes out as you would expect: a positive integer that you then assign to the value d. But the other times, you get a negative value for a. We don't want that, so simply add phi to it and make that your value of d.

Python: Streamlining Code for Brown Numbers

I was curious if any of you could come up with a more streamline version of code to calculate Brown numbers. as of the moment, this code can do ~650! before it moves to a crawl. Brown Numbers are calculated thought the equation n! + 1 = m**(2) Where M is an integer
brownNum = 8
import math
def squareNum(n):
x = n // 2
seen = set([x])
while x * x != n:
x = (x + (n // x)) // 2
if x in seen: return False
seen.add(x)
return True
while True:
for i in range(math.factorial(brownNum)+1,math.factorial(brownNum)+2):
if squareNum(i) is True:
print("pass")
print(brownNum)
print(math.factorial(brownNum)+1)
break
else:
print(brownNum)
print(math.factorial(brownNum)+1)
brownNum = brownNum + 1
continue
break
print(input(" "))
Sorry, I don't understand the logic behind your code.
I don't understand why you calculate math.factorial(brownNum) 4 times with the same value of brownNum each time through the while True loop. And in the for loop:
for i in range(math.factorial(brownNum)+1,math.factorial(brownNum)+2):
i will only take on the value of math.factorial(brownNum)+1
Anyway, here's my Python 3 code for a brute force search of Brown numbers. It quickly finds the only 3 known pairs, and then proceeds to test all the other numbers under 1000 in around 1.8 seconds on this 2GHz 32 bit machine. After that point you can see it slowing down (it hits 2000 around the 20 second mark) but it will chug along happily until the factorials get too large for your machine to hold.
I print progress information to stderr so that it can be separated from the Brown_number pair output. Also, stderr doesn't require flushing when you don't print a newline, unlike stdout (at least, it doesn't on Linux).
import sys
# Calculate the integer square root of `m` using Newton's method.
# Returns r: r**2 <= m < (r+1)**2
def int_sqrt(m):
if m <= 0:
return 0
n = m << 2
r = n >> (n.bit_length() // 2)
while True:
d = (n // r - r) >> 1
r += d
if -1 <= d <= 1:
break
return r >> 1
# Search for Browns numbers
fac = i = 1
while True:
if i % 100 == 0:
print('\r', i, file=sys.stderr, end='')
fac *= i
n = fac + 1
r = int_sqrt(n)
if r*r == n:
print('\nFound', i, r)
i += 1
You might want to:
pre calculate your square numbers, instead of testing for them on the fly
pre calculate your factorial for each loop iteration num_fac = math.factorial(brownNum) instead of multiple calls
implement your own, memoized, factorial
that should let you run to the hard limits of your machine
one optimization i would make would be to implement a 'wrapper' function around math.factorial that caches previous values of factorial so that as your brownNum increases, factorial doesn't have as much work to do. this is known as 'memoization' in computer science.
edit: found another SO answer with similar intention: Python: Is math.factorial memoized?
You should also initialize the square root more closely to the root.
e = int(math.log(n,4))
x = n//2**e
Because of 4**e <= n <= 4**(e+1) the square root will be between x/2 and x which should yield quadratic convergence of the Heron formula from the first iteration on.

Implementing Pollard's Rho Algorithm for Discrete Logarithms

I am trying to implement Pollard's rho algorithm for computing discrete logarithms based on the description in the book Prime Numbers: A Computational Perspective by Richard Crandall and Carl Pomerance, section 5.2.2, page 232. Here is my Python code:
def dlog(g,t,p):
# l such that g**l == t (mod p), with p prime
# algorithm due to Crandall/Pomerance "Prime Numbers" sec 5.2.2
from fractions import gcd
def inverse(x, p): return pow(x, p-2, p)
def f(xab):
x, a, b = xab[0], xab[1], xab[2]
if x < p/3:
return [(t*x)%p, (a+1)%(p-1), b]
if 2*p/3 < x:
return [(g*x)%p, a, (b+1)%(p-1)]
return [(x*x)%p, (2*a)%(p-1), (2*b)%(p-1)]
i, j, k = 1, [1,0,0], f([1,0,0])
while j[0] <> k[0]:
print i, j, k
i, j, k = i+1, f(j), f(f(k))
print i, j, k
d = gcd(j[1] - k[1], p - 1)
if d == 1: return ((k[2]-j[2]) * inverse(j[1]-k[1],p-1)) % (p-1)
m, l = 0, ((k[2]-j[2]) * inverse(j[1]-k[1],(p-1)/d)) % ((p-1)/d)
while m <= d:
print m, l
if pow(g,l,p) == t: return l
m, l = m+1, (l+((p-1)/d))%(p-1)
return False
The code includes debugging output to show what is happening. You can run the code at http://ideone.com/8lzzOf, where you will also see two test cases. The first test case, which follows the d > 1 path, calculates the correct value. The second test case, which follows the d == 1 path, fails.
Please help me find my error.
Problem 1
One thing that looks suspicious is this function:
def inverse(x, p): return pow(x, p-2, p)
This is computing a modular inverse of x modulo p using Euler's theorem. This is fine if p is prime, but otherwise you need to raise x to the power phi(p)-1.
In your case you are calling this function with a modulo of p-1 which is going to be even and therefore give an incorrect inverse.
As phi(p-1) is hard to compute, it may be better to use the extended Euclidean algorithm for computing the inverse instead.
Problem 2
Running your code for the case g=83, t=566, p=997 produces 977, while you were expecting 147.
In fact 977 is indeed a logarithm of 83 as we can see if we compute:
>>> pow(83,977,997)
566
but it is not the one you were expecting (147).
This is because the Pollard rho method requires g to be a generator of the group. Unfortunately, 83 is not a generator of the group 1,2,..997 because pow(83,166,997)==1. (In other words, after generating 166 elements of the group you start repeating elements.)

Computing Eulers Totient Function

I am trying to find an efficient way to compute Euler's totient function.
What is wrong with this code? It doesn't seem to be working.
def isPrime(a):
return not ( a < 2 or any(a % i == 0 for i in range(2, int(a ** 0.5) + 1)))
def phi(n):
y = 1
for i in range(2,n+1):
if isPrime(i) is True and n % i == 0 is True:
y = y * (1 - 1/i)
else:
continue
return int(y)
Here's a much faster, working way, based on this description on Wikipedia:
Thus if n is a positive integer, then φ(n) is the number of integers k in the range 1 ≤ k ≤ n for which gcd(n, k) = 1.
I'm not saying this is the fastest or cleanest, but it works.
from math import gcd
def phi(n):
amount = 0
for k in range(1, n + 1):
if gcd(n, k) == 1:
amount += 1
return amount
You have three different problems...
y needs to be equal to n as initial value, not 1
As some have mentioned in the comments, don't use integer division
n % i == 0 is True isn't doing what you think because of Python chaining the comparisons! Even if n % i equals 0 then 0 == 0 is True BUT 0 is True is False! Use parens or just get rid of comparing to True since that isn't necessary anyway.
Fixing those problems,
def phi(n):
y = n
for i in range(2,n+1):
if isPrime(i) and n % i == 0:
y *= 1 - 1.0/i
return int(y)
Calculating gcd for every pair in range is not efficient and does not scales. You don't need to iterate throught all the range, if n is not a prime you can check for prime factors up to its square root, refer to https://stackoverflow.com/a/5811176/3393095.
We must then update phi for every prime by phi = phi*(1 - 1/prime).
def totatives(n):
phi = int(n > 1 and n)
for p in range(2, int(n ** .5) + 1):
if not n % p:
phi -= phi // p
while not n % p:
n //= p
#if n is > 1 it means it is prime
if n > 1: phi -= phi // n
return phi
I'm working on a cryptographic library in python and this is what i'm using. gcd() is Euclid's method for calculating greatest common divisor, and phi() is the totient function.
def gcd(a, b):
while b:
a, b=b, a%b
return a
def phi(a):
b=a-1
c=0
while b:
if not gcd(a,b)-1:
c+=1
b-=1
return c
Most implementations mentioned by other users rely on calling a gcd() or isPrime() function. In the case you are going to use the phi() function many times, it pays of to calculated these values before hand. A way of doing this is by using a so called sieve algorithm.
https://stackoverflow.com/a/18997575/7217653 This answer on stackoverflow provides us with a fast way of finding all primes below a given number.
Oke, now we can replace isPrime() with a search in our array.
Now the actual phi function:
Wikipedia gives us a clear example: https://en.wikipedia.org/wiki/Euler%27s_totient_function#Example
phi(36) = phi(2^2 * 3^2) = 36 * (1- 1/2) * (1- 1/3) = 30 * 1/2 * 2/3 = 12
In words, this says that the distinct prime factors of 36 are 2 and 3; half of the thirty-six integers from 1 to 36 are divisible by 2, leaving eighteen; a third of those are divisible by 3, leaving twelve numbers that are coprime to 36. And indeed there are twelve positive integers that are coprime with 36 and lower than 36: 1, 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, and 35.
TL;DR
With other words: We have to find all the prime factors of our number and then multiply these prime factors together using foreach prime_factor: n *= 1 - 1/prime_factor.
import math
MAX = 10**5
# CREDIT TO https://stackoverflow.com/a/18997575/7217653
def sieve_for_primes_to(n):
size = n//2
sieve = [1]*size
limit = int(n**0.5)
for i in range(1,limit):
if sieve[i]:
val = 2*i+1
tmp = ((size-1) - i)//val
sieve[i+val::val] = [0]*tmp
return [2] + [i*2+1 for i, v in enumerate(sieve) if v and i>0]
PRIMES = sieve_for_primes_to(MAX)
print("Primes generated")
def phi(n):
original_n = n
prime_factors = []
prime_index = 0
while n > 1: # As long as there are more factors to be found
p = PRIMES[prime_index]
if (n % p == 0): # is this prime a factor?
prime_factors.append(p)
while math.ceil(n / p) == math.floor(n / p): # as long as we can devide our current number by this factor and it gives back a integer remove it
n = n // p
prime_index += 1
for v in prime_factors: # Now we have the prime factors, we do the same calculation as wikipedia
original_n *= 1 - (1/v)
return int(original_n)
print(phi(36)) # = phi(2**2 * 3**2) = 36 * (1- 1/2) * (1- 1/3) = 36 * 1/2 * 2/3 = 12
It looks like you're trying to use Euler's product formula, but you're not calculating the number of primes which divide a. You're calculating the number of elements relatively prime to a.
In addition, since 1 and i are both integers, so is the division, in this case you always get 0.
With regards to efficiency, I haven't noticed anyone mention that gcd(k,n)=gcd(n-k,n). Using this fact can save roughly half the work needed for the methods involving the use of the gcd. Just start the count with 2 (because 1/n and (n-1)/k will always be irreducible) and add 2 each time the gcd is one.
Here is a shorter implementation of orlp's answer.
from math import gcd
def phi(n): return sum([gcd(n, k)==1 for k in range(1, n+1)])
As others have already mentioned it leaves room for performance optimization.
Actually to calculate phi(any number say n)
We use the Formula
where p are the prime factors of n.
So, you have few mistakes in your code:
1.y should be equal to n
2. For 1/i actually 1 and i both are integers so their evaluation will also be an integer,thus it will lead to wrong results.
Here is the code with required corrections.
def phi(n):
y = n
for i in range(2,n+1):
if isPrime(i) and n % i == 0 :
y -= y/i
else:
continue
return int(y)

Categories