I want to factorize a large number using Fermat's factorization method. This is how I implemented it:
import numpy as np
def fac(n):
x = np.ceil(np.sqrt(n))
y = x*x - n
while not np.sqrt(y).is_integer():
x += 1
y = x*x - n
return(x + np.sqrt(y), x - np.sqrt(y))
Using this method I want to factor N into its components. Note that N=p*q, where p and q are prime.
I chose the following values to compute N:
p = 34058934059834598495823984675767545695711020949846845989934523432842834738974239847294083409583495898523872347284789757987987387543533846141.0
q = 34058934059834598495823984675767545695711020949846845989934523432842834738974239847294083409583495898523872347284789757987987387543533845933.0
and defined N
N = p*q
Now I factor N:
r = fac(n)
However, the factorization seems to not be correct:
int(r[0])*int(r[1]) == N
It does work for smaller ints:
fac(65537)
Out[1]: (65537.0, 1.0)
I'm quite sure the reason is numerical precision at some point.
I tried calculating N in numpy using object types:
N = np.dot(np.array(p).astype(object), np.array(q).astype(object))
but it doesn't help. Still, the numpy requires a float for the sqrt function.
I also tried using the math library instead of numpy, this library seems to not require a float for its sqrt function, but ultimately running into precision issues as well.
Python int are multiple precision numbers. But numpy is a wrapper around C low level libraries to speed up operations. The downside is that it cannot handle those multi-precision numbers. Worse, if you try to use np.sqrt on them, they will be converted to floating point numbers (C double or numpy float64) what have a precision of about 15 decimal digits.
But as Python int type is already a multiprecision type, you could use math.sqrt to get an approximative value of the true square root, and then use Newton to find a closer value:
def isqrt(n):
x = int(math.sqrt(n))
old = None
while True:
d = (n - x * x) // (2 * x)
if d == 0: break
if d == 1: # infinite loop prevention
if old is None:
old = 1
else: break
x += d
return x
Using it, your fac function could become:
def fac(n):
x = isqrt(n)
if x*x < n: x += 1
y = x * x - n
while True:
z = isqrt(y)
if z*z == y: break
x += 1
y = x*x -n
return x+z, x-z
Demo:
p = 34058934059834598495823984675767545695711020949846845989934523432842834738974239847294083409583495898523872347284789757987987387543533846141
q = 34058934059834598495823984675767545695711020949846845989934523432842834738974239847294083409583495898523872347284789757987987387543533845933
N = p*q
print(fac(N) == (p,q))
prints as expected True
Given some function f, I want to compute the following sum using sympy:
In general I want to use the index of summation as the order of differentiation of the function but I could not find out how to do it with sympy.
Given n is an int you know in advance, you can construct a function:
from sympy import diff
def sum_diff_order(f,x,n):
g = 0
for i in range(n+1):
g += diff(f,x,i)
return g
So if you take f to be x**10 and n=5, we get:
>>> x = symbols('x')
>>> f = x**10
>>> sum_diff_order(f,x,5)
x**10 + 10*x**9 + 90*x**8 + 720*x**7 + 5040*x**6 + 30240*x**5
import sympy as sp
x = sp.symbols('x')
f = sp.Function('f')
n = 2
sum([f(x).diff(x,i) for i in range(n+1)])
f(x) + Derivative(f(x), x) + Derivative(f(x), x, x)
If n is an known integer, you can use something like Add(*[diff(f(x), x, i) for i in range(n+1)]). For symbolic n or infinity, it isn't possible yet, as there is no way yet to represent derivatives of symbolic order.
Does some standard Python module contain a function to compute modular multiplicative inverse of a number, i.e. a number y = invmod(x, p) such that x*y == 1 (mod p)? Google doesn't seem to give any good hints on this.
Of course, one can come up with home-brewed 10-liner of extended Euclidean algorithm, but why reinvent the wheel.
For example, Java's BigInteger has modInverse method. Doesn't Python have something similar?
Python 3.8+
y = pow(x, -1, p)
Python 3.7 and earlier
Maybe someone will find this useful (from wikibooks):
def egcd(a, b):
if a == 0:
return (b, 0, 1)
else:
g, y, x = egcd(b % a, a)
return (g, x - (b // a) * y, y)
def modinv(a, m):
g, x, y = egcd(a, m)
if g != 1:
raise Exception('modular inverse does not exist')
else:
return x % m
If your modulus is prime (you call it p) then you may simply compute:
y = x**(p-2) mod p # Pseudocode
Or in Python proper:
y = pow(x, p-2, p)
Here is someone who has implemented some number theory capabilities in Python: http://www.math.umbc.edu/~campbell/Computers/Python/numbthy.html
Here is an example done at the prompt:
m = 1000000007
x = 1234567
y = pow(x,m-2,m)
y
989145189L
x*y
1221166008548163L
x*y % m
1L
You might also want to look at the gmpy module. It is an interface between Python and the GMP multiple-precision library. gmpy provides an invert function that does exactly what you need:
>>> import gmpy
>>> gmpy.invert(1234567, 1000000007)
mpz(989145189)
Updated answer
As noted by #hyh , the gmpy.invert() returns 0 if the inverse does not exist. That matches the behavior of GMP's mpz_invert() function. gmpy.divm(a, b, m) provides a general solution to a=bx (mod m).
>>> gmpy.divm(1, 1234567, 1000000007)
mpz(989145189)
>>> gmpy.divm(1, 0, 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: not invertible
>>> gmpy.divm(1, 4, 8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: not invertible
>>> gmpy.divm(1, 4, 9)
mpz(7)
divm() will return a solution when gcd(b,m) == 1 and raises an exception when the multiplicative inverse does not exist.
Disclaimer: I'm the current maintainer of the gmpy library.
Updated answer 2
gmpy2 now properly raises an exception when the inverse does not exists:
>>> import gmpy2
>>> gmpy2.invert(0,5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: invert() no inverse exists
As of 3.8 pythons pow() function can take a modulus and a negative integer. See here. Their case for how to use it is
>>> pow(38, -1, 97)
23
>>> 23 * 38 % 97 == 1
True
Here is a one-liner for CodeFights; it is one of the shortest solutions:
MMI = lambda A, n,s=1,t=0,N=0: (n < 2 and t%N or MMI(n, A%n, t, s-A//n*t, N or n),-1)[n<1]
It will return -1 if A has no multiplicative inverse in n.
Usage:
MMI(23, 99) # returns 56
MMI(18, 24) # return -1
The solution uses the Extended Euclidean Algorithm.
Sympy, a python module for symbolic mathematics, has a built-in modular inverse function if you don't want to implement your own (or if you're using Sympy already):
from sympy import mod_inverse
mod_inverse(11, 35) # returns 16
mod_inverse(15, 35) # raises ValueError: 'inverse of 15 (mod 35) does not exist'
This doesn't seem to be documented on the Sympy website, but here's the docstring: Sympy mod_inverse docstring on Github
Here is a concise 1-liner that does it, without using any external libraries.
# Given 0<a<b, returns the unique c such that 0<c<b and a*c == gcd(a,b) (mod b).
# In particular, if a,b are relatively prime, returns the inverse of a modulo b.
def invmod(a,b): return 0 if a==0 else 1 if b%a==0 else b - invmod(b%a,a)*b//a
Note that this is really just egcd, streamlined to return only the single coefficient of interest.
I try different solutions from this thread and in the end I use this one:
def egcd(a, b):
lastremainder, remainder = abs(a), abs(b)
x, lastx, y, lasty = 0, 1, 1, 0
while remainder:
lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
x, lastx = lastx - quotient*x, x
y, lasty = lasty - quotient*y, y
return lastremainder, lastx * (-1 if a < 0 else 1), lasty * (-1 if b < 0 else 1)
def modinv(a, m):
g, x, y = self.egcd(a, m)
if g != 1:
raise ValueError('modinv for {} does not exist'.format(a))
return x % m
Modular_inverse in Python
Here is my code, it might be sloppy but it seems to work for me anyway.
# a is the number you want the inverse for
# b is the modulus
def mod_inverse(a, b):
r = -1
B = b
A = a
eq_set = []
full_set = []
mod_set = []
#euclid's algorithm
while r!=1 and r!=0:
r = b%a
q = b//a
eq_set = [r, b, a, q*-1]
b = a
a = r
full_set.append(eq_set)
for i in range(0, 4):
mod_set.append(full_set[-1][i])
mod_set.insert(2, 1)
counter = 0
#extended euclid's algorithm
for i in range(1, len(full_set)):
if counter%2 == 0:
mod_set[2] = full_set[-1*(i+1)][3]*mod_set[4]+mod_set[2]
mod_set[3] = full_set[-1*(i+1)][1]
elif counter%2 != 0:
mod_set[4] = full_set[-1*(i+1)][3]*mod_set[2]+mod_set[4]
mod_set[1] = full_set[-1*(i+1)][1]
counter += 1
if mod_set[3] == B:
return mod_set[2]%B
return mod_set[4]%B
The code above will not run in python3 and is less efficient compared to the GCD variants. However, this code is very transparent. It triggered me to create a more compact version:
def imod(a, n):
c = 1
while (c % a > 0):
c += n
return c // a
from the cpython implementation source code:
def invmod(a, n):
b, c = 1, 0
while n:
q, r = divmod(a, n)
a, b, c, n = n, c, b - q*c, r
# at this point a is the gcd of the original inputs
if a == 1:
return b
raise ValueError("Not invertible")
according to the comment above this code, it can return small negative values, so you could potentially check if negative and add n when negative before returning b.
To figure out the modular multiplicative inverse I recommend using the Extended Euclidean Algorithm like this:
def multiplicative_inverse(a, b):
origA = a
X = 0
prevX = 1
Y = 1
prevY = 0
while b != 0:
temp = b
quotient = a/b
b = a%b
a = temp
temp = X
a = prevX - quotient * X
prevX = temp
temp = Y
Y = prevY - quotient * Y
prevY = temp
return origA + prevY
Well, here's a function in C which you can easily convert to python. In the below c function extended euclidian algorithm is used to calculate inverse mod.
int imod(int a,int n){
int c,i=1;
while(1){
c = n * i + 1;
if(c%a==0){
c = c/a;
break;
}
i++;
}
return c;}
Translates to Python Function
def imod(a,n):
i=1
while True:
c = n * i + 1;
if(c%a==0):
c = c/a
break;
i = i+1
return c
Reference to the above C function is taken from the following link C program to find Modular Multiplicative Inverse of two Relatively Prime Numbers
New to Python and not sure why my fermat factorisation method is failing? I think it may have something to do with the way large numbers are being implemented but I don't know enough about the language to determine where I'm going wrong.
The code below works when n=p*q is made with p and q extremely close (as in within about 20 of each other) but seems to run forever if they are further apart. For example, with n=991*997 the code works correctly and executes in <1s, likewise for n=104729*104659. If I change it ton=103591*104659 however, it just runs forever (well, I let it go 2 hours then stopped it).
Any points in the right direction would be greatly appreciated!
Code:
import math
def isqrt(n):
x = n
y = (x + n // x) // 2
while y < x:
x = y
y = (x + n // x) // 2
return x
n=103591*104729
a=isqrt(n) + 1
b2=a*a - n
b=isqrt(b2)
while b*b!=b2:
a=a+1
b2=b2+2*a+1
b=isqrt(b2)
p=a+b
q=a-b
print('a=',a,'\n')
print('b=',b,'\n')
print('p=',p,'\n')
print('q=',q,'\n')
print('pq=',p*q,'\n')
print('n=',n,'\n')
print('diff=',n-p*q,'\n')
I looked up the algorithm on Wikipedia and this works for me:
#from math import ceil
def isqrt(n):
x = n
y = (x + n // x) // 2
while y < x:
x = y
y = (x + n // x) // 2
return x
def fermat(n, verbose=True):
a = isqrt(n) # int(ceil(n**0.5))
b2 = a*a - n
b = isqrt(n) # int(b2**0.5)
count = 0
while b*b != b2:
if verbose:
print('Trying: a=%s b2=%s b=%s' % (a, b2, b))
a = a + 1
b2 = a*a - n
b = isqrt(b2) # int(b2**0.5)
count += 1
p=a+b
q=a-b
assert n == p * q
print('a=',a)
print('b=',b)
print('p=',p)
print('q=',q)
print('pq=',p*q)
return p, q
n=103591*104729
fermat(n)
I tried a couple test cases. This one is from the wikipedia page:
>>> fermat(5959)
Trying: a=78 b2=125 b=11
Trying: a=79 b2=282 b=16
a= 80
b= 21
p= 101
q= 59
pq= 5959
(101, 59)
This one is your sample case:
>>> fermat(103591*104729)
Trying: a=104159 b2=115442 b=339
a= 104160
b= 569
p= 104729
q= 103591
pq= 10848981839
(104729, 103591)
Looking at the lines labeled "Trying" shows that, in both cases, it converges quite quickly.
UPDATE: Your very long integer from the comments factors as follows:
n_long=316033277426326097045474758505704980910037958719395560565571239100878192955228495343184968305477308460190076404967552110644822298179716669689426595435572597197633507818204621591917460417859294285475630901332588545477552125047019022149746524843545923758425353103063134585375275638257720039414711534847429265419
fermat(n_long, verbose=False)
a= 17777324810733646969488445787976391269105128850805128551409042425916175469326288448917184096591563031034494377135896478412527365012246902424894591094668262
b= 157517855001095328119226302991766503492827415095855495279739107269808590287074235
p= 17777324810733646969488445787976391269105128850805128551409042425916175469483806303918279424710789334026260880628723893508382860291986009694703181381742497
q= 17777324810733646969488445787976391269105128850805128551409042425916175469168770593916088768472336728042727873643069063316671869732507795155086000807594027
pq= 316033277426326097045474758505704980910037958719395560565571239100878192955228495343184968305477308460190076404967552110644822298179716669689426595435572597197633507818204621591917460417859294285475630901332588545477552125047019022149746524843545923758425353103063134585375275638257720039414711534847429265419
The error was doing the addition after incremeting a so the new value was not the square of a.
This works as intended :
while b*b!=b2:
b2+=2*a+1
a=a+1
b=isqrt(b2)
for big numbers it should be faster than computing the square which has quite a greater number of digits.