I know there's already a question similar to this, but I want to speed it up using GMPY2 (or something similar with GMP).
Here is my current code, it's decent but can it be better?
Edit: new code, checks divisors 2 and 3
def factors(n):
result = set()
result |= {mpz(1), mpz(n)}
def all_multiples(result, n, factor):
z = mpz(n)
while gmpy2.f_mod(mpz(z), factor) == 0:
z = gmpy2.divexact(z, factor)
result |= {mpz(factor), z}
return result
result = all_multiples(result, n, 2)
result = all_multiples(result, n, 3)
for i in range(1, gmpy2.isqrt(n) + 1, 6):
i1 = mpz(i) + 1
i2 = mpz(i) + 5
div1, mod1 = gmpy2.f_divmod(n, i1)
div2, mod2 = gmpy2.f_divmod(n, i2)
if mod1 == 0:
result |= {i1, div1}
if mod2 == 0:
result |= {i2, div2}
return result
If it's possible, I'm also interested in an implementation with divisors only within n^(1/3) and 2^(2/3)*n(1/3)
As an example, mathematica's factor() is much faster than the python code. I want to factor numbers between 20 and 50 decimal digits. I know ggnfs can factor these in less than 5 seconds.
I am interested if any module implementing fast factorization exists in python too.
I just made some quick changes to your code to eliminate redundant name lookups. The algorithm is still the same but it is about twice as fast on my computer.
import gmpy2
from gmpy2 import mpz
def factors(n):
result = set()
n = mpz(n)
for i in range(1, gmpy2.isqrt(n) + 1):
div, mod = divmod(n, i)
if not mod:
result |= {mpz(i), div}
return result
print(factors(12345678901234567))
Other suggestions will need more information about the size of the numbers, etc. For example, if you need all the possible factors, it may be faster to construct those from all the prime factors. That approach will let you decrease the limit of the range statement as you proceed and also will let you increment by 2 (after removing all the factors of 2).
Update 1
I've made some additional changes to your code. I don't think your all_multiplies() function is correct. Your range() statement isn't optimal since 2 is check again but my first fix made it worse.
The new code delays computing the co-factor until it knows the remainder is 0. I also tried to use the built-in functions as much as possible. For example, mpz % integer is faster than gmpy2.f_mod(mpz, integer) or gmpy2.f_mod(integer, mpz) where integer is a normal Python integer.
import gmpy2
from gmpy2 import mpz, isqrt
def factors(n):
n = mpz(n)
result = set()
result |= {mpz(1), n}
def all_multiples(result, n, factor):
z = n
f = mpz(factor)
while z % f == 0:
result |= {f, z // f}
f += factor
return result
result = all_multiples(result, n, 2)
result = all_multiples(result, n, 3)
for i in range(1, isqrt(n) + 1, 6):
i1 = i + 1
i2 = i + 5
if not n % i1:
result |= {mpz(i1), n // i1}
if not n % i2:
result |= {mpz(i2), n // i2}
return result
print(factors(12345678901234567))
I would change your program to just find all the prime factors less than the square root of n and then construct all the co-factors later. Then you decrease n each time you find a factor, check if n is prime, and only look for more factors if n isn't prime.
Update 2
The pyecm module should be able to factor the size numbers you are trying to factor. The following example completes in about a second.
>>> import pyecm
>>> list(pyecm.factors(12345678901234567890123456789012345678901, False, True, 10, 1))
[mpz(29), mpz(43), mpz(43), mpz(55202177), mpz(2928109491677), mpz(1424415039563189)]
There exist different Python factoring modules in the Internet. But if you want to implement factoring yourself (without using external libraries) then I can suggest quite fast and very easy to implement Pollard-Rho Algorithm. I implemented it fully in my code below, you just scroll down directly to my code (at the bottom of answer) if you don't want to read.
With great probability Pollard-Rho algorithm finds smallest non-trivial factor P (not equal to 1 or N) within time of O(Sqrt(P)). To compare, Trial Division algorithm that you implemented in your question takes O(P) time to find factor P. It means for example if a prime factor P = 1 000 003 then trial division will find it after 1 000 003 division operations, while Pollard-Rho on average will find it just after 1 000 operations (Sqrt(1 000 003) = 1 000), which is much much faster.
To make Pollard-Rho algorithm much faster we should be able to detect prime numbers, to exclude them from factoring and don't wait unnecessarily time, for that in my code I used Fermat Primality Test which is very fast and easy to implement within just 7-9 lines of code.
Pollard-Rho algorithm itself is very short, 13-15 lines of code, you can see it at the very bottom of my pollard_rho_factor() function, the remaining lines of code are supplementary helpers-functions.
I implemented all algorithms from scratch without using extra libraries (except random module). That's why you can see my gcd() function there although you can use built-in Python's math.gcd() instead (which finds Greatest Common Divisor).
You can see function Int() in my code, it is used just to convert Python's integers to GMPY2. GMPY2 ints will make algorithm faster, you can just use Python's int(x) instead. I didn't use any specific GMPY2 function, just converted all ints to GMPY2 ints to have around 50% speedup.
As an example I factor first 190 digits of Pi!!! It takes 3-15 seconds to factor them. Pollard-Rho algorithm is randomized hence it takes different time to factor same number on each run. You can restart program again and see that it will print different running time.
Of course factoring time depends greatly on size of prime divisors. Some 50-200 digits numbers can be factoring within fraction of second, some will take months. My example 190 digits of Pi has quite small prime factors, except largest one, that's why it is fast. Other digits of Pi may be not that fast to factor. So digit-size of number doesn't matter very much, only size of prime factors matter.
I intentionally implemented pollard_rho_factor() function as one standalone function, without breaking it into smaller separate functions. Although it breaks Python's style guide, which (as I remember) suggests not to have nested functions and place all possible functions at global scope. Also style guide suggests to do all imports at global scope in first lines of script. I did single function intentionally so that it is easy copy-pastable and fully ready to use in your code. Fermat primality test is_fermat_probable_prime() sub-function is also copy pastable and works without extra dependencies.
In very rare cases Pollard-Rho algorithm may fail to find non-trivial prime factor, especially for very small factors, for example you can replace n inside test() with small number 4 and see that Pollard-Rho fails. For such small failed factors you can easily use your Trial Division algorithm that you implemented in your question.
Try it online!
def pollard_rho_factor(N, *, trials = 16):
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
import math, random
def Int(x):
import gmpy2
return gmpy2.mpz(x) # int(x)
def is_fermat_probable_prime(n, *, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def gcd(a, b):
# https://en.wikipedia.org/wiki/Greatest_common_divisor
# https://en.wikipedia.org/wiki/Euclidean_algorithm
while b != 0:
a, b = b, a % b
return a
def found(f, prime):
print(f'Found {("composite", "prime")[prime]} factor, {math.log2(f):>7.03f} bits... {("Pollard-Rho failed to fully factor it!", "")[prime]}')
return f
N = Int(N)
if N <= 1:
return []
if is_fermat_probable_prime(N):
return [found(N, True)]
for j in range(trials):
i, stage, y, x = 0, 2, Int(1), Int(random.randint(1, N - 2))
while True:
r = gcd(N, abs(x - y))
if r != 1:
break
if i == stage:
y = x
stage <<= 1
x = (x * x + 1) % N
i += 1
if r != N:
return sorted(pollard_rho_factor(r) + pollard_rho_factor(N // r))
return [found(N, False)] # Pollard-Rho failed
def test():
import time
# http://www.math.com/tables/constants/pi.htm
# pi = 3.
# 1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
# 8214808651 3282306647 0938446095 5058223172 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196
# n = first 190 fractional digits of Pi
n = 1415926535_8979323846_2643383279_5028841971_6939937510_5820974944_5923078164_0628620899_8628034825_3421170679_8214808651_3282306647_0938446095_5058223172_5359408128_4811174502_8410270193_8521105559_6446229489
tb = time.time()
print('N:', n)
print('Factors:', pollard_rho_factor(n))
print(f'Time: {time.time() - tb:.03f} sec')
test()
Output:
N: 1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489
Found prime factor, 1.585 bits...
Found prime factor, 6.150 bits...
Found prime factor, 20.020 bits...
Found prime factor, 27.193 bits...
Found prime factor, 28.311 bits...
Found prime factor, 545.087 bits...
Factors: [mpz(3), mpz(71), mpz(1063541), mpz(153422959), mpz(332958319), mpz(122356390229851897378935483485536580757336676443481705501726535578690975860555141829117483263572548187951860901335596150415443615382488933330968669408906073630300473)]
Time: 2.963 sec
I'm relatively new to the python world, and the coding world in general, so I'm not really sure how to go about optimizing my python script. The script that I have is as follows:
import math
z = 1
x = 0
while z != 0:
x = x+1
if x == 500:
z = 0
calculated = open('Prime_Numbers.txt', 'r')
readlines = calculated.readlines()
calculated.close()
a = len(readlines)
b = readlines[(a-1)]
b = int(b) + 1
for num in range(b, (b+1000)):
prime = True
calculated = open('Prime_Numbers.txt', 'r')
for i in calculated:
i = int(i)
q = math.ceil(num/2)
if (q%i==0):
prime = False
if prime:
calculated.close()
writeto = open('Prime_Numbers.txt', 'a')
num = str(num)
writeto.write("\n" + num)
writeto.close()
print(num)
As some of you can probably guess I'm calculating prime numbers. The external file that it calls on contains all the prime numbers between 2 and 20.
The reason that I've got the while loop in there is that I wanted to be able to control how long it ran for.
If you have any suggestions for cutting out any clutter in there could you please respond and let me know, thanks.
Reading and writing to files is very, very slow compared to operations with integers. Your algorithm can be sped up 100-fold by just ripping out all the file I/O:
import itertools
primes = {2} # A set containing only 2
for n in itertools.count(3): # Start counting from 3, by 1
for prime in primes: # For every prime less than n
if n % prime == 0: # If it divides n
break # Then n is composite
else:
primes.add(n) # Otherwise, it is prime
print(n)
A much faster prime-generating algorithm would be a sieve. Here's the Sieve of Eratosthenes, in Python 3:
end = int(input('Generate primes up to: '))
numbers = {n: True for n in range(2, end)} # Assume every number is prime, and then
for n, is_prime in numbers.items(): # (Python 3 only)
if not is_prime:
continue # For every prime number
for i in range(n ** 2, end, n): # Cross off its multiples
numbers[i] = False
print(n)
It is very inefficient to keep storing and loading all primes from a file. In general file access is very slow. Instead save the primes to a list or deque. For this initialize calculated = deque() and then simply add new primes with calculated.append(num). At the same time output your primes with print(num) and pipe the result to a file.
When you found out that num is not a prime, you do not have to keep checking all the other divisors. So break from the inner loop:
if q%i == 0:
prime = False
break
You do not need to go through all previous primes to check for a new prime. Since each non-prime needs to factorize into two integers, at least one of the factors has to be smaller or equal sqrt(num). So limit your search to these divisors.
Also the first part of your code irritates me.
z = 1
x = 0
while z != 0:
x = x+1
if x == 500:
z = 0
This part seems to do the same as:
for x in range(500):
Also you limit with x to 500 primes, why don't you simply use a counter instead, that you increase if a prime is found and check for at the same time, breaking if the limit is reached? This would be more readable in my opinion.
In general you do not need to introduce a limit. You can simply abort the program at any point in time by hitting Ctrl+C.
However, as others already pointed out, your chosen algorithm will perform very poor for medium or large primes. There are more efficient algorithms to find prime numbers: https://en.wikipedia.org/wiki/Generating_primes, especially https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes.
You're writing a blank line to your file, which is making int() traceback. Also, I'm guessing you need to rstrip() off your newlines.
I'd suggest using two different files - one for initial values, and one for all values - initial and recently computed.
If you can keep your values in memory a while, that'd be a lot faster than going through a file repeatedly. But of course, this will limit the size of the primes you can compute, so for larger values you might return to the iterate-through-the-file method if you want.
For computing primes of modest size, a sieve is actually quite good, and worth a google.
When you get into larger primes, trial division by the first n primes is good, followed by m rounds of Miller-Rabin. If Miller-Rabin probabilistically indicates the number is probably a prime, then you do complete trial division or AKS or similar. Miller Rabin can say "This is probably a prime" or "this is definitely composite". AKS gives a definitive answer, but it's slower.
FWIW, I've got a bunch of prime-related code collected together at http://stromberg.dnsalias.org/~dstromberg/primes/
My assignment is to create a function to sum the powers of tuples.
def sumOfPowers(tups, primes):
x = 0;
for i in range (1, len(primes) + 1):
x += pow(tups, i);
return x;
So far I have this.
tups - list of one or more tuples, primes - list of one or more primes
It doesn't work because the inputs are tuples and not single integers. How could I fix this to make it work for lists?
[/edit]
Sample output:
sumOfPowers([(2,3), (5,6)], [3,5,7,11,13,17,19,23,29]) == 2**3 + 5**6
True
sumOfPowers([(2,10**1000000 + 1), (-2,10**1000000 + 1), (3,3)], primes)
27
Sum of powers of [(2,4),(3,5),(-6,3)] is 2^4 + 3^5 + (−6)^3
**The purpose of the prime is to perform the computation of a^k1 + ... a^kn modulo every prime in the list entered. (aka perform the sum computation specified by each input modulo each of the primes in the second input list, then solve using the chinese remainder theorem )
Primes list used in the example input:
15481619,15481633,15481657,15481663,15481727,15481733,15481769,15481787
,15481793,15481801,15481819,15481859,15481871,15481897,15481901,15481933
,15481981,15481993,15481997,15482011,15482023,15482029,15482119,15482123
,15482149,15482153,15482161,15482167,15482177,15482219,15482231,15482263
,15482309,15482323,15482329,15482333,15482347,15482371,15482377,15482387
,15482419,15482431,15482437,15482447,15482449,15482459,15482477,15482479
,15482531,15482567,15482569,15482573,15482581,15482627,15482633,15482639
,15482669,15482681,15482683,15482711,15482729,15482743,15482771,15482773
,15482783,15482807,15482809,15482827,15482851,15482861,15482893,15482911
,15482917,15482923,15482941,15482947,15482977,15482993,15483023,15483029
,15483067,15483077,15483079,15483089,15483101,15483103,15483121,15483151
,15483161,15483211,15483253,15483317,15483331,15483337,15483343,15483359
,15483383,15483409,15483449,15483491,15483493,15483511,15483521,15483553
,15483557,15483571,15483581,15483619,15483631,15483641,15483653,15483659
,15483683,15483697,15483701,15483703,15483707,15483731,15483737,15483749
,15483799,15483817,15483829,15483833,15483857,15483869,15483907,15483971
,15483977,15483983,15483989,15483997,15484033,15484039,15484061,15484087
,15484099,15484123,15484141,15484153,15484187,15484199,15484201,15484211
,15484219,15484223,15484243,15484247,15484279,15484333,15484363,15484387
,15484393,15484409,15484421,15484453,15484457,15484459,15484471,15484489
,15484517,15484519,15484549,15484559,15484591,15484627,15484631,15484643
,15484661,15484697,15484709,15484723,15484769,15484771,15484783,15484817
,15484823,15484873,15484877,15484879,15484901,15484919,15484939,15484951
,15484961,15484999,15485039,15485053,15485059,15485077,15485083,15485143
,15485161,15485179,15485191,15485221,15485243,15485251,15485257,15485273
,15485287,15485291,15485293,15485299,15485311,15485321,15485339,15485341
,15485357,15485363,15485383,15485389,15485401,15485411,15485429,15485441
,15485447,15485471,15485473,15485497,15485537,15485539,15485543,15485549
,15485557,15485567,15485581,15485609,15485611,15485621,15485651,15485653
,15485669,15485677,15485689,15485711,15485737,15485747,15485761,15485773
,15485783,15485801,15485807,15485837,15485843,15485849,15485857,15485863
I am not quite sure if I understand you correctly, but maybe you are looking for something like this:
from functools import reduce
def sumOfPowersModuloPrimes (tups, primes):
return [reduce(lambda x, y: (x + y) % p, (pow (b, e, p) for b, e in tups), 0) for p in primes]
You shouldn't run into any memory issues as your (intermediate) values never exceed max(primes). If your resulting list is too large, then return a generator and work with it instead of a list.
Ignoring primes, since they don't appear to be used for anything:
def sumOfPowers(tups, primes):
return sum( pow(x,y) for x,y in tups)
Is it possible that you are supposed to compute the sum modulo one or more of the prime numbers? Something like
2**3 + 5**2 mod 3 = 8 + 25 mod 3 = 33 mod 3 = 0
(where a+b mod c means to take the remainder of the sum a+b after dividing by c).
One guess at how multiple primes would be used is to use the product of the primes as the
divisor.
def sumOfPower(tups, primes):
# There are better ways to compute this product. Loop
# is for explanatory purposes only.
c = 1
for p in primes:
p *= c
return sum( pow(x,y,c) for x,y in tups)
(I also seem to remember that a mod pq == (a mod p) mod q if p and q are both primes, but I could be mistaken.)
Another is to return one sum for each prime:
def sumOfPower(tups, primes):
return [ sum( pow(x,y,c) for x,y in tups ) for c in primes ]
def sumOfPowers (powerPairs, unusedPrimesParameter):
sum = 0
for base, exponent in powerPairs:
sum += base ** exponent
return sum
Or short:
def sumOfPowers (powerPairs, unusedPrimesParameter):
return sum(base ** exponent for base, exponent in powerPairs)
perform the sum computation specified by each input modulo each of the primes in the second input list
That’s a completely different thing. However, you still haven’t really explained what your function is supposed to do and how it should work. Given that you mentioned Euler's theorem and the Chinese remainder theorem, I guess there is a lot more to it than you actually made us believe. You probably want to solve the exponentiations by using Euler's theorem to reduce those large powers. I’m not willing to further guess what is going on though; this seems to involve a non-trivial math problem you should solve on the paper first.
def sumOfPowers (powerPairs, primes):
for prime in primes:
sum = 0
for base, exponent in powerPairs:
sum += pow(base, exponent, prime)
# do something with the sum here
# Chinese remainder theorem?
return something
I had an overflow error with this program here!, I realized the mistake of that program. I cannot use range or xrange when it came to really long integers. I tried running the program in Python 3 and it works. My code works but then responds after several times. Hence in order to optimize my code, I started thinking of strategies for the optimizing the code.
My problem statement is A number is called lucky if the sum of its digits, as well as the sum of the squares of its digits is a prime number. How many numbers between A and B are lucky?.
I started with this:
squarelist=[0,1,4,9,16,25,36,49,64,81]
def isEven(self, n):
return
def isPrime(n):
return
def main():
t=long(raw_input().rstrip())
count = []
for i in xrange(t):
counts = 0
a,b = raw_input().rstrip().split()
if a=='1':
a='2'
tempa, tempb= map(int, a), map(int,b)
for i in range(len(b),a,-1):
tempsum[i]+=squarelist[tempb[i]]
What I am trying to achieve is since I know the series is ordered, only the last number changes. I can save the sum of squares of the earlier numbers in the list and just keep changing the last number. This does not calculate the sum everytime and check if the sum of squares is prime. I am unable to fix the sum to some value and then keep changing the last number.How to go forward from here?
My sample inputs are provided below.
87517 52088
72232 13553
19219 17901
39863 30628
94978 75750
79208 13282
77561 61794
I didn't get what you want to achieve with your code at all. This is my solution to the question as I understand it: For all natural numbers n in a range X so that a < X < b for some natural numbers a, b with a < b, how many numbers n have the property that the sum of its digits and the sum of the square of its digits in decimal writing are both prime?
def sum_digits(n):
s = 0
while n:
s += n % 10
n /= 10
return s
def sum_digits_squared(n):
s = 0
while n:
s += (n % 10) ** 2
n /= 10
return s
def is_prime(n):
return all(n % i for i in xrange(2, n))
def is_lucky(n):
return is_prime(sum_digits(n)) and is_prime(sum_digits_squared(n))
def all_lucky_numbers(a, b):
return [n for n in xrange(a, b) if is_lucky(n)]
if __name__ == "__main__":
sample_inputs = ((87517, 52088),
(72232, 13553),
(19219, 17901),
(39863, 30628),
(94978, 75750),
(79208, 13282),
(77561, 61794))
for b, a in sample_inputs:
lucky_number_count = len(all_lucky_numbers(a, b))
print("There are {} lucky numbers between {} and {}").format(lucky_number_count, a, b)
A few notes:
The is_prime is the most naive implementation possible. It's still totally fast enough for the sample input. There are many better implementations possible (and just one google away). The most obvious improvement would be skipping every even number except for 2. That alone would cut calculation time in half.
In Python 3 (and I really recommend using it), remember to use //= to force the result of the division to be an integer, and use range instead of xrange. Also, an easy way to speed up is_prime is Python 3's #functools.lru_cache.
If you want to save some lines, calculate the sum of digits by casting them to str and back to int like that:
def sum_digits(n):
return sum(int(d) for d in str(a))
It's not as mathy, though.
This is my code in python for calculation of sum of prime numbers less than a given number.
What more can I do to optimize it?
import math
primes = [2,] #primes store the prime numbers
for i in xrange(3,20000,2): #i is the test number
x = math.sqrt(i)
isprime = True
for j in primes: #j is the devider. only primes are used as deviders
if j <= x:
if i%j == 0:
isprime = False
break
if isprime:
primes.append(i,)
print sum (primes,)
You can use a different algorithm called the Sieve of Eratosthenes which will be faster but take more memory. Keep an array of flags, signifying whether each number is a prime or not, and for each new prime set it to zero for all multiples of that prime.
N = 10000
# initialize an array of flags
is_prime = [1 for num in xrange(N)]
is_prime[0] = 0 # this is because indexing starts at zero
is_prime[1] = 0 # one is not a prime, but don't mark all of its multiples!
def set_prime(num):
"num is a prime; set all of its multiples in is_prime to zero"
for x in xrange(num*2, N, num):
is_prime[x] = 0
# iterate over all integers up to N and update the is_prime array accordingly
for num in xrange(N):
if is_prime[num] == 1:
set_prime(num)
primes = [num for num in xrange(N) if is_prime[num]]
You can actually do this for pretty large N if you use an efficient bit array, such as in this example (scroll down on the page and you'll find a Sieve of Eratosthenes example).
Another thing you could optimize is move the sqrt computation outside the inner loop. After all, i stays constant through it, so there's no need to recompute sqrt(i) every time.
primes = primes + (i,) is very expensive. It copies every element on every pass of the loop, converting your elegant dynamic programming solution into an O(N2) algorithm. Use lists instead:
primes = [2]
...
primes.append(i)
Also, exit the loop early after passing sqrt(i). And, since you are guaranteed to pass sqrt(i) before running off the end of the list of primes, update the list in-place rather than storing isprime for later consumption:
...
if j > math.sqrt(i):
primes.append(i)
break
if i%j == 0:
break
...
Finally, though this has nothing to do with performance, it is more Pythonic to use range instead of while:
for i in range(3, 10000, 2):
...
Just another code without using any imports:
#This will check n, if it is prime, it will return n, if not, it will return 0
def get_primes(n):
if n < 2:
return 0
i = 2
while True:
if i * i > n:
return n
if n % i == 0:
return 0
i += 1
#this will sum up every prime number up to n
def sum_primes(n):
if n < 2:
return 0
i, s = 2, 0
while i < n:
s += get_primes(i)
i += 1
return s
n = 1000000
print sum_primes(n)
EDIT: removed some silliness while under influence
All brute-force type algorithms for finding prime numbers, no matter how efficient, will become drastically expensive as the upper bound increases. A heuristic approach to testing for primeness can actually save a lot of computation. Established divisibility rules can eliminate most non-primes "at-a-glance".