Prime numbers sum - for loop and big numbers - python

I'm running the following code to find the sum of the first 10,000,000 prime numbers.
How can I optimize it such that it doesn't take forever to obtain the result(the sum of prime numbers)?
sum=0
num=2
iterator=0
while iterator<10000000:
prime = True
for i in range(2,num):
if (num%i==0):
prime = False
if prime:
sum=sum+num
# print (num, sum, iterator)
iterator=iterator+1
num=num+1
print(sum)

the 10,000,000 th prime is approximately n * ln(n) + n * ln( ln(n) ) or ~188980383 ... then you can use a sieve to find all primes under that value (discard any extras ... (ie you will get about 50k extra prime numbers when using 10million, note this took approximately 8 seconds for me))
see also : Finding first n primes?
see also : Fastest way to list all primes below N

You can use the Sieve of Eratosthenes. It's a much faster method to find the first n prime numbers.

It turns out the Sieve of Eratosthenes is fairly simple to implement in Python by using correct slicing. So you can use it to recover the n first primes and sum them.
It was pointed out in Joran Beasley's answer that an upper bound for the n-th prime is n * ln(n) + n * ln( ln(n) ), which we can use and then discrard the extra primes. Note that this bound does not work for n smaller than 6.
from math import log
def sum_n_primes(n):
# Calculate the upper bound
upper = int(n * ( log(n) + log(log(n))))
# Prepare our Sieve, for readability we make index match the number by adding 0 and 1
primes = [False] * 2 + [True] * (upper - 1)
# Remove non-primes
for x in range(2, int(upper**(1/2) + 1)):
if primes[x]:
primes[2*x::x] = [False] * (upper // x - 1) # Replace // by / in Python2
# Sum the n first primes
return sum([x for x, is_prime in enumerate(primes) if is_prime][:n])
It takes a few seconds, but outputs that the sum of the 10,000,000 first primes is 870530414842019.
If you need n to be any higher, one solution would be to improve your upper bound.

Related

Unexpected time result for optimization of Project euler Problem 12

I have solved Project Euler problem 12 and I tried to optimize my solution.
The part I am focusing on is the part of finding the number of divisors.
The first algorithm I created I thought was going to be slower than the second but it wasn't and I don't understand why?
First(regular count goes until n**0.5):
from math import sqrt
def get(n):
count = 0
limit = sqrt(n)
for i in range(1,int(limit)+1):
if n%i==0:
count+=2
if limit.is_integer():
return count-1
return count
Second(Prime factoring to get each the degree of each prime in order to use this fomula, I am using the form of primes as you can see here to calculate faster but its is still slower ).:
def Get_Devisors_Amount(n):#Prime factorization
if n <=1: return 1
dcount = 1
count = 0
while n%2==0:
count+=1
n//=2
dcount*=(count+1)
count = 0
while n%3==0:
count+=1
n//=3
dcount*=(count+1)
i = 1#count for the form of primes 6n+-1
while n!=1:
t = 6*i+1
count = 0
while n%t==0:
count+=1
n//=t
dcount*=(count+1)
t = 6*i-1
count = 0
while n%t==0:
count+=1
n//=t
if count!=0:
dcount*=(count+1)
i+=1
if dcount==1: return 2# n is a prime
return dcount
How I tested the time
import time
start = time.time()
for i in range(1,1000):
get(i)
print(time.time()-start)
start = time.time()
for i in range(1,1000):
Get_Devisors_Amount(i)
print(time.time()-start)
Output:
get: 0.00299835205078125
Get_Devisors_Amount: 0.009994029998779297
Although I am using property and a formula that I think should make the search time lower the first method is still faster. could you explain why to me?
In the first approach, you testing divisibility with each number from 1 to sqrt(x), so the complexity of testing a single number is sqrt(x). According to this formula, the sum of first n roots can be approximated to n*sqrt(n).
Time complexity of method 1: O(N*sqrt(N)) (N is the total count of numbers being tested).
In the second approach, there are 2 cases:
If a number isn't prime, all primes upto n are tested. Complexity - O(n/6) = O(n)
If a number is prime, we can approximate the complexity to be O(log(n)) (there might be a more accurate calculation of the complexity for this case, I'm making an approximation since this wouldn't matter in the proof)
For the prime numbers, using the fact that we test them with (n/6) primes, the complexity would become 5/6 + 7/6 + 11/6 + 13/6 + 17/6 ..... (last prime before n)/6. This can be reduced to (sum of all prime numbers till n)/6 for the time being. Now, the sum of all prime numbers upto N can be approximated as N^2/(2*logN). Thus the complexity for this step becomes N^2/(6*(2*logN)) = N^2/(12*lognN).
Time complexity of method 2: O(N^2/(12*lognN)) (N is the total count of numbers being tested).
(if you want, you can make more accurate bounds for the time complexities of each step. I have made a few approximations since it helps in proving the point without making any overoptimistic assumption).
Your first algorithm wisely only considers divisors up to sqrt(n).
But your second algorithm considers divisors all the way up to n, although admittedly if n has many factors, n will be reduced along the way.
If you fix this in your algorithm, by changing this:
t = 6*i-1
to this:
t = 6*i-1
if t*t > n:
return dcount * 2
Then your second algorithm will be faster.
(The * 2 is because the algorithm would eventually find the remaining prime factor (n itself) and then dcount *= (count + 1) would double dcount before returning it.)

How does this python loop check for primality, if it loops less than the number n?

Hi guys so I was wondering how is this code:
def is_prime(n):
for i in range(2, int(n**.5 + 1)):
if n % i == 0:
return False
return True
able to check for prime when on line 2: for i in range(2, int(n**.5 + 1)): the range is not : range(2, n)? Shouldn't it have to iterate through every number till n but excluding it? This one is not doing that but somehow it works... Could someone explain why it works please.
The loop iterates on all numbers from 2 to the square root on n. For any divisor it could find above that square root (if it continued iterating to n - 1), there would obviously be another divisor below it.
Because the prime factorisation of any number n (by trial division) needs only check the prime numbers up to sqrt(n)
.. Furthermore, the trial factors need go no further than sqrt(n)
because, if n is divisible by some number p, then n = p × q and
if q were smaller than p, n would have been detected earlier as
being divisible by q or by a prime factor of q.
On a sidenote, trial division is slow to check for primality or possible primality. There are faster probabilistic tests like the Miller-Rabin test which can check quickly if a number is composite or probably prime.

Optimising Factoring Program

This is my factorising code which is used to find all the factors of a number but after roughly 7 digits, the program begins to slow down.
so I was wondering if there is any method of optimising this program to allow it to factorise numbers faster.
number = int(input("Input the whole number here?\n"))
factors = [1]
def factorization():
global factors
for i in range(1 , number):
factor = (number/i)
try:
factorInt = int(number/i)
if factorInt == factor:
factors.append(factorInt)
except ValueError:
pass
factorization()
print(factors)
The most effective optimization is by noting that when the number has non trivial factors, and the smallest of them is smaller than the square root of the number, and there is no need to continue looping past this square root.
Indeed, let this smallest factor be m. We have n = m.p and the other factor is such that p >= m. But if m > √n, then m.p >= n, a contradiction.
Note that this optimization only speeds-up the processing of prime numbers (for the composite ones, the search stops before √n anyway). But the density of primes and the fact that n is much larger than √n make it abolutely worth.
Another optimization is by noting that the smallest divisor must be a prime, and you can use a stored table of primes. (There are less than 51 million primes below one billion.) The speedup will be less noticeable.
Let me offer a NumPy-based solution. It seems quite efficient:
import numpy as np
def factorize(number):
n = np.arange(2, np.sqrt(number), dtype=int)
n2 = number / n
low = n[n2.astype(int) == n2]
return np.concatenate((low, number // low,))
factorize(34976237696437)
#array([ 71, 155399, 3170053, 492623066147, 225073763, 11033329])])
# 176 msec

Sum of powers for lists of tuples

My assignment is to create a function to sum the powers of tuples.
def sumOfPowers(tups, primes):
x = 0;
for i in range (1, len(primes) + 1):
x += pow(tups, i);
return x;
So far I have this.
tups - list of one or more tuples, primes - list of one or more primes
It doesn't work because the inputs are tuples and not single integers. How could I fix this to make it work for lists?
[/edit]
Sample output:
sumOfPowers([(2,3), (5,6)], [3,5,7,11,13,17,19,23,29]) == 2**3 + 5**6
True
sumOfPowers([(2,10**1000000 + 1), (-2,10**1000000 + 1), (3,3)], primes)
27
Sum of powers of [(2,4),(3,5),(-6,3)] is 2^4 + 3^5 + (−6)^3
**The purpose of the prime is to perform the computation of a^k1 + ... a^kn modulo every prime in the list entered. (aka perform the sum computation specified by each input modulo each of the primes in the second input list, then solve using the chinese remainder theorem )
Primes list used in the example input:
15481619,15481633,15481657,15481663,15481727,15481733,15481769,15481787
,15481793,15481801,15481819,15481859,15481871,15481897,15481901,15481933
,15481981,15481993,15481997,15482011,15482023,15482029,15482119,15482123
,15482149,15482153,15482161,15482167,15482177,15482219,15482231,15482263
,15482309,15482323,15482329,15482333,15482347,15482371,15482377,15482387
,15482419,15482431,15482437,15482447,15482449,15482459,15482477,15482479
,15482531,15482567,15482569,15482573,15482581,15482627,15482633,15482639
,15482669,15482681,15482683,15482711,15482729,15482743,15482771,15482773
,15482783,15482807,15482809,15482827,15482851,15482861,15482893,15482911
,15482917,15482923,15482941,15482947,15482977,15482993,15483023,15483029
,15483067,15483077,15483079,15483089,15483101,15483103,15483121,15483151
,15483161,15483211,15483253,15483317,15483331,15483337,15483343,15483359
,15483383,15483409,15483449,15483491,15483493,15483511,15483521,15483553
,15483557,15483571,15483581,15483619,15483631,15483641,15483653,15483659
,15483683,15483697,15483701,15483703,15483707,15483731,15483737,15483749
,15483799,15483817,15483829,15483833,15483857,15483869,15483907,15483971
,15483977,15483983,15483989,15483997,15484033,15484039,15484061,15484087
,15484099,15484123,15484141,15484153,15484187,15484199,15484201,15484211
,15484219,15484223,15484243,15484247,15484279,15484333,15484363,15484387
,15484393,15484409,15484421,15484453,15484457,15484459,15484471,15484489
,15484517,15484519,15484549,15484559,15484591,15484627,15484631,15484643
,15484661,15484697,15484709,15484723,15484769,15484771,15484783,15484817
,15484823,15484873,15484877,15484879,15484901,15484919,15484939,15484951
,15484961,15484999,15485039,15485053,15485059,15485077,15485083,15485143
,15485161,15485179,15485191,15485221,15485243,15485251,15485257,15485273
,15485287,15485291,15485293,15485299,15485311,15485321,15485339,15485341
,15485357,15485363,15485383,15485389,15485401,15485411,15485429,15485441
,15485447,15485471,15485473,15485497,15485537,15485539,15485543,15485549
,15485557,15485567,15485581,15485609,15485611,15485621,15485651,15485653
,15485669,15485677,15485689,15485711,15485737,15485747,15485761,15485773
,15485783,15485801,15485807,15485837,15485843,15485849,15485857,15485863
I am not quite sure if I understand you correctly, but maybe you are looking for something like this:
from functools import reduce
def sumOfPowersModuloPrimes (tups, primes):
return [reduce(lambda x, y: (x + y) % p, (pow (b, e, p) for b, e in tups), 0) for p in primes]
You shouldn't run into any memory issues as your (intermediate) values never exceed max(primes). If your resulting list is too large, then return a generator and work with it instead of a list.
Ignoring primes, since they don't appear to be used for anything:
def sumOfPowers(tups, primes):
return sum( pow(x,y) for x,y in tups)
Is it possible that you are supposed to compute the sum modulo one or more of the prime numbers? Something like
2**3 + 5**2 mod 3 = 8 + 25 mod 3 = 33 mod 3 = 0
(where a+b mod c means to take the remainder of the sum a+b after dividing by c).
One guess at how multiple primes would be used is to use the product of the primes as the
divisor.
def sumOfPower(tups, primes):
# There are better ways to compute this product. Loop
# is for explanatory purposes only.
c = 1
for p in primes:
p *= c
return sum( pow(x,y,c) for x,y in tups)
(I also seem to remember that a mod pq == (a mod p) mod q if p and q are both primes, but I could be mistaken.)
Another is to return one sum for each prime:
def sumOfPower(tups, primes):
return [ sum( pow(x,y,c) for x,y in tups ) for c in primes ]
def sumOfPowers (powerPairs, unusedPrimesParameter):
sum = 0
for base, exponent in powerPairs:
sum += base ** exponent
return sum
Or short:
def sumOfPowers (powerPairs, unusedPrimesParameter):
return sum(base ** exponent for base, exponent in powerPairs)
perform the sum computation specified by each input modulo each of the primes in the second input list
That’s a completely different thing. However, you still haven’t really explained what your function is supposed to do and how it should work. Given that you mentioned Euler's theorem and the Chinese remainder theorem, I guess there is a lot more to it than you actually made us believe. You probably want to solve the exponentiations by using Euler's theorem to reduce those large powers. I’m not willing to further guess what is going on though; this seems to involve a non-trivial math problem you should solve on the paper first.
def sumOfPowers (powerPairs, primes):
for prime in primes:
sum = 0
for base, exponent in powerPairs:
sum += pow(base, exponent, prime)
# do something with the sum here
# Chinese remainder theorem?
return something

Finding all divisors of a number optimization

I have written the following function which finds all divisors of a given natural number and returns them as a list:
def FindAllDivisors(x):
divList = []
y = 1
while y <= math.sqrt(x):
if x % y == 0:
divList.append(y)
divList.append(int(x / y))
y += 1
return divList
It works really well with the exception that it's really slow when the input is say an 18-digit number. Do you have any suggestions for how I can speed it up?
Update:
I have the following method to check for primality based on Fermat's Little Theorem:
def CheckIfProbablyPrime(x):
return (2 << x - 2) % x == 1
This method is really efficient when checking a single number, however I'm not sure whether I should be using it to compile all primes up to a certain boundary.
You can find all the divisors of a number by calculating the prime factorization. Each divisor has to be a combination of the primes in the factorization.
If you have a list of primes, this is a simple way to get the factorization:
def factorize(n, primes):
factors = []
for p in primes:
if p*p > n: break
i = 0
while n % p == 0:
n //= p
i+=1
if i > 0:
factors.append((p, i));
if n > 1: factors.append((n, 1))
return factors
This is called trial division. There are much more efficient methods to do this. See here for an overview.
Calculating the divisors is now pretty easy:
def divisors(factors):
div = [1]
for (p, r) in factors:
div = [d * p**e for d in div for e in range(r + 1)]
return div
The efficiency of calculating all the divisors depends on the algorithm to find the prime numbers (small overview here) and on the factorization algorithm. The latter is always slow for very large numbers and there's not much you can do about that.
I'd suggest storing the result of math.sqrt(x) in a separate variable, then checking y against it. Otherwise it will be re-calculated at each step of while, and math.sqrt is definitely not a light-weight operation.
I would do a prime factor decomposition, and then compute all divisors from that result.
I don't know if there's much of a performance hit, but I'm pretty sure that cast to an int is unnecessary. At least in Python 2.7, int x / int y returns an int.

Categories