numpy.sum not returning expected value - python

I'm working through project euler problems.
'Find the sum of all the primes below two million'
I've built a prime checker that I think is pretty fast - any advise on how to improve would be great too.
But, what I've spent the last 30mins working out is that np.sum isn't returning the correct value. Here's my code:
import numpy as np
def isprime(num, primelist):
#Give primelist it's first value if none exist
if len(primelist) == 0:
primelist.append(num)
return True
for primes in primelist:
#Only need to iterate up to square root of num to test primality
if primes <= math.sqrt(num):
#If any number is evenly divisble (remainder = 0) the num is not prime
if num % primes == 0:
#print('non-prime')
return False
break
#If we have iterated through all primes <= sqrt of num, num is prime
else:
primelist.append(num)
#print('prime')
return True
break
lim = 2000000
n = 3
primelist = [2]
while primelist[-1] <= lim:
isprime(n, primelist)
n += 1
if primelist[-1] > lim: primelist = primelist[:-1]
primearray = np.asarray(primelist)
print(np.sum(primearray))
sum = 0
for i in primelist:
sum = sum + i
print(sum)
I suppose it could also be the np.asarray that isn't working rather than np.sum
I've iterated through the original list to test the value numpy is returned.
numpy sum = 1179908154
iterating sum = 142913828922
Over 100 times larger. Where am I going wrong please!!

My guess is that you are using Windows, where the default size of an integer in numpy is 32 bits. np.sum(primelist) is computing the sum using 32 bit integers, and the sum overflows. You can verify this by computing (with Python integers) 142913828922 % (2**31):
In [18]: s = 142913828922
In [19]: s % (2**31)
Out[19]: 1179908154
That is the value that you got with numpy.sum(primelist).
You can avoid the problem (or at least defer it until the 64 bit integers overflow) by explicitly converting primelist to an array of 64 bit unsigned integers, and then computing its sum:
np.array(primelist, dtype=np.uint64).sum()
Or just don't use numpy when dealing with very large integers.

Related

How to optimize and find output for large inputs?

For an input number N, I am trying to find the count of numbers of special pairs (x,y) such that the following conditions hold:
x != y
1 <= N <= 10^50
0 <= x <= N
0 <= y <= N
F(x) + F(y) is prime where F is sum of all digits of the number
finally print the output of the count modulo 1000000007
Sample Input: 2
Sample Output: 2
Explanation:
(0,2) Since F(0)+F(2)=2 which is prime
(1,2) Since F(1)+F(2)=3 which is prime
(2,1) is not considered as (1,2) is same as (2,1)
My code is:
def mod(x,y,p):
res=1
x=x%p
while(y>0):
if((y&1)==1):
res=(res*x)%p
y=y>>1
x=(x*x)%p
return res
def sod(x):
a=str(x)
res=0
for i in range(len(a)):
res+=int(a[i])
return res
def prime(x):
p=0
if(x==1 or x==0):
return False
if(x==2):
return True
else:
for i in range(2,(x//2)+1):
if(x%i==0):
p+=1
if(p==0):
return (True)
else:
return(False)
n=int(input())
res=[]
for i in range (n+1):
for j in range(i,n+1):
if(prime(sod(int(i))+sod(int(j)))):
if([i,j]!=[j,i]):
if([j,i] not in res):
res.append([i,j])
count=len(res)
a=mod(count,1,(10**9)+7)
print(res)
print(a)
I expect the output of 9997260736 to be 671653298 but the error is code execution timed out.
Already posted a bit too long comments, so changing it to answer:
When thinking of such problems, don't translate the problem directly to code but see what can you do only once or in different order.
As of now, you're doing N*N passes, each time calculating a sum of digits for x and y (not that bad) AND factoring each sum to check whether it's prime (really bad). That means for sum s you're checking whether it's prime s+1 times! (for 0+s, 1+(s-1), ..., (s-1)+1, s+0).
What you can do differently?
Let's see what we know:
Sum of digits is the same for many numbers.
Sum of sod(x) and sod(y) is the same for many values.
Number is prime during its 1st and nth check (and checking whether it's prime is costly).
So the best would be to calculate prime numbers only once, and each of the sum only once. But how to do that when we have many numbers?
Change the direction of thinking: get the prime number, split it into two numbers (sodx and sody), then generate x and y from those numbers.
Example:
Prime p = 7. That give possible sums as 0+7, 1+6, 2+5, 3+4.
Then for each sum you can generate a number, e.g. for N=101 and sod=1, you have 1, 10, 100, and for sod=2 you have 2, 11, 20, 101. You can possibly store this, but generating this should not be that bad.
Other optimisation:
You have to think how to limit generating prime numbers using your N:
given N with lenN digits (remember, lenN is ~log(N)), the biggest sum of digits possible is 9*lenN (for N consisting of only 9's). That means our sodx and sody are <= 9*lenN, so prime p = sodx + sody <= 18*lenN
Look: that means 18*lenN checking for whether number is prime vs N*N checks your previous algorithm had!

Python Overflow error: int too large to convert to C long

I'm a beginner and I'm doing the problems in and while doing the third problem, which is about finding the largest prime factor of 600851475143, I get this error:
Python int too large to convert to C long
plist = [2]
def primes(min, max):
if 2 >= min:
yield 2
for i in xrange(3, max, 2):
for p in plist:
if i % p == 0 or p * p > i:
break
if i % p:
plist.append(i)
if i >= min:
yield i
def factors(number):
for prime in primes(2, number):
if number % prime == 0:
number /= prime
yield prime
if number == 1:
break
a = 600851475143
print max(factors(a))
Annoyingly, in Python 2, xrange requires its arguments to fit into a C long. 600851475143 is too big for that on your system. You'll have to rewrite your algorithm to not need such a big range, or use a substitute, such as your own xrange implementation, or a while loop with manual counter management.
This occurs when the number you are dealing with is greater than sys.maxsize
You could possibly use the numpy module and use a larger data type. Not sure how large you need without checking though.

How do I optimize my Python code to perform my calculation using less memory?

I have put together the following code in order to determine if a number has an odd or even number of divisors. The code works well with relatively small numbers but once a larger number like 9 digits is entered it hangs up.
def divisors(n):
num = len(set([x for x in range(1,n+1) if not divmod(n,x)[1]]))
if (num != 0 and num % 2 == 0):
return 'even'
else:
return 'odd'
what can I do to make this more efficient?
Here's your problem:
num = len(set([x for x in range(1,n+1) if not divmod(n,x)[1]]))
This constructs a list, then constructs a set out of that list, then takes the length of the set. You don't need to do any of that work (range(), or xrange() for that matter, does not produce repeated objects, so we don't need the set, and sum() works on any iterable object, so you don't need the list either). While we're on the subject, divmod(n, x)[1] is just a very elaborate way of writing n % x, and consumes a little bit of extra memory to construct a tuple (which is immediately reclaimed because you throw the tuple away). Here's the fixed version:
num = sum(1 for x in xrange(1,n+1) if not n % x)
You do not need to test every possible divisor, testing up to sqrt(n) is enough. This will make your function O(sqrt(n)) instead of O(n).
import math
def num_divisors(n):
sqrt = math.sqrt(n)
upper = int(sqrt)
num = sum(1 for x in range(1, upper + 1) if not n % x)
num *= 2
if upper == sqrt and num != 0:
num -= 1
return num
In my benchmarks using python2 this is 1000 times faster than sum(1 for x in range(1, n + 1) if not n % x) with n = int(1e6) and 10000 times faster for 1e8. For 1e9 the latter code gave me a memory error, suggesting that the whole sequence is stored in memory before doing the sum because in python 2 range() returns a list and I should be using xrange() instead. For python3 range() is fine.

Finding all divisors of a number optimization

I have written the following function which finds all divisors of a given natural number and returns them as a list:
def FindAllDivisors(x):
divList = []
y = 1
while y <= math.sqrt(x):
if x % y == 0:
divList.append(y)
divList.append(int(x / y))
y += 1
return divList
It works really well with the exception that it's really slow when the input is say an 18-digit number. Do you have any suggestions for how I can speed it up?
Update:
I have the following method to check for primality based on Fermat's Little Theorem:
def CheckIfProbablyPrime(x):
return (2 << x - 2) % x == 1
This method is really efficient when checking a single number, however I'm not sure whether I should be using it to compile all primes up to a certain boundary.
You can find all the divisors of a number by calculating the prime factorization. Each divisor has to be a combination of the primes in the factorization.
If you have a list of primes, this is a simple way to get the factorization:
def factorize(n, primes):
factors = []
for p in primes:
if p*p > n: break
i = 0
while n % p == 0:
n //= p
i+=1
if i > 0:
factors.append((p, i));
if n > 1: factors.append((n, 1))
return factors
This is called trial division. There are much more efficient methods to do this. See here for an overview.
Calculating the divisors is now pretty easy:
def divisors(factors):
div = [1]
for (p, r) in factors:
div = [d * p**e for d in div for e in range(r + 1)]
return div
The efficiency of calculating all the divisors depends on the algorithm to find the prime numbers (small overview here) and on the factorization algorithm. The latter is always slow for very large numbers and there's not much you can do about that.
I'd suggest storing the result of math.sqrt(x) in a separate variable, then checking y against it. Otherwise it will be re-calculated at each step of while, and math.sqrt is definitely not a light-weight operation.
I would do a prime factor decomposition, and then compute all divisors from that result.
I don't know if there's much of a performance hit, but I'm pretty sure that cast to an int is unnecessary. At least in Python 2.7, int x / int y returns an int.

Optimization of prime number code

This is my code in python for calculation of sum of prime numbers less than a given number.
What more can I do to optimize it?
import math
primes = [2,] #primes store the prime numbers
for i in xrange(3,20000,2): #i is the test number
x = math.sqrt(i)
isprime = True
for j in primes: #j is the devider. only primes are used as deviders
if j <= x:
if i%j == 0:
isprime = False
break
if isprime:
primes.append(i,)
print sum (primes,)
You can use a different algorithm called the Sieve of Eratosthenes which will be faster but take more memory. Keep an array of flags, signifying whether each number is a prime or not, and for each new prime set it to zero for all multiples of that prime.
N = 10000
# initialize an array of flags
is_prime = [1 for num in xrange(N)]
is_prime[0] = 0 # this is because indexing starts at zero
is_prime[1] = 0 # one is not a prime, but don't mark all of its multiples!
def set_prime(num):
"num is a prime; set all of its multiples in is_prime to zero"
for x in xrange(num*2, N, num):
is_prime[x] = 0
# iterate over all integers up to N and update the is_prime array accordingly
for num in xrange(N):
if is_prime[num] == 1:
set_prime(num)
primes = [num for num in xrange(N) if is_prime[num]]
You can actually do this for pretty large N if you use an efficient bit array, such as in this example (scroll down on the page and you'll find a Sieve of Eratosthenes example).
Another thing you could optimize is move the sqrt computation outside the inner loop. After all, i stays constant through it, so there's no need to recompute sqrt(i) every time.
primes = primes + (i,) is very expensive. It copies every element on every pass of the loop, converting your elegant dynamic programming solution into an O(N2) algorithm. Use lists instead:
primes = [2]
...
primes.append(i)
Also, exit the loop early after passing sqrt(i). And, since you are guaranteed to pass sqrt(i) before running off the end of the list of primes, update the list in-place rather than storing isprime for later consumption:
...
if j > math.sqrt(i):
primes.append(i)
break
if i%j == 0:
break
...
Finally, though this has nothing to do with performance, it is more Pythonic to use range instead of while:
for i in range(3, 10000, 2):
...
Just another code without using any imports:
#This will check n, if it is prime, it will return n, if not, it will return 0
def get_primes(n):
if n < 2:
return 0
i = 2
while True:
if i * i > n:
return n
if n % i == 0:
return 0
i += 1
#this will sum up every prime number up to n
def sum_primes(n):
if n < 2:
return 0
i, s = 2, 0
while i < n:
s += get_primes(i)
i += 1
return s
n = 1000000
print sum_primes(n)
EDIT: removed some silliness while under influence
All brute-force type algorithms for finding prime numbers, no matter how efficient, will become drastically expensive as the upper bound increases. A heuristic approach to testing for primeness can actually save a lot of computation. Established divisibility rules can eliminate most non-primes "at-a-glance".

Categories