Finding a Prime Sieve Inconsistency in Python

Finding a Prime Sieve Inconsistency in Python - python

I'm attempting to learn python and I thought trying to develop my own prime sieve would be an interesting problem for the afternoon. When required thus far, I would just import a version of the Sieve of Eratosthenes that I found online -- it's this that I used as my benchmark.
After trying several different optimizations, I thought I had written a pretty decent sieve:
def sieve3(n):
top = n+1
sieved = dict.fromkeys(xrange(3,top,2), True)
for si in sieved:
if si * si > top:
break
if sieved[si]:
for j in xrange((si*2) + si, top, si*2): [****]
sieved[j] = False
return [2] + [pr for pr in sieved if sieved[pr]]
Using the first 1,000,000 integers as my range, this code would generate the correct number of primes and was only about 3-5x slower than my benchmark. I was about to give up and pat myself on the back when I tried it on a larger range, but it no longer worked!
n = 1,000 -- Benchmark = 168 in 0.00010 seconds
n = 1,000 -- Sieve3 = 168 in 0.00022 seconds
n = 4,194,304 -- Benchmark = 295,947 in 0.288 seconds
n = 4,194,304 -- Sieve3 = 295,947 in 1.443 seconds
n = 4,194,305 -- Benchmark = 295,947 in 3.154 seconds
n = 4,194,305 -- Sieve3 = 2,097,153 in 0.8465 seconds
I think the problem comes from the line with [****], but I can't figure out why it's so broken. It's supposed to mark each odd multiple of 'j' as False and it works most of the time, but for anything above 4,194,304 the sieve is broken. (To be fair, it breaks on random other numbers too, like 10,000 for instance).
I made a change and it significantly slowed my code down, but it would actually work for all values. This version includes all numbers (not just odds) but is otherwise identical.
def sieve2(n):
top = n+1
sieved = dict.fromkeys(xrange(2,top), True)
for si in sieved:
if si * si > top:
break
if sieved[si]:
for j in xrange((si*2), top, si):
sieved[j] = False
return [pr for pr in sieved if sieved[pr]]
Can anyone help me figure out why my original function (sieve3) doesn't work consistently?
Edit: I forgot to mention, that when Sieve3 'breaks', sieve3(n) returns n/2.

The sieve requires the loop over candidate primes to be ordered. The code in question is enumerating the keys of a dictionary, which are not guaranteed to be ordered. Instead, go ahead and use the xrange you used to initialize the dictionary for your main sieve loop as well as the return result loop as follows:
def sieve3(n):
top = n+1
sieved = dict.fromkeys(xrange(3,top,2), True)
for si in xrange(3,top,2):
if si * si > top:
break
if sieved[si]:
for j in xrange(3*si, top, si*2):
sieved[j] = False
return [2] + [pr for pr in xrange(3,top,2) if sieved[pr]]

It's because dictionary keys are not ordered. Some of the time, by chance, for si in sieved: will loop through your keys in increasing order.
With your last example, the first value si gets is big enough to break the loop immediately.
You can simply use:
for si in sorted(sieved):

Well, look at the runtime -- you see that the runtime on the last case you showed was almost 5 times faster than the benchmark, while it had usually been 5 times slower. So that is a red flag, maybe you aren't performing all of the iterations? (And it is 5 times faster while having almost 10 times as many primes...)
I don't have time to look into the code more right now, but I hope this helps.

Related

Sum of primes below 2,000,000 in python

I am attempting problem 10 of Project Euler, which is the summation of all primes below 2,000,000. I have tried implementing the Sieve of Erasthotenes using Python, and the code I wrote works perfectly for numbers below 10,000.
However, when I attempt to find the summation of primes for bigger numbers, the code takes too long to run (finding the sum of primes up to 100,000 took 315 seconds). The algorithm clearly needs optimization.
Yes, I have looked at other posts on this website, like Fastest way to list all primes below N, but the solutions there had very little explanation as to how the code worked (I am still a beginner programmer) so I was not able to actually learn from them.
Can someone please help me optimize my code, and clearly explain how it works along the way?
Here is my code:
primes_below_number = 2000000 # number to find summation of all primes below number
numbers = (range(1, primes_below_number + 1, 2)) # creates a list excluding even numbers
pos = 0 # index position
sum_of_primes = 0 # total sum
number = numbers[pos]
while number < primes_below_number and pos < len(numbers) - 1:
pos += 1
number = numbers[pos] # moves to next prime in list numbers
sum_of_primes += number # adds prime to total sum
num = number
while num < primes_below_number:
num += number
if num in numbers[:]:
numbers.remove(num) # removes multiples of prime found
print sum_of_primes + 2
As I said before, I am new to programming, therefore a thorough explanation of any complicated concepts would be deeply appreciated. Thank you.

As you've seen, there are various ways to implement the Sieve of Erasthotenes in Python that are more efficient than your code. I don't want to confuse you with fancy code, but I can show how to speed up your code a fair bit.
Firstly, searching a list isn't fast, and removing elements from a list is even slower. However, Python provides a set type which is quite efficient at performing both of those operations (although it does chew up a bit more RAM than a simple list). Happily, it's easy to modify your code to use a set instead of a list.
Another optimization is that we don't have to check for prime factors all the way up to primes_below_number, which I've renamed to hi in the code below. It's sufficient to just go to the square root of hi, since if a number is composite it must have a factor less than or equal to its square root.
We don't need to keep a running total of the sum of the primes. It's better to do that at the end using Python's built-in sum() function, which operates at C speed, so it's much faster than doing the additions one by one at Python speed.
# number to find summation of all primes below number
hi = 2000000
# create a set excluding even numbers
numbers = set(xrange(3, hi + 1, 2))
for number in xrange(3, int(hi ** 0.5) + 1):
if number not in numbers:
#number must have been removed because it has a prime factor
continue
num = number
while num < hi:
num += number
if num in numbers:
# Remove multiples of prime found
numbers.remove(num)
print 2 + sum(numbers)
You should find that this code runs in a a few seconds; it takes around 5 seconds on my 2GHz single-core machine.
You'll notice that I've moved the comments so that they're above the line they're commenting on. That's the preferred style in Python since we prefer short lines, and also inline comments tend to make the code look cluttered.
There's another small optimization that can be made to the inner while loop, but I let you figure that out for yourself. :)

First, removing numbers from the list will be very slow. Instead of this, make a list
primes = primes_below_number * True
primes[0] = False
primes[1] = False
Now in your loop, when you find a prime p, change primes[k*p] to False for all suitable k. (You wouldn't actually do multiply, you'd continually add p, of course.)
At the end,
primes = [n for n i range(primes_below_number) if primes[n]]
This should be a great deal faster.
Second, you can stop looking once your find a prime greater than the square root of primes_below_number, since a composite number must have a prime factor that doesn't exceed its square root.

Try using numpy, should make it faster. Replace range by xrange, it may help you.

Here's an optimization for your code:
import itertools
primes_below_number = 2000000
numbers = list(range(3, primes_below_number, 2))
pos = 0
while pos < len(numbers) - 1:
number = numbers[pos]
numbers = list(
itertools.chain(
itertools.islice(numbers, 0, pos + 1),
itertools.ifilter(
lambda n: n % number != 0,
itertools.islice(numbers, pos + 1, len(numbers))
)
)
)
pos += 1
sum_of_primes = sum(numbers) + 2
print sum_of_primes
The optimization here is because:
Removed the sum to outside the loop.
Instead of removing elements from a list we can just create another one, memory is not an issue here (I hope).
When creating the new list we create it by chaining two parts, the first part is everything before the current number (we already checked those), and the second part is everything after the current number but only if they are not divisible by the current number.
Using itertools can make things faster since we'd be using iterators instead of looping through the whole list more than once.
Another solution would be to not remove parts of the list but disable them like #saulspatz said.
And here's the fastest way I was able to find: http://www.wolframalpha.com/input/?i=sum+of+all+primes+below+2+million 😁
Update
Here is the boolean method:
import itertools
primes_below_number = 2000000
numbers = [v % 2 != 0 for v in xrange(primes_below_number)]
numbers[0] = False
numbers[1] = False
numbers[2] = True
number = 3
while number < primes_below_number:
n = number * 3 # We already excluded even numbers
while n < primes_below_number:
numbers[n] = False
n += number
number += 1
while number < primes_below_number and not numbers[number]:
number += 1
sum_of_numbers = sum(itertools.imap(lambda index_n: index_n[1] and index_n[0] or 0, enumerate(numbers)))
print(sum_of_numbers)
This executes in seconds (took 3 seconds on my 2.4GHz machine).

Instead of storing a list of numbers, you can instead store an array of boolean values. This use of a bitmap can be thought of as a way to implement a set, which works well for dense sets (there aren't big gaps between the values of members).
An answer on a recent python sieve question uses this implementation python-style. It turns out a lot of people have implemented a sieve, or something they thought was a sieve, and then come on SO to ask why it was slow. :P Look at the related-questions sidebar from some of them if you want more reading material.
Finding the element that holds the boolean that says whether a number is in the set or not is easy and extremely fast. array[i] is a boolean value that's true if i is in the set, false if not. The memory address can be computed directly from i with a single addition.
(I'm glossing over the fact that an array of boolean might be stored with a whole byte for each element, rather than the more efficient implementation of using every single bit for a different element. Any decent sieve will use a bitmap.)
Removing a number from the set is as simple as setting array[i] = false, regardless of the previous value. No searching, not comparison, no tracking of what happened, just one memory operation. (Well, two for a bitmap: load the old byte, clear the correct bit, store it. Memory is byte-addressable, but not bit-addressable.)
An easy optimization of the bitmap-based sieve is to not even store the even-numbered bytes, because there is only one even prime, and we can special-case it to double our memory density. Then the membership-status of i is held in array[i/2]. (Dividing by powers of two is easy for computers. Other values are much slower.)
An SO question:
Why is Sieve of Eratosthenes more efficient than the simple "dumb" algorithm? has many links to good stuff about the sieve. This one in particular has some good discussion about it, in words rather than just code. (Nevermind the fact that it's talking about a common Haskell implementation that looks like a sieve, but actually isn't. They call this the "unfaithful" sieve in their graphs, and so on.)
discussion on that question brought up the point that trial division may be fast than big sieves, for some uses, because clearing the bits for all multiples of every prime touches a lot of memory in a cache-unfriendly pattern. CPUs are much faster than memory these days.

How to find complexity for the following program?

So I am not a CS major and have hard time answering questions about a program's big(O) complexity.
I wrote the following routine to output the pairs of numbers in an array which sum to 0:
asd=[-3,-2,-3,2,3,2,4,5,8,-8,9,10,-4]
def sum_zero(asd):
for i in range(len(asd)):
for j in range(i,len(asd)):
if asd[i]+asd[j]==0:
print asd[i],asd[j]
Now if someone asks the complexity of this method I know since the first loop goes thorough all the n items it will be more than (unless I am wrong) but can someone explain how to find the correct complexity?
If there is better more efficient way of solving this?

I won't give you a full solution, but will try to guide you.
You should get a pencil and a paper, and ask yourself:
How many times does the statement print asd[i], asd[j] execute? (in worst case, meaning that you shouldn't really care about the condition there)
You'll find that it really depends on the loop above it, which gets executed len(asd) (denote it by n) times.
The only thing you need to know, how many times is the inner loop executed giving that the outer loop has n iterations? (i runs from 0 up to n)
If you still not sure about the result, just take a real example, say n=20, and calculate how many times is the lowest statement executed, this will give you a very good indication about the answer.

def sum_zero(asd):
for i in range(len(asd)): # len called once = o(1), range called once = o(1)
for j in range(i,len(asd)): # len called once per i times = O(n), range called once per i times = O(n)
if asd[i]+asd[j]==0: # asd[i] called twice per j = o(2*n²)
# adding is called once per j = O(n²)
# comparing with 0 is called once per j = O(n²)
print asd[i],asd[j] # asd[i] is called twice per j = O(2*n²)
sum_zero(asd) # called once, o(1)
Assuming the worst case scenario (the if-condition always being true):
Total:
O(1) * 3
O(n) * 2
O(n²) * 6
O(6n² + 2n + 3)
A simple program to demonstrate the complexity:
target= []
quadraditc = []
linear = []
for x in xrange(1,100):
linear.append(x)
target.append(6*(x**2) + 2*x + 3)
quadraditc.append(x**2)
import matplotlib.pyplot as plt
plt.plot(linear,label="Linear")
plt.plot(target,label="Target Function")
plt.plot(quadraditc,label="Quadratic")
plt.ylabel('Complexity')
plt.xlabel('Time')
plt.legend(loc=2)
plt.show()
EDIT:
As pointed out by #Micah Smith, the above answer is the worst case operations, the Big-O is actually O(n^2), since the constants and lower order terms are omitted.

Python prime number code giving runtime error(NZEC) on spoj

I am trying to get an accepted answer for this question:http://www.spoj.com/problems/PRIME1/
It's nothing new, just wanting prime numbers to be generated between two given numbers. Eventually, I have coded the following. But spoj is giving me runtime-error(nzec), and I have no idea how it should be dealt with. I hope you can help me with it. Thanks in advance.
def is_prime(m,n):
myList= []
mySieve= [True] * (n+1)
for i in range(2,n+1):
if mySieve[i]:
myList.append(i)
for x in range(i*i,n+1,i):
mySieve[x]= False
for a in [y for y in myList if y>=m]:
print(a)
t= input()
count = 0
while count <int(t):
m, n = input().split()
count +=1
is_prime(int(m),int(n))
if count == int(t):
break
print("\n")

Looking at the problem definition:
In each of the next t lines there are two numbers m and n (1 <= m <= n <= 1000000000, n-m<=100000) separated by a space.
Looking at your code:
mySieve= [True] * (n+1)
So, if n is 1000000000, you're going to try to create a list of 1000000001 boolean values. That means you're asking Python to allocate storage for a billion pointers. On a 64-bit platform, that's 8GB—which is fine as far as Python's concerned, but might well throw your system into swap hell or get it killed by a limit or watchdog. On a 32-bit platform, that's 4GB—which will guarantee you a MemoryError.
The problem also explicitly has this warning:
Warning: large Input/Output data, be careful with certain languages
So, if you want to implement it this way, you're going to have to come up with a more compact storage. For example, array.array('B', [True]) * (n+1) will only take 1GB instead of 4 or 8. And you can make it even smaller (128MB) if you store it in bits instead of bytes, but that's not quite as trivial a change to code.

Calculating prime numbers between two numbers is meaningless. You can only calculate prime numbers until a given number by using other primes you found before, then show only range you wanted.
Here is a python code and some calculated primes you can continue by using them:
bzr branch http://bzr.ceremcem.net/calc-primes
This code is somewhat trivial but is working correctly and tested well.

NZEC error in python code

MY code is working perfectly fine on my machine but it gives M+NZEC erro when compiled by spoj.
Here is the link to my ques:
http://www.spoj.com/problems/CPRIME/
Here is my code:
def smallPrimes(n):
"""Given an integer n, compute a list of the primes <= n"""
if n <= 1:
return []
sieve = range(3, n+1, 2)
top = len(sieve)
for si in sieve:
if si:
bottom = (si*si - 3)//2
if bottom >= top:
break
sieve[bottom::si] = [0] * -((bottom-top)//si)
return [2]+filter(None, sieve)
from math import *
import sys
def main():
flag=True
while(flag==True):
x=input()
if(x==0):
flag=False
return 0
z=x/log(x)
v=len(smallPrimes(x))
print round((abs(v-z)*100/(v)),1)
if __name__ == "__main__":
main()

In SPOJ the NZEC error is raised when an exception occurs in the Python script execution.
In your case as the problem in input is well specified and terminates on a zero, so it can't be because of input as you take that into consideration.
The error is most likely because of usage of more memory than allowed. In your problem the memory limit is specified as 256 MB. But in your code
sieve = range(3, n+1, 2)
This line declares a list of size about n/2. When, n=10^8 it means that you will declare a list with 5*10^7 integers which with a naive approximation and ignoring all the overheads will be
(5*10^7)*4 bytes
~ 200 MB
Including the overheads and other memory usage for your second big list declaration
[0] * -((bottom-top)//si)
which can reach about 130 MB neglecting all the overheads, you will exceed the memory limit to just store that many integers in the list. I noticed memory usage of about 1 GB by your code on my machine. So your code crosses the memory limit on SPOJ and it raises an exception.
The best thing to do is to optimize your approach, declaring lists of the order of 10^8 is seldom needed in such questions. I can see a way in which you won't need to declare a list that big but since it's a question of an online judge, it's best to let you figure out the approach. :)

Unexpected performance curve from CPython merge sort

I have implemented a naive merge sorting algorithm in Python. Algorithm and test code is below:
import time
import random
import matplotlib.pyplot as plt
import math
from collections import deque
def sort(unsorted):
if len(unsorted) <= 1:
return unsorted
to_merge = deque(deque([elem]) for elem in unsorted)
while len(to_merge) > 1:
left = to_merge.popleft()
right = to_merge.popleft()
to_merge.append(merge(left, right))
return to_merge.pop()
def merge(left, right):
result = deque()
while left or right:
if left and right:
elem = left.popleft() if left[0] > right[0] else right.popleft()
elif not left and right:
elem = right.popleft()
elif not right and left:
elem = left.popleft()
result.append(elem)
return result
LOOP_COUNT = 100
START_N = 1
END_N = 1000
def test(fun, test_data):
start = time.clock()
for _ in xrange(LOOP_COUNT):
fun(test_data)
return time.clock() - start
def run_test():
timings, elem_nums = [], []
test_data = random.sample(xrange(100000), END_N)
for i in xrange(START_N, END_N):
loop_test_data = test_data[:i]
elapsed = test(sort, loop_test_data)
timings.append(elapsed)
elem_nums.append(len(loop_test_data))
print "%f s --- %d elems" % (elapsed, len(loop_test_data))
plt.plot(elem_nums, timings)
plt.show()
run_test()
As much as I can see everything is OK and I should get a nice N*logN curve as a result. But the picture differs a bit:
Things I've tried to investigate the issue:
PyPy. The curve is ok.
Disabled the GC using the gc module. Wrong guess. Debug output showed that it doesn't even run until the end of the test.
Memory profiling using meliae - nothing special or suspicious.
`
I had another implementation (a recursive one using the same merge function), it acts the similar way. The more full test cycles I create - the more "jumps" there are in the curve.
So how can this behaviour be explained and - hopefully - fixed?
UPD: changed lists to collections.deque
UPD2: added the full test code
UPD3: I use Python 2.7.1 on a Ubuntu 11.04 OS, using a quad-core 2Hz notebook. I tried to turn of most of all other processes: the number of spikes went down but at least one of them was still there.

You are simply picking up the impact of other processes on your machine.
You run your sort function 100 times for input size 1 and record the total time spent on this. Then you run it 100 times for input size 2, and record the total time spent. You continue doing so until you reach input size 1000.
Let's say once in a while your OS (or you yourself) start doing something CPU-intensive. Let's say this "spike" lasts as long as it takes you to run your sort function 5000 times. This means that the execution times would look slow for 5000 / 100 = 50 consecutive input sizes. A while later, another spike happens, and another range of input sizes look slow. This is precisely what you see in your chart.
I can think of one way to avoid this problem. Run your sort function just once for each input size: 1, 2, 3, ..., 1000. Repeat this process 100 times, using the same 1000 inputs (it's important, see explanation at the end). Now take the minimum time spent for each input size as your final data point for the chart.
That way, your spikes should only affect each input size only a few times out of 100 runs; and since you're taking the minimum, they will likely have no impact on the final chart at all.
If your spikes are really really long and frequent, you of course might want to increase the number of repetitions beyond the current 100 per input size.
Looking at your spikes, I notice the execution slows down exactly 3 times during a spike. I'm guessing the OS gives your python process one slot out of three during high load. Whether my guess is correct or not, the approach I recommend should resolve the issue.
EDIT:
I realized that I didn't clarify one point in my proposed solution to your problem.
Should you use the same input in each of your 100 runs for the given input size? Or should use 100 different (random) inputs?
Since I recommended to take the minimum of the execution times, the inputs should be the same (otherwise you'll be getting incorrect output, as you'll measuring the best-case algorithm complexity instead of the average complexity!).
But when you take the same inputs, you create some noise in your chart since some inputs are simply faster than others.
So a better solution is to resolve the system load problem, without creating the problem of only one input per input size (this is obviously pseudocode):
seed = 'choose whatever you like'
repeats = 4
inputs_per_size = 25
runtimes = defaultdict(lambda : float('inf'))
for r in range(repeats):
random.seed(seed)
for i in range(inputs_per_size):
for n in range(1000):
input = generate_random_input(size = n)
execution_time = get_execution_time(input)
if runtimes[(n, i)] > execution_time:
runtimes[(n,i)] = execution_time
for n in range(1000):
runtimes[n] = sum(runtimes[(n,i)] for i in range(inputs_per_size))/inputs_per_size
Now you can use runtimes[n] to build your plot.
Of course, depending if your system is super-noisy, you might change (repeats, inputs_per_size) from (4,25) to say, (10,10), or even (25,4).

I can reproduce the spikes using your code:
You should choose an appropriate timing function (time.time() vs. time.clock() -- from timeit import default_timer), number of repetitions in a test (how long each test takes), and number of tests to choose the minimal time from. It gives you a better precision and less external influence on the results. Read the note from timeit.Timer.repeat() docs:
It’s tempting to calculate mean and standard deviation from the result
vector and report these. However, this is not very useful. In a
typical case, the lowest value gives a lower bound for how fast your
machine can run the given code snippet; higher values in the result
vector are typically not caused by variability in Python’s speed, but
by other processes interfering with your timing accuracy. So the min()
of the result is probably the only number you should be interested in.
After that, you should look at the entire vector and apply common
sense rather than statistics.
timeit module can choose appropriate parameters for you:
$ python -mtimeit -s 'from m import testdata, sort; a = testdata[:500]' 'sort(a)'
Here's timeit-based performance curve:
The figure shows that sort() behavior is consistent with O(n*log(n)):
|------------------------------+-------------------|
| Fitting polynom | Function |
|------------------------------+-------------------|
| 1.00 log2(N) + 1.25e-015 | N |
| 2.00 log2(N) + 5.31e-018 | N*N |
| 1.19 log2(N) + 1.116 | N*log2(N) |
| 1.37 log2(N) + 2.232 | N*log2(N)*log2(N) |
To generate the figure I've used make-figures.py:
$ python make-figures.py --nsublists 1 --maxn=0x100000 -s vkazanov.msort -s vkazanov.msort_builtin
where:
# adapt sorting functions for make-figures.py
def msort(lists):
assert len(lists) == 1
return sort(lists[0]) # `sort()` from the question
def msort_builtin(lists):
assert len(lists) == 1
return sorted(lists[0]) # builtin
Input lists are described here (note: the input is sorted so builtin sorted() function shows expected O(N) performance).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.