This question already has answers here:
Sieve of Eratosthenes - Finding Primes Python
(22 answers)
Closed 7 years ago.
I'm fairly new to programming and I decided to do some exercises to improve my abilities. I'm stuck with an exercise: "Find the sum of all the primes below two million." My code is too slow.
Initially, I tried to solve as a normal prime problem and ended up with this:
sum = 2 + 3
for i in range (5, 2000000, 2):
for j in range (3, i, 2):
if i%j == 0:
break
else:
sum += i
print(sum)
In this way, all even numbers would be excluded from the loop. But it did not solve my problem. The magnitude here is really big.
So I tried to understand what was happening with this code. I have a loop inside a loop, and the loop inside the loop runs the index of the outside loop times (not quite that because the list doesn't start from 0), right? So, when I try to find prime numbers under 20, it runs the outside loop 8 times, but the inside loop 60 (I don't know if this math is correct, as I said, I'm quite knew to programming). But when I use it with 2,000,000, I'm running the inside loop something like 999,993,000,012 times in total, and that is madness.
My friend told me about the Sieve of Eratosthenes, and I tried to create a new code:
list = [2]
list.extend(range(3, 2000000, 2))
for i in list:
for j in list:
if j%i == 0 and j > i:
list.remove(j)
print(sum(list))
And that's what I achieved trying to simulate the sieve (Ignoring even numbers helped). it's a lot faster (with the other code, it would take a long time to find primes under 200,000, and with this new one I can do it) but it is not enough to compute 2,000,000,000 in a reasonable time. The code is running in the background since I started to write, and still nothing. I don't know how many times this thing is looping and I am too tired to think about it now.
I came here to ask for help. Why is it so slow? What should I learn/read/do to improve my code? Is there any other method more efficient than this sieve? Thank you for your time.
Because list.remove is a O(n) operation, and you're doing it a lot. And you're not performing a true sieve, just trial division in disguise; you're still doing all the remainder testing you did in the original code.
A Sieve of Eratosthenes typically is implemented with an array of flags; in the simplest form, each index corresponds to the same number, and the value is initially True for all indices but 0 and 1. You iterate along, and when you find a True value, you set all indices that are multiples of it to False. This means the work is sequential addition, not multiplication, not division (which are much more expensive.
Related
I'm currently trying to solve the 'dance recital' kattis challenge in Python 3. See here
After taking input on how many performances there are in the dance recital, you must arrange performances in such a way that sequential performances by a dancer are minimized.
I've seen this challenge completed in C++, but my code kept running out of time and I wanted to optimize it.
Question: As of right now, I generate all possible permutations of performances and run comparisons off of that. A faster way to would be to not generate all permutations, as some of them are simply reversed and would result in the exact same output.
import itertools
print(list(itertools.permutations(range(2)))) --> [(0,1),(1,0)] #They're the same, backwards and forwards
print(magic_algorithm(range(2))) --> [(0,1)] #This is what I want
How might I generate such a list of permutations?
I've tried:
-Generating all permutation, running over them again to reversed() duplicates and saving them. This takes too long and the result cannot be hard coded into the solution as the file becomes too big.
-Only generating permutations up to the half-way mark, then stopping, assuming that after that, no unique permutations are generated (not true, as I found out)
-I've checked out questions here, but no one seems to have the same question as me, ditto on the web
Here's my current code:
from itertools import permutations
number_of_routines = int(input()) #first line is number of routines
dance_routine_list = [0]*10
permutation_list = list(permutations(range(number_of_routines))) #generate permutations
for q in range(number_of_routines):
s = input()
for c in s:
v = ord(c) - 65
dance_routine_list[q] |= (1 << v) #each routine ex.'ABC' is A-Z where each char represents a performer in the routine
def calculate():
least_changes_possible = 1e9 #this will become smaller, as optimizations are found
for j in permutation_list:
tmp = 0
for i in range(1,number_of_routines):
tmp += (bin(dance_routine_list[j[i]] & dance_routine_list[j[i - 1]]).count('1')) #each 1 represents a performer who must complete sequential routines
least_changes_possible = min(least_changes_possible, tmp)
return least_changes_possible
print(calculate())
Edit: Took a shower and decided adding a 2-element-comparison look-up table would speed it up, as many of the operations are repeated. Still doesn't fix iterating over the whole permutations, but it should help.
Edit: Found another thread that answered this pretty well. How to generate permutations of a list without "reverse duplicates" in Python using generators
Thank you all!
There are at most 10 possible dance routines, so at most 3.6M permutations, and even bad algorithms like generate 'em all and test will be done very quickly.
If you wanted a fast solution for up to 24 or so routines, then I would do it like this...
Given the the R dance routines, at any point in the recital, in order to decide which routine you can perform next, you need to know:
Which routines you've already performed, because there you can't do those ones next. There are 2R possible sets of already-performed routines; and
Which routine was performed last, because that helps determine the cost of the next one. There are at most R-1 possible values for that.
So there are at less than (R-2)*2R possible recital states...
Imagine a directed graph that connects each possible state to all the possible following states, by an edge for the routine that you would perform to get to that state. Label each edge with the cost of performing that routine.
For example, if you've performed routines 5 and 6, with 5 last, then you would be in state (5,6):5, and there would be an edge to (3,5,6):3 that you could get to after performing routine 3.
Starting at the initial "nothing performed yet" state ():-, use Dijkstra's algorithm to find the least cost path to a state with all routines performed.
Total complexity is O(R2*2R) or so, depending exactly how you implement it.
For R=10, R2*2R is ~100 000, which won't take very long at all. For R=24 it's about 9 billion, which is going to take under half a minute in pretty good C++.
I wrote the below program for one of the questions asked in the interview preparation websites. I would like to know if this code efficient or not or how can we improve if it is not efficient?
My problem
Write a function that takes an integer flight_length (in minutes) and a list of integers movie_lengths (in minutes) and returns a boolean indicating whether there are two numbers in movie_lengths whose sum equals flight_length.
When building your function:
Assume your users will watch exactly two movies
Don't make your users watch the same movie twice
Optimize for runtime over memory
def movie_length(fligh_length,movietimes):
newmovietimes =[]
for time in movietimes[0::]:
if time not in newmovietimes:
newmovietimes.append(time)
else:
print("movie times are equal")
if fligh_length == sum(newmovietimes):
return print("you can watch")
else:
return
print("you can't watch")
movie_length(11,[8,2])
The loop to reduce times to only unique times would certainly be more efficient with newmovietimes = set(movietimes) but this is a bug actually; if there are two 5-minute movies, you can watch them back to back in 10 minutes.
Your code doesn't solve the full problem if I understand it correctly; you should find two movies from a list of movie times, and if there are multiple combinations which satisfy the constraint, propose them all. Like for (10, [4, 5, 6, 5]) you should find the combinations 6+4 and 5+5.
(In the end you only return a boolean to say whether you found an answer, so you can quit as soon as you find one; but your code should easily extend to find all permutations if necessary - that's what it takes to solve this completely anyway.)
I've got a problem in Python:
I want to find how many UNIQUE a**b values exist if:
2 ≤ a ≤ 100and 2 ≤ b ≤ 100?
I wrote the following script, but it's too slow on my laptop (and doesnt even produce the results):
List=[]
a = 2
b = 2
c = pow(a, b)
while b != 101:
while a != 101:
if List.count(c) == 0:
List.append(c)
a += 1
b += 1
print len(List)
Is it good? Why is it slow?
This code doesn't work; it's an infinite loop because of the way you don't increment a on every iteration of the loop. After you fix that, you still won't get the right answer because you never reset a to 2 when b reaches 101.
Then, List will ever contain only 4 because you set c outside the loop to 2 ** 2 and never change it inside the loop. And when you fix that it'll still be slower than it really needs to be because you are reading the entire list each time through to get the count, and as it gets longer, that takes more and more time.
You generally should use in rather than count if you just need to know if an item is in a list, since it will stop as soon as it finds the the item, but in this specific instance you should be using a set anyway, since you are looking for unique values. You can just add to the set without checking to see whether the item is already in it.
Finally, using for loops is more readable than using while loops.
result = set()
for a in xrange(2, 101):
for b in xrange(2, 101):
result.add(a ** b)
print len(result)
This takes less than a second on my machine.
The reason your script is slow and doesn't return a value is that you have created an infinite loop. You need to dedent the a += 1 line by one level, otherwise, after the first time through the inner while loop a will not get incremented again.
There are some additional issues with the script that have been pointed out in the comments, but this is what is responsible for the issues your are experiencing.
Your code is not good, since it does not produce correct results. As the comment by #grael pointed out, you do not recalculate the value of c inside the loop, so you are counting only one value over and over again. There are also other problems, as other people have noted.
Your code is not fast for several reasons.
You are using a brute-force method. The answer can be found more simply by using number theory and combinatorics. Look at the prime factorization of each number between 2 and 100 and consider the prime factorization of each power of that number. You never need to calculate the complete number--the prime factorization is enough. I'll leave the details to you but this would be much faster.
You are rolling your own loop, but it is faster to use python's. Loop a and b with:
for a in range(2,101):
for b in range(2,101):
c = pow(a, b)
# other code here
This code uses the built-in capabilities of the language and should be faster. This also avoids your errors since it is simpler.
You use a very slow method to see if a number has already been calculated. Your if List.count(c) == 0 must check every previous number to see if the current number has been seen. This will become very slow when you have already seen thousands of numbers. It is much faster to keep the already-seen numbers in a set rather than a list. Checking if a number is in a set is much faster than using count() on a list.
Try combining all these suggestions. As another answer shows, just using the last two probably suffice.
Again I have a question concerning large loops.
Suppose I have a function
limits
def limits(a,b):
*evaluate integral with upper and lower limits a and b*
return float result
A and B are simple np.arrays that store my values a and b. Now I want to calculate the integral 300'000^2/2 times because A and B are of the length of 300'000 each and the integral is symmetrical.
In Python I tried several ways like itertools.combinations_with_replacement to create the combinations of A and B and then put them into the integral but that takes huge amount of time and the memory is totally overloaded.
Is there any way, for example transferring the loop in another language, to speed this up?
I would like to run the loop
for i in range(len(A)):
for j in range(len(B)):
np.histogram(limits(A[i],B[j]))
I think histrogramming the return of limits is desirable in order not to store additional arrays that grow squarely.
From what I read python is not really the best choice for this iterative ansatzes.
So would it be reasonable to evaluate this loop in another language within Python, if yes, How to do it. I know there are ways to transfer code, but I have never done it so far.
Thanks for your help.
If you're worried about memory footprint, all you need to do is bin the results as you go in the for loop.
num_bins = 100
bin_upper_limits = np.linspace(-456, 456, num=num_bins-1)
# (last bin has no upper limit, it goes from 456 to infinity)
bin_count = np.zeros(num_bins)
for a in A:
for b in B:
if b<a:
# you said the integral is symmetric, so we can skip these, right?
continue
new_result = limits(a,b)
which_bin = np.digitize([new_result], bin_upper_limits)
bin_count[which_bin] += 1
So nothing large is saved in memory.
As for speed, I imagine that the overwhelming majority of time is spent evaluating limits(a,b). The looping and binning is plenty fast in this case, even in python. To convince yourself of this, try replacing the line new_result = limits(a,b) with new_result = 234. You'll find that the loop runs very fast. (A few minutes on my computer, much much less than the 4 hour figure you quote.) Python does not loop very fast compared to C, but it doesn't matter in this case.
Whatever you do to speed up the limits() call (including implementing it in another language) will speed up the program.
If you change the algorithm, there is vast room for improvement. Let's take an example of what it seems you're doing. Let's say A and B are 0,1,2,3. You're integrating a function over the ranges 0-->0, 0-->1, 1-->1, 1-->2, 0-->2, etc. etc. You're re-doing the same work over and over. If you have integrated 0-->1 and 1-->2, then you can add up those two results to get the integral 0-->2. You don't have to use a fancy integration algorithm, you just have to add two numbers you already know.
Therefore it seems to me that you can compute integrals in all the smallest ranges (0-->1, 1-->2, 2-->3), store the results in an array, and add subsets of the results to get the integral over whatever range you want. If you want this program to run in a few minutes instead of 4 hours, I suggest thinking through an alternative algorithm along those lines.
(Sorry if I'm misunderstanding the problem you're trying to solve.)
I'm trying to solve problem 50 on Project Euler. Don't give me the answer or solve it for me, just try to answer this specific question.
The goal is to find the longest sum of consecutive primes that adds to a prime below one million. I wrote a sieve to find all the primes below n, and I have confirmed that it is correct. Next, I am going to check the sum of each subset of consecutive primes using the following method:
I have a empty list sums. For each prime number, I add it to each element in sums and check the new sum, then I append the prime to sums.
Here it is in python
primes = allPrimesBelow(1000000)
sums = []
for p in primes:
for i in range(len(sums)):
sums[i] += p
check(sums[i])
sums.append(p)
I want to know if I have called check() for every sum of two or more consecutive primes below one million
The problem says that there is a prime, 953, that can be written as the sum of 21 consecutive primes, but I am not finding it.
Your code is correct. I ran it and it does generate the number 953, so the problem is probably with your prime generating function. There should be 78498 primes below a million - you may want to check if you get that result.
That said, your code will take a long time to run, since it will call check() 3,080,928,753 times. You may want to find a method that checks less sums. I won't expand on this since you asked for no spoilers, but let me know if you're interested in general hints.
I don't have a straight answer off the top of my head, but have you tried making sums into a nested array and then appending primes p to the sub-arrays instead of adding them to a summation counter? That would let you visually check which primes were being added to each sub-array, and by extension would tell you which primes the original code was summing up.