l = [[i, i, 1] for i in range(1,1000000)]
def collatz(li):
for el in li:
if el[1] == 1:
li.remove(el)
elif el[1] % 2 == 0:
el[1] = el[1] / 2
el[2] += 1
elif el[1] % 2 == 1:
el[1] = 3*el[1] + 1
el[2] += 1
return li
while len(collatz(l)) >= 2:
l = collatz(l)
print l
Hi, this is a (partial) solution to Euler problem 14, written in Python.
Longest Collatz sequence
Problem 14
The following iterative sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
I wrote partial because it does not really output the solution since I can't really run it in the whole 1 - 1000000 range. It's way too slow - taking more than 20 minutes the last time I killed the process. I have barely just started with python and programming in general (about 2 weeks) and I am looking to understand what's the obvious mistake I am making in terms of efficiency. I googled some solutions and even the average ones are orders of magnitude faster than mine. So what am I missing here? Any pointers to literature to avoid making the same mistakes in the future?
a little improvement upon sara's answer
import time
start = time.time()
def collatz(n):
k = n
length = 1
nList = []
nList.append(n)
while n != 1:
if n not in dic:
n = collatzRule(n)
nList.append(n)
length += 1
else:
# we dont need the values but we do need the real length for the for-loop
nList.extend([None for _ in range(dic[n] - 1)])
length = (length - 1) + dic[n]
break
for seq in nList:
if seq not in dic:
dic[seq] = len(nList) - nList.index(seq)
return length
def collatzRule(n):
if n % 2 == 0:
return n // 2
else:
return 3 * n + 1
longestLen = 0
longestNum = 0
dic = {}
for n in range(2, 1000001):
prsntLen = collatz(n)
if prsntLen > longestLen:
longestLen = prsntLen
longestNum = n
# print(f'{n}: {prsntLen}')
print(f'The starting num is: {longestNum} with the longest chain having: {longestLen} terms.')
print(f'time taken: {time.time() - start}')
Sara's answer is great, but can be more efficient.
If the value we return from the function is len(seq), why not just counting the number of iterations instead of conducting a list first?
I have changed the code slightly, and the performance improvement is significant
def collatz(x):
count = 1
temp = x
while temp > 1:
if temp % 2 == 0:
temp = int(temp/2)
if temp in has2: # calculate temp and check if in cache
count += has2[temp]
break
else:
count += 1
else:
temp = 3*temp + 1
if temp in has2:
count += has2[temp]
break
else:
count += 1
has2[x] = count
return count
837799 has 525 elements. calculation time =1.97099995613 seconds.
Compared to the original version
837799 has 525 elements. calculation time =11.3389999866 seconds.
Using list of int rather than building the whole list is ~80% faster.
the problem is you use brute force algorithm that is inefficient.this is my solution to problem 14 from project Euler. it takes a few second to run. the key is you should save previous results in a dictionary so you don't have to compute those results again.:
#problem 14 project euler
import time
start=time.time()
has2={}
def collatz(x):
seq=[]
seq.append(x)
temp=x
while(temp>1):
if temp%2==0:
temp=int(temp/2)
if temp in has2:
seq+=has2[temp]
break
else:
seq.append(temp)
else:
temp=3*temp+1
if temp in has2:
seq+=has2[temp]
break
else:
seq.append(temp)
has2[x]=seq
return len(seq)
num=0
greatest=0
for i in range(1000000):
c=collatz(i)
if num<c:
num=c
greatest=i
print('{0} has {1} elements. calculation time ={2} seconds.'.format(greatest,num,time.time()-start))
As #Sara says you could use dictionary to save previous results and then look them up for making program run faster. But I don't quite understand your results, taking more than 20 mins sounds like you have some problem.
By using bruteforce i get code to run about at 16 sec.
#!/bin/python3
########################
# Collatz Conjecture #
# Written by jeb 2015 #
########################
import time
current = 0
high = 0
# While number is not one, either divide it by 2
# or multiply with 3 and add one
# Returns number of iterations
def NonRecursiveCollatz(i):
counter = 1
while i != 1:
counter = counter + 1
if i%2 == 0:
i = i / 2
else:
i = 3*i + 1
return counter
time_start = time.time()
# Test all numbers between 1 and 1.000.000
# If number returned is higher than last one, store it nd remember
# what number we used as input to the function
for i in range(1,1000000):
current = NonRecursiveCollatz(i)
if current > high:
high = current
number = i
elapsed_time = time.time() - time_start
print "Highest chain"
print high
print "From number "
print number
print "Time taken "
print elapsed_time
With the output:
Highest chain
525
From number
837799
Time taken
16.730340004
//Longest Colletz Sequence
public class Problem14 {
static long getLength(long numb) {
long length = 0;
for(long i=numb; i>=1;) {
length++;
if(i==1)
break;
if(i%2==0)
i = i/2;
else
i = (3*i)+1;
}
return length;
}
static void solution(long numb) {
long number = numb;
long maxLength = getLength(number);
for(long i=numb; i>=1; i--) {
if(getLength(i)>=maxLength) {
maxLength = getLength(i);
number = i;
}
}
System.out.println("`enter code here`Length of "+number+" is : "+maxLength);
}
public static void main(String args[]) {
long begin = System.currentTimeMillis();
solution(1000000);
long end = System.currentTimeMillis();
System.out.println("Time : "+(end-begin));
}
}
output :
Length of 837799 is : 525
Time : 502
Related
What's the goal?
My goal is to find all armstrong/narcisstic numbers in hex for a given amount of digits.
The basic idea
The basic idea is that for a set of digits e.g. [A, 3, F, 5] the sum of powers is always the same no matter the order in which they occur. That means we don't have to look at every possible number up to our maximum which should greatly reduce runtime.
What I have so far
# Armstrong numbers base 16 for n digits
import time
import itertools
from typing import Counter
pows = [[]]
def genPow(max, base):
global pows
pows = [[0]*1 for i in range(base)]
for i in range(base):
pows[i][0] = i ** max
def check(a, b):
c1 = Counter(a)
c2 = Counter(b)
diff1 = c1-c2
diff2 = c2-c1
# Check if elements in both 'sets' are equal in occurence
return (diff1 == diff2)
def armstrong(digits):
results = []
genPow(digits, 16)
# Generate all combinations without consideration of order
for set in itertools.combinations_with_replacement('0123456789abcdef', digits):
sum = 0
# Genereate sum for every 'digit' in the set
for digit in set:
sum = sum + pows[int(digit, 16)][0]
# Convert to hex
hexsum = format(sum, 'x')
# No point in comparing if the length isn't the same
if len(hexsum) == len(set):
if check(hexsum, set):
results.append(hexsum)
return sorted(results)
start_time = time.time()
print(armstrong(10))
print("--- %s seconds ---" % (time.time() - start_time))
My problem
My issue is that this is still rather slow. It takes up to ~60 seconds for 10 digits. I'm pretty sure there are ways to do this more efficient. Some things I can think of, but don't know how to do are: faster way to generate combinations, condition for stopping calc. of sum, better way to compare the sum and set, convert to hex after comparing
Any ideas how to optimize this?
Edit: I tried to compare/check a bit differently and it's already a bit faster this way https://gist.github.com/Claypaenguin/d657c4413b510be580c1bbe3e7872624 Meanwhile I'm trying to understand the recursive approach, because it looks like it'll be a lot faster.
Your problem is that combinations_with_replacement for base b and length l is returning (b+l choose b) different things. Which in your case (base 16, length 10) means that you have 5,311,735 combinations.
Each of which you then do a heavyweight calculation on.
What you need to do is filter the combinations that you are creating as you are creating them. As soon as you realize that you are not on the way to an Armstrong number, abandon that path. The calculation will seem more complicated, but it is worthwhile when it lets you skip over whole blocks of combinations without having to individually generate them.
Here is pseudocode for the heart of the technique:
# recursive search for Armstrong numbers with:
#
# base = base of desired number
# length = length of desired number
# known_digits = already chosen digits (not in order)
# max_digit = the largest digit we are allowed to add
#
# The base case is that we are past or at a solution.
#
# The recursive cases are that we lower max_digit, or add max_digit to known_digits.
#
# When we add max_digit we compute min/max sums. Looking at those we
# stop searching if our min_sum is too big or our max_sum is too small.
# We then look for leading digits in common. This may let us discover
# more digits that we need. (And if they are too big, we can't do that.)
def search(base, length, known_digits, max_digit):
digits = known_digits.copy() # Be sure we do not modify the original.
answer = []
if length < len(digits):
# We can't have any solutions.
return []
elif length == len(digits):
if digits is a solution:
return [digits]
else:
return []
elif 0 < max_digit:
answer = search(base, length, digits, max_digit-1)
digits.append(max_digit)
# We now have some answers, and known_digits. Can we find more?
find min_sum (all remaining digits are 0)
if min_sum < base**(length-1):
min_sum = base**(length-1)
find max_sum (all remaining digits are max_digit)
if base**length <= max_sum:
max_sum = base**length - 1
# Is there a possible answer between them?
if max_sum < base**(length-1) or base**length <= min_sum:
return answer # can't add more
else:
min_sum_digits = base_digits(min_sum, base)
max_sum_digits = base_digits(max_sum, base)
common_leading_digits = what digits are in common?
new_digits = what digits in common_leading_digits can't be found in our known_digits?
if 0 == len(new_digits):
return answer + search(base, length, digits, max_digit)
elif max_digit < max(new_digits):
# Can't add this digit
return answer
else:
digits.extend(new_digits)
return answer + search(base, length, digits, max_digit)
I had a small logic error, but here is working code:
def in_base (n, b):
answer = []
while 0 < n:
answer.append(n % b)
n = n // b
return answer
def powers (b, length, cached={}):
if (b, length) not in cached:
answer = []
for i in range(b):
answer.append(i**length)
cached[(b, length)] = answer
return cached[(b, length)]
def multiset_minus (a, b):
count_a = {}
for x in a:
if x not in count_a:
count_a[x] = 1
else:
count_a[x] += 1
minus_b = []
for x in b:
if x in count_a:
if 1 == count_a[x]:
count_a.pop(x)
else:
count_a[x] -= 1
else:
minus_b.append(x)
return minus_b
def armstrong_search (length, b, max_digit=None, known=None):
if max_digit is None:
max_digit = b-1
elif max_digit < 0:
return []
if known is None:
known = []
else:
known = known.copy() # Be sure not to accidentally share
if len(known) == length:
base_rep = in_base(sum([powers(b,length)[x] for x in known]), b)
if 0 == len(multiset_minus(known, base_rep)):
return [(base_rep)]
else:
return []
elif length < len(known):
return []
else:
min_sum = sum([powers(b,length)[x] for x in known])
max_sum = min_sum + (length - len(known)) * powers(b,length)[max_digit]
if min_sum < b**(length-1):
min_sum = b**(length-1)
elif b**length < min_sum:
return []
if b**length < max_sum:
max_sum = b**length - 1
elif max_sum < b**(length-1):
return []
min_sum_rep = in_base(min_sum, b)
max_sum_rep = in_base(max_sum, b)
common_digits = []
for i in range(length-1, -1, -1):
if min_sum_rep[i] == max_sum_rep[i]:
common_digits.append(min_sum_rep[i])
else:
break
new_digits = multiset_minus(known, common_digits)
if 0 == len(new_digits):
answers = armstrong_search(length, b, max_digit-1, known)
known.append(max_digit)
answers.extend(armstrong_search(length, b, max_digit, known))
return answers
else:
known.extend(new_digits)
return armstrong_search(length, b, max_digit, known)
And for a quick example:
digits = list('0123456789abcdef')
print([''.join(reversed([digits[i] for i in x])) for x in armstrong_search(10, len(digits))])
Takes a little over 2 seconds to find that the only answer is bcc6926afe.
Since itertools's combinations will return numbers in ascending order, comparing the sum of powers would be more efficient using a sorted list of its digits:
Here's a general purpose narcissic number generator that uses that mode of comparison:
import string
import itertools
def narcissic(base=10,startSize=1,endSize=None):
baseDigits = string.digits+string.ascii_uppercase+string.ascii_lowercase
if not endSize:
endSize = 1
while (base/(base-1))**(endSize+1) < base*(endSize+1): endSize += 1
def getDigits(N):
result = []
while N:
N,digit = divmod(N,base)
result.append(digit)
return result[::-1]
yield (0,"0")
allDigits = [*range(base)]
for size in range(startSize,endSize):
powers = [i**size for i in range(base)]
for digits in itertools.combinations_with_replacement(allDigits, size):
number = sum(powers[d] for d in digits)
numDigits = getDigits(number)
if digits == tuple(sorted(numDigits)):
baseNumber = "".join(baseDigits[d] for d in numDigits)
yield number, baseNumber
output:
for i,(n,bn) in enumerate(narcissic(5)): print(i+1,":",n,"-->",bn)
1 : 0 --> 0
2 : 1 --> 1
3 : 2 --> 2
4 : 3 --> 3
5 : 4 --> 4
6 : 13 --> 23
7 : 18 --> 33
8 : 28 --> 103
9 : 118 --> 433
10 : 353 --> 2403
11 : 289 --> 2124
12 : 419 --> 3134
13 : 4890 --> 124030
14 : 4891 --> 124031
15 : 9113 --> 242423
16 : 1874374 --> 434434444
17 : 338749352 --> 1143204434402
18 : 2415951874 --> 14421440424444
Using timeit to compare performance, we get a 3.5x speed improvement:
from timeit import timeit
t = timeit(lambda:list(narcissic(16,10,11)),number=1)
print("narcissic",t) # 11.006802322999999
t = timeit(lambda:armstrong(10),number=1)
print("armstrong:",t) # 40.324530023
Note that the processing time increases exponentially with each new size so a mere 3.5x speed boost will not be a meaningful as it will only push the issue to the next size
I am trying to count the number of unique numbers in a sorted array using binary search. I need to get the edge of the change from one number to the next to count. I was thinking of doing this without using recursion. Is there an iterative approach?
def unique(x):
start = 0
end = len(x)-1
count =0
# This is the current number we are looking for
item = x[start]
while start <= end:
middle = (start + end)//2
if item == x[middle]:
start = middle+1
elif item < x[middle]:
end = middle -1
#when item item greater, change to next number
count+=1
# if the number
return count
unique([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5])
Thank you.
Edit: Even if the runtime benefit is negligent from o(n), what is my binary search missing? It's confusing when not looking for an actual item. How can I fix this?
Working code exploiting binary search (returns 3 for given example).
As discussed in comments, complexity is about O(k*log(n)) where k is number of unique items, so this approach works well when k is small compared with n, and might become worse than linear scan in case of k ~ n
def countuniquebs(A):
n = len(A)
t = A[0]
l = 1
count = 0
while l < n - 1:
r = n - 1
while l < r:
m = (r + l) // 2
if A[m] > t:
r = m
else:
l = m + 1
count += 1
if l < n:
t = A[l]
return count
print(countuniquebs([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5]))
I wouldn't quite call it "using a binary search", but this binary divide-and-conquer algorithm works in O(k*log(n)/log(k)) time, which is better than a repeated binary search, and never worse than a linear scan:
def countUniques(A, start, end):
len = end-start
if len < 1:
return 0
if A[start] == A[end-1]:
return 1
if len < 3:
return 2
mid = start + len//2
return countUniques(A, start, mid+1) + countUniques(A, mid, end) - 1
A = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,4,5,5,5,5,5,5,5,5,5,5]
print(countUniques(A,0,len(A)))
This is an interesting question that I came across in a coding challenge:
There are k cities and n days. A travel agent is going to show you city k on day n. You're supposed to find the minimum number of days in which you can visit all cities. You're also allowed to visit cities more than once, but ideally you wouldn't want to do that since you want to minimize the number of days.
Input :You're given an array of days and cities where days are indices and cities are values.
A=[7,4,7,3,4,1,7] So A[0]=7 means travel agent will show you city 7 on day 0, city 4 on day 1 etc.
So Here if you start out on day 0, you'll have visited all cities by day 5, but you can also start on day 2 and finish up on day 5.
Output:4 Because it took you 4 days to visit all the cities at least once
My solution : I do have an O(N^2) solution that tries out all combinations of cities. But the test said that the ideal time and space complexity should be O(N). How do I do this?
def findmin(A):
hashtable1={}
locationcount=0
#get the number of unique locations
for x in A:
if A[x] not in hashtable1:
locationcount+=1
index1=0
daycount=sys.maxint
hashtable2={}
#brute force
while index1<len(A):
index2=index1
prevday=index2
ans=0
count1=0
while index2<len(A):
if A[index2] not in hashtable2:
count1+=1
ans+=(index2-prevday)
hashtable2[A[index2]]=1
index2+=1
if count1==count:
daycount=min(ans,daycount)
hashtable2.clear()
index1+=1
return daycount+1
This problem might be solved with two-pointer approach.
Some data structure should contain element counts in current window. Perhaps your hash table is suitable.
Set left and right pointer to the start of list.
Move right pointer, incrementing table entries for elements like this:
hashtable2[A[rightindex]] = hashtable2[A[rightindex]] + 1
When all (locationcount) table entries become non-zero, stop moving right pointer. You have left-right interval covering all cities. Remember interval length.
Now move left pointer, decrementing table entries. When some table entry becomes zero, stop moving left pointer.
Move right pointer again. Repeat until the list end.
Note that indexes run the list only once, and complexity is linear (if table entry update is O(1), as hash map provides in average)
I had this problem in interview and failed as I thought about a moving windows too late. I took it a few days later and here is my C# solution which I think is O(n) (the array will be parsed at most 2 times).
The remaining difficulty after my flash was to understand how to update the end pointer. There's probably a better solution, my solution will always provide the highest possible starting and ending days even if the vacation could be started earlier.
public int solution(int[] A) {
if (A.Length is 0 or 1) {
return A.Length;
}
var startingIndex = 0;
var endingIndex = 0;
var locationVisitedCounter = new int[A.Length];
locationVisitedCounter[A[0] - 1] = 1;
for (var i=1; i<A.Length; i++)
{
var locationIndex = A[i] - 1;
locationVisitedCounter[locationIndex]++;
if (A[i] == A[i - 1])
{
continue;
}
endingIndex=i;
while (locationVisitedCounter[A[startingIndex] - 1] > 1)
{
locationVisitedCounter[A[startingIndex] - 1]--;
startingIndex++;
}
}
return endingIndex - startingIndex + 1;
}
I solved it using two-pointer approach, pointer i is for moving the pointer forward, pointer j is to move towards getting the optimal solution.
Time Complexity: O(2*N)
def solution(A):
n = len(A)
hashSet = dict()
max_count = len(set(A))
i = 0
j = 0
result = float("inf")
while i < n:
if A[i] in hashSet:
hashSet[A[i]] += 1
else:
hashSet[A[i]] = 1
if len(hashSet) == max_count:
result = min(result, i-j)
while len(hashSet) == max_count and j<=i:
hashSet[A[j]] -= 1
if hashSet[A[j]] == 0:
del hashSet[A[j]]
j+=1
if len(hashSet) < max_count:
break
result = min(result, i-j)
if result == max_count:
return result
j+=1
i+=1
return result
Python solution
def vacation(A):
# Get all unique vacation locations
v_set = set(A)
a_l = len(A)
day_count = 0
# Maximum days to cover all locations will be the length of the array
max_day_count = a_l
for i in range(a_l):
count = 0
v_set_copy = v_set.copy()
# Starting point to find next number of days
#that covers all unique locations
for j in range(i, a_l):
# Remove from set, if the location exists,
# meaning we have visited the location
if (A[j] in v_set_copy):
v_set_copy.remove(A[j])
else:
pass
count = count + 1
# If we have visited all locations,
# determine the current minimum days needed to visit all and break
if (len(v_set_copy) == 0):
day_count = min(count, max_day_count)
max_day_count = day_count
break
return day_count
from L = 0 move right until all distinct locations are visited; say R
maintain a map of element to frequency from L to R inclusive
until R == n - 1 OR L - R + 1 == distinct element count:-
Increase L, until we get an invalid window, i.e. map's element freq becomes 0
Increase R by 1 and update map.
For reference, this question kind of is related to leetcode question 76.Minimum Window Substring. You can watch the solution here NeetCode. My solution in python following the same tutorial.
def solution(A):
if not A: return
locations = dict()
for location in A:
locations[location] = 0
res,resLen = [-1,-1],float("infinity")
# left_pointer, right_pointer
lp,rp = 0,0
for rp in range(len(A)):
locations[A[rp]] = locations.get(A[rp],0) + 1
while (0 not in locations.values()):
if(rp - lp + 1) < resLen:
res = [lp,rp]
resLen = (rp-lp + 1)
locations[A[lp]] -= 1
lp += 1
lp,rp = res
return len(A[lp:rp+1]) if resLen != float("infinity") else 0
A = [7,4,7,3,4,1,7]
# A= [2,1,1,3,2,1,1,3]
# A = [7,3,2,3,1,2,1,7,7,1]
print(solution(A=A))
Although others have posted their answers, I think my solution is a little bit simpler and neater, I hope it helps.
from collections import defaultdict
from typing import List
# it is a sliding window problem
def min_days_to_visit_all_cities(arr: List[int]):
no_of_places = len(set(arr))
l, r = 0, 0
place_to_count = defaultdict(int)
res = len(arr)
while r < len(arr):
while r < len(arr) and len(place_to_count) < no_of_places:
place_to_count[arr[r]] += 1
r += 1
while len(place_to_count) >= no_of_places:
res = min(res, r - l)
place_to_count[arr[l]] -= 1
if place_to_count[arr[l]] == 0:
del place_to_count[arr[l]]
l += 1
return res
I got an interview and the question is how to get the length of repeating decimal?
For example
1/3=0.3333..., it returns 1,
5/7=0.7142857142857143, it returns 6, since 714285 is the repeating decimal.
1/15=0.066666666666666, it returns 1.
17/150=0.11333333333333333, it returns 1. since 3 is the repeating decimal.
And I have tried to write a code
def solution(a, b):
n = a % b
if n == 0:
return 0
mem = []
n *= 10
while True:
n = n % b
if n == 0:
return 0
if n in mem:
i = mem.index(n)
return len(mem[i:])
else:
mem.append(n)
n *= 10
However, my code can't pass all tests. And it's time complexity is O(n*logn). How can I improve that and make its time complexity O(n)?
Probably the proper way is to follow the math stack exchange link suggested by #Henry. But concerning your code here's my optimized version of it. The key point here is to use dictionary instead of array - the in operation is much faster in this case.
def solution(a, b):
n = a % b
if n == 0:
return 0
mem = {}
n *= 10
pos = 0
while True:
pos += 1
n = n % b
if n == 0:
return 0
if n in mem:
i = mem[n]
return pos - i
else:
mem[n] = pos
n *= 10
On my computer for 29/39916801 this code finishes calculations in several seconds.
I wrote some code for Project Euler Problem 35:
#Project Euler: Problem 35
import time
start = time.time()
def sieve_erat(n):
'''creates list of all primes < n'''
x = range(2,n)
b = 0
while x[b] < int(n ** 0.5) + 1:
x = filter(lambda y: y % x[b] != 0 or y == x[b], x)
b += 1
else:
return x
def circularPrimes(n):
'''returns # of circular primes below n'''
count = 0
primes = sieve_erat(n)
b = set(primes)
for prime in primes:
inc = 0
a = str(prime)
while inc < len(a):
if int(a) not in b:
break
a = a[-1] + a[0:len(a) - 1]
inc += 1
else:
count += 1
else:
return count
print circularPrimes(1000000)
elapsed = (time.time() - start)
print "Found in %s seconds" % elapsed
I am wondering why this code (above) runs so much faster when I set b = set(primes) in the circularPrimes function. The running time for this code is about 8 seconds. Initially, I did not set b = set(primes) and my circularPrimes function was this:
def circularPrimes(n):
'''returns # of circular primes below n'''
count = 0
primes = sieve_erat(n)
for prime in primes:
inc = 0
a = str(prime)
while inc < len(a):
if int(a) not in primes:
break
a = a[-1] + a[0:len(a) - 1]
inc += 1
else:
count += 1
else:
return count
My initial code (without b = set(primes)) ran so long that I didn't wait for it to finish. I am curious as to why there is such a large discrepancy in terms of running time between the two pieces of code as I do not believe that primes would have had any duplicates that would have made iterating through it take so much longer that iterating through set(primes). Maybe my idea of set( ) is wrong. Any help is welcome.
I believe the culprit here is if int(a) not in b:. Sets are implemented internally as hashtables, meaning that checking for membership is significantly less expensive than with a list (since you just need to check for collision).
You can check out the innards of sets here.