Further Optimisation of Project Euler problem 14 (Collatz Sequence) - python

When I first starting trying the question, my code would take over a minute to even finish running and give me the answer. I have already tried dynamic programming and storing previous numbers so it doesn't have to run the same number multiple times. I have also tried compacting (n3)+1 and n / 2 into a single line with ((n3)+1) but both of these has only managed to cut my code to 10 seconds. Is there anything else I can try to speed up my code?
def Collatz(n):
dic = {a: 0 for a in range(1,1000000)}
dic[1] = 0
dic[2] = 1
number,length = 1,1
for i in range(3,n,1):
z = i
testlength = 0
loop = "T"
while loop == "T":
if z % 2 == 0:
z = z / 2
testlength += 1
else:
z = ((z*3)+1) / 2
testlength += 2
if z < i:
testlength += dic[z]
loop = "F"
dic[i] = testlength
if testlength > length:
print(i,testlength)
number,length = i,testlength
return number,length
print(Collatz(1000000))

When you calculate the sequence for one input, you find out the sequence length for all the intermediate values. It helps to remember all of these in the dictionary so you never have to calculate a sequence twice of any number < n.
I also started at (n-1)//2, since there's no point testing any number x if 2x is going to be tested later, because 2x will certainly have a longer sequence:
def Collatz(n):
dic = [-1]*n
dic[1] = 0
bestlen = 0
bestval = 1
q=[]
for i in range((n-1)//2,n,1):
q.clear()
z = i
while z >= n or dic[z] < 0:
q.append(z)
if z % 2 == 0:
z = z//2
else:
z = z*3+1
testlen = len(q)+dic[z]
if testlen > bestlen:
bestlen = testlen
bestval = i
print (bestval, bestlen)
for j in range(0,len(q)):
z = q[j]
if z < n:
dic[z] = testlen-j
return bestval, bestlen
print(Collatz(1000000))

Although the answer from Matt Timmermanns is fast, it is not quite as easy to understand as a recursive function. Here is my attempt that is actually faster for n = 10*million and perhaps easier to understand...
f = 10000000
def collatz(n):
if n>=collatz.bounds:
if (n % 4) == 0:
return collatz(n//4)+2
if (n % 2) == 0:
return collatz(n//2)+1
return collatz((3*n+1)//2)+2
if collatz.memory[n]>=0:
return collatz.memory[n]
if (n % 2) == 0:
count = collatz(n//2)+1
else:
count = collatz((3*n+1)//2)+2
collatz.memory[n] = count
return count
collatz.memory = [-1]*f
collatz.memory[1] = 0
collatz.bounds = f
highest = max(collatz(i) for i in range(f//2, f+1))
highest_n = collatz.memory.index(highest)
print(f"collatz({highest_n}) is {highest}")
My results:
$ time /usr/bin/python3 collatz.py
collatz(8400511) is 685
real 0m9.445s
user 0m9.375s
sys 0m0.060s
Compared to
$ time /usr/bin/python3 mattsCollatz.py
(8400511, 685)
real 0m10.672s
user 0m10.599s
sys 0m0.066s

Related

How can i sequence for a range(5,10000) until it hits 1?

i am new to python and i am currently working on a task for my university. The question is the following:
Given that f(x) = x / 2 if x is even and f(x) = 3*x+1 if x is odd, how do i build a loop that picks a number from a range(5,10000) and sequences it for as long as if it hits 1, it stops. Right now i only accomplished that my loop sorts it in different lists. At least :D
This is my current code:
odd = []
even = []
for num in range (5,10000):
if num % 2 == 0:
even.append(sum)
else:
if num % 2 == 1:
odd.append(sum)
This is famous math problem known as Collatz conjecture to make it simple we will perform the function 2x if x is even and 3x+1 if x is odd till it becomes 1. 1 is the minimum possible value of this sequence.
import random
def collatz_sequence(x):
seq = [x]
if x < 1:
return []
while x > 1:
if x % 2 == 0:
x = x / 2
else:
x = 3 * x + 1
seq.append(x)
return seq
maxLength = -1
maxNum = 1
for num in range(5, 10001):
currseq = collatz_sequence(num)
print(currseq)
currseq_len = len(currseq)
if currseq_len > maxLength:
maxLength = currseq_len
maxNum = num
print(maxNum, maxLength)

Having trouble with implementeing the Miller-Rabin compositeness in Python

I'm not sure if this is the right place to post this question so if it isn't let me know! I'm trying to implement the Miller Rabin test in python. The test is to find the first composite number that is a witness to N, an odd number. My code works for numbers that are somewhat smaller in length but stops working when I enter a huge number. (The "challenge" wants to find the witness of N := 14779897919793955962530084256322859998604150108176966387469447864639173396414229372284183833167 in which my code returns that it is prime when it isn't) The first part of the test is to convert N into the form 2^k + q, where q is a prime number.
Is there some limit with python that doesn't allow huge numbers for this?
Here is my code for that portion of the test.
def convertN(n): #this turns n into 2^x * q
placeholder = False
list = []
#this will be x in the equation
count = 1
while placeholder == False:
#x = result of division of 2^count
x = (n / (2**count))
#y tells if we can divide by 2 again or not
y = x%2
#if y != 0, it means that we cannot divide by 2, loop exits
if y != 0:
placeholder = True
list.append(count) #x
list.append(x) #q
else:
count += 1
#makes list to return
#print(list)
return list
The code for the actual test:
def test(N):
#if even return false
if N == 2 | N%2 == 0:
return "even"
#convert number to 2^k+q and put into said variables
n = N - 1
nArray = convertN(n)
k = nArray[0]
q = int(nArray[1])
#this is the upper limit a witness can be
limit = int(math.floor(2 * (math.log(N))**2))
#Checks when 2^q*k = 1 mod N
for a in range(2,limit):
modu = pow(a,q,N)
for i in range(k):
print(a,i,modu)
if i==0:
if modu == 1:
break
elif modu == -1:
break
elif i != 0:
if modu == 1:
#print(i)
return a
#instead of recalculating 2^q*k+1, can square old result and modN that.
modu = pow(modu,2,N)
Any feedback is appreciated!
I don't like unanswered questions so I decided to give a small update.
So as it turns out I was entering the wrong number from the start. Along with that my code should have tested not for when it equaled to 1 but if it equaled -1 from the 2nd part.
The fixed code for the checking
#Checks when 2^q*k = 1 mod N
for a in range(2,limit):
modu = pow(a,q,N)
witness = True #I couldn't think of a better way of doing this so I decided to go with a boolean value. So if any of values of -1 or 1 when i = 0 pop up, we know it's not a witness.
for i in range(k):
print(a,i,modu)
if i==0:
if modu == 1:
witness = False
break
elif modu == -1:
witness = False
break
#instead of recalculating 2^q*k+1, can square old result and modN that.
modu = pow(modu,2,N)
if(witness == True):
return a
Mei, i wrote a Miller Rabin Test in python, the Miller Rabin part is threaded so it's very fast, faster than sympy, for larger numbers:
import math
def strailing(N):
return N>>lars_last_powers_of_two_trailing(N)
def lars_last_powers_of_two_trailing(N):
""" This utilizes a bit trick to find the trailing zeros in a number
Finding the trailing number of zeros is simply a lookup for most
numbers and only in the case of 1 do you have to shift to find the
number of zeros, so there is no need to bit shift in 7 of 8 cases.
In those 7 cases, it's simply a lookup to find the amount of zeros.
"""
p,y=1,2
orign = N
N = N&15
if N == 1:
if ((orign -1) & (orign -2)) == 0: return orign.bit_length()-1
while orign&y == 0:
p+=1
y<<=1
return p
if N in [3, 7, 11, 15]: return 1
if N in [5, 13]: return 2
if N == 9: return 3
return 0
def primes_sieve2(limit):
a = [True] * limit
a[0] = a[1] = False
for (i, isprime) in enumerate(a):
if isprime:
yield i
for n in range(i*i, limit, i):
a[n] = False
def llinear_diophantinex(a, b, divmodx=1, x=1, y=0, offset=0, withstats=False, pow_mod_p2=False):
""" For the case we use here, using a
llinear_diophantinex(num, 1<<num.bit_length()) returns the
same result as a
pow(num, 1<<num.bit_length()-1, 1<<num.bit_length()). This
is 100 to 1000x times faster so we use this instead of a pow.
The extra code is worth it for the time savings.
"""
origa, origb = a, b
r=a
q = a//b
prevq=1
#k = powp2x(a)
if a == 1:
return 1
if withstats == True:
print(f"a = {a}, b = {b}, q = {q}, r = {r}")
while r != 0:
prevr = r
a,r,b = b, b, r
q,r = divmod(a,b)
x, y = y, x - q * y
if withstats == True:
print(f"a = {a}, b = {b}, q = {q}, r = {r}, x = {x}, y = {y}")
y = 1 - origb*x // origa - 1
if withstats == True:
print(f"x = {x}, y = {y}")
x,y=y,x
modx = (-abs(x)*divmodx)%origb
if withstats == True:
print(f"x = {x}, y = {y}, modx = {modx}")
if pow_mod_p2==False:
return (x*divmodx)%origb, y, modx, (origa)%origb
else:
if x < 0: return (modx*divmodx)%origb
else: return (x*divmodx)%origb
def MillerRabin(arglist):
""" This is a standard MillerRabin Test, but refactored so it can be
used with multi threading, so you can run a pool of MillerRabin
tests at the same time.
"""
N = arglist[0]
primetest = arglist[1]
iterx = arglist[2]
powx = arglist[3]
withstats = arglist[4]
primetest = pow(primetest, powx, N)
if withstats == True:
print("first: ",primetest)
if primetest == 1 or primetest == N - 1:
return True
else:
for x in range(0, iterx-1):
primetest = pow(primetest, 2, N)
if withstats == True:
print("else: ", primetest)
if primetest == N - 1: return True
if primetest == 1: return False
return False
# For trial division, we setup this global variable to hold primes
# up to 1,000,000
SFACTORINT_PRIMES=list(primes_sieve2(100000))
# Uses MillerRabin in a unique algorithimically deterministic way and
# also uses multithreading so all MillerRabin Tests are performed at
# the same time, speeding up the isprime test by a factor of 5 or more.
# More k tests can be performed than 5, but in my testing i've found
# that's all you need.
def sfactorint_isprime(N, kn=5, trialdivision=True, withstats=False):
from multiprocessing import Pool
if N == 2:
return True
if N % 2 == 0:
return False
if N < 2:
return False
# Trial Division Factoring
if trialdivision == True:
for xx in SFACTORINT_PRIMES:
if N%xx == 0 and N != xx:
return False
iterx = lars_last_powers_of_two_trailing(N)
""" This k test is a deterministic algorithmic test builder instead of
using random numbers. The offset of k, from -2 to +2 produces pow
tests that fail or pass instead of having to use random numbers
and more iterations. All you need are those 5 numbers from k to
get a primality answer. I've tested this against all numbers in
https://oeis.org/A001262/b001262.txt and all fail, plus other
exhaustive testing comparing to other isprimes to confirm it's
accuracy.
"""
k = llinear_diophantinex(N, 1<<N.bit_length(), pow_mod_p2=True) - 1
t = N >> iterx
tests = []
if kn % 2 == 0: offset = 0
else: offset = 1
for ktest in range(-(kn//2), (kn//2)+offset):
tests.append(k+ktest)
for primetest in range(len(tests)):
if tests[primetest] >= N:
tests[primetest] %= N
arglist = []
for primetest in range(len(tests)):
if tests[primetest] >= 2:
arglist.append([N, tests[primetest], iterx, t, withstats])
with Pool(kn) as p:
s=p.map(MillerRabin, arglist)
if s.count(True) == len(arglist): return True
else: return False
sinn=14779897919793955962530084256322859998604150108176966387469447864639173396414229372284183833167
print(sfactorint_isprime(sinn))

Python 3: Optimizing Project Euler Problem #14

I'm trying to solve the Hackerrank Project Euler Problem #14 (Longest Collatz sequence) using Python 3. Following is my implementation.
cache_limit = 5000001
lookup = [0] * cache_limit
lookup[1] = 1
def collatz(num):
if num == 1:
return 1
elif num % 2 == 0:
return num >> 1
else:
return (3 * num) + 1
def compute(start):
global cache_limit
global lookup
cur = start
count = 1
while cur > 1:
count += 1
if cur < cache_limit:
retrieved_count = lookup[cur]
if retrieved_count > 0:
count = count + retrieved_count - 2
break
else:
cur = collatz(cur)
else:
cur = collatz(cur)
if start < cache_limit:
lookup[start] = count
return count
def main(tc):
test_cases = [int(input()) for _ in range(tc)]
bound = max(test_cases)
results = [0] * (bound + 1)
start = 1
maxCount = 1
for i in range(1, bound + 1):
count = compute(i)
if count >= maxCount:
maxCount = count
start = i
results[i] = start
for tc in test_cases:
print(results[tc])
if __name__ == "__main__":
tc = int(input())
main(tc)
There are 12 test cases. The above implementation passes till test case #8 but fails for test cases #9 through #12 with the following reason.
Terminated due to timeout
I'm stuck with this for a while now. Not sure what else can be done here.
What else can be optimized here so that I stop getting timed out?
Any help will be appreciated :)
Note: Using the above implementation, I'm able to solve the actual Project Euler Problem #14. It is giving timeout only for those 4 test cases in hackerrank.
Yes, there are things you can do to your code to optimize it. But I think, more importantly, there is a mathematical observation you need to consider which is at the heart of the problem:
whenever n is odd, then 3 * n + 1 is always even.
Given this, one can always divide (3 * n + 1) by 2. And that saves one a fair bit of time...
Here is an improvement (it takes 1.6 seconds): there is no need to compute the sequence of every number. You can create a dictionary and store the number of the elements of a sequence. If a number that has appeared already comes up, the sequence is computed as dic[original_number] = dic[n] + count - 1. This saves a lot of time.
import time
start = time.time()
def main(n,dic):
'''Counts the elements of the sequence starting at n and finishing at 1'''
count = 1
original_number = n
while True:
if n < original_number:
dic[original_number] = dic[n] + count - 1 #-1 because when n < original_number, n is counted twice otherwise
break
if n == 1:
dic[original_number] = count
break
if (n % 2 == 0):
n = n/2
else:
n = 3*n + 1
count += 1
return dic
limit = 10**6
dic = {n:0 for n in range(1,limit+1)}
if __name__ == '__main__':
n = 1
while n < limit:
dic=main(n,dic)
n += 1
print('Longest chain: ', max(dic.values()))
print('Number that gives the longest chain: ', max(dic, key=dic.get))
end = time.time()
print('Time taken:', end-start)
The trick to solve this question is to compute the answers for only largest input and save the result as lookup for all smaller inputs rather than calculating for extreme upper bound.
Here is my implementation which passes all the Test Cases.(Python3)
MAX = int(5 * 1e6)
ans = [0]
steps = [0]*(MAX+1)
def solve(N):
if N < MAX+1:
if steps[N] != 0:
return steps[N]
if N == 1:
return 0
else:
if N % 2 != 0:
result = 1+ solve(3*N + 1) # This is recursion
else:
result = 1 + solve(N>>1) # This is recursion
if N < MAX+1:
steps[N]=result # This is memoization
return result
inputs = [int(input()) for _ in range(int(input()))]
largest = max(inputs)
mx = 0
collatz=1
for i in range(1,largest+1):
curr_count=solve(i)
if curr_count >= mx:
mx = curr_count
collatz = i
ans.append(collatz)
for _ in inputs:
print(ans[_])
this is my brute force take:
'
#counter
C = 0
N = 0
for i in range(1,1000001):
n = i
c = 0
while n != 1:
if n % 2 == 0:
_next = n/2
else:
_next= 3*n+1
c = c + 1
n = _next
if c > C:
C = c
N = i
print(N,C)
Here's my implementation(for the question specifically on Project Euler website):
num = 1
limit = int(input())
seq_list = []
while num < limit:
sequence_num = 0
n = num
if n == 1:
sequence_num = 1
else:
while n != 1:
if n % 2 == 0:
n = n / 2
sequence_num += 1
else:
n = 3 * n + 1
sequence_num += 1
sequence_num += 1
seq_list.append(sequence_num)
num += 1
k = seq_list.index(max(seq_list))
print(k + 1)

Why does set( ) make this code run so much faster?

I wrote some code for Project Euler Problem 35:
#Project Euler: Problem 35
import time
start = time.time()
def sieve_erat(n):
'''creates list of all primes < n'''
x = range(2,n)
b = 0
while x[b] < int(n ** 0.5) + 1:
x = filter(lambda y: y % x[b] != 0 or y == x[b], x)
b += 1
else:
return x
def circularPrimes(n):
'''returns # of circular primes below n'''
count = 0
primes = sieve_erat(n)
b = set(primes)
for prime in primes:
inc = 0
a = str(prime)
while inc < len(a):
if int(a) not in b:
break
a = a[-1] + a[0:len(a) - 1]
inc += 1
else:
count += 1
else:
return count
print circularPrimes(1000000)
elapsed = (time.time() - start)
print "Found in %s seconds" % elapsed
I am wondering why this code (above) runs so much faster when I set b = set(primes) in the circularPrimes function. The running time for this code is about 8 seconds. Initially, I did not set b = set(primes) and my circularPrimes function was this:
def circularPrimes(n):
'''returns # of circular primes below n'''
count = 0
primes = sieve_erat(n)
for prime in primes:
inc = 0
a = str(prime)
while inc < len(a):
if int(a) not in primes:
break
a = a[-1] + a[0:len(a) - 1]
inc += 1
else:
count += 1
else:
return count
My initial code (without b = set(primes)) ran so long that I didn't wait for it to finish. I am curious as to why there is such a large discrepancy in terms of running time between the two pieces of code as I do not believe that primes would have had any duplicates that would have made iterating through it take so much longer that iterating through set(primes). Maybe my idea of set( ) is wrong. Any help is welcome.
I believe the culprit here is if int(a) not in b:. Sets are implemented internally as hashtables, meaning that checking for membership is significantly less expensive than with a list (since you just need to check for collision).
You can check out the innards of sets here.

sum of factors of a given number

i have written this code which finds factors of a number .after thinking and trying so much i could not get the sum of the numbers I get in output.I wish to get the sum of these numbers as output recursively.here's my code:
def p(n,c):
s = 0
if c >= n:
return n
if n % c == 0:
s += c
print(s,end=',')
return p(n,c+1)
n = int(input('enter no:'))
c = 1
print(p(n,c))
Given the comments, it appears that this might be what you want:
sum([n for n in xrange(1,24) if 24 % n == 0])
To make it a bit more generic:
def sum_of_factors(x):
return sum([n for n in xrange(1,x) if x % n == 0])
EDIT: here's a recursive version:
def sum_of_factors(x, y=1):
if (y >= x):
return 0
if (x % y == 0):
return y + sum_of_factors(x, y + 1)
return sum_of_factors(x, y + 1)
>>> sum_of_factors(24)
36
Is this the output you are looking for?
Use global variable,
s = 0
def p(n,c):
global s
if c >= n:
return n
if n % c == 0:
s += c
print(s,end=',')
return p(n,c+1)
n = int(input('enter no:'))
c = 1
print(p(n,c))
Output
enter no:1,3,6,10,16,24,36,24

Categories