How can I get the length of repeating decimal? - python

I got an interview and the question is how to get the length of repeating decimal?
For example
1/3=0.3333..., it returns 1,
5/7=0.7142857142857143, it returns 6, since 714285 is the repeating decimal.
1/15=0.066666666666666, it returns 1.
17/150=0.11333333333333333, it returns 1. since 3 is the repeating decimal.
And I have tried to write a code
def solution(a, b):
n = a % b
if n == 0:
return 0
mem = []
n *= 10
while True:
n = n % b
if n == 0:
return 0
if n in mem:
i = mem.index(n)
return len(mem[i:])
else:
mem.append(n)
n *= 10
However, my code can't pass all tests. And it's time complexity is O(n*logn). How can I improve that and make its time complexity O(n)?

Probably the proper way is to follow the math stack exchange link suggested by #Henry. But concerning your code here's my optimized version of it. The key point here is to use dictionary instead of array - the in operation is much faster in this case.
def solution(a, b):
n = a % b
if n == 0:
return 0
mem = {}
n *= 10
pos = 0
while True:
pos += 1
n = n % b
if n == 0:
return 0
if n in mem:
i = mem[n]
return pos - i
else:
mem[n] = pos
n *= 10
On my computer for 29/39916801 this code finishes calculations in several seconds.

Related

Further Optimisation of Project Euler problem 14 (Collatz Sequence)

When I first starting trying the question, my code would take over a minute to even finish running and give me the answer. I have already tried dynamic programming and storing previous numbers so it doesn't have to run the same number multiple times. I have also tried compacting (n3)+1 and n / 2 into a single line with ((n3)+1) but both of these has only managed to cut my code to 10 seconds. Is there anything else I can try to speed up my code?
def Collatz(n):
dic = {a: 0 for a in range(1,1000000)}
dic[1] = 0
dic[2] = 1
number,length = 1,1
for i in range(3,n,1):
z = i
testlength = 0
loop = "T"
while loop == "T":
if z % 2 == 0:
z = z / 2
testlength += 1
else:
z = ((z*3)+1) / 2
testlength += 2
if z < i:
testlength += dic[z]
loop = "F"
dic[i] = testlength
if testlength > length:
print(i,testlength)
number,length = i,testlength
return number,length
print(Collatz(1000000))
When you calculate the sequence for one input, you find out the sequence length for all the intermediate values. It helps to remember all of these in the dictionary so you never have to calculate a sequence twice of any number < n.
I also started at (n-1)//2, since there's no point testing any number x if 2x is going to be tested later, because 2x will certainly have a longer sequence:
def Collatz(n):
dic = [-1]*n
dic[1] = 0
bestlen = 0
bestval = 1
q=[]
for i in range((n-1)//2,n,1):
q.clear()
z = i
while z >= n or dic[z] < 0:
q.append(z)
if z % 2 == 0:
z = z//2
else:
z = z*3+1
testlen = len(q)+dic[z]
if testlen > bestlen:
bestlen = testlen
bestval = i
print (bestval, bestlen)
for j in range(0,len(q)):
z = q[j]
if z < n:
dic[z] = testlen-j
return bestval, bestlen
print(Collatz(1000000))
Although the answer from Matt Timmermanns is fast, it is not quite as easy to understand as a recursive function. Here is my attempt that is actually faster for n = 10*million and perhaps easier to understand...
f = 10000000
def collatz(n):
if n>=collatz.bounds:
if (n % 4) == 0:
return collatz(n//4)+2
if (n % 2) == 0:
return collatz(n//2)+1
return collatz((3*n+1)//2)+2
if collatz.memory[n]>=0:
return collatz.memory[n]
if (n % 2) == 0:
count = collatz(n//2)+1
else:
count = collatz((3*n+1)//2)+2
collatz.memory[n] = count
return count
collatz.memory = [-1]*f
collatz.memory[1] = 0
collatz.bounds = f
highest = max(collatz(i) for i in range(f//2, f+1))
highest_n = collatz.memory.index(highest)
print(f"collatz({highest_n}) is {highest}")
My results:
$ time /usr/bin/python3 collatz.py
collatz(8400511) is 685
real 0m9.445s
user 0m9.375s
sys 0m0.060s
Compared to
$ time /usr/bin/python3 mattsCollatz.py
(8400511, 685)
real 0m10.672s
user 0m10.599s
sys 0m0.066s

How do you use a loop to get the least significant digit of a number?

I am a little confused with what the question is asking me to do here —-> (https://i.stack.imgur.com/TSfHH.jpg)
This is using python and the rules are:
Only loops and conditionals can be used
Here are two solutions to your problem. First one is using recursion and the second one is counter.
def contains_two_fives_loop(n):
"""Using loop."""
counter = 0
while n:
counter += n % 10 == 5
n //= 10
return 2 <= counter
def contains_two_fives_recursion(n):
"""Using recursion."""
return 2 <= (n % 10) == 5 + contains_two_fives(n // 10)
def contains_two_fives_str_counter(n):
"""Convert to string and count 5s in the string."""
return 2 <= str(n).count("5")

Down to zero problem - getting time exceeded error

Trying to solve hackerrank problem.
You are given Q queries. Each query consists of a single number N. You can perform 2 operations on N in each move. If N=a×b(a≠1, b≠1), we can change N=max(a,b) or decrease the value of N by 1.
Determine the minimum number of moves required to reduce the value of N to 0.
I have used BFS approach to solve this.
a. Generating all prime numbers using seive
b. using prime numbers I can simply avoid calculating the factors
c. I enqueue -1 along with all the factors to get to zero.
d. I have also used previous results to not enqueue encountered data.
This still is giving me time exceeded. Any idea? Added comments also in the code.
import math
#find out all the prime numbers
primes = [1]*(1000000+1)
primes[0] = 0
primes[1] = 0
for i in range(2, 1000000+1):
if primes[i] == 1:
j = 2
while i*j < 1000000:
primes[i*j] = 0
j += 1
n = int(input())
for i in range(n):
memoize= [-1 for i in range(1000000)]
count = 0
n = int(input())
queue = []
queue.append((n, count))
while len(queue):
data, count = queue.pop(0)
if data <= 1:
count += 1
break
#if it is a prime number then just enqueue -1
if primes[data] == 1 and memoize[data-1] == -1:
queue.append((data-1, count+1))
memoize[data-1] = 1
continue
#enqueue -1 along with all the factors
queue.append((data-1, count+1))
sqr = int(math.sqrt(data))
for i in range(sqr, 1, -1):
if data%i == 0:
div = max(int(data/i), i)
if memoize[div] == -1:
memoize[div] = 1
queue.append((div, count+1))
print(count)
There are two large causes of slowness with this code.
Clearing an array is slower than clearing a set
The first problem is this line:
memoize= [-1 for i in range(1000000)]
this prepares 1 million integers and is executed for each of your 1000 test cases. A faster approach is to simply use a Python set to indicate which values have already been visited.
Unnecessary loop being executed
The second problem is this line:
if primes[data] == 1 and memoize[data-1] == -1:
If you have a prime number, and you have already visited this number, you actually do the slow loop searching for prime factors which will never find any solutions (because it is a prime).
Faster code
In fact, the improvement due to using sets is so much that you don't even need your prime testing code and the following code passes all tests within the time limit:
import math
n = int(input())
for i in range(n):
memoize = set()
count = 0
n = int(input())
queue = []
queue.append((n, count))
while len(queue):
data, count = queue.pop(0)
if data <= 1:
if data==1:
count += 1
break
if data-1 not in memoize:
memoize.add(data-1)
queue.append((data-1, count+1))
sqr = int(math.sqrt(data))
for i in range(sqr, 1, -1):
if data%i == 0:
div = max(int(data/i), i)
if div not in memoize:
memoize.add(div)
queue.append((div, count+1))
print(count)
Alternatively, there's a O(n*sqrt(n)) time and O(n) space complexity solution that passes all the test cases just fine.
The idea is to cache minimum counts for each non-negative integer number up to 1,000,000 (the maximum possible input number in the question) !!!BEFORE!!! running any query. After doing so, for each query just return a minimum count for a given number stored in the cache. So, retrieving a result will have O(1) time complexity per query.
To find minimal counts for each number (let's call it down2ZeroCounts), we should consider several cases:
0 and 1 have 0 and 1 minimal counts correspondingly.
Prime number p doesn't have factors other than 1 and itself. Hence, its minimal count is 1 plus a minimal count of p - 1 or more formally down2ZeroCounts[p] = down2ZeroCounts[p - 1] + 1.
For a composite number num it's a bit more complicated. For any pair of factors a > 1,b > 1 such that num = a*b the minimal count of num is either down2ZeroCounts[a] + 1 or down2ZeroCounts[b] + 1 or down2ZeroCounts[num - 1] + 1.
So, we can gradually build minimal counts for each number in ascending order. Calculating a minimal count of each consequent number will be based on optimal counts for lower numbers and so in the end a list of optimal counts will be built.
To better understand the approach please check the code:
from __future__ import print_function
import os
import sys
maxNumber = 1000000
down2ZeroCounts = [None] * 1000001
def cacheDown2ZeroCounts():
down2ZeroCounts[0] = 0
down2ZeroCounts[1] = 1
currentNum = 2
while currentNum <= maxNumber:
if down2ZeroCounts[currentNum] is None:
down2ZeroCounts[currentNum] = down2ZeroCounts[currentNum - 1] + 1
else:
down2ZeroCounts[currentNum] = min(down2ZeroCounts[currentNum - 1] + 1, down2ZeroCounts[currentNum])
for i in xrange(2, currentNum + 1):
product = i * currentNum
if product > maxNumber:
break
elif down2ZeroCounts[product] is not None:
down2ZeroCounts[product] = min(down2ZeroCounts[product], down2ZeroCounts[currentNum] + 1)
else:
down2ZeroCounts[product] = down2ZeroCounts[currentNum] + 1
currentNum += 1
def downToZero(n):
return down2ZeroCounts[n]
if __name__ == '__main__':
fptr = open(os.environ['OUTPUT_PATH'], 'w')
q = int(raw_input())
cacheDown2ZeroCounts()
for q_itr in xrange(q):
n = int(raw_input())
result = downToZero(n)
fptr.write(str(result) + '\n')
fptr.close()

Python - Why does this prime factorization function get better performance from this?

I have this prime factorization function that I wrote:
def prime_factorization(n):
prime_factors = {}
for i in _prime_candidates(n):
if n % i == 0:
prime_factors[i] = 0
while n % i == 0:
n /= i
prime_factors[i] += 1
if n != 1: prime_factors[int(n)] = 1
return prime_factors
def _prime_candidates(n):
yield 2
for i in range(3, int(n**.5)+1, 2):
yield i
It takes around 0.387 seconds on my machine for n = 10^13. But if I copy the content of the for loop and run it for the number 2 before running the actual for loop, I get the same correct results but with a running time of about 0.003 seconds for n = 10^13. You can see that code below:
def prime_factorization(n):
prime_factors = {}
if n % 2 == 0:
prime_factors[2] = 0
while n % 2 == 0:
n /= 2
prime_factors[2] += 1
for i in _prime_candidates(n):
if n % i == 0:
prime_factors[i] = 0
while n % i == 0:
n /= i
prime_factors[i] += 1
if n != 1: prime_factors[int(n)] = 1
return prime_factors
def _prime_candidates(n):
yield 2
for i in range(3, int(n**.5)+1, 2):
yield i
Why does this cause such a massive performance gain?
Edit: I'm using Python 3.5 and I'm using the clock() function of the time module to benchmark.
In your initial version, _prime_candidates gets passed 10^13, so it generates candidates up to the square root of that.
In your second version, _prime_candidates gets passed 5^13, because all the factors of 2 have been divided out. It generates a much smaller number of candidates to test.
By folding the _prime_candidates logic into prime_factorization and recomputing the upper bound whenever you find a factor, you can get an even better, more general improvement:
def prime_factorization(n):
prime_factors = {}
factor_multiplicity = 0
while n % 2 == 0:
n //= 2
factor_multiplicity += 1
if factor_multiplicity:
prime_factors[2] = factor_multiplicity
factor_bound = n**.5
candidate = 3
while candidate <= factor_bound:
factor_multiplicity = 0
while n % i == 0:
n //= i
factor_multiplicity += 1
if factor_multiplicity:
prime_factors[candidate] = factor_multiplicity
factor_bound = n**.5
candidate += 2
if n != 1:
prime_factors[n] = 1
return prime_factors
Note that for large enough n, the computation of n**.5 eventually generates the wrong bound due to the limits of floating-point precision. You could fix this by comparing candidate * candidate <= n, or by using something like the decimal module to compute the bound to sufficient precision.
The reason is inside _prime_candidates function.
In your first example it generates all numbers 3,5,...,3162277 and you try to divide your n by all these candidates.
In your second example you firstly greatly reduce your n so _prime_candidates generates numbers 3,5,...,34939. it's much less numbers to check.

Why does set( ) make this code run so much faster?

I wrote some code for Project Euler Problem 35:
#Project Euler: Problem 35
import time
start = time.time()
def sieve_erat(n):
'''creates list of all primes < n'''
x = range(2,n)
b = 0
while x[b] < int(n ** 0.5) + 1:
x = filter(lambda y: y % x[b] != 0 or y == x[b], x)
b += 1
else:
return x
def circularPrimes(n):
'''returns # of circular primes below n'''
count = 0
primes = sieve_erat(n)
b = set(primes)
for prime in primes:
inc = 0
a = str(prime)
while inc < len(a):
if int(a) not in b:
break
a = a[-1] + a[0:len(a) - 1]
inc += 1
else:
count += 1
else:
return count
print circularPrimes(1000000)
elapsed = (time.time() - start)
print "Found in %s seconds" % elapsed
I am wondering why this code (above) runs so much faster when I set b = set(primes) in the circularPrimes function. The running time for this code is about 8 seconds. Initially, I did not set b = set(primes) and my circularPrimes function was this:
def circularPrimes(n):
'''returns # of circular primes below n'''
count = 0
primes = sieve_erat(n)
for prime in primes:
inc = 0
a = str(prime)
while inc < len(a):
if int(a) not in primes:
break
a = a[-1] + a[0:len(a) - 1]
inc += 1
else:
count += 1
else:
return count
My initial code (without b = set(primes)) ran so long that I didn't wait for it to finish. I am curious as to why there is such a large discrepancy in terms of running time between the two pieces of code as I do not believe that primes would have had any duplicates that would have made iterating through it take so much longer that iterating through set(primes). Maybe my idea of set( ) is wrong. Any help is welcome.
I believe the culprit here is if int(a) not in b:. Sets are implemented internally as hashtables, meaning that checking for membership is significantly less expensive than with a list (since you just need to check for collision).
You can check out the innards of sets here.

Categories