Efficiently calculating mathematical formulas with exponents - python

I'm implementing a program that calculates an equation: F(n) = F(n-1) + 'a' + func1(func2(F(n-1))).
func1 takes every 'a' and makes it 'c' and every 'c' becomes 'a'.
func2 reverses the string (e.x. "xyz" becomes "zyx").
I want to calculate the Kth character of F(10**2017).
The basic rules are F(0) = "" (empty string), and examples are F(1) = "a", F(2) = "aac", and so on.
How do I do this efficiently?
The basic part of my code is this:
def op1 (str1):
if str1 == 'a':
return 'c'
else:
return 'a'
def op2 (str2):
return str2[::-1]
sinitial = ''
while (counter < 10**2017):
Finitial = Finitial + 'a' + op1(op2(Finitial))
counter += 1
print Finitial

Let's start by fixing your original code and defining a function to compute F(n) for small n. We'll also print out the first few values of F. All code below is for Python 3; if you're using Python 2, you'll need to make some minor changes, like replacing str.maketrans with string.maketrans and range with xrange.
swap_ac = str.maketrans({ord('a'): 'c', ord('c'): 'a'})
def F(n):
s = ''
for _ in range(n):
s = s + 'a' + s[::-1].translate(swap_ac)
return s
for n in range(7):
print("F({}) = {!r}".format(n, F(n)))
This gives the following output:
F(0) = ''
F(1) = 'a'
F(2) = 'aac'
F(3) = 'aacaacc'
F(4) = 'aacaaccaaaccacc'
F(5) = 'aacaaccaaaccaccaaacaacccaaccacc'
F(6) = 'aacaaccaaaccaccaaacaacccaaccaccaaacaaccaaaccacccaacaacccaaccacc'
A couple of observations at this point:
F(n) is a string of length 2**n-1. That means that F(n) grows fast. Computing F(50) would already require some serious hardware: even if we stored one character per bit, we'd need over 100 terabytes to store the full string. F(200) has more characters than there are estimated atoms in the solar system. So the idea of computing F(10**2017) directly is laughable: we need a different approach.
By construction, each F(n) is a prefix of F(n+1). So what we really have is a well-defined infinite string, where each F(n) merely gives us the first 2**n-1 characters of that infinite string, and we're looking to compute its kth character. And for any practical purpose, F(10**2017) might as well be that infinite string: for example, when we do our computation, we don't need to check that k < 2**(10**2017)-1, since a k exceeding this can't even be represented in normal binary notation in this universe.
Luckily, the structure of the string is simple enough that computing the kth character directly is straightforward. The major clue comes when we look at the characters at even and odd positions:
>>> F(6)[::2]
'acacacacacacacacacacacacacacacac'
>>> F(6)[1::2]
'aacaaccaaaccaccaaacaacccaaccacc'
The characters at even positions simply alternate between a and c (and it's straightforward to prove that this is true, based on the construction). So if our k is even, we can simply look at whether k/2 is odd or even to determine whether we'll get an a or a c.
What about the odd positions? Well F(6)[1::2] should look somewhat familiar: it's just F(5):
>>> F(6)[1::2] == F(5)
True
Again, it's straightforward to prove (e.g., by induction) that this isn't simply a coincidence, and that F(n+1)[1::2] == F(n) for all nonnegative n.
We now have an effective way to compute the kth character in our infinite string: if k is even, we just look at the parity of k/2. If k is odd, then we know that the character at position k is equal to that at position (k-1)/2. So here's a first solution to computing that character:
def char_at_pos(k):
"""
Return the character at position k of the string F(n), for any
n satisfying 2**n-1 > k.
"""
while k % 2 == 1:
k //= 2
return 'ac'[k//2%2]
And a check that this does the right thing:
>>> ''.join(char_at_pos(i) for i in range(2**6-1))
'aacaaccaaaccaccaaacaacccaaccaccaaacaaccaaaccacccaacaacccaaccacc'
>>> ''.join(char_at_pos(i) for i in range(2**6-1)) == F(6)
True
But we can do better. We're effectively staring at the binary representation of k, removing all trailing '1's and the next '0', then simply looking at the next bit to determine whether we've got an 'a' or a 'c'. Identifying the trailing 1s can be done by bit-operation trickery. This gives us the following semi-obfuscated loop-free solution, which I leave it to you to unwind:
def char_at_pos2(k):
"""
Return the character at position k of the string F(n), for any
n satisfying 2**n-1 > k.
"""
return 'ac'[k//(1+(k+1^k))%2]
Again, let's check:
>>> F(20) == ''.join(char_at_pos2(i) for i in range(2**20-1))
True
Final comments: this is a very well-known and well-studied sequence: it's called the dragon curve sequence, or the regular paper-folding sequence, and is sequence A014577 in the online encyclopaedia of integer sequences. Some Google searches will likely give you many other ways to compute its elements. See also this codegolf question.

Based on what you have already coded, here's my suggestion:
def main_function(num):
if num == 0:
return ''
previous = main_function(num-1)
return previous + 'a' + op1(op2(previous))
print(main_function(10**2017))
P.S: I'm not sure of the efficiency.

Related

Python recursion (format issue)

Write a recursive function replace_digit(n, d, r) which replaces each occurrence of digit d in the number n by r.
replace_digit(31242154125, 1, 0) => 30242054025
My code is as such
def replace_digit(n, d, r):
y=str(n)
if len(y)==0:
return ''
else:
if y[0]== str(d):
return str(r) + replace_digit(str(n)[1:],d,r)
else:
return y[0]+ replace_digit(str(n)[1:],d,r)
However, the answer I get is in a string format. Any idea how to convert into an integer format? I have been stuck for quite some time on this :(
If your recursive function must return an integer, then return integers. You can always convert the returned integer back into a string for recursive calls.
You'll have to stop when you run out of digits before calling, so only recurse if there are 2 or more characters in y.
However, this approach a big problem: leading zeros are dropped when converting to int():
>>> int('025')
25
You have two options here:
Pad the number when you convert to a string (using str.zfill() or format(), and use the length of the value you passed into the recursive call).
Recurse from the end. This would also allow you to not use strings.
Here is an approach using zero-padding:
def replace_digit(n, d, r):
nstr = str(n)
first, rest = nstr[0], nstr[1:]
if rest:
rest = str(replace_digit(rest, d, r)).zfill(len(rest))
if first == str(d):
first = str(r)
return int(first + rest)
Note that you always want to separate out the first character from the tail anyway, so I used variables for both.
This way, you can use if rest: to guard against recursing when there are no digits left, and you can call str() on the return value. The function returns the int() conversion of the (possibly replaced first value) with the updated rest value.
Demo:
>>> replace_digit(31242154125, 1, 0)
30242054025
Recursing from the opposite end would not have problems with zeros, except if the input value was 0 to begin with. However, you could instead use division and modules operations to work on the integer value directly:
number % 10 gives you the right-most digit, as an integer.
number // 10 gives you the remaining numbers, again as integer.
You could combine the two operations into one using the divmod() function. Personally, I don't do so, as I don't think it particularly improves readability, and using the operators is slightly faster when using CPython.
You can re-combine the recursive call result with the (possibly replaced) last digit by multiplying the returned value by 10 again:
def replace_digit(n, d, r):
head, last = n // 10, n % 10
if head:
head = replace_digit(head, d, r)
if last == d:
last = r
return (head * 10) + last
This works for any natural number, including 0:
>>> replace_digit(0, 1, 0)
0
>>> replace_digit(0, 0, 1)
1
>>> replace_digit(31242154125, 1, 0)
30242054025
>>> replace_digit(31242154125, 4, 9)
31292159125

String sorting problem with code execution time limit

I was recently trying to solve a HackerEarth problem. The code worked on the sample inputs and some custom inputs that I gave. But, when I submitted, it showed errors for exceeding the time limit. Can someone explain how I can make the code run faster?
Problem Statement: Cyclic shift
A large binary number is represented by a string A of size N and comprises of 0s and 1s. You must perform a cyclic shift on this string. The cyclic shift operation is defined as follows:
If the string A is [A0, A1,..., An-1], then after performing one cyclic shift, the string becomes [A1, A2,..., An-1, A0].
You performed the shift infinite number of times and each time you recorded the value of the binary number represented by the string. The maximum binary number formed after performing (possibly 0) the operation is B. Your task is to determine the number of cyclic shifts that can be performed such that the value represented by the string A will be equal to B for the Kth time.
Input format:
First line: A single integer T denoting the number of test cases
For each test case:
First line: Two space-separated integers N and K
Second line: A denoting the string
Output format:
For each test case, print a single line containing one integer that represents the number of cyclic shift operations performed such that the value represented by string A is equal to B for the Kth time.
Code:
import math
def value(s):
u = len(s)
d = 0
for h in range(u):
d = d + (int(s[u-1-h]) * math.pow(2, h))
return d
t = int(input())
for i in range(t):
x = list(map(int, input().split()))
n = x[0]
k = x[1]
a = input()
v = 0
for j in range(n):
a = a[1:] + a[0]
if value(a) > v:
b = a
v = value(a)
ctr = 0
cou = 0
while ctr < k:
a = a[1:] + a[0]
cou = cou + 1
if a == b:
ctr = ctr + 1
print(cou)
In the problem, the constraint on n is 0<=n<=1e5. In the function value(), you calculating integer from the binary string whose length can go up to 1e5. so the integer calculating by you can go as high as pow(2, 1e5). This surely impractical.
As mentioned by Prune, you must use some efficient algorithms for finding a subsequence, say sub1, whose repetitions make up the given string A. If you solve this by brute-force, the time complexity will be O(n*n), as maximum value of n is 1e5, time limit will exceed. so use some efficient algorithm.
I can't do much with the code you posted, since you obfuscated it with meaningless variables and a lack of explanation. When I scan it, I get the impression that you've made the straightforward approach of doing a single-digit shift in a long-running loop. You count iterations until you hit B for the Kth time.
This is easy to understand, but cumbersome and inefficient.
Since the cycle repeats every N iterations, you gain no new information from repeating that process. All you need to do is find where in the series of N iterations you encounter B ... which could be multiple times.
In order for B to appear multiple times, A must consist of a particular sub-sequence of bits, repeated 2 or more times. For instance, 101010 or 011011. You can detect this with a simple addition to your current algorithm: at each iteration, check to see whether the current string matches the original. The first time you hit this, simply compute the repetition factor as rep = len(a) / j. At this point, exit the shifting loop: the present value of b is the correct one.
Now that you have b and its position in the first j rotations, you can directly compute the needed result without further processing.
I expect that you can finish the algorithm and do the coding from here.
Ah -- taken as a requirements description, the wording of your problem suggests that B is a given. If not, then you need to detect the largest value.
To find B, append A to itself. Find the A-length string with the largest value. You can hasten this by finding the longest string of 1s, applying other well-known string-search algorithms for the value-trees after the first 0 following those largest strings.
Note that, while you iterate over A, you look for the first place in which you repeat the original value: this is the desired repetition length, which drives the direct-computation phase in the first part of my answer.

Repeated String Match

I did Contest 52 of leetcode.com and I had trouble understanding the solution. The problem statement is:
Given two strings A and B, find the minimum number of times A has to be >repeated such that B is a substring of it. If no such solution, return -1.
For example, with A = "abcd" and B = "cdabcdab.
Return 3, because by repeating A three times (“abcdabcdabcd”), B is a >substring of it; and B is not a substring of A repeated two times
("abcdabcd").
The solution is:
def repeatedStringMatch(self, A, B):
"""
:type A: str
:type B: str
:rtype: int
"""
times = int(math.ceil(float(len(B)) / len(A)))
for i in range(2):
if B in (A * (times + i)):
return times + i
return -1
The explanation from one of the collaborators was:
A has to be repeated sufficient times such that it is at least as long as >B (or one more), hence we can conclude that the theoretical lower bound >for the answer would be length of B / length of A.
Let x be the theoretical lower bound, which is ceil(len(B)/len(A)).
The answer n can only be x or x + 1
I don't understand why n can only be x or x+1, can someone help?
If x+1 < n and B is a substring of A repeated n times and you've embedded B in it then either you can chop off the last copy of A without hitting B (meaning that n is not minimal) or else the start of B in A is after the end of the first copy so you can chop off the first copy (and again n is not minimal).
Therefore if it fits at all, it must fit within x+1 copies. Based on length alone it can't fit within < x copies. So the only possibilities left are x and x+1 copies. (And examples can be found where each is the answer.)
Suppose we have two strings m = "abcde" and any 11 character string "e_abcde_abcde", n to be searched. Since the 11 digits are to be matched, the minimum number of characters required is 11 which requires ((n/m)+1)=3 minimum sets for search due to integer rounding. Now the ending position depends on the beginning position and there are at most length(m) unique starting points in each set of m elements. So we can start at each of the positions till the first set is exhausted. The last try may go to m_last = 4, hence the last element of the matched string will go to (n/m+2) set. After the first set is exhausted, the above pattern is exhausted and there is no new searches. Hence we require (n/m + 2) * length (m) for full search.
I know this is an old question still, I would like to add my 2 cents here.
I think #btilly has provided enough clarity but a picture may help further.

Project Euler 104: Need help in understanding the solution

Project Euler Q104 (https://projecteuler.net/problem=104) is as such:
The Fibonacci sequence is defined by the recurrence relation:
Fn = Fn−1 + Fn−2, where F1 = 1 and F2 = 1. It turns out that F541,
which contains 113 digits, is the first Fibonacci number for which the
last nine digits are 1-9 pandigital (contain all the digits 1 to 9,
but not necessarily in order). And F2749, which contains 575 digits,
is the first Fibonacci number for which the first nine digits are 1-9
pandigital.
Given that Fk is the first Fibonacci number for which the first nine
digits AND the last nine digits are 1-9 pandigital, find k.
And I wrote this simple code in Python:
def fibGen():
a,b = 1,1
while True:
a,b = b,a+b
yield a
k = 0
fibG = fibGen()
while True:
k += 1
x = str(fibG.next())
if (set(x[-9:]) == set("123456789")):
print x #debugging print statement
if(set(x[:9]) == set("123456789")):
break
print k
However, it was taking well.. forever.
After leaving it running for 30 mins, puzzled, I gave up and checked the solution.
I came across this code in C#:
long fn2 = 1;
long fn1 = 1;
long fn;
long tailcut = 1000000000;
int n = 2;
bool found = false;
while (!found) {
n++;
fn = (fn1 + fn2) % tailcut;
fn2 = fn1;
fn1 = fn;
if (IsPandigital(fn)) {
double t = (n * 0.20898764024997873 - 0.3494850021680094);
if (IsPandigital((long)Math.Pow(10, t - (long)t + 8)))
found = true;
}
}
Which.. I could barely understand. I tried it out in VS, got the correct answer and checked the thread for help.
I found these two, similar looking answers in Python then.
One here, http://blog.dreamshire.com/project-euler-104-solution/
And one from the thread:
from math import sqrt
def isPandigital(s):
return set(s) == set('123456789')
rt5=sqrt(5)
def check_first_digits(n):
def mypow( x, n ):
res=1.0
for i in xrange(n):
res *= x
# truncation to avoid overflow:
if res>1E20: res*=1E-10
return res
# this is an approximation for large n:
F = mypow( (1+rt5)/2, n )/rt5
s = '%f' % F
if isPandigital(s[:9]):
print n
return True
a, b, n = 1, 1, 1
while True:
if isPandigital( str(a)[-9:] ):
print a
# Only when last digits are
# pandigital check the first digits:
if check_first_digits(n):
break
a, b = b, a+b
b=b%1000000000
n += 1
print n
These worked pretty fast, under 1 minute!
I really need help understanding these solutions. I don't really know the meaning or the reason behind using stuff like log. And though I could easily do the first 30 questions, I cannot understand these tougher ones.
How is the best way to solve this question and how these solutions are implementing it?
These two solutions work on the bases that as fibonacci numbers get bigger, the ratio between two consecutive terms gets closer to a number known as the Golden Ratio, (1+sqrt(5))/2, roughly 1.618. If you have one (large) fibonacci number, you can easily calculate the next, just by multiplying it by that number.
We know from the question that only large fibonacci numbers are going to satisfy the conditions, so we can use that to quickly calculate the parts of the sequence we're interested in.
In your implementation, to calculate fib(n), you need to calculate fib(n-1), which needs to calculate fib(n-2) , which needs to calculate fib(n-3) etc, and it needs to calculate fib(n-2), which calculates fib(n-3) etc. That's a huge number of function calls when n is big. Having a single calculation to know what number comes next is a huge speed increase. A computer scientist would call the first method O(n^2)*: to calculate fib(n), you need n^2 sub calculations. Using the golden mean, the fibonacci sequence becomes (approximately, but close enouigh for what we need):
(using phi = (1+sqrt(5))/2)
1
1*phi
1*phi*phi = pow(phi, 2)
1*phi*phi*phi = pow(phi, 3)
...
1*phi*...*phi = pow(phi, n)
\ n times /
So, you can do an O(1) calculation: fib(n): return round(pow(golden_ratio, n)/(5**0.5))
Next, there's a couple of simplifications that let you use smaller numbers.
If I'm concerned about the last nine digits of a number, what happens further up isn't all that important, so I can throw anything after the 9th digit from the right away. That's what b=b%1000000000 or fn = (fn1 + fn2) % tailcut; are doing. % is the modulus operator, which says, if I divide the left number by the right, what's the remainder?
It's easiest to explain with equivalent code:
def mod(a,b):
while a > b:
a -= b
return a
So, there's a quick addition loop that adds together the last nine digits of fibonacci numbers, waiting for them to be pandigital. If it is, it calculates the whole value of the fibonacci number, and check the first nine digits.
Let me know if I need to cover anything in more detail.
* https://en.wikipedia.org/wiki/Big_O_notation

HackerRank "AND product"

When I submit the below code for testcases in HackerRank challenge "AND product"...
You will be given two integers A and B. You are required to compute the bitwise AND amongst all natural numbers lying between A and B, both inclusive.
Input Format:
First line of the input contains T, the number of testcases to follow.
Each testcase in a newline contains A and B separated by a single space.
from math import log
for case in range(int(raw_input())):
l, u = map(int, (raw_input()).split())
if log(l, 2) == log(u, 2) or int(log(l,2))!=int(log(l,2)):
print 0
else:
s = ""
l, u = [x for x in str(bin(l))[2:]], [x for x in str(bin(u))[2:]]
while len(u)!=len(l):
u.pop(0)
Ll = len(l)
for n in range(0, len(l)):
if u[n]==l[n]:
s+=u[n]
while len(s)!=len(l):
s+="0"
print int(s, 2)
...it passes 9 of the test cases, Shows "Runtime error" in 1 test case and shows "Wrong Answer" in the rest 10 of them.
What's wrong in this?
It would be better for you to use the Bitwise operator in Python for AND. The operator is: '&'
Try this code:
def andProduct(a, b):
j=a+1
x=a
while(j<=b):
x = x&j
j+=1
return x
For more information on Bitwise operator you can see: https://wiki.python.org/moin/BitwiseOperators
Yeah you can do this much faster.
You are doing this very straightforward, calculating all ands in a for loop.
It should actually be possible to calculate this in O(1) (I think)
But here are some optimisations:
1) abort the for loop if you get the value 0, because it will stay 0 no matter what
2)If there is a power of 2 between l and u return 0 (you don't need a loop in that case)
My Idea for O(1) would be to think about which bits change between u and l.
Because every bit that changes somewhere between u and l becomes 0 in the answer.
EDIT 1: Here is an answer in O(same leading digits) time.
https://math.stackexchange.com/questions/1073532/how-to-find-bitwise-and-of-all-numbers-for-a-given-range
EDIT 2: Here is my code, I have not tested it extensively but it seems to work. (O(log(n))
from math import log
for case in [[i+1,j+i+1] for i in range(30) for j in range(30)]:
#Get input
l, u = case
invL=2**int(log(l,2)+1)-l
invU=2**int(log(u,2)+1)-u
#Calculate pseudo bitwise xnor of input and format to binary rep
e=format((u&l | invL & invU),'010b')
lBin=format(l,'010b')
#output to zero
res=0
#boolean to check if we have found any zero
anyZero=False
#boolean to check the first one because we have leading zeros
firstOne=False
for ind,i in enumerate(e):
#for every digit
#if it is a leading one
if i=='1' and (not anyZero):
firstOne=True
#leftshift result (multiply by 2)
res=res<<1
#and add 1
res=res+int(lBin[ind])
#else if we already had a one and find a zero this happens every time
elif(firstOne):
anyZero=True
#leftshift
res=res<<1
#test if we are in the same power, if not there was a power between
if(res!=0):
#print "test",(int(log(res,2))!=int(log(l,2))) | ((log(res,2))!=int(log(u,2)))
if((int(log(res,2))!=int(log(l,2))) or (int(log(res,2))!=int(log(u,2)))):
res=0
print res
Worked for every but a single testcase. Small change needed to get the last one. You'll have to find out what that small change is yourself. Seriously

Categories