Finding the balanced code of size k using recursion

Finding the balanced code of size k using recursion - python

I have a problem for an assignment I am working on, where I have to write a recursive function in python which returns the balanced code of size k, which is defined as the list of all binary strings of length 2k that contain an equal number of 0s in each half of the string. It is only allowed to accept one parameter, k. I have so far found a way to return a list of all possible binary strings of length 2k, but am having trouble reducing the list to only those that meet the criteria. This is my code so far:
def balanced_code(k):
if k >= 0:
if k == 0:
return ['']
else:
L = []
x = balanced_code(k - 1)
for i in range(0, len(x)):
L.append('00' + x[i])
L.append('01' + x[i])
L.append('10' + x[i])
L.append('11' + x[i])
return L
else:
return
My plan was after the for loop, I would check each item in L for the criteria mentioned (number of 0s equal in each half of the string), but quickly realized that this didn't give the right result as it would reduce L during every call, and I only want to reduce it once all calls to the function have been made. Is there any way I could track what recursion level the code is on or something like that so that I only reduce the list once all calls have been made?

How recursive does this have to be? Where does the recursion need to be?
If this were me, I'd write a recursive function:
def all_strings_of_length_k_with_n_zeros(k, n):
... you should be able to write this easily as recursion
And then
def balanced_code(k):
result = []
for zeros in range(0, k + 1):
temp = all_strings_of_length_k_with_n_zeros(k, zeros)
for left, right in itertools.product(temp, temp):
result.append(left + right)
return result
It's strange that your instructor is asking you to write some code recursively that can be written straightforwardly without recursion. (The function I left as an exercise to the reader could be written using itertools.combinations).

You can approach the recursion by adding "0" and "1" bits on each side of the k-1 results. The bits need to be added last on the right side and at every position on the left side. Since this is going to produce duplicates, using a set to return the strings will ensure distinct results.
def balancedCodes(k):
if not k: return {""}
return { code[:pos]+bit+code[pos:]+bit for code in balancedCodes(k-1)
for pos in range(k)
for bit in ("0","1") }
for bc in sorted(balancedCodes(3)): print(bc)
000000
001001
001010
001100
010001
010010
010100
011011
011101
011110
100001
100010
100100
101011
101101
101110
110011
110101
110110
111111
The 111111 result is a case of having no zeroes on each side

One way to solve this recursively without creating duplicates is by using what's described by The On-Line Encyclopedia of Integer Sequences® as the "1's-counting sequence: number of 1's in binary expansion of n (or the binary weight of n)" (sequence A000120), which can be formulated as a recurrence relation (see the function, f, below). Although we could "unpack" the sequence into our result at every stage for a self-contained recursive answer, it seemed superfluous so I left the unpacking of the sequence as a separate function we can call on the result.
def str_tuple(a, b, k):
return "{0:0{align}b}{1:0{align}b}".format(a, b, align=k)
def unpack(seq, k):
bitcounts = [[] for _ in range(k + 1)]
products = []
for n, i in enumerate(seq):
for n0 in bitcounts[i]:
products.extend([str_tuple(n0, n, k), str_tuple(n, n0, k)])
products.append(str_tuple(n, n, k))
bitcounts[i].append(n)
return products
def f(n):
if n == 1:
return [0, 1]
seq_l = f(n - 1)
seq_r = list(map(lambda x: x + 1, seq_l))
return seq_l + seq_r
k = 3
print(f(k))
print(unpack(f(k), k))
Output:
[0, 1, 1, 2, 1, 2, 2, 3]
['000000', '001001', '001010', '010001', '010010',
'011011', '001100', '100001', '010100', '100010',
'100100', '011101', '101011', '101101', '011110',
'110011', '101110', '110101', '110110', '111111']

Related

Sum of two squares in Python

I have written a code based on the two pointer algorithm to find the sum of two squares. My problem is that I run into a memory error when running this code for an input n=55555**2 + 66666**2. I am wondering how to correct this memory error.
def sum_of_two_squares(n):
look=tuple(range(n))
i=0
j = len(look)-1
while i < j:
x = (look[i])**2 + (look[j])**2
if x == n:
return (j,i)
elif x < n:
i += 1
else:
j -= 1
return None
n=55555**2 + 66666**2
print(sum_of_two_squares(n))
The problem Im trying to solve using two pointer algorithm is:
return a tuple of two positive integers whose squares add up to n, or return None if the integer n cannot be so expressed as a sum of two squares. The returned tuple must present the larger of its two numbers first. Furthermore, if some integer can be expressed as a sum of two squares in several ways, return the breakdown that maximizes the larger number. For example, the integer 85 allows two such representations 7*7 + 6*6 and 9*9 + 2*2, of which this function must therefore return (9, 2).

You're creating a tuple of size 55555^2 + 66666^2 = 7530713581
So if each element of the tuple takes one byte, the tuple will take up 7.01 GiB.
You'll need to either reduce the size of the tuple, or possibly make each element take up less space by specifying the type of each element: I would suggest looking into Numpy for the latter.
Specifically for this problem:
Why use a tuple at all?
You create the variable look which is just a list of integers:
look=tuple(range(n)) # = (0, 1, 2, ..., n-1)
Then you reference it, but never modify it. So: look[i] == i and look[j] == j.
So you're looking up numbers in a list of numbers. Why look them up? Why not just use i in place of look[i] and remove look altogether?

As others have pointed out, there's no need to use tuples at all.
One reasonably efficient way of solving this problem is to generate a series of integer square values (0, 1, 4, 9, etc...) and test whether or not subtracting these values from n leaves you with a value that is a perfect square.
You can generate a series of perfect squares efficiently by adding successive odd numbers together: 0 (+1) → 1 (+3) → 4 (+5) → 9 (etc.)
There are also various tricks you can use to test whether or not a number is a perfect square (for example, see the answers to this question), but — in Python, at least — it seems that simply testing the value of int(n**0.5) is faster than iterative methods such as a binary search.
def integer_sqrt(n):
# If n is a perfect square, return its (integer) square
# root. Otherwise return -1
r = int(n**0.5)
if r * r == n:
return r
return -1
def sum_of_two_squares(n):
# If n can be expressed as the sum of two squared integers,
# return these integers as a tuple. Otherwise return <None>
# i: iterator variable
# x: value of i**2
# y: value we need to add to x to obtain (i+1)**2
i, x, y = 0, 0, 1
# If i**2 > n / 2, then we can stop searching
max_x = n >> 1
while x <= max_x:
r = integer_sqrt(n-x)
if r >= 0:
return (i, r)
i, x, y = i+1, x+y, y+2
return None
This returns a solution to sum_of_two_squares(55555**2 + 66666**2) in a fraction of a second.

You do not need the ranges at all, and certainly do not need to convert them into tuples. They take a ridiculous amount of space, but you only need their current elements, numbers i and j. Also, as the friendly commenter suggested, you can start with sqrt(n) to improve the performance further.
def sum_of_two_squares(n):
i = 1
j = int(n ** (1/2))
while i < j:
x = i * i + j * j
if x == n:
return j, i
if x < n:
i += 1
else:
j -= 1
Bear in mind that the problem takes a very long time to be solved. Be patient. And no, NumPy won't help. There is nothing here to vectorize.

Trying to understand the time complexity of this dynamic recursive subset sum

# Returns true if there exists a subsequence of `A[0…n]` with the given sum
def subsetSum(A, n, k, lookup):
# return true if the sum becomes 0 (subset found)
if k == 0:
return True
# base case: no items left, or sum becomes negative
if n < 0 or k < 0:
return False
# construct a unique key from dynamic elements of the input
key = (n, k)
# if the subproblem is seen for the first time, solve it and
# store its result in a dictionary
if key not in lookup:
# Case 1. Include the current item `A[n]` in the subset and recur
# for the remaining items `n-1` with the decreased total `k-A[n]`
include = subsetSum(A, n - 1, k - A[n], lookup)
# Case 2. Exclude the current item `A[n]` from the subset and recur for
# the remaining items `n-1`
exclude = subsetSum(A, n - 1, k, lookup)
# assign true if we get subset by including or excluding the current item
lookup[key] = include or exclude
# return solution to the current subproblem
return lookup[key]
if __name__ == '__main__':
# Input: a set of items and a sum
A = [7, 3, 2, 5, 8]
k = 14
# create a dictionary to store solutions to subproblems
lookup = {}
if subsetSum(A, len(A) - 1, k, lookup):
print('Subsequence with the given sum exists')
else:
print('Subsequence with the given sum does not exist')
It is said that the complexity of this algorithm is O(n * sum), but I can't understand how or why;
can someone help me? Could be a wordy explanation or a recurrence relation, anything is fine

The simplest explanation I can give is to realize that when lookup[(n, k)] has a value, it is True or False and indicates whether some subset of A[:n+1] sums to k.
Imagine a naive algorithm that just fills in all the elements of lookup row by row.
lookup[(0, i)] (for 0 ≤ i ≤ total) has just two elements true, i = A[0] and i = 0, and all the other elements are false.
lookup[(1, i)] (for 0 ≤ i ≤ total) is true if lookup[(0, i)] is true or i ≥ A[1] and lookup[(0, i - A[1]) is true. I can reach the sum i either by using A[i] or not, and I've already calculated both of those.
...
lookup[(r, i)] (for 0 ≤ i ≤ total) is true if lookup[(r - 1, i)] is true or i ≥ A[r] and lookup[(r - 1, i - A[r]) is true.
Filling in this table this way, it is clear that we can completely fill the lookup table for rows 0 ≤ row < len(A) in time len(A) * total since filling in each element in linear. And our final answer is just checking if (len(A) - 1, sum) True in the table.
Your program is doing the exact same thing, but calculating the value of entries of lookup as they are needed.

Sorry for submitting two answers. I think I came up with a slightly simpler explanation.
Take your code in imagine putting the three lines inside if key not in lookup: into a separate function, calculateLookup(A, n, k, lookup). I'm going to call "the cost of calling calculateLookup for n and k for a specific value of n and k to be the total time spent in the call to calculateLookup(A, n, k, loopup), but excluding any recursive calls to calculateLookup.
The key insight is that as defined above, the cost of calling calculateLookup() for any n and k is O(1). Since we are excluding recursive calls in the cost, and there are no for loops, the cost of calculateLookup is the cost of just executing a few tests.
The entire algorithm does a fixed amount of work, calls calculateLookup, and then a small amount of work. Hence the amount of time spent in our code is the same as asking how many times do we call calculateLookup?
Now we're back to previous answer. Because of the lookup table, every call to calculateLookup is called with a different value for (n, k). We also know that we check the bounds of n and k before each call to calculateLookup so 1 ≤ k ≤ sum and 0 ≤ n ≤ len(A). So calculateLookup is called at most (len(A) * sum) times.
In general, for these algorithms that use memoization/cacheing, the easiest thing to do is to separately calculate and then sum:
How long things take assuming all values you need are cached.
How long it takes to fill the cache.
The algorithm you presented is just filling up the lookup cache. It's doing it in an unusual order, and its not filling every entry in the table, but that's all its doing.
The code would be slightly faster with
lookup[key] = subsetSum(A, n - 1, k - A[n], lookup) or subsetSum(A, n - 1, k, lookup)
Doesn't change the O() of the code in the worst case, but can avoid some unnecessary calculations.

recursive generalized continued fraction from a single list in Python

I need to write a recursive function that takes a list L (of odd length) representing a generalized continued fraction, and returns the corresponding rational number.
The reason the list L is always odd, is the rational needs to be created by the continued fraction
L[0] + L[1]/(L[2] + L[3]/L[....]))
i've seen this as two lists a,b. Essentially in this case the a is L[even] and the b is L[odd]
My base case is when it gets down to the last 3
if n == 3: #base case
return L[0] + (L[1] / L[2])
I am not sure how to advance both the numerator and the denominator in the recursion.
I keep trying things like (GCF2R is the function) (L is the list) (n = len(L))
if n > 3:
return L[0] + (L[n-2] / GCF2R(L[0:n]))
but since the recursion is only happening in the denominator the numerator doesn't change with each recursion.
I know I am missing some fundamental step. Any help in understand much appreciated

If you look at the form of a generalized continued fraction, you can see the substructure:
L[1]
L[0] + --------------------------
L[3]
L[2] + -------------------
L[5]
L[4] + ---------------
...
Basically, a GCF is either an integer, or an expression of the form a + b/c, where a and b are integers and c is itself a GCF.
From that, the recursion follows naturally:
def GCF2R(L):
if len(L) == 1:
return L[0]
else: # Assume len(L) > 2
return L[0] + L[1]/GCF2R(L[2:])
Remove the first two elements, and recurse on the rest.

I think you want to peel these off of the end, rather than off of the front:
def GCF2R(L):
if len(L) == 1:
return L[0]
else:
return L[-2]/L[-1] + GCF2R(L[:-2])
I suppose if you passed the length, you wouldn't need to cut a new list every time:
def GCF2R(L,N):
if N == 1:
return L[0]
else:
return L[N-2]/L[N-1] + GCF2R(L,N-2)

Beautiful sequence

A sequence of integers is beautiful if each element of this sequence is divisible by 4.You are given a sequence a1, a2, ..., an. In one step, you may choose any two elements of this sequence, remove them from the sequence and append their sum to the sequence. Compute the minimum number of steps necessary to make the given sequence beautiful else print -1 if this is not possible.
for i in range(int(input())):
n=int(input())
arr=list(map(int,input().split()))
if((sum(arr))%4)!=0:
print(-1)
continue
else:
counter=[]
for i in range(n):
if arr[i]%4!=0:
counter.append(arr[i])
else:
continue
x=sum(counter)
while(x%4==0):
x=x//4
print(x)
My approach:if the sum of the array is not divisible by 4 then the array can not be beautiful else if the sum of the array mod 4 is equal to zero i count the elements in the array whose mod by 4 is not equal to zero and append them in the list and then find the sum of the list and divide the sum by 4 till its quotient modulus 4 is not equal to zero.what i am doing wrong here?
Edit:I have a working script which works well
for i in range(int(input())):
n=int(input())
arr=list(map(int,input().split()))
count1=0
count2=0
count3=0
summ=0
for i in range(n):
x=arr[i]%4
summ+=x
if x==1:
count1+=1
if x==2:
count2+=1
if x==3:
count3+=1
if (summ%4)!=0:
print(-1)
continue
else:
if count2==0 and count1!=0 and count3==0:
tt=count1//4
print(3*tt)
if count2==0 and count1==0 and count3!=0:
tt=count3//4
print(3*tt)
if count2%2==0 and count1==count3:
print(count2//2+count1)
flag1=min(count1,count3)
flag2=abs(count1-count3)
if count2%2==0 and count1!=count3:
flag3=flag2//4
flag4=flag3*3
print(count2//2+ flag1+ flag4)
if count2%2!=0 and count1!=count3:
flag3=flag2-2
flag4=flag3//4
flag5=flag4*3
print(((count2-1)//2)+flag1+flag5+2)

First some observations:
For the sake of 4-divisibility, we can replace all numbers by their division-by-4 remainder, so we only have to cope with values 0, 1, 2 and 3.
The ordering doesn't matter, counting the zeroes, ones, twos and threes is enough.
There are pairs immediately giving a sum divisible by 4: (1, 3) and (2, 2). Each existence of such a pair needs one step.
There are triples (1, 1, 2) and (3, 3, 2) needing two steps.
There are quadruples (1, 1, 1, 1) and (3, 3, 3, 3) needing three steps.
Algorithm:
Count the remainder-0 (can be omitted), remainder-1, remainder-2 and remainder-3 numbers.
If the total sum (from the counts) isn't divisible by 4, there's no solution.
For all the N-tuples described above, find how often they fit into the counts; add the resulting number of steps, subtract the numbers consumed from the counts.
Finally, the remainder-1, remainder-2 and remainder-3 counts should be zero.

Here is an O(N) implementation going pretty much in the direction suggested by Ralf Kleberhoff:
from collections import Counter
def beautify(seq):
# only mod4 is interesting, count 1s, 2s, and 3s
c = Counter(x % 4 for x in seq)
c1, c2, c3 = c.get(1, 0), c.get(2, 0), c.get(3, 0)
steps22, twos = divmod(c2, 2) # you have either 0 or 1 2s left
steps13, ones_or_threes = min(c1, c3), abs(c1 - c3)
if not twos and not ones_or_threes % 4:
# 3 steps for every quadruple of 1s or 3s
return steps22 + steps13 + 3 * ones_or_threes // 4
if twos and ones_or_threes % 2 == 2:
# 2 extra steps to join the remaining 2 1s or 3s with the remaining 2
return steps22 + steps13 + 3 * ones_or_threes // 4 + 2
return -1

I'm not entirely sure what your issue is, but perhaps you could change your approach to the problem. Your logic seems fine, but it seems that your trying to do everything in one go, this problem would be much easier if you break it down into pieces. It looks like it would fit a divide and conquer / recursive approach quite nicely. I also took the liberty of solving this problem myself, as it seems like a fun question to attempt.
Suggestions below
First thing you could do is write a function that finds two numbers that has a sum divisible by k, and return them:
def two_sum(numbers, k):
n = len(numbers)
for i in range(0, n):
for j in range(i+1, n):
if (numbers[i] + numbers[j]) % k == 0:
return numbers[i], numbers[j]
return None
Furthermore, the above function is O(n^2), this could be made more efficient.
Secondly, you could write a recursive function that uses the above function, and has a base case where it stops recursing when all the numbers in the list are divisible by k, therefore the list has become "beautiful". Here is one way of doing this:
def rec_helper(numbers, k, count):
if all(x % k == 0 for x in numbers):
return count
# probably safer to check if two_sum() is not None here
first, second = two_sum(numbers, k)
numbers.remove(first)
numbers.remove(second)
numbers.append(first + second)
return rec_helper(numbers, k, count + 1)
Procedure of above code
Base case: if all the items in the list are currently divisible by k, return the current accumulated count.
Otherwise, obtain a pair of integers whose sum is divisible by k from two_sum()
remove() these two numbers from the list, and append() them to the end of the list.
Finally, call rec_helper() again, with the new modified list and count incremented by one , which is count + 1. count here is the minimum number of steps.
Lastly, you can now write a main calling function:
def beautiful_array(numbers, k):
if sum(numbers) % k != 0:
return -1
return rec_helper(numbers, k, 0)
Which first checks that the sum() of the numbers in the list is divisible by k, before proceeding to calling rec_helper(). If it doesn't pass this test, the function simply returns -1, and the list cannot be made "beautiful".
Behavior of above code
>>> beautiful_array([1, 2, 3, 1, 2, 3, 8], 4)
3
>>> beautiful_array([1, 3, 2, 2, 4, 8], 4)
2
>>> beautiful_array([1, 5, 2, 2, 4, 8], 4)
-1
Note: The above code examples are just suggestions, you can follow or use it however you want to. It also doesn't handle the input(), since I believe the main issue in your code is the approach. I didn't want to create a whole new solution that handles your input as well. Please comment below if their is something wrong with the above code, or if you don't understand anything.

Find the index of a given combination (of natural numbers) among those returned by `itertools` Python module

Given a combination of k of the first n natural numbers, for some reason I need to find the position of such combination among those returned by itertools.combination(range(1,n),k) (the reason is that this way I can use a list instead of a dict to access values associated to each combination, knowing the combination).
Since itertools yields its combinations in a regular pattern it is possible to do it (and I also found a neat algorithm), but I'm looking for an even faster/natural way which I might ignore.
By the way here is my solution:
def find_idx(comb,n):
k=len(comb)
idx=0
last_c=0
for c in comb:
#idx+=sum(nck(n-2-x,k-1) for x in range(c-last_c-1)) # a little faster without nck caching
idx+=nck(n-1,k)-nck(n-c+last_c,k) # more elegant (thanks to Ray), faster with nck caching
n-=c-last_c
k-=1
last_c=c
return idx
where nck returns the binomial coefficient of n,k.
For example:
comb=list(itertools.combinations(range(1,14),6))[654] #pick the 654th combination
find_idx(comb,14) # -> 654
And here is an equivalent but maybe less involved version (actually I derived the previous one from the following one). I considered the integers of the combination c as positions of 1s in a binary digit, I built a binary tree on parsing 0/1, and I found a regular pattern of index increments during parsing:
def find_idx(comb,n):
k=len(comb)
b=bin(sum(1<<(x-1) for x in comb))[2:]
idx=0
for s in b[::-1]:
if s=='0':
idx+=nck(n-2,k-1)
else:
k-=1
n-=1
return idx

Your solution seems quite fast. In find_idx, you have two for loop, the inner loop can be optimized using the formular:
C(n, k) + C(n-1, k) + ... + C(n-r, k) = C(n+1, k+1) - C(n-r, k+1)
so, you can replace sum(nck(n-2-x,k-1) for x in range(c-last_c-1)) with nck(n-1, k) - nck(n-c+last_c, k).
I don't know how you implement your nck(n, k) function, but it should be O(k) measured in time complexity. Here I provide my implementation:
from operator import mul
from functools import reduce # In python 3
def nck_safe(n, k):
if k < 0 or n < k: return 0
return reduce(mul, range(n, n-k, -1), 1) // reduce(mul, range(1, k+1), 1)
Finally, your solution become O(k^2) without recursion. It's quite fast since k wouldn't be too large.
Update
I've noticed that nck's parameters are (n, k). Both n and k won't be too large. We may speed up the program by caching.
def nck(n, k, _cache={}):
if (n, k) in _cache: return _cache[n, k]
....
# before returning the result
_cache[n, k] = result
return result
In python3 this can be done by using functools.lru_cache decorator:
#functools.lru_cache(maxsize=500)
def nck(n, k):
...

I dug up some old (although it's been converted to Python 3 syntax) code that includes the function combination_index which does what you request:
def fact(n, _f=[1, 1, 2, 6, 24, 120, 720]):
"""Return n!
The “hidden” list _f acts as a cache"""
try:
return _f[n]
except IndexError:
while len(_f) <= n:
_f.append(_f[-1] * len(_f))
return _f[n]
def indexed_combination(n: int, k: int, index: int) -> tuple:
"""Select the 'index'th combination of k over n
Result is a tuple (i | i∈{0…n-1}) of length k
Note that if index ≥ binomial_coefficient(n,k)
then the result is almost always invalid"""
result= []
for item, n in enumerate(range(n, -1, -1)):
pivot= fact(n-1)//fact(k-1)//fact(n-k)
if index < pivot:
result.append(item)
k-= 1
if k <= 0: break
else:
index-= pivot
return tuple(result)
def combination_index(combination: tuple, n: int) -> int:
"""Return the index of combination (length == k)
The combination argument should be a sorted sequence (i | i∈{0…n-1})"""
k= len(combination)
index= 0
item_in_check= 0
n-= 1 # to simplify subsequent calculations
for offset, item in enumerate(combination, 1):
while item_in_check < item:
index+= fact(n-item_in_check)//fact(k-offset)//fact(n+offset-item_in_check-k)
item_in_check+= 1
item_in_check+= 1
return index
def test():
for n in range(1, 11):
for k in range(1, n+1):
max_index= fact(n)//fact(k)//fact(n-k)
for i in range(max_index):
comb= indexed_combination(n, k, i)
i2= combination_index(comb, n)
if i2 != i:
raise RuntimeError("mismatching n:%d k:%d i:%d≠%d" % (n, k, i, i2))
indexed_combination does the inverse operation.
PS I remember that I sometime attempted removing all those fact calls (by substituting appropriate incremental multiplications and divisions) but the code became much more complicated and wasn't actually faster. A speedup was achievable if I substituted a pre-calculated list of factorials for the fact function, but again the speed difference was negligible for my use cases, so I kept this version.

Looks like you need to better specify your task or I am just getting it wrong. For me it seems that when you iterating through the itertools.combination you can save indexes you need to an appropriate data structure. If you need all of them then I would go with the dict (one dict for all your needs):
combinationToIdx = {}
for (idx, comb) in enumerate(itertools.combinations(range(1,14),6)):
combinationToIdx[comb] = idx
def findIdx(comb):
return combinationToIdx[comb]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.