Recursive Decompression of Strings - python

I'm trying to decompress strings using recursion. For example, the input:
3[b3[a]]
should output:
baaabaaabaaa
but I get:
baaaabaaaabaaaabbaaaabaaaabaaaaa
I have the following code but it is clearly off. The first find_end function works as intended. I am absolutely new to using recursion and any help understanding / tracking where the extra letters come from or any general tips to help me understand this really cool methodology would be greatly appreciated.
def find_end(original, start, level):
if original[start] != "[":
message = "ERROR in find_error, must start with [:", original[start:]
raise ValueError(message)
indent = level * " "
index = start + 1
count = 1
while count != 0 and index < len(original):
if original[index] == "[":
count += 1
elif original[index] == "]":
count -= 1
index += 1
if count != 0:
message = "ERROR in find_error, mismatched brackets:", original[start:]
raise ValueError(message)
return index - 1
def decompress(original, level):
# set the result to an empty string
result = ""
# for any character in the string we have not looked at yet
for i in range(len(original)):
# if the character at the current index is a digit
if original[i].isnumeric():
# the character of the current index is the number of repetitions needed
repititions = int(original[i])
# start = the next index containing the '[' character
x = 0
while x < (len(original)):
if original[x].isnumeric():
start = x + 1
x = len(original)
else:
x += 1
# last = the index of the matching ']'
last = find_end(original, start, level)
# calculate a substring using `original[start + 1:last]
sub_original = original[start + 1 : last]
# RECURSIVELY call decompress with the substring
# sub = decompress(original, level + 1)
# concatenate the result of the recursive call times the number of repetitions needed to the result
result += decompress(sub_original, level + 1) * repititions
# set the current index to the index of the matching ']'
i = last
# else
else:
# concatenate the letter at the current index to the result
if original[i] != "[" and original[i] != "]":
result += original[i]
# return the result
return result
def main():
passed = True
ORIGINAL = 0
EXPECTED = 1
# The test cases
provided = [
("3[b]", "bbb"),
("3[b3[a]]", "baaabaaabaaa"),
("3[b2[ca]]", "bcacabcacabcaca"),
("5[a3[b]1[ab]]", "abbbababbbababbbababbbababbbab"),
]
# Run the provided tests cases
for t in provided:
actual = decompress(t[ORIGINAL], 0)
if actual != t[EXPECTED]:
print("Error decompressing:", t[ORIGINAL])
print(" Expected:", t[EXPECTED])
print(" Actual: ", actual)
print()
passed = False
# print that all the tests passed
if passed:
print("All tests passed")
if __name__ == '__main__':
main()

From what I gathered from your code, it probably gives the wrong result because of the approach you've taken to find the last matching closing brace at a given level (I'm not 100% sure, the code was a lot). However, I can suggest a cleaner approach using stacks (almost similar to DFS, without the complications):
def decomp(s):
stack = []
for i in s:
if i.isalnum():
stack.append(i)
elif i == "]":
temp = stack.pop()
count = stack.pop()
if count.isnumeric():
stack.append(int(count)*temp)
else:
stack.append(count+temp)
for i in range(len(stack)-2, -1, -1):
if stack[i].isnumeric():
stack[i] = int(stack[i])*stack[i+1]
else:
stack[i] += stack[i+1]
return stack[0]
print(decomp("3[b]")) # bbb
print(decomp("3[b3[a]]")) # baaabaaabaaa
print(decomp("3[b2[ca]]")) # bcacabcacabcaca
print(decomp("5[a3[b]1[ab]]")) # abbbababbbababbbababbbababbbab
This works on a simple observation: rather tha evaluating a substring after on reading a [, evaluate the substring after encountering a ]. That would allow you to build the result AFTER the pieces have been evaluated individually as well. (This is similar to the prefix/postfix evaluation using programming).
(You can add error checking to this as well, if you wish. It would be easier to check if the string is semantically correct in one pass and evaluate it in another pass, rather than doing both in one go)

Here is the solution with the similar idea from above:
we go through string putting everything on stack until we find ']', then we go back until '[' taking everything off, find the number, multiply and put it back on stack
It's much less consuming as we don't add strings, but work with lists
Note: multiply number can't be more than 9 as we parse it as one element string
def decompress(string):
stack = []
letters = []
for i in string:
if i != ']':
stack.append(i)
elif i == ']':
letter = stack.pop()
while letter != '[':
letters.append(letter)
letter = stack.pop()
word = ''.join(letters[::-1])
letters = []
stack.append(''.join([word for j in range(int(stack.pop()))]))
return ''.join(stack)

Related

Ignoring Changed Index Check (Python)

I have made a script:
our_word = "Success"
def duplicate_encode(word):
char_list = []
final_str = ""
changed_index = []
base_wrd = word.lower()
for k in base_wrd:
char_list.append(k)
for i in range(0, len(char_list)):
count = 0
for j in range(i + 1, len(char_list)):
if j not in changed_index:
if char_list[j] == char_list[i]:
char_list[j] = ")"
changed_index.append(j)
count += 1
else:
continue
if count > 0:
char_list[i] = ")"
else:
char_list[i] = "("
print(changed_index)
print(char_list)
final_str = "".join(char_list)
return final_str
print(duplicate_encode(our_word))
essentialy the purpose of this script is to convert a string to a new string where each character in the new string is "(", if that character appears only once in the original string, or ")", if that character appears more than once in the original string. I have made a rather layered up script (I am relatively new to the python language so didn't want to use any helpful in-built functions) that attempts to do this. My issue is that where I check if the current index has been previously edited (in order to prevent it from changing), it seems to ignore it. So instead of the intended )())()) I get )()((((. I'd really appreciate an insightful answer to why I am getting this issue and ways to work around this, since I'm trying to gather an intuitive knowledge surrounding python. Thanks!
word = "Success"
print(''.join([')' if word.lower().count(c) > 1 else '(' for c in word.lower()]))
The issue here has nothing to do with your understanding of Python. It's purely algorithmic. If you retain this 'layered' algorithm, it is essential that you add one more check in the "i" loop.
our_word = "Success"
def duplicate_encode(word):
char_list = list(word.lower())
changed_index = []
for i in range(len(word)):
count = 0
for j in range(i + 1, len(word)):
if j not in changed_index:
if char_list[j] == char_list[i]:
char_list[j] = ")"
changed_index.append(j)
count += 1
if i not in changed_index: # the new inportant check to avoid reversal of already assigned ')' to '('
char_list[i] = ")" if count > 0 else "("
return "".join(char_list)
print(duplicate_encode(our_word))
Your algorithm can be greatly simplified if you avoid using char_list as both the input and output. Instead, you can create an output list of the same length filled with ( by default, and then only change an element when a duplicate is found. The loops will simply walk along the entire input list once for each character looking for any matches (other than self-matches). If one is found, the output list can be updated and the inner loop will break and move on to the next character.
The final code should look like this:
def duplicate_encode(word):
char_list = list(word.lower())
output = list('(' * len(word))
for i in range(len(char_list)):
for j in range(len(char_list)):
if i != j and char_list[i] == char_list[j]:
output[i] = ')'
break
return ''.join(output)
for our_word in (
'Success',
'ChJsTk(u cIUzI htBp#qX)OTIHpVtHHhQ',
):
result = duplicate_encode(our_word)
print(our_word)
print(result)
Output:
Success
)())())
ChJsTk(u cIUzI htBp#qX)OTIHpVtHHhQ
))(()(()))))())))()()((())))()))))

Write a recursive function matching_bracket(string, idx) to find the index of the close bracket matching the open bracket at string[idx]

While there are many questions on stackoverflow to check if the string is balanced, what I need is to find the index of the closing bracket of string[idx]. For example:
>>> matching_bracket('([])', 0)
3
>>> matching_bracket('([])', 1)
2
There are 3 conditions that will return -1:
the closing bracket is not of the same type
the nested brackets are not matched [IMPORTANT]
there are no more brackets available
Here is what I have so far:
def matching_bracket(string, idx):
open_tup = ("(", "{", "<", "[")
close_tup = (")", "}", ">", "]")
chosen = string[idx]
b_index = open_tup.index(chosen)
n = len(string) - 1
if string[idx + 1] in open_tup: # Case 1: Check if nested brackets match
return matching_bracket(string, idx + 1)
elif string[n] != close_tup[b_index]: # Case 2: Closing bracket not the same
return matching_bracket(string[0 : n], idx)
elif len(string) == 1: # Case 3: No more available brackets
return -1
else:
return n
While I am running a recursive function to check if the nested brackets are closed as well, I am having difficulty getting the correct output as I end up returning the index of the closing bracket that is nested instead. See below:
>>> matching_bracket('([])', 0)
2
How should I modify my code?
In above code, in first if condition your are checking whether the next bracket is of type open. if it is you are calling matching_bracket with next bracket index. and losing the actual open bracket index for which you want close bracket index.
Checkout following solution using :
def matching_bracket(string, idx):
open_tup = ("(", "{", "<", "[")
close_tup = (")", "}", ">", "]")
dict_brackets = {"{": "}", "(": ")", "<": ">", "[": "]"}
stack = []
if string[idx] in close_tup or idx >= len(string):
return -1
stack.append(string[idx])
for t in range(idx + 1, len(string)):
if string[t] in open_tup:
stack.append(string[t])
else:
if string[t] != dict_brackets.get(stack.pop()):
return -1
elif len(stack) == 0:
return t
return -1
It's a little convoluted, but should do:
def matching_bracket(string, idx):
bracket_dict = {'[':']', '(':')', '{':'}', '<':'>'}
# Actual recursive function
def inner_func(ix_open, ix_close):
if string[ix_close] == bracket_dict[string[ix_open]]:
return ix_open, ix_close
else:
if ix_close + 1 == len(string) - 1:
return ix_open, -1
else:
return inner_func(ix_open+1, ix_close+1)
if idx == len(string) - 1:
return -1
elif string[idx + 1] == bracket_dict[string[idx]]:
return idx + 1
elif idx == len(string) - 2:
return -1
else:
ix_open, ix_close = idx+2, idx+1
while ix_open != idx and ix_close != -1:
ix_open, ix_close = ix_open - 1, ix_close + 1
ix_open, ix_close = inner_func(ix_open, ix_close)
return ix_close
PS: Wrote down the solution way back, forgot to post :p
If your goal is to match open delimiters and close them with corresponding delimiters, take a look at this library I made, perhaps the algorithm can help you, though it is in java.
Here's how it works-
First the class needs to know which opening delimiter matches which closing delimiter - you can use a dictionary for this in python
delim_dict = {}
delim_dict['('] = ')'
.....
Now if you're only interested in checking whether the closing and opening delimiters don't match - take a look at this function.
Simply put, you have to count the number of each closing delimiter and open delimiter, reverse iterating the string from backwards. Whenever you see the, if the counts don't match, you know the delimiters are also not matched
Now if you want to find the index of your desired delimiter - take a look at this function
It's designed to find the mathematical function in an expression, given its closing delimiter, but you can modify it to match your usecase. Since you want to find a closing delimiter, given opening delimiter, you should be iterating the expression in normal order, instead of reverse
# opening_delim is given as parameter
closing_delim = get_corresponding_delimiter(opening_delim)
closing_delim_count, opening_delim_count = 0, 0
i = 0
for item in expression:
if expression[i] == opening_delim:
opening_delim_count += 1
elif expression[i] == closing_delim:
closing_delim_count+= 1
if opening_delim_count == closing_delim_count:
return i
i += 1
Of course, this code is only for the first index's delimiter and it also assumes the delimiters are matched correctly

How to rearrange a string's characters such that none of it's adjacent characters are the same, using Python

In my attempt to solve the above question, I've written the following code:
Logic: Create a frequency dict for each character in the string (key= character, value= frequency of the character). If any character's frequency is greater than ceil(n/2), there is no solution. Else, print the most frequent character followed by reducing its frequency in the dict/
import math, operator
def rearrangeString(s):
# Fill this in.
n = len(s)
freqDict = {}
for i in s:
if i not in freqDict.keys():
freqDict[i] = 1
else:
freqDict[i] += 1
for j in list(freqDict.values()):
if j > math.ceil(n / 2):
return None
return maxArrange(freqDict)[:-4]
temp = ""
def maxArrange(inp):
global temp
n = len(inp)
if list(inp.values()) != [0] * n:
resCh = max(inp.items(), key=operator.itemgetter(1))[0]
if resCh is not None and resCh != temp:
inp[resCh] -= 1
# Terminates with None
temp = resCh
return resCh + str(maxArrange(inp))
# Driver
print(rearrangeString("abbccc"))
# cbcabc
print(rearrangeString("abbcccc"))
In the first try, with input abbccc, it gives the right answer, i.e. cbcabc, but fails for the input abbcccc, returning ccbcabc, without handling it using the temp variable, else returning cbcabc and skipping c altogether when handled using temp
How should I modify the logic, or is there a better approach?

Find tokens that are connected

I wrote code that gets text-tokens as input:
tokens = ["Tap-", "Berlin", "Was-ISt", "das", "-ist", "cool", "oh", "Man", "-Hum", "-Zuh-UH-", "glit"]
The code should find all tokens that contain hyphens or are connected to each other with hyphens: Basically the output should be:
[["Tap-", "Berlin"], ["Was-ISt"], ["das", "-ist"], ["Man", "-Hum", "-Zuh-UH-", "glit"]]
I wrote a code, but somehow Im not getting the with hypens connected Tokens back: To try it out: http://goo.gl/iqov0q
def find_hyphens(self):
tokens_with_hypens =[]
for i in range(len(self.tokens)):
hyp_leng = 0
while self.hypen_between_two_tokens(i + hyp_leng):
hyp_leng += 1
if self.has_hypen_in_middle(i) or hyp_leng > 0:
if hyp_leng == 0:
tokens_with_hypens.append(self.tokens[i:i + 1])
else:
tokens_with_hypens.append(self.tokens[i:i + hyp_leng])
i += hyp_leng - 1
return tokens_with_hypens
What do I wrong? Is there a more performant solution? Thanks
I found 3 mistakes in your code:
1) You are comparing the last 2 characters of tok1 here, rather than the last of tok1 and the first of tok2:
if "-" in joined[len(tok1) - 2: len(tok1)]:
# instead, do this:
if "-" in joined[len(tok1) - 1: len(tok1) + 1]:
2) You are omitting the last matching token here. Increase the end-index of your slice here by 1:
tokens_with_hypens.append(self.tokens[i:i + hyp_leng])
# instead, do this:
tokens_with_hypens.append(self.tokens[i:i + 1 + hyp_leng])
3) You cannot manipulate the index of a for i in range loop in python. the next iteration will just retrieve the next index, overwriting your change. Instead, you could use a while-loop like this:
i = 0
while i < len(self.tokens):
[...]
i += 1
These 3 corrections lead to your test passing: http://goo.gl/fd07oL
Nonetheless I couldn't resist to write an algorithm from scratch, solving your problem as simple as possible:
def get_hyphen_groups(tokens):
i_start, i_end = 0, 1
while i_start < len(tokens):
while (i_end < len(tokens) and
(tokens[i_end].startswith("-") ^ tokens[i_end - 1].endswith("-"))):
i_end += 1
yield tokens[i_start:i_end]
i_start, i_end = i_end, i_end + 1
tokens = ["Tap-", "Berlin", "Was-ISt", "das", "-ist", "cool", "oh", "Man", "-Hum", "-Zuh-UH-", "glit"]
for group in get_hyphen_groups(tokens):
print ("".join(group))
To exclude 1-element-groups, like in your expected result, wrap the yield into this if:
if i_end - i_start > 1:
yield tokens[i_start:i_end]
To include 1-element-groups that already include a hyphen, change that if to this for example:
if i_end - i_start > 1 or "-" in tokens[i_start]:
yield tokens[i_start:i_end]
One thing that is wrong with your approach is trying to change the value of i in the for i in range(len(self.tokens)) loop. It won't work because the value of i will get the next value from range in each iteration, ignoring your changes.
I changed your algorithm to use an iterative algorithm that pops one item at the time from the list and decides what to do with it. It uses a buffer where it stored items belonging to one chain until it's complete.
The full code is:
class Hyper:
def __init__(self, tokens):
self.tokens = tokens
def find_hyphens(self):
tokens_with_hypens =[]
copy = list(self.tokens)
buffer = []
while len(copy) > 0:
item = copy.pop(0)
if self.has_hyphen_in_middle(item) and item[0] != '-' and item[-1] != '-':
# words with hyphens that are not part of a bigger chain
tokens_with_hypens.append([item])
elif item[-1] == '-' or (len(copy) > 0 and copy[0][0] == '-'):
# part of a chain - append to the buffer
buffer.append(item)
elif len(buffer) > 0:
# the last word in a chain - the buffer contains the complete chain
buffer.append(item)
tokens_with_hypens.append(buffer)
buffer = []
return tokens_with_hypens
#staticmethod
def has_hyphen_in_middle(input):
return len(input) > 2 and "-" in input[1:-2]
tokens = ["Tap-", "Berlin", "Was-ISt", "das", "-ist", "cool", "oh", "Man", "-Hum", "-Zuh-UH-", "glit"]
hyper = Hyper(tokens)
result = hyper.find_hyphens()
print(result)
print(result == [["Tap-", "Berlin"], ["Was-ISt"], ["das", "-ist"], ["Man", "-Hum", "-Zuh-UH-", "glit"]])

How do you reverse the words in a string using python (manually)? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Reverse the ordering of words in a string
I know there are methods that python already provides for this, but I'm trying to understand the basics of how those methods work when you only have the list data structure to work with. If I have a string hello world and I want to make a new string world hello, how would I think about this?
And then, if I can do it with a new list, how would I avoid making a new list and do it in place?
Split the string, make a reverse iterator then join the parts back.
' '.join(reversed(my_string.split()))
If you are concerned with multiple spaces, change split() to split(' ')
As requested, I'm posting an implementation of split (by GvR himself from the oldest downloadable version of CPython's source code: Link)
def split(s,whitespace=' \n\t'):
res = []
i, n = 0, len(s)
while i < n:
while i < n and s[i] in whitespace:
i = i+1
if i == n:
break
j = i
while j < n and s[j] not in whitespace:
j = j+1
res.append(s[i:j])
i = j
return res
I think now there are more pythonic ways of doing that (maybe groupby) and the original source had a bug (if i = n:, corrrected to ==)
Original Answer
from array import array
def reverse_array(letters, first=0, last=None):
"reverses the letters in an array in-place"
if last is None:
last = len(letters)
last -= 1
while first < last:
letters[first], letters[last] = letters[last], letters[first]
first += 1
last -= 1
def reverse_words(string):
"reverses the words in a string using an array"
words = array('c', string)
reverse_array(words, first=0, last=len(words))
first = last = 0
while first < len(words) and last < len(words):
if words[last] != ' ':
last += 1
continue
reverse_array(words, first, last)
last += 1
first = last
if first < last:
reverse_array(words, first, last=len(words))
return words.tostring()
Answer using list to match updated question
def reverse_list(letters, first=0, last=None):
"reverses the elements of a list in-place"
if last is None:
last = len(letters)
last -= 1
while first < last:
letters[first], letters[last] = letters[last], letters[first]
first += 1
last -= 1
def reverse_words(string):
"""reverses the words in a string using a list, with each character
as a list element"""
characters = list(string)
reverse_list(characters)
first = last = 0
while first < len(characters) and last < len(characters):
if characters[last] != ' ':
last += 1
continue
reverse_list(characters, first, last)
last += 1
first = last
if first < last:
reverse_list(characters, first, last=len(characters))
return ''.join(characters)
Besides renaming, the only change of interest is the last line.
You have a string:
str = "A long string to test this algorithm"
Split the string (at word boundary -- no arguments to split):
splitted = str.split()
Reverse the array obtained -- either using ranges or a function
reversed = splitted[::-1]
Concatenate all words with spaces in between -- also known as joining.
result = " ".join(reversed)
Now, you don't need so many temps, combining them into one line gives:
result = " ".join(str.split()[::-1])
str = "hello world"
" ".join(str.split()[::-1])

Categories