Python Recursive enumerate(), with Start Value Incrementation - python

I am attempting to create a Python function which parses through a bracket representation of a binary tree and outputs a line-by-line bipartite graph representation of it, where the partitions are separated by a "|", thus:
Binary tree bracket representation:
(((A, B), C), D)
Bipartite graph relationship output:
A B | C D
A B C | D
I approached it using recursion, maintaining each bipartite relationship line in a list, taking the original bracket notation string and the starting index of parsing as input.
def makeBipartRecursive(treeStr, index):
bipartStr = ""
bipartList = []
for ind, char in enumerate(treeStr, start=index):
if char == '(':
# Make recursive call to makeBipartRecursive
indx = ind
bipartList.append(makeBipartRecursive(treeStr, indx+1))
elif char == ')':
group1 = treeStr[0:index-1] + treeStr[ind+1::]
group2 = treeStr[index:ind]
return group1 + " | " + group2
elif char == ',':
bipartStr += " "
else:
# Begin construction of string
bipartStr += char
Each time an open-parenthesis is encountered, a recursive call is made, beginning enumeration at the index immediately following the open-parenthesis, so as to prevent infinite recursion (Or so I think). If possible, try to ignore the fact that I'm not actually returning a list. The main issue is that I encounter infinite recursion, where the enumeration never progresses beyond the first character in my string. Should my recursive call with the incremented start position for the enumeration not fix this?
Thanks in advance.

You are misinterpreting the use of the start parameter of enumerate. It does not mean start the enumeration at this position but start counting from this index. See help(enumerate):
| The enumerate object yields pairs containing a count (from start, which
| defaults to zero) and a value yielded by the iterable argument.
So basically each time you perform a recursive call you start again from the beginning of your string.

Related

Parsing a block of mathematical expressions and separate the terms

In a textfile, I have a block of text between 2 keywords (let's call them "keyword1" and "keyword2") which consists in a big mathematical expression which is a sum of smaller expressions and could be more or less complex. x"random_number" refer to some variables which are numbered.
For example, this could be like this :
keyword1 x47*ln(x46+2*x38) + (x35*x24 + exp(x87 + x56))^2 - x34 + ...
+ .....
+ .....
keyword2
All I want to do is to separate this big mathematical expression in the terms it is coumpound with and stock these "atomic" terms in a list for example so that every term which appear in the sum (if it is negative, this should be - term)
With the example above, this should return this :
L = [x47*ln(x46+2*x38), (x35*x24 + exp(x87 + x56))^2, - x34, ...]
I would try to use a regex which matches with the + or - symbol which separates terms between them but I think this is wrong because it will also match the +/- symbols which appears in smaller expressions which I don't want to be separated
So I'm a bit triggered with this
Thank you in advance for helping me solve my problem guys
I think for extracting the part between the keywords, a regex will work just fine. With the help of an online regex creator you should be able to create that. Then you have the string left with the mathematical formula in it.
Essentially what you want is to split the string at all places where the bracket 'depth' is 0. For example, if you have x1*(x2+x3)+x4 the + between the brackets should be ignored.
I wrote the following function which searches though the list and keeps track of the current bracket depth. If the depth is 0 and a + or - is encountered, the index is stored. In the end, we can split the string at these indices to obtain the split you require. I first wrote a recursive variant, but the iterative variant works just as well and is probably easier to understand.
Recursive function
def find_split_indexes(block, index=0, depth=0, indexes=[]):
# return when the string has been searched entirely
if index >= len(block):
return indexes
# change the depth when a bracket is encountered
if block[index] == '(':
depth += 1
elif block[index] == ')':
depth -= 1
# if a + or minus is encountered at depth 0, store the index
if depth == 0 and (block[index] == '+' or block[index] == '-'):
indexes.append(index)
# finally return the list of indexes
return find_split_indexes(block, index+1, depth, indexes)
Iterative function
Of course an iterative (using a loop) version of this function can also be created, and is likely a bit simpler to understand
def find_split_indexes_iterative(block):
indexes = []
depth = 0
# iterate over the string
for index in range(len(block)):
if block[index] == '(':
depth += 1
elif block[index] == ')':
depth -= 1
elif depth == 0 and (block[index] == '+' or block[index] == '-'):
indexes.append(index)
return indexes
Using the indices
To then use these indices, you can, for instance, split the string as explained in this other question to obtain the parts you want. The only thing left to do is remove the leading and trailing spaces.

Python strings: quickly summarize the character count in order of appearance

Let's say I have the following strings in Python3.x
string1 = 'AAAAABBBBCCCDD'
string2 = 'CCBADDDDDBACDC'
string3 = 'DABCBEDCCAEDBB'
I would like to create a summary "frequency string" that counts the number of characters in the string in the following format:
string1_freq = '5A4B3C2D' ## 5 A's, followed by 4 B's, 3 C's, and 2D's
string2_freq = '2C1B1A5D1B1A1C1D1C'
string3_freq = '1D1A1B1C1B1E1D2C1A1E1D2B'
My problem:
How would I quickly create such a summary string?
My idea would be: create an empty list to keep track of the count. Then create a for loop which checks the next character. If there's a match, increase the count by +1 and move to the next character. Otherwise, append to end of the string 'count' + 'character identity'.
That's very inefficient in Python. Is there a quicker way (maybe using the functions below)?
There are several ways to count the elements of a string in python. I like collections.Counter, e.g.
from collections import Counter
counter_str1 = Counter(string1)
print(counter_str1['A']) # 5
print(counter_str1['B']) # 4
print(counter_str1['C']) # 3
print(counter_str1['D']) # 2
There's also str.count(sub[, start[, end]
Return the number of non-overlapping occurrences of substring sub in
the range [start, end]. Optional arguments start and end are
interpreted as in slice notation.
As an example:
print(string1.count('A')) ## 5
The following code accomplishes the task without importing any modules.
def freq_map(s):
num = 0 # number of adjacent, identical characters
curr = s[0] # current character being processed
result = '' # result of function
for i in range(len(s)):
if s[i] == curr:
num += 1
else:
result += str(num) + curr
curr = s[i]
num = 1
result += str(num) + curr
return result
Note: Since you requested a solution based on performance, I suggest you use this code or a modified version of it.
I have executed rough performance test against the code provided by CoryKramer for reference. This code performed the same function in 58% of the time without using external modules. The snippet can be found here.
I would use itertools.groupby to group consecutive runs of the same letter. Then use a generator expression within join to create a string representation of the count and letter for each run.
from itertools import groupby
def summarize(s):
return ''.join(str(sum(1 for _ in i[1])) + i[0] for i in groupby(s))
Examples
>>> summarize(string1)
'5A4B3C2D'
>>> summarize(string2)
'2C1B1A5D1B1A1C1D1C'
>>> summarize(string3)
'1D1A1B1C1B1E1D2C1A1E1D2B'

Taking long time to execute Python code for the definition

This is the problem definition:
Given a string of lowercase letters, determine the index of the
character whose removal will make a palindrome. If is already a
palindrome or no such character exists, then print -1. There will always
be a valid solution, and any correct answer is acceptable. For
example, if "bcbc", we can either remove 'b' at index or 'c' at index.
I tried this code:
# !/bin/python
import sys
def palindromeIndex(s):
# Complete this function
length = len(s)
index = 0
while index != length:
string = list(s)
del string[index]
if string == list(reversed(string)):
return index
index += 1
return -1
q = int(raw_input().strip())
for a0 in xrange(q):
s = raw_input().strip()
result = palindromeIndex(s)
print(result)
This code works for the smaller values. But taken hell lot of time for the larger inputs.
Here is the sample: Link to sample
the above one is the bigger sample which is to be decoded. But at the solution must run for the following input:
Input (stdin)
3
aaab
baa
aaa
Expected Output
3
0
-1
How to optimize the solution?
Here is a code that is optimized for the very task
def palindrome_index(s):
# Complete this function
rev = s[::-1]
if rev == s:
return -1
for i, (a, b) in enumerate(zip(s, rev)):
if a != b:
candidate = s[:i] + s[i + 1:]
if candidate == candidate[::-1]:
return i
else:
return len(s) - i - 1
First we calculate the reverse of the string. If rev equals the original, it was a palindrome to begin with. Then we iterate the characters at the both ends, keeping tab on the index as well:
for i, (a, b) in enumerate(zip(s, rev)):
a will hold the current character from the beginning of the string and b from the end. i will hold the index from the beginning of the string. If at any point a != b then it means that either a or b must be removed. Since there is always a solution, and it is always one character, we test if the removal of a results in a palindrome. If it does, we return the index of a, which is i. If it doesn't, then by necessity, the removal of b must result in a palindrome, therefore we return its index, counting from the end.
There is no need to convert the string to a list, as you can compare strings. This will remove a computation that is called a lot thus speeding up the process. To reverse a string, all you need to do is used slicing:
>>> s = "abcdef"
>>> s[::-1]
'fedcba'
So using this, you can re-write your function to:
def palindromeIndex(s):
if s == s[::-1]:
return -1
for i in range(len(s)):
c = s[:i] + s[i+1:]
if c == c[::-1]:
return i
return -1
and the tests from your question:
>>> palindromeIndex("aaab")
3
>>> palindromeIndex("baa")
0
>>> palindromeIndex("aaa")
-1
and for the first one in the link that you gave, the result was:
16722
which computed in about 900ms compared to your original function which took 17000ms but still gave the same result. So it is clear that this function is a drastic improvement. :)

Algorithm to compute edit set for transforming one string into another?

I'd like to compute the edits required to transform one string, A, into another string B using only inserts and deletions, with the minimum number of operations required.
So something like "kitten" -> "sitting" would yield a list of operations something like ("delete at 0", "insert 's' at 0", "delete at 4", "insert 'i' at 3", "insert 'g' at 6")
Is there an algorithm to do this, note that I don't want the edit distance, I want the actual edits.
I had an assignment similar to this at one point. Try using an A* variant. Construct a graph of possible 'neighbors' for a given word and search outward using A* with the distance heuristic being the number of letter needed to change in the current word to reach the target. It should be clear as to why this is a good heuristic-it's always going to underestimate accurately. You could think of a neighbor as a word that can be reached from the current word only using one operation. It should be clear that this algorithm will correctly solve your problem optimally with slight modification.
I tried to make something that works, at least for your precise case.
word_before = "kitten"
word_after = "sitting"
# If the strings aren't the same length, we stuff the smallest one with spaces
if len(word_before) > len(word_after):
word_after += " "*(len(word_before)-len(word_after))
elif len(word_before) < len(word_after):
word_before += " "*(len(word_after)-len(word_before))
operations = []
for idx, char in enumerate(word_before):
if char != word_after[idx]:
if char != " ":
operations += ["delete at "+str(idx)]
operations += ["insert '"+word_after[idx]+"' at "+str(idx)]
print(operations)
This should be what you're looking for, using itertools.zip_longest to zip the lists together and iterate over them in pairs compares them and applies the correct operation, it appends the operation to a list at the end of each operation, it compares the lists if they match and breaks out or continues if they don't
from itertools import zip_longest
a = "kitten"
b = "sitting"
def transform(a, b):
ops = []
for i, j in zip_longest(a, b, fillvalue=''):
if i == j:
pass
else:
index = a.index(i)
print(a, b)
ops.append('delete {} '.format(i)) if i != '' else ''
a = a.replace(i, '')
if a == b:
break
ops[-1] += 'insert {} at {},'.format(j, index if i not in b else b.index(j))
return ops
result = transform(a, b)
print(result, ' {} operation(s) was carried out'.format(len(result)))
Since you only have delete and insert operations, this is an instance of the Longest Common Subsequence Problem : https://en.wikipedia.org/wiki/Longest_common_subsequence_problem
Indeed, there is a common subsequence of length k in two strings S and T, S of length n and T of length m, if and only only you can transform S into T with m+n-2k insert and delete operations. Think about this as intuition : the order of the letters is preserved both when adding and deleting letters, as well as when taking a subsequence.
EDIT : since you asked for the list of edits, a possible way to do the edits is to first remove all the characters of S not in the common subsequence, and then insert all the characters of T that are not the in common subsequence.

Displaying multiple substring indices within a string by using recursion

def subStringMatchExact(target, key):
if (target.find(key) == -1):
return []
else:
foundStringAt = [target.find(key)]
target = target[foundStringAt[0] + len(key):]
return foundStringAt + subStringMatchExact(target, key)
string = subStringMatchExact("your code works with wrongly correlated coefficients which incorporates more costs", "co")
print(string)
Current incorrect output:
[5, 22, 9, 19, 14]
I am having trouble summing the length of the substring on the previous recursion step. Like the second element of the list should be 29 instead of 22 as in len(previousSubstring) + len(key) - 1 + len(currentSubstring).
Any ideas to improve my code and/or fix my error too?
The fast way
You don't have to implement your own solution, its already done! Use the finditer function from the re module:
>>> import re
>>> s = 'your code works with wrongly correlated coefficients which incorporates more costs'
>>> matches = re.finditer('co', s)
>>> positions = [ match.start() for match in matches ]
>>> positions
[5, 29, 40, 61, 77]
Your own way
If you want to make your own implementation (using recursion) you could take advantage of the extra arguments of the str.find function. Lets see what help(str.find) says about it:
S.find(sub [,start [,end]]) -> int
Return the lowest index in S where substring sub is found,
such that sub is contained within s[start:end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
There is an extra argument called start that tells str.find where to start searching the substring. That's just what we need!
So, modifying your implementation, we can get a simple, fast and beautiful solution:
def substring_match_exact(pattern, string, where_should_I_start=0):
# Save the result in a variable to avoid doing the same thing twice
pos = string.find(pattern, where_should_I_start)
if pos == -1:
# Not found!
return []
# No need for an else statement
return [pos] + substring_match_exact(pattern, string, pos + len(key))
What is the recursion doing here?
You're first searching the substring in the string starting at position 0.
If the substring wasn't found, an empty list is returned [].
If the substring was found, it will be returned [pos] plus all the positions where the substring will appear in the string starting at position pos + len(key).
Using our brand new function
>>> s = 'your code works with wrongly correlated coefficients which incorporates more costs'
>>> substring_match_exact('co', s)
[5, 29, 40, 61, 77]
Currently, your code is attempting to find the index of co in the shortened string, rather than the original string. Therefore, while [5, 22, 9, 19, 14] may seem incorrect, the script is doing exactly what you told it to do. By including an offset, like the script below, this code could work.
def subStringMatchExact(target, key, offset=0): # note the addition of offset
if (target.find(key) == -1):
return []
else:
foundStringAt = target.find(key)
target = target[foundStringAt + len(key):]
foundStringAt += offset # added
return [foundStringAt] + subStringMatchExact(target, key, foundStringAt + len(key))
# added foundStringAt + len(key) part
string = subStringMatchExact("your code works with wrongly correlated coefficients which incorporates more costs", "co")
# no need to call w/ 0 since offset defaults to 0 if no offset is given
print(string)
I should add that making foundStringAt a list from the beginning isn't great practice when dealing with only one value, as you add some overhead with every [0] index lookup. Instead, since you want a list return type, you should just enclose it in [] in the return statement (as shown in my code).
You are always adding the position in the respective substring. In
return foundStringAt + subStringMatchExact(target, key)
, the result of the function call is related to the "new" string target which is different from the "Old" one, as it was redefined with target = target[foundStringAt[0] + len(key):].
So you should add exactly this value to the function call results:
foundStringAt = target.find(key)
offset = foundStringAt + len(key)
target = target[offset:]
return [foundStringAt] + [i + offset for i in subStringMatchExact(target, key)]
should do the trick (untested).
I wouldn't bother using recursion for this, except as an exercise.
To fix the problem:
I am having trouble summing the length of the substring on the previous recursion step.
What you really want to "sum" is the amount of the string that has already been searched. Pass this to the function as a parameter (use 0 for the first call), adding the amount of string removed (foundStringAt[0] + len(key):, currently) to the input value for the recursive call.
As a matter of formatting (and to make things correspond better to their names), you'll probably find it neater to let foundStringAt store the result directly (instead of that 1-element list), and do the list wrapping as part of the expression with the recursive call.

Categories