Parsing a block of mathematical expressions and separate the terms - python

In a textfile, I have a block of text between 2 keywords (let's call them "keyword1" and "keyword2") which consists in a big mathematical expression which is a sum of smaller expressions and could be more or less complex. x"random_number" refer to some variables which are numbered.
For example, this could be like this :
keyword1 x47*ln(x46+2*x38) + (x35*x24 + exp(x87 + x56))^2 - x34 + ...
+ .....
+ .....
keyword2
All I want to do is to separate this big mathematical expression in the terms it is coumpound with and stock these "atomic" terms in a list for example so that every term which appear in the sum (if it is negative, this should be - term)
With the example above, this should return this :
L = [x47*ln(x46+2*x38), (x35*x24 + exp(x87 + x56))^2, - x34, ...]
I would try to use a regex which matches with the + or - symbol which separates terms between them but I think this is wrong because it will also match the +/- symbols which appears in smaller expressions which I don't want to be separated
So I'm a bit triggered with this
Thank you in advance for helping me solve my problem guys

I think for extracting the part between the keywords, a regex will work just fine. With the help of an online regex creator you should be able to create that. Then you have the string left with the mathematical formula in it.
Essentially what you want is to split the string at all places where the bracket 'depth' is 0. For example, if you have x1*(x2+x3)+x4 the + between the brackets should be ignored.
I wrote the following function which searches though the list and keeps track of the current bracket depth. If the depth is 0 and a + or - is encountered, the index is stored. In the end, we can split the string at these indices to obtain the split you require. I first wrote a recursive variant, but the iterative variant works just as well and is probably easier to understand.
Recursive function
def find_split_indexes(block, index=0, depth=0, indexes=[]):
# return when the string has been searched entirely
if index >= len(block):
return indexes
# change the depth when a bracket is encountered
if block[index] == '(':
depth += 1
elif block[index] == ')':
depth -= 1
# if a + or minus is encountered at depth 0, store the index
if depth == 0 and (block[index] == '+' or block[index] == '-'):
indexes.append(index)
# finally return the list of indexes
return find_split_indexes(block, index+1, depth, indexes)
Iterative function
Of course an iterative (using a loop) version of this function can also be created, and is likely a bit simpler to understand
def find_split_indexes_iterative(block):
indexes = []
depth = 0
# iterate over the string
for index in range(len(block)):
if block[index] == '(':
depth += 1
elif block[index] == ')':
depth -= 1
elif depth == 0 and (block[index] == '+' or block[index] == '-'):
indexes.append(index)
return indexes
Using the indices
To then use these indices, you can, for instance, split the string as explained in this other question to obtain the parts you want. The only thing left to do is remove the leading and trailing spaces.

Related

Trying to solve the n-parenthesis problem - but failing

I am trying to implement a solution to the 'n-parenthesis problem'
def gen_paren_pairs(n):
def gen_pairs(left_count, right_count, build_str, build_list=[]):
print(f'left count is:{left_count}, right count is:{right_count}, build string is:{build_str}')
if left_count == 0 and right_count == 0:
build_list.append(build_str)
print(build_list)
return build_list
if left_count > 0:
build_str += "("
gen_pairs(left_count - 1, right_count, build_str, build_list)
if left_count < right_count:
build_str += ")"
#print(f'left count is:{left_count}, right count is:{right_count}, build string is:{build_str}')
gen_pairs(left_count, right_count - 1, build_str, build_list)
in_str = ""
gen_pairs(n,n,in_str)
gen_paren_pairs(2)
It almost works but isn't quite there.
The code is supposed to generate a list of correctly nested brackets whose count matches the input 'n'
Here is the final contents of a list. Note that the last string starts with an unwanted left bracket.
['(())', '(()()']
Please advise.
Here's a less convoluted approach:
memory = {0:[""]}
def gp(n):
if n not in memory:
local_mem = []
for a in range(n):
part1s = list(gp(a))
for p2 in gp(n-1-a):
for p1 in part1s:
pat = "("+p1+")"+p2
local_mem.append(pat)
memory[n] = local_mem
return memory[n]
The idea is to take one pair of parentheses, go over all the ways to divide the remaining N-1 pairs between going inside that pair and going after it, find the set of patterns for each of those sizes, and make all of the combinations.
To eliminate redundant computation, we save the values returned for each input n, so if asked for the same n again, we can just look it up.

Python Recursive enumerate(), with Start Value Incrementation

I am attempting to create a Python function which parses through a bracket representation of a binary tree and outputs a line-by-line bipartite graph representation of it, where the partitions are separated by a "|", thus:
Binary tree bracket representation:
(((A, B), C), D)
Bipartite graph relationship output:
A B | C D
A B C | D
I approached it using recursion, maintaining each bipartite relationship line in a list, taking the original bracket notation string and the starting index of parsing as input.
def makeBipartRecursive(treeStr, index):
bipartStr = ""
bipartList = []
for ind, char in enumerate(treeStr, start=index):
if char == '(':
# Make recursive call to makeBipartRecursive
indx = ind
bipartList.append(makeBipartRecursive(treeStr, indx+1))
elif char == ')':
group1 = treeStr[0:index-1] + treeStr[ind+1::]
group2 = treeStr[index:ind]
return group1 + " | " + group2
elif char == ',':
bipartStr += " "
else:
# Begin construction of string
bipartStr += char
Each time an open-parenthesis is encountered, a recursive call is made, beginning enumeration at the index immediately following the open-parenthesis, so as to prevent infinite recursion (Or so I think). If possible, try to ignore the fact that I'm not actually returning a list. The main issue is that I encounter infinite recursion, where the enumeration never progresses beyond the first character in my string. Should my recursive call with the incremented start position for the enumeration not fix this?
Thanks in advance.
You are misinterpreting the use of the start parameter of enumerate. It does not mean start the enumeration at this position but start counting from this index. See help(enumerate):
| The enumerate object yields pairs containing a count (from start, which
| defaults to zero) and a value yielded by the iterable argument.
So basically each time you perform a recursive call you start again from the beginning of your string.

Algorithm to print all valid combinations of n pairs of parentheses [duplicate]

This question already has answers here:
Algorithm to print all valid combations of n pairs of parenthesis
(3 answers)
Closed 2 years ago.
This is a very popular interview question and there are tons of pages on the internet about the solution to this problem.
eg. Calculating the complexity of algorithm to print all valid (i.e., properly opened and closed) combinations of n-pairs of parentheses
So before marking this as a duplicate question please read the full details.
I implemented my own solution to this problem but I'm missing some edge cases that I'm having a hard time to figure out.
def get_all_parens(num):
if num == 0:
return []
if num == 1:
return ['()']
else:
sub_parens = get_all_parens(num - 1)
temp = []
for parens in sub_parens:
temp.append('(' + parens + ')')
temp.append('()' + parens)
temp.append(parens + '()')
return set(temp)
there is basically a recursive call to subproblems and putting parenthesis around the combinations from subproblem.
For num = 4, it returns 13 possible combinations however the correct answer is 14, and the missing one is (())(())
I'm not sure what I'm doing wrong here. is this a right direction I'm moving towards or it's a completely wrong approach?
For the first time reader here is the question:
Implement an algorithm to print all valid (e.g., properly opened and closed) combinations of n pairs of parentheses.
E.G Input: 3, Output: ()()(), ()(()), (())(), (()()), ((()))
It looks like a wrong approach.
As you can see in your failure case (())(()) your algorithm may only obtain such string by placing parenthesis around ())((). Unfortunately the latter is not a valid combination, and cannot be generated: the prior recursive call only builds valid ones.
There are many things to correct in your approach.
recursion - it is not the fastest solution
returning set from list with duplicates (did you consider only set instead of list?)
approach of generating only 3 types of new combinations:
a) surrounding parentheses
b) parentheses on the left
c) parentheses on the right,
which also generates many duplications and omits the symmetrical results
You can try to add one additional loop (it will not reduce problems mentioned above) but it will add the expected results to the returned set.
I modified your function by adding only one loop (my proposition is to use every position of ( and add parentheses in the middle of that string):
def get_all_parens(num):
if num == 0:
return []
if num == 1:
return ['()']
else:
sub_parens = get_all_parens(num - 1)
temp = []
for parens in sub_parens:
temp.append('()' + parens)
temp.append('(' + parens + ')')
temp.append(parens + '()')
# added loop
last_index = 0
for _ in range(parens.count('(')):
temp.append(parens[:last_index] + '()' + parens[last_index:])
last_index = parens.index('(', last_index) + 1
# end of added loop
return set(temp)
EDIT:
I propose linear version of that algorithm:
def get_all_combinations(n):
results = set()
for i in range(n):
new_results = set()
if i == 0:
results = {"()"}
continue
for it in results:
output = set()
last_index = 0
for _ in range(it.count("(")):
output.add(it[:last_index] + "()" + it[last_index:])
last_index = it.index("(", last_index) + 1
output.add(it[:last_index] + "()" + it[last_index:])
new_results.update(output)
results = new_results
return list(results), len(results)

how to make an imputed string to a list, change it to a palindrome(if it isn't already) and reverse it as a string back

A string is palindrome if it reads the same forward and backward. Given a string that contains only lower case English alphabets, you are required to create a new palindrome string from the given string following the rules gives below:
1. You can reduce (but not increase) any character in a string by one; for example you can reduce the character h to g but not from g to h
2. In order to achieve your goal, if you have to then you can reduce a character of a string repeatedly until it becomes the letter a; but once it becomes a, you cannot reduce it any further.
Each reduction operation is counted as one. So you need to count as well how many reductions you make. Write a Python program that reads a string from a user input (using raw_input statement), creates a palindrome string from the given string with the minimum possible number of operations and then prints the palindrome string created and the number of operations needed to create the new palindrome string.
I tried to convert the string to a list first, then modify the list so that should any string be given, if its not a palindrome, it automatically edits it to a palindrome and then prints the result.after modifying the list, convert it back to a string.
c=raw_input("enter a string ")
x=list(c)
y = ""
i = 0
j = len(x)-1
a = 0
while i < j:
if x[i] < x[j]:
a += ord(x[j]) - ord(x[i])
x[j] = x[i]
print x
else:
a += ord(x[i]) - ord(x[j])
x [i] = x[j]
print x
i = i + 1
j = (len(x)-1)-1
print "The number of operations is ",a print "The palindrome created is",( ''.join(x) )
Am i approaching it the right way or is there something I'm not adding up?
Since only reduction is allowed, it is clear that the number of reductions for each pair will be the difference between them. For example, consider the string 'abcd'.
Here the pairs to check are (a,d) and (b,c).
Now difference between 'a' and 'd' is 3, which is obtained by (ord('d')-ord('a')).
I am using absolute value to avoid checking which alphabet has higher ASCII value.
I hope this approach will help.
s=input()
l=len(s)
count=0
m=0
n=l-1
while m<n:
count+=abs(ord(s[m])-ord(s[n]))
m+=1
n-=1
print(count)
This is a common "homework" or competition question. The basic concept here is that you have to find a way to get to minimum values with as few reduction operations as possible. The trick here is to utilize string manipulation to keep that number low. For this particular problem, there are two very simple things to remember: 1) you have to split the string, and 2) you have to apply a bit of symmetry.
First, split the string in half. The following function should do it.
def split_string_to_halves(string):
half, rem = divmod(len(string), 2)
a, b, c = '', '', ''
a, b = string[:half], string[half:]
if rem > 0:
b, c = string[half + 1:], string[rem + 1]
return (a, b, c)
The above should recreate the string if you do a + c + b. Next is you have to convert a and b to lists and map the ord function on each half. Leave the remainder alone, if any.
def convert_to_ord_list(string):
return map(ord, list(string))
Since you just have to do a one-way operation (only reduction, no need for addition), you can assume that for each pair of elements in the two converted lists, the higher value less the lower value is the number of operations needed. Easier shown than said:
def convert_to_palindrome(string):
halfone, halftwo, rem = split_string_to_halves(string)
if halfone == halftwo[::-1]:
return halfone + halftwo + rem, 0
halftwo = halftwo[::-1]
zipped = zip(convert_to_ord_list(halfone), convert_to_ord_list(halftwo))
counter = sum([max(x) - min(x) for x in zipped])
floors = [min(x) for x in zipped]
res = "".join(map(chr, floors))
res += rem + res[::-1]
return res, counter
Finally, some tests:
target = 'ideal'
print convert_to_palindrome(target) # ('iaeai', 6)
target = 'euler'
print convert_to_palindrome(target) # ('eelee', 29)
target = 'ohmygodthisisinsane'
print convert_to_palindrome(target) # ('ehasgidihmhidigsahe', 84)
I'm not sure if this is optimized nor if I covered all bases. But I think this pretty much covers the general concept of the approach needed. Compared to your code, this is clearer and actually works (yours does not). Good luck and let us know how this works for you.

python recursion with bubble sort

So, i have this problem where i recieve 2 strings of letters ACGT, one with only letters, the other contain letters and dashes "-".both are same length. the string with the dashes is compared to the string without it. cell for cell. and for each pairing i have a scoring system. i wrote this code for the scoring system:
for example:
dna1: -ACA
dna2: TACG
the scoring is -1. (because dash compared to a letter(T) gives -2, letter compared to same letter gives +1 (A to A), +1 (C to C) and non similar letters give (-1) so sum is -1.
def get_score(dna1, dna2, match=1, mismatch=-1, gap=-2):
""""""
score = 0
for index in range(len(dna1)):
if dna1[index] is dna2[index]:
score += match
elif dna1[index] is not dna2[index]:
if "-" not in (dna1[index], dna2[index]):
score += mismatch
else:
score += gap
this is working fine.
now i have to use recursion to give the best possible score for 2 strings.
i recieve 2 strings, they can be of different sizes this time. ( i cant change the order of letters).
so i wrote this code that adds "-" as many times needed to the shorter string to create 2 strings of same length and put them in the start of list. now i want to start moving the dashes and record the score for every dash position, and finally get the highest posibble score. so for moving the dashes around i wrote a litle bubble sort.. but it dosnt seem to do what i want. i realize its a long quesiton but i'd love some help. let me know if anything i wrote is not understood.
def best_score(dna1, dna2, match=1, mismatch=-1, gap=-2,\
score=[], count=0):
""""""
diff = abs(len(dna1) - len(dna2))
if len(dna1) is len(dna2):
short = []
elif len(dna1) < len(dna2):
short = [base for base in iter(dna1)]
else:
short = [base for base in iter(dna2)]
for i in range(diff):
short.insert(count, "-")
for i in range(diff+count, len(short)-1):
if len(dna1) < len(dna2):
score.append((get_score(short, dna2),\
''.join(short), dna2))
else:
score.append((get_score(dna1, short),\
dna1, ''.join(short)))
short[i+1], short[i] = short[i], short[i+1]
if count is min(len(dna1), len(dna2)):
return score[score.index(max(score))]
return best_score(dna1, dna2, 1, -1, -2, score, count+1)
First, if I correctly deciephered your cost function, your best score value do not depend on gap, as number of dashes is fixed.
Second, it is lineary dependent on number of mismatches and so doesn't depend on match and mismatch exact values, as long as they are positive and negative respectively.
So your task reduces to lookup of a longest subsequence of longest string letters strictly matching subsequence of letters of the shortest one.
Third, define by M(string, substr) function returnin length of best match from above. If you smallest string fisrt letter is S, that is substr == 'S<letters>', then
M(string, 'S<letters>') = \
max(1 + M(string[string.index(S):], '<letters>') + # found S
M(string[1:], '<letters>')) # letter S not found, placed at 1st place
latter is an easy to implement recursive expression.
For a pair string, substr denoting m=M(string, substr) best score is equal
m * match + (len(substr) - m) * mismatch + (len(string)-len(substr)) * gap
It is straightforward, storing what value was max in recursive expression, to find what exactly best match is.

Categories