Getting the expressions inside a mathematical equation

Getting the expressions inside a mathematical equation - python

I have a equation I get in a file something like (((2+1)*(4+5))/2). What I am looking for is the distinct mathematical expressions inside it.
In this case:
2+1
4+5
(2+1)*(4+5) and finally ((2+1)*(4+5))/2.
I started by lookins at these How can I split a string of a mathematical expressions in python?
But not able to arrive at a solution..
Can you please help.

You can make a bare-bones parser by iterating through the string and each time you find an opening parenthesis, push the index to a stack. When you find a closing parenthesis, pop off the last thing the in the stack and take a slice from that to where you are now:
stack = []
s = "(((2+1)*(4+5))/2)"
for i, c in enumerate(s):
if c == "(":
stack.append(i+1)
if c == ")":
f = stack.pop()
print(s[f:i])
Result
2+1
4+5
(2+1)*(4+5)
((2+1)*(4+5))/2
If pop() doesn't work or you have something left in the stack when you're done, you don't have balanced parenthesis — this can be fleshed out to do error checking.

Related

Parsing a Chemistry Formula in Python

I am trying to solve this problem: https://leetcode.com/articles/number-of-atoms/#approach-1-recursion-accepted.
The question is: given a formula like C(Mg2(OH)4)2, return a hash table with elements and their counts. Element names always start with a capital letter and may be followed by a small letter.
I thought that I will first start by solving the simplest case: no brackets.
def bracket_hash(formula):
element = ""
atom_count = 0
element_hash = {}
for x in formula:
if x.isupper():
if element!="":
element_hash[element] = 1
element = ""
element = x
elif x.islower():
element += x
else:
element_count = int(x)
element_hash[element] = element_count
element_count = 0
element = ""
if element!="":
element_hash[element] = 1
return element_hash
This code works perfectly fine for cases like:
print(bracket_hash("H2O"))
print(bracket_hash("CO2"))
print(bracket_hash("Mg2O4"))
print(bracket_hash("OH"))
Now I thought that somehow stacks must be used to handle the case of multiple brackets like OH(Ag3(OH)2)4, here Ag's count has to be 3*4 and O and H's count will be 2*4 + 1.
So far I started with something like this:
def formula_hash(formula):
stack = []
final_hash = {}
cur = ""
i = 0
while i < len(formula):
if formula[i] == '(':
j = i
while formula[j]!=')':
j = j + 1
cur = formula[i:j]
stack.append(bracket_hash(cur))
cur = ""
i = j + 1
but now I am stuck.
I kind of get stuck as coding problems get longer and involved a mix of data structures to solve. Here they use Hash table and stack.
So my question is: how to break down this problem into manageable parts and solve it. If I am really solving this problem I have to map it to manageable code segments. Any help would be greatly appreciated.
Thanks.

I think you can use recursivity to solve this problem. Here is how your function should work:
Do like you do in the first code, until you encounter an opening parenthesis.
When you encounter an opening parenthesis, find the corresponding closing parenthesis. This can be done with a counter: initialize it to 1, then when you encounter a new opening parenthesis, you increment the counter, and when you encounter a closing parenthesis you decrement it. When the counter equals 0, you have found the matching closing parenthesis.
Cut the string between parentheses and call the same function with this string (here's the recursive aspect).
Add the values in the returned dictionary to the current dictionary, multiplied by the number which follows the parenthesis.
If you have problems implementing some parts of this solution, tell me and I will give more details.
EDIT: about the stack approach
The stack approach just simulates recursivity. Instead of calling the function again and having local counter, it has a stack of counters. When an opening parenthesis is opened, it counts in this context, and when it's closed it merges it with the context which contains it, with corresponding multiplicity.
I prefer by far the recursive approach, which is more natural.

You may want to Google for python parser generator. A parser generator is a library that helps developers create parsers for any kind of formula or language (technically, any "grammar") without doing all the work from scratch.
You may have to do some reading to understand what type of grammar a chemical formula adheres to.
An interesting overview for Python is this.

Trouble with top down recursive algorithm

I am trying to make word chains, but cant get around recursive searching.
I want to return a list of the words reuired to get to the target word
get_words_quicker returns a list of words that can be made by just changing one letter.
def dig(InWord, OutWord, Depth):
if Depth == 0:
return False
else:
d = Depth - 1;
wordC = 0;
wordS = [];
for q in get_words_quicker(InWord):
wordC+=1
if(OutWord == q):
return q
wordS.append(q)
for i in range(0,wordC):
return dig(wordS[i],OutWord,d)
Any help/questions would be much appreciated.

ANALYSIS
There is nowhere in your code that you form a list to return. The one place where you make a list is appending to wordS, but you never return this list, and your recursive call passes only one element (a single word) from that list.
As jasonharper already pointed out, your final loop can iterate once and return whatever the recursion gives it, or it can fall off the end and return None (rather than "nothing").
You have two other returns in the code: one returns False, and the other will return q, but only if q has the same value as OutWord.
Since there is no code where you use the result or alter the return value in any way, the only possibilities for your code's return value are None, False, and OutWord.
REPAIR
I'm afraid that I'm not sure how to fix this routine your way, since you haven't really described how you intended this code to carry out the high-level tasks you describe. The abbreviated variable names hide their purposes. wordC is a counter whose only function is to hold the length of the list returned from get_words_quicker -- which could be done much more easily.
If you can clean up the code, improve the data flow and/or documentation to something that shows only one or two disruptions in logic, perhaps we can fix the remaining problems. As it stands, I hesitate to try -- you'd have my solution, not yours.

print pattern recursion

I need to write a recursive function printPattern() that takes an integer n as a parameter and prints n star marks followed by n exclamation marks, all on one line. The function should not have any loops and should not use multiplication of strings. The printing of the characters should be done recursively only. The following are some examples of the behavior of the function:
>>>printPattern(3)
***!!!
>>>printPattern(10)
**********!!!!!!!!!!
This is what I have at the moment
def printPattern(n):
if n < 1:
pass
else:
return '*'*printPattern(n)+'!'*printPattern(n)
I know I am completely off, and this would be easier without recursion, but it is necessary for my assignment.

Q: What's printPattern(0)?
A: Nothing.
Q: What's printPattern(n), for n>=1?
A: *, then printPattern(n-1), then !.
Now you should be able to do it. Just remember to think recursively.

Recursion is based on two things:
a base case
a way to get an answer based off something closer to the base case, given something that's not the base case.
In your case, the simplest base case is probably 0 - which would print thing (the empty string). So printPattern(0) is ''.
So how do you get closer to 0 from your input? Well, probably by reducing it by 1.
So let's say that you are currently at n=5 and want to base your answer off something closer to the base case - you'd want to get the answer for n=5 from the one for n=4.
The output for n=5 is *****!!!!!.
The output for n=4 is ****!!!!.
How do you get from the output of n=4 to n=5? Well, you add a * on the front and a ! on the end.
So you could say that printPattern(5) is actually just '*' + printPattern(4) + '!'.
See where this is going?

Try this:
def printPattern(n):
if n <= 0:
return ''
return '*' + printPattern(n-1) + '!'
print printPattern(5)
> *****!!!!!

Failing to understand recursion

New to Python and trying to understand recursion. I'm trying to make a program that prints out the number of times string 'key' is found in string 'target' using a recursive function, as in Problem 1 of the MIT intro course problem set. I'm having a problem trying to figure out how the function will run. I've read the documentation and some tutorials on it, but does anyone have any tips on how to better comprehend recursion to help me fix this code?
from string import *
def countR(target,key):
numb = 0
if target.find(key) == -1:
print numb
else:
numb +=1
return countR(target[find(target,key):],key)
countR('ajdkhkfjsfkajslfajlfjsaiflaskfal','a')

By recursion you want to split the problem into smaller sub-problems that you can solve independently and then combine their solution together to get the final solution.
In your case you can split the task in two parts: Checking where (if) first occurence of key exists and then counting recursively for the rest.
Is there a key in there:
- No: Return 0.
- Yes: Remove key and say that the number of keys is 1 + number of key in the rest
In Code:
def countR(target,key):
if target.find(key) == -1:
return 0
else:
return 1+ countR(target[target.find(key)+len(key):],key)
Edit:
The following code then prints the desired result:
print(countR('ajdkhkfjsfkajslfajlfjsaiflaskfal','a'))

This is not how recursion works. numb is useless - every time you enter the recursion, numb is created again as 0, so it can only be 0 or 1 - never the actual result you seek.
Recursion works by finding the answer the a smaller problem, and using it to solve the big problem. In this case, you need to find the number of appearances of the key in a string that does not contain the first appearance, and add 1 to it.
Also, you need to actually advance the slice so the string you just found won't appear again.
from string import *
def countR(target,key):
if target.find(key) == -1:
return 0
else:
return 1+countR(target[target.find(key)+len(key):],key)
print(countR('ajdkhkfjsfkajslfajlfjsaiflaskfal','a'))

Most recursive functions that I've seen make a point of returning an interesting value upon which higher frames build. Your function doesn't do that, which is probably why it's confusing you. Here's a recursive function that gives you the factorial of an integer:
def factorial(n):
"""return the factorial of any positive integer n"""
if n > 1:
return n * factorial(n - 1)
else:
return 1 # Cheating a little bit by ignoring illegal values of n
The above function demonstrates what I'd call the "normal" kind of recursion – the value returned by inner frames is operated upon by outer frames.
Your function is a little unusual in that it:
Doesn't always return a value.
Outer frames don't do anything with the returned value of inner frames.
Let's see if we can refactor it to follow a more conventional recursion pattern. (Written as spoiler syntax so you can see if you can get it on your own, first):
def countR(target,key):
idx = target.find(key)`
if idx > -1:
return 1 + countR(target[idx + 1:], key)
else:
return 0
Here, countR adds 1 each time it finds a target, and then recurs upon the remainder of the string. If it doesn't find a match it still returns a value, but it does two critical things:
When added to outer frames, doesn't change the value.
Doesn't recur any further.
(OK, so the critical things are things it doesn't do. You get the picture.)
Meta/Edit: Despite this meta article it's apparently not possible to actually properly format code in spoiler text. So I'll leave it unformatted until that feature is fixed, or forever, whichever comes first.

If key is not found in target, print numb, else create a new string that starts after the the found occurrence (so cut away the beginning) and continue the search from there.

Remove all nested blocks, whilst leaving non-nested blocks alone via python

Source:
[This] is some text with [some [blocks that are nested [in a [variety] of ways]]]
Resultant text:
[This] is some text with
I don't think you can do a regex for this, from looking at the threads at stack overflow.
Is there a simple way to to do this -> or must one reach for pyparsing (or other parsing library)?

Here's an easy way that doesn't require any dependencies: scan the text and keep a counter for the braces that you pass over. Increment the counter each time you see a "["; decrement it each time you see a "]".
As long as the counter is at zero or one, put the text you see onto the output string.
Otherwise, you are in a nested block, so don't put the text onto the output string.
If the counter doesn't finish at zero, the string is malformed; you have an unequal number of opening and closing braces. (If it's greater than zero, you have that many excess [s; if it's less than zero you have that many excess ]s.)

Taking the OP's example as normative (any block including further nested blocks must be removed), what about...:
import itertools
x = '''[This] is some text with [some [blocks that are nested [in a [variety]
of ways]]] and some [which are not], and [any [with nesting] must go] away.'''
def nonest(txt):
pieces = []
d = 0
level = []
for c in txt:
if c == '[': d += 1
level.append(d)
if c == ']': d -= 1
for k, g in itertools.groupby(zip(txt, level), lambda x: x[1]>0):
block = list(g)
if max(d for c, d in block) > 1: continue
pieces.append(''.join(c for c, d in block))
print ''.join(pieces)
nonest(x)
This emits
[This] is some text with and some [which are not], and away.
which under the normatime hypothesis would seem to be the desired result.
The idea is to compute, in level, a parallel list of counts "how nested are we at this point" (i.e., how many opened and not yet closed brackets have we met so far); then segment the zip of level with the text, with groupby, into alternate blocks with zero nesting and nesting > 0. For each block, the maximum nesting herein is then computed (will stay at zero for blocks with zero nesting - more generally, it's just the maximum of the nesting levels throughout the block), and if the resulting nesting is <= 1, the corresponding block of text is preserved. Note that we need to make the group g into a list block as we want to perform two iteration passes (one to get the max nesting, one to rejoin the characters into a block of text) -- to do it in a single pass we'd need to keep some auxiliary state in the nested loop, which is a bit less convenient in this case.

You will be better off writing a parser, especially if you use a parser generator like pyparsing. It will be more maintainable and extendable.
In fact pyparsing already implements the parser for you, you just need to write the function that filters the parser output.

I took a couple of passes at writing a single parser expression that could be used with expression.transformString(), but I had difficulty distinguish between nested and unnested []'s at parse time. In the end I had to open up the loop in transformString and iterate over the scanString generator explicitly.
To address the question of whether [some] should be included or not based on the original question, I explored this by adding more "unnested" text at the end, using this string:
src = """[This] is some text with [some [blocks that are
nested [in a [variety] of ways]] in various places]"""
My first parser follows the original question's lead, and rejects any bracketed expression that has any nesting. My second pass takes the top level tokens of any bracketed expression, and returns them in brackets - I didn't like this solution so well, as we lose the information that "some" and "in various places" are not contiguous. So I took one last pass, and had to make a slight change to the default behavior of nestedExpr. See the code below:
from pyparsing import nestedExpr, ParseResults, CharsNotIn
# 1. scan the source string for nested [] exprs, and take only those that
# do not themselves contain [] exprs
out = []
last = 0
for tokens,start,end in nestedExpr("[","]").scanString(src):
out.append(src[last:start])
if not any(isinstance(tok,ParseResults) for tok in tokens[0]):
out.append(src[start:end])
last = end
out.append(src[last:])
print "".join(out)
# 2. scan the source string for nested [] exprs, and take only the toplevel
# tokens from each
out = []
last = 0
for t,s,e in nestedExpr("[","]").scanString(src):
out.append(src[last:s])
topLevel = [tok for tok in t[0] if not isinstance(tok,ParseResults)]
out.append('['+" ".join(topLevel)+']')
last = e
out.append(src[last:])
print "".join(out)
# 3. scan the source string for nested [] exprs, and take only the toplevel
# tokens from each, keeping each group separate
out = []
last = 0
for t,s,e in nestedExpr("[","]", CharsNotIn('[]')).scanString(src):
out.append(src[last:s])
for tok in t[0]:
if isinstance(tok,ParseResults): continue
out.append('['+tok.strip()+']')
last = e
out.append(src[last:])
print "".join(out)
Giving:
[This] is some text with
[This] is some text with [some in various places]
[This] is some text with [some][in various places]
I hope one of these comes close to the OP's question. But if nothing else, I got to explore nestedExpr's behavior a little further.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting the expressions inside a mathematical equation - python

Related

Parsing a Chemistry Formula in Python

Trouble with top down recursive algorithm

print pattern recursion

Failing to understand recursion

Remove all nested blocks, whilst leaving non-nested blocks alone via python

Categories

Resources