Parsing a Chemistry Formula in Python

Parsing a Chemistry Formula in Python - python

I am trying to solve this problem: https://leetcode.com/articles/number-of-atoms/#approach-1-recursion-accepted.
The question is: given a formula like C(Mg2(OH)4)2, return a hash table with elements and their counts. Element names always start with a capital letter and may be followed by a small letter.
I thought that I will first start by solving the simplest case: no brackets.
def bracket_hash(formula):
element = ""
atom_count = 0
element_hash = {}
for x in formula:
if x.isupper():
if element!="":
element_hash[element] = 1
element = ""
element = x
elif x.islower():
element += x
else:
element_count = int(x)
element_hash[element] = element_count
element_count = 0
element = ""
if element!="":
element_hash[element] = 1
return element_hash
This code works perfectly fine for cases like:
print(bracket_hash("H2O"))
print(bracket_hash("CO2"))
print(bracket_hash("Mg2O4"))
print(bracket_hash("OH"))
Now I thought that somehow stacks must be used to handle the case of multiple brackets like OH(Ag3(OH)2)4, here Ag's count has to be 3*4 and O and H's count will be 2*4 + 1.
So far I started with something like this:
def formula_hash(formula):
stack = []
final_hash = {}
cur = ""
i = 0
while i < len(formula):
if formula[i] == '(':
j = i
while formula[j]!=')':
j = j + 1
cur = formula[i:j]
stack.append(bracket_hash(cur))
cur = ""
i = j + 1
but now I am stuck.
I kind of get stuck as coding problems get longer and involved a mix of data structures to solve. Here they use Hash table and stack.
So my question is: how to break down this problem into manageable parts and solve it. If I am really solving this problem I have to map it to manageable code segments. Any help would be greatly appreciated.
Thanks.

I think you can use recursivity to solve this problem. Here is how your function should work:
Do like you do in the first code, until you encounter an opening parenthesis.
When you encounter an opening parenthesis, find the corresponding closing parenthesis. This can be done with a counter: initialize it to 1, then when you encounter a new opening parenthesis, you increment the counter, and when you encounter a closing parenthesis you decrement it. When the counter equals 0, you have found the matching closing parenthesis.
Cut the string between parentheses and call the same function with this string (here's the recursive aspect).
Add the values in the returned dictionary to the current dictionary, multiplied by the number which follows the parenthesis.
If you have problems implementing some parts of this solution, tell me and I will give more details.
EDIT: about the stack approach
The stack approach just simulates recursivity. Instead of calling the function again and having local counter, it has a stack of counters. When an opening parenthesis is opened, it counts in this context, and when it's closed it merges it with the context which contains it, with corresponding multiplicity.
I prefer by far the recursive approach, which is more natural.

You may want to Google for python parser generator. A parser generator is a library that helps developers create parsers for any kind of formula or language (technically, any "grammar") without doing all the work from scratch.
You may have to do some reading to understand what type of grammar a chemical formula adheres to.
An interesting overview for Python is this.

Related

Getting the expressions inside a mathematical equation

I have a equation I get in a file something like (((2+1)*(4+5))/2). What I am looking for is the distinct mathematical expressions inside it.
In this case:
2+1
4+5
(2+1)*(4+5) and finally ((2+1)*(4+5))/2.
I started by lookins at these How can I split a string of a mathematical expressions in python?
But not able to arrive at a solution..
Can you please help.

You can make a bare-bones parser by iterating through the string and each time you find an opening parenthesis, push the index to a stack. When you find a closing parenthesis, pop off the last thing the in the stack and take a slice from that to where you are now:
stack = []
s = "(((2+1)*(4+5))/2)"
for i, c in enumerate(s):
if c == "(":
stack.append(i+1)
if c == ")":
f = stack.pop()
print(s[f:i])
Result
2+1
4+5
(2+1)*(4+5)
((2+1)*(4+5))/2
If pop() doesn't work or you have something left in the stack when you're done, you don't have balanced parenthesis — this can be fleshed out to do error checking.

Building IP addresses from string

Write a program that determines where to add periods to a decimal string so that the resulting string is a valid IP address. There may be more than one valid IP address corresponding to a string, in which case you should print all possibilities. For example, "19216811", two of the nine possible IP addresses include 192.169.1.1 and 19.216.81.1.
Below is my (incomplete) solution:
def valid_ips(string):
def is_valid_part(part):
return len(part) == 1 or (part[0] != 0 and int(part) <= 255)
def build_valid_ips(substring):
result = []
for i in range(1, min(4, len(substring))):
part = substring[:i]
if is_valid_part(part):
for sub in build_valid_ips(substring[i:]):
result.append(part + '.' + sub)
return result
return build_valid_ips(string)
This is a variant problem in the book I'm working out of, so I don't have a solution to look at. However, I have a couple of questions
This solution is incorrect, as it always returns an empty list but I'm not sure why. Seems like I'm handling the inductive step and base case just fine. Could someone point me in the right direction?
How can I do this better? I understand each recursive call generates a new list and multiple new strings which adds a ton of overhead, but how to avoid this?

Your function always returns an empty list because you never append anything to result in the bottom layer of recursion.
In build_valid_ips you only append to result when looping through values obtained from a recursive call to build_valid_ips, but that would only return values obtained by looping through further recursive calls to build_valid_ips. Somewhere the recursion has to stop, but at this level, nothing gets appended. As a result there's nothing to pass back up the recursion.
Try adding the lines
if is_valid_part(substring):
result.append(substring)
in build_valid_ips, just after the line result = []. You should then find that your code then returns a non-empty list.
However, the result is still not correct. Nowhere in your code do you enforce that there must be four parts to an IP address, so the code will generate incorrect output such as 1.9.2.1.6.8.1.1. I'll leave it up to you to modify your code to fix this.
As for how to improve the code, that's more a question for Code Review. For a small example such as yours, which will never run for very long, I wouldn't be too worried about generating too many lists and strings. Worry about these things only when the performance of your code becomes a problem.

Given that an IP has to contain four different parts, you can use recursion to generate a list of possibilities with groupings:
s = "19216811"
def ips(d, current = []):
if not d:
yield current
else:
for i in range(1, len(s)):
yield from ips(d[i:], current + [d[:i]])
final_ips = list(filter(lambda x:all(len(i) > 1 for i in x[:2]), [i for i in ips(s) if len(list(filter(None, i))) == 4]))
new_ips = ['.'.join(a) for i, a in enumerate(final_ips) if a not in final_ips[:i]]
Output:
['19.21.6.811', '19.21.68.11', '19.21.681.1', '19.216.8.11', '19.216.81.1', '19.2168.1.1', '192.16.8.11', '192.16.81.1', '192.168.1.1', '1921.68.1.1']

Trouble with top down recursive algorithm

I am trying to make word chains, but cant get around recursive searching.
I want to return a list of the words reuired to get to the target word
get_words_quicker returns a list of words that can be made by just changing one letter.
def dig(InWord, OutWord, Depth):
if Depth == 0:
return False
else:
d = Depth - 1;
wordC = 0;
wordS = [];
for q in get_words_quicker(InWord):
wordC+=1
if(OutWord == q):
return q
wordS.append(q)
for i in range(0,wordC):
return dig(wordS[i],OutWord,d)
Any help/questions would be much appreciated.

ANALYSIS
There is nowhere in your code that you form a list to return. The one place where you make a list is appending to wordS, but you never return this list, and your recursive call passes only one element (a single word) from that list.
As jasonharper already pointed out, your final loop can iterate once and return whatever the recursion gives it, or it can fall off the end and return None (rather than "nothing").
You have two other returns in the code: one returns False, and the other will return q, but only if q has the same value as OutWord.
Since there is no code where you use the result or alter the return value in any way, the only possibilities for your code's return value are None, False, and OutWord.
REPAIR
I'm afraid that I'm not sure how to fix this routine your way, since you haven't really described how you intended this code to carry out the high-level tasks you describe. The abbreviated variable names hide their purposes. wordC is a counter whose only function is to hold the length of the list returned from get_words_quicker -- which could be done much more easily.
If you can clean up the code, improve the data flow and/or documentation to something that shows only one or two disruptions in logic, perhaps we can fix the remaining problems. As it stands, I hesitate to try -- you'd have my solution, not yours.

Python KeyError: 0 when working with dictionary and functions

I'm new to Python so my question may seem easy to some but then again I'm stuck on my own so I need your help! This is the code that i am having trouble with:
def identify_language(sequence, **common_words):
result = {}
for i in common_words:
result[i] = 0
for i in func_op(sequence.lower()):
for j in common_words:
if i in common_words[j]:
result[j] += 1
return sort(result[0][0])
...
dictionary = {'cro':list_cro, 'eng':list_cro}
language = identify_language('I had a little lamb. It was called Billy.', **dictionary)
I am trying to identify language based on samples which are in list_cro and list_eng (and hopefully others). I am getting KeyError: 0. Additionally, sort and func_op are working fine i tested then separately. What may be the problem?
Also, if i change order of arguments in function (putting list as a first argument and string as second) i am getting syntax error.
Thanks for listening!

At the end of the function, result should look like this: {'cro': X, 'eng': Y}, where X and Y are numbers. I don't know what your dictionaries are, so I can't guess what the numbers are. Evaluating result['eng'] will produce a number, as will result['cro'], but there is no 0 key in this dictionary.
Further, the second indexing operation will also give you issues. result['eng'][0] will give you an error because result['eng'] is a number, and you can't index into a number.
What do you expect the output of this function to look like? Where is sort defined and what is it supposed to do?

Failing to understand recursion

New to Python and trying to understand recursion. I'm trying to make a program that prints out the number of times string 'key' is found in string 'target' using a recursive function, as in Problem 1 of the MIT intro course problem set. I'm having a problem trying to figure out how the function will run. I've read the documentation and some tutorials on it, but does anyone have any tips on how to better comprehend recursion to help me fix this code?
from string import *
def countR(target,key):
numb = 0
if target.find(key) == -1:
print numb
else:
numb +=1
return countR(target[find(target,key):],key)
countR('ajdkhkfjsfkajslfajlfjsaiflaskfal','a')

By recursion you want to split the problem into smaller sub-problems that you can solve independently and then combine their solution together to get the final solution.
In your case you can split the task in two parts: Checking where (if) first occurence of key exists and then counting recursively for the rest.
Is there a key in there:
- No: Return 0.
- Yes: Remove key and say that the number of keys is 1 + number of key in the rest
In Code:
def countR(target,key):
if target.find(key) == -1:
return 0
else:
return 1+ countR(target[target.find(key)+len(key):],key)
Edit:
The following code then prints the desired result:
print(countR('ajdkhkfjsfkajslfajlfjsaiflaskfal','a'))

This is not how recursion works. numb is useless - every time you enter the recursion, numb is created again as 0, so it can only be 0 or 1 - never the actual result you seek.
Recursion works by finding the answer the a smaller problem, and using it to solve the big problem. In this case, you need to find the number of appearances of the key in a string that does not contain the first appearance, and add 1 to it.
Also, you need to actually advance the slice so the string you just found won't appear again.
from string import *
def countR(target,key):
if target.find(key) == -1:
return 0
else:
return 1+countR(target[target.find(key)+len(key):],key)
print(countR('ajdkhkfjsfkajslfajlfjsaiflaskfal','a'))

Most recursive functions that I've seen make a point of returning an interesting value upon which higher frames build. Your function doesn't do that, which is probably why it's confusing you. Here's a recursive function that gives you the factorial of an integer:
def factorial(n):
"""return the factorial of any positive integer n"""
if n > 1:
return n * factorial(n - 1)
else:
return 1 # Cheating a little bit by ignoring illegal values of n
The above function demonstrates what I'd call the "normal" kind of recursion – the value returned by inner frames is operated upon by outer frames.
Your function is a little unusual in that it:
Doesn't always return a value.
Outer frames don't do anything with the returned value of inner frames.
Let's see if we can refactor it to follow a more conventional recursion pattern. (Written as spoiler syntax so you can see if you can get it on your own, first):
def countR(target,key):
idx = target.find(key)`
if idx > -1:
return 1 + countR(target[idx + 1:], key)
else:
return 0
Here, countR adds 1 each time it finds a target, and then recurs upon the remainder of the string. If it doesn't find a match it still returns a value, but it does two critical things:
When added to outer frames, doesn't change the value.
Doesn't recur any further.
(OK, so the critical things are things it doesn't do. You get the picture.)
Meta/Edit: Despite this meta article it's apparently not possible to actually properly format code in spoiler text. So I'll leave it unformatted until that feature is fixed, or forever, whichever comes first.

If key is not found in target, print numb, else create a new string that starts after the the found occurrence (so cut away the beginning) and continue the search from there.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing a Chemistry Formula in Python - python

Related

Getting the expressions inside a mathematical equation

Building IP addresses from string

Trouble with top down recursive algorithm

Python KeyError: 0 when working with dictionary and functions

Failing to understand recursion

Categories

Resources