Regex to parse equation - python

I'm trying to parse an equation such as
5x>=7-5y+4z
into a list of tuples with python:
[('', '5', 'x', '>='), ('', '7', '', ''), ('-', '5', 'y', ''), ('+', '4', 'z', '')]
I've managed to write a pattern (pattern = "[+-]?\d*[a-z]?[><=]*") to break the equation into groups, but I have no idea how to make it return tuples.
Any help appreciated...

I think you want this:
import re
pattern = re.compile(r'([+-]?)([0-9]+)([a-z]?)([><]?=?)')
re.findall(pattern, '5x>=7-5y+4z')
>>> [('', '5', 'x', '>='), ('', '7', '', ''), ('-', '5', 'y', ''), ('+', '4', 'z', '')]
Each instance of the regex passed to re.findall is put into a tuple, which is then further split into strings corresponding to each of the groups in the regex.
I took some liberties with the interpretation of the actual regex, since I'm not sure what the expected output for other cases would be (for example, would there be a 0x term?)

Related

Split a String with multiple delimiter and get the used delimiter

I want to split a String in python using multiple delimiter. In my case I also want the delimiter which was used returned in a list of delimiters.
Example:
string = '1000+20-12+123-165-564'
(Methods which split the string and return lists with numbers and delimiter)
numbers = ['1000', '20', '12', '123', '165', '564']
delimiter = ['+', '-', '+', '-', '-']
I hope my question is understandable.
You might use re.split for this task following way
import re
string = '1000+20-12+123-165-564'
elements = re.split(r'(\d+)',string) # note capturing group
print(elements) # ['', '1000', '+', '20', '-', '12', '+', '123', '-', '165', '-', '564', '']
numbers = elements[1::2] # last 2 is step, get every 2nd element, start at index 1
delimiter = elements[2::2] # again get every 2nd element, start at index 2
print(numbers) # ['1000', '20', '12', '123', '165', '564']
print(delimiter) # ['+', '-', '+', '-', '-', '']
Just capture (...) the delimiter along with matching/splitting with re.split:
import re
s = '1000+20-12+123-165-564'
parts = re.split(r'([+-])', s)
numbers, delims = parts[::2], parts[1::2]
print(numbers, delims)
['1000', '20', '12', '123', '165', '564'] ['+', '-', '+', '-', '-']

How to remove string quotations, commas and parenthesis from Python itertools output?

This nice script generates all 4 character permutations of the given set s, and prints them on new lines.
import itertools
s = ('7', '8', '-')
l = itertools.product(s, repeat=4)
print(*l, sep='\n')
Sample output:
...
('9', '-', '7', '8')
('9', '-', '7', '9')
('9', '-', '8', '7')
('9', '-', '8', '8')
('9', '-', '8', '9')
...
I can't figure out how to remove all single quotes, commas and left/right parenthesis.
Desired output:
...
9-78
9-79
9-87
9-88
9-89
...
Tried adding:
c = []
for i in l:
i = i.replace(",", '')
c.append(i)
print(*c, sep='\n')
Error: AttributeError: 'tuple' object has no attribute 'replace'
Also tried: I can't seem to find where to put the print(' '.join()) logic.
Each time you are printing a value, you can use:
for vals in l:
print("".join([str(v) for v in vals]))
This just joins all the characters, noting that .join requires the values to be strings.
You can also use:
for vals in l:
print(*vals)
... but that has a space between values.

Creating a Tuple with a variable and Boolean Python

I'm supposed to add a variable and a boolean to a new Tuple - the actual assignment is below, with my code. I know tuples are immutable - this is the first time I've tried to make one. Additionally, I can't find anything about inserting the variable and a boolean. Thanks in advance!
My code is just created a new list. This is the desired result:
[('h', False), ('1', True), ('C', False), ('i', False), ('9', True), ('True', False), ('3.1', False), ('8', True), ('F', False), ('4', True), ('j', False)]
Assignment:
The string module provides sequences of various types of Python
characters. It has an attribute called digits that produces the string
‘0123456789’. Import the module and assign this string to the variable
nums. Below, we have provided a list of characters called chars. Using
nums and chars, produce a list called is_num that consists of tuples.
The first element of each tuple should be the character from chars,
and the second element should be a Boolean that reflects whether or
not it is a Python digit.
import string
nums = string.digits
chars = ['h', '1', 'C', 'i', '9', 'True', '3.1', '8', 'F', '4', 'j']
is_num = []
for item in chars:
if item in string.digits:
is_num.insert(item, bool)
elif item not in string.digits:
is_num.insert(item, bool)
You can use a list comprehension for this, which is like a more concise for loop that creates a new list
>>> from string import digits
>>> chars = ['h', '1', 'C', 'i', '9', 'True', '3.1', '8', 'F', '4', 'j']
>>> is_num = [(i, i in digits) for i in chars]
>>> is_num
[('h', False), ('1', True), ('C', False), ('i', False), ('9', True), ('True', False), ('3.1', False), ('8', True), ('F', False), ('4', True), ('j', False)]
This would be equivalent to the follow loop
is_num = []
for i in chars:
is_num.append((i, i in digits))
>>> is_num
[('h', False), ('1', True), ('C', False), ('i', False), ('9', True), ('True', False), ('3.1', False), ('8', True), ('F', False), ('4', True), ('j', False)]
Note that the containment check is being done using in against string.digits
>>> digits
'0123456789'
>>> '7' in digits
True
>>> 'b' in digits
False
the easiest way is that you should have casted the nums in to a list. i mean num =list(num) before parsing it in
import string
chars = ['h', '1', 'C', 'i', '9', 'True', '3.1', '8', 'F', '4', 'j']
nums = string.digits
nums = list(nums)
is_num = []
for char in chars:
if char in nums:
is_num.append((char, True))
else:
is_num.append((char, False))
print(is_num)

Parse equation to list of tuples in Python

I want to parse equations and get a list of tuples.
For example, when I enter
2x = 4+3y,
I want to get
[('', '2', 'x', '='), ('','4','',''), ('+','3','y','')]
This is my regex so far:
([+-]*)([0-9]+)([a-z]*)([<=>]*)
It works fine for the above query but it does not capture equations like
2 = x +3y, (where x does not have any coefficient)
How do I capture that?
(\d*)(\w*) *(=) *(\d*)(\w*) *[+|\-|*|/] *(\d*)(\w*)
How about this regex?
It separates all operands and operators. (and inside operands it also splits number and variable).
For testing the regex I normally use https://regex101.com/ so you can build regex with live changes there.
If you change the quantifier on the coefficient from + (one or more) to * (zero or more) then you should get the result you are after. You will also get an empty string match due to all the quantifiers now being * but you can filter out that match.
>>> import re
>>> e1 = "2x=4+3y"
>>> e2 = "2=x+3y"
>>> re.findall("([+-]*)([0-9]*)([a-z]*)([<=>]*)", e1)
[('', '2', 'x', '='), ('', '4', '', ''), ('+', '3', 'y', ''), ('', '', '', '')]
>>> re.findall("([+-]*)([0-9]*)([a-z]*)([<=>]*)", e2)
[('', '2', '', '='), ('', '', 'x', ''), ('+', '3', 'y', ''), ('', '', '', '')]
Note: whilst this solves your direct question this is not a good approach to parsing infix equations.

How can I split a string into tokens?

If I have a string
'x+13.5*10x-4e1'
how can I split it into the following list of tokens?
['x', '+', '13', '.', '5', '*', '10', 'x', '-', '4', 'e', '1']
Currently I'm using the shlex module:
str = 'x+13.5*10x-4e1'
lexer = shlex.shlex(str)
tokenList = []
for token in lexer:
tokenList.append(str(token))
return tokenList
But this returns:
['x', '+', '13', '.', '5', '*', '10x', '-', '4e1']
So I'm trying to split the letters from the numbers. I'm considering taking the strings that contain both letters and numbers then somehow splitting them, but not sure about how to do this or how to add them all back into the list with the others afterwards. It's important that the tokens stay in order, and I can't have nested lists.
In an ideal world, e and E would not be recognised as letters in the same way, so
'-4e1'
would become
['-', '4e1']
but
'-4x1'
would become
['-', '4', 'x', '1']
Can anybody help?
Use the regular expression module's split() function, to split at
'\d+' -- digits (number characters) and
'\W+' -- non-word characters:
CODE:
import re
print([i for i in re.split(r'(\d+|\W+)', 'x+13.5*10x-4e1') if i])
OUTPUT:
['x', '+', '13', '.', '5', '*', '10', 'x', '-', '4', 'e', '1']
If you don't want to separate the dot (as a floating-point number in the expression) then you should use this:
[\d.]+ -- digit or dot characters (although this allows you to write: 13.5.5
CODE:
print([i for i in re.split(r'([\d.]+|\W+)', 'x+13.5*10x-4e1') if i])
OUTPUT:
['x', '+', '13.5', '*', '10', 'x', '-', '4', 'e', '1']
Another alternative not suggested here, is to using nltk.tokenize module
Well, the problem seems not to be quite simple. I think, a good way to get robust (but, unfortunately, not so short) solution is to use Python Lex-Yacc for creating a full-weight tokenizer. Lex-Yacc is a common (not only Python) practice for this, thus there can exist ready grammars for creating a simple arithmetic tokenizer (like this one), and you have just to fit them to your specific needs.

Categories