How to do math operations with string? - python

If a have the string calculation = '1+1x8'. How can I convert this into calculation = 1+1*8? I tried doing something like
for char in calculation:
if char == 'x':
calculation = calculation.replace('x', *)
# and
if char == '1':
calculation = calculation.replace('1', 1)
This clearly doesn't work, since you can't replace just one character with an integer. The entire string needs to be an integer, and if I do that it doesn't work either since I can't convert 'x' and '+' to integers

Let's use a more complicated string as an example: 1+12x8. What follows is a rough outline; you need to supply the implementation for each step.
First, you tokenize it, turning 1+12x8 into ['1', '+', '12', 'x', '8']. For this step you need to write a tokenizer or a lexical analyzer. This is the step where you define your operators and literals.
Next, you convert the token stream into a parse tree. Perhaps you represent the tree as an S-expression ['+', '1', ['x', '12', '8']] or [operator.add, 1, [operator.mul, 12, 8]]. This step requires writing a parser, which requires you to define things like the precedence of your operators.
Finally, you write an evaluator that can reduce your parse tree to a single value. Doing this in two steps might yield
[operator.add, 1, [operator.mul, 12, 8]] to [operator.add, 1, 96]
[operator.add, 1, 96] to 97

You could write something like:
def parse_exp(s):
return eval(s.replace('x','*'))
and expand for whatever other exotic symbols you want to use.
To limit the risks of eval you can also eliminate bad characters:
import string
good = string.digits + '()/*+-x'
def parse_exp(s):
s2 = ''.join([i for i in s if i in good])
return eval(s2.replace('x','*'))
Edit: additional bonus is that the in-built eval function will take care of things like parenthesis and general calculation rules :)
Edit 2: As another user pointed out, evalcan be dangerous. As such, only use it if your code will ever only run locally

Adding code to what chepner suggested:
Tokenize '1+12x8' -> ['1', '+', '12', 'x', '8'].
Use order of operation '/*+-' -> reduce calculation 1 + (12*8)
Return the answer
import re
import operator
operators = {
'/': operator.truediv,
'x':operator.mul,
'+':operator.add,
'-':operator.sub,
}
def op(operators, data):
# apply operating to all occurrences
for p in operators:
while p in data:
x = data.index(p)
replacer = operators.get(p)(int(data[x-1]) , int(data[x+1]))
data[x-1] = replacer
del data[x:x+2]
return data[0]
def func(data):
# Tokenize
d = [i for i in re.split('(\d+)', data) if i ]
# Use order of operations
d = op(operators, d)
return d
s1 = "1+1x8"
s2 = '2-4/2+5'
s = func(s1) # 9
print(s)
t = func(s2) #-5
print(t)

Related

avoiding nested for loops python

I have a function which takes in expressions and replaces the variables with all the permutations of the values that I am using as inputs. This is my code that I have tested and works, however after looking through SO, people have said that nested for loops are a bad idea however I am unsure as to how to make this more efficient. Could somebody help? Thanks.
def replaceVar(expression):
eval_list = list()
a = [1, 8, 12, 13]
b = [1, 2, 3, 4]
c = [5, 9, 2, 7]
for i in expression:
first_eval = [i.replace("a", str(j)) for j in a]
tmp = list()
for k in first_eval:
snd_eval = [k.replace("b", str(l)) for l in b]
tmp2 = list()
for m in snd_eval:
trd_eval = [m.replace("c", str(n)) for n in c]
tmp2.append(trd_eval)
tmp.append(tmp2)
eval_list.append(tmp)
print(eval_list)
return eval_list
print(replaceVar(['b-16+(c-(a+11))', 'a-(c-5)+a-b-10']))
Foreword
Nested loops are not a bad thing per se. They are only bad, if there are used for problems, for which better algorithm have been found (better and bad in terms of efficiency regarding the input size). Sorting of a list of integers for example is such a problem.
Analyzing the Problem
The size
In your case above you have three lists, all of size 4. This makes 4 * 4 * 4 = 64 possible combinations of them, if a comes always before b and b before c. So you need at least 64 iterations!
Your approach
In your approach we have 4 iterations for each possible value of a, 4 iterations for each possible value of b and the same for c. So we have 4 * 4 * 4 = 64 iterations in total. So in fact your solution is quite good!
As there is no faster way of listening all combinations, your way is also the best one.
The style
Regarding the style one can say that you can improve your code by better variable names and combining some of the for loops. E.g. like that:
def replaceVar(expressions):
"""
Takes a list of expressions and returns a list of expressions with
evaluated variables.
"""
evaluatedExpressions = list()
valuesOfA = [1, 8, 12, 13]
valuesOfB = [1, 2, 3, 4]
valuesOfC = [5, 9, 2, 7]
for expression in expressions:
for valueOfA in valuesOfA:
for valueOfB in valuesOfB:
for valueOfC in valuesOfC:
newExpression = expression.\
replace('a', str(valueOfA)).\
replace('b', str(valueOfB)).\
replace('c', str(valueOfC))
evaluatedExpressions.append(newExpression)
print(evaluatedExpressions)
return evaluatedExpressions
print(replaceVar(['b-16+(c-(a+11))', 'a-(c-5)+a-b-10']))
Notice however that the amount of iterations remain the same!
Itertools
As Kevin noticed, you could also use itertools to generate the cartesian product. Internally it will do the same as what you did with the combined for loops:
import itertools
def replaceVar(expressions):
"""
Takes a list of expressions and returns a list of expressions with
evaluated variables.
"""
evaluatedExpressions = list()
valuesOfA = [1, 8, 12, 13]
valuesOfB = [1, 2, 3, 4]
valuesOfC = [5, 9, 2, 7]
for expression in expressions:
for values in itertools.product(valuesOfA, valuesOfB, valuesOfC):
valueOfA = values[0]
valueOfB = values[1]
valueOfC = values[2]
newExpression = expression.\
replace('a', str(valueOfA)).\
replace('b', str(valueOfB)).\
replace('c', str(valueOfC))
evaluatedExpressions.append(newExpression)
print(evaluatedExpressions)
return evaluatedExpressions
print(replaceVar(['b-16+(c-(a+11))', 'a-(c-5)+a-b-10']))
here are some ideas:
as yours list a, b and c are hardcoded, harcode them as strings, therefore you don't have to cast every element to string at each step
use list comprehension, they are a little more faster than a normal for-loop with append
instead of .replace, use .format, it does all the replace for you in a single step
use itertools.product to combine a, b and c
with all that, I arrive to this
import itertools
def replaceVar(expression):
a = ['1', '8', '12', '13' ]
b = ['1', '2', '3', '4' ]
c = ['5', '9', '2', '7' ]
expression = [exp.replace('a','{0}').replace('b','{1}').replace('c','{2}')
for exp in expression] #prepare the expresion so they can be used with format
return [ exp.format(*arg) for exp in expression for arg in itertools.product(a,b,c) ]
the speed gain is marginal, but is something, in my machine it goes from 148 milliseconds to 125
Functionality is the same to the version of R.Q.
"The problem" with nested loops is basically just that the number of levels is hard coded. You wrote nesting for 3 variables. What if you only have 2? What if it jumps to 5? Then you need non-trivial surgery on the code. That's why itertools.product() is recommended.
Relatedly, all suggestions so far hard-code the number of replace() calls. Same "problem": if you don't have exactly 3 variables, the replacement code has to be modified.
Instead of doing that, think about a cleaner way to do the replacements. For example, suppose your input string were:
s = '{b}-16+({c}-({a}+11))'
instead of:
'b-16+(c-(a+11))'
That is, the variables to be replaced are enclosed in curly braces. Then Python can do all the substitutions "at once" for you:
>>> s.format(a=333, b=444, c=555)
'444-16+(555-(333+11))'
That hard-codes the names and number of names too, but the same thing can be accomplished with a dict:
>>> d = dict(zip(["a", "b", "c"], (333, 444, 555)))
>>> s.format(**d)
'444-16+(555-(333+11))'
Now nothing about the number of variables, or their names, is hard-coded in the format() call.
The tuple of values ((333, 444, 555)) is exactly the kind of thing itertools.product() returns. The list of variable names (["a", "b", "c"]) can be created just once at the top, or even passed in to the function.
You just need a bit of code to transform your input expressions to enclose the variable names in curly braces.
So, your current structure addresses one of the inefficiencies that the solutions with itertools.product will not address. Your code is saving the intermediately substituted expressions and reusing them, rather than redoing these substitutions with each itertools.product tuple. This is good and I think your current code is efficient.
However, it is brittle and only works when substituting in exactly three variables. A dynamic programming approach can solve this issue. To do so, I'm going to slightly alter the input parameters. The function will use two inputs:
expressions - The expressions to be substituted into
replacement_map - A dictionary which provides the values to substitute for each variable
The dynamic programming function is given below:
def replace_variable(expressions, replacement_map):
return [list(_replace_variable([e], replacement_map)) for e in expressions]
def _replace_variable(expressions, replacement_map):
if not replacement_map:
for e in expressions:
yield e
else:
map_copy = replacement_map.copy()
key, value_list = map_copy.popitem()
for value in value_list:
substituted = [e.replace(key, value) for e in expressions]
for e in _replace_variable(substituted, map_copy):
yield e
With the example usage:
expressions = ['a+b', 'a-b']
replacement_map = {
'a': ['1', '2'],
'b': ['3', '4']
}
print replace_variable(expressions, replacement_map)
# [['1+3', '1+4', '2+3', '2+4'], ['1-3', '1-4', '2-3', '2-4']]
Note that if you're using Python 3.X, you can use the yield from iterator construct instead of reiterating over e twice in _replace_variables. This function would look like:
def _replace_variable(expressions, replacement_map):
if not replacement_map:
yield from expressions
else:
map_copy = replacement_map.copy()
key, value_list = map_copy.popitem()
for value in value_list:
substituted = [e.replace(key, value) for e in expressions]
yield from _replace_variable(substituted, map_copy)

Changing all values of string to a certain value

input = ["AB0","A","BBBB"]
output = ["000","0","0000"]
Is there a function like .replace("", "") which could take in any input and give a string of zeros with the same number of characters?
There is no such built-in function, but you can easily write a list comprehension for that:
>>> input = ["AB0","A","BBBB"]
>>>
>>> ["0" * len(item) for item in input]
['000', '0', '0000']
Another way to do this (mostly for fun):
>>> input = ["AB0", "A", "BBBB"]
>>> zeros = ''.zfill
>>> [zeros(len(s)) for s in input]
['000', '0', '0000']
Note that this only works for filling with 0. If you want to fill with different characters then this method won't work.
You could use ljsut or rjust to fill with different characters...
>>> input = ["AB0", "A", "BBBB"]
>>> pad = ''.ljust
>>> [pad(len(s), '1') for s in input]
['111', '1', '1111']
However, most of these are really just clever ways to do it. They aren't faster:
>>> timeit.timeit("[pad(len(s), '1') for s in input]", 'from __main__ import pad, input')
1.3355789184570312
>>> timeit.timeit("['1' * len(s) for s in input]", 'from __main__ import pad, input')
0.8812301158905029
>>> zeros = ''.zfill
>>> timeit.timeit("[zeros(len(s)) for s in input]", 'from __main__ import zeros, input')
1.110482931137085
though, depending on your particular preferences/background, you might find one way clearer to understand than another (and that's worth something)...
FWIW, my first instinct is to use the multiplication method as proposed in
Selcuk's answer so that's probably what I find most easy to read and understandable...
This will work:
input = ["AB0","A","BBBB"]
output = ["0"*len(x) for x in input]
or the same:
input = ["AB0","A","BBBB"]
output = []
for x in input:
output.append("0"*len(x))
You can use python 're' module, like following:
import re
input = ["AB0","A","BBBB"]
output = []
for value in input:
str = re.sub(".","0",value)
output.append(str)
print output
map(lambda x:"0"*len(x),["AB0","A","BBB"])

Replace multiple elements in string with str methods

I am trying to write a function that takes a string of DNA and returns the compliment. I have been trying to solve this for a while now and looked through the Python documentation but couldn't work it out. I have written the docstring for the function so you can see what the answer should look like. I have seen a similar question asked on this forum but I could not understand the answers. I would be grateful if someone can explain this using only str formatting and loops / if statements, as I have not yet studied dictionaries/lists in detail.
I tried str.replace but could not get it to work for multiple elements, tried nested if statements and this didn't work either. I then tried writing 4 separate for loops, but to no avail.
def get_complementary_sequence(dna):
""" (str) -> str
Return the DNA sequence that is complementary
to the given DNA sequence.
>>> get_complementary_sequence('AT')
TA
>>> get_complementary_sequence('GCTTAA')
CGAATT
"""
for char in dna:
if char == A:
dna = dna.replace('A', 'T')
elif char == T:
dna = dna.replace('T', 'A')
# ...and so on
For a problem like this, you can use string.maketrans (str.maketrans in Python 3) combined with str.translate:
import string
table = string.maketrans('CGAT', 'GCTA')
print 'GCTTAA'.translate(table)
# outputs CGAATT
You can map each letter to another letter.
You probably need not create translation table with all possible combination.
>>> M = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
>>> STR = 'CGAATT'
>>> S = "".join([M.get(c,c) for c in STR])
>>> S
'GCTTAA'
How this works:
# this returns a list of char according to your dict M
>>> L = [M.get(c,c) for c in STR]
>>> L
['G', 'C', 'T', 'T', 'A', 'A']
The method join() returns a string in which the string elements of sequence have been joined by str separator.
>>> str = "-"
>>> L = ['a','b','c']
>>> str.join(L)
'a-b-c'

Expanding a logical statement (multiplying out)

I am looking for a way to expand a logical expression (in a string) of the form:
'(A or B) and ((C and D) or E)'
in Python to produce a list of all positive sets, i.e.
['A and C and D',
'A and E',
'B and C and D',
'B and E']
but I have been unable to find how to do this. I have investigated pyparser, but I cannot work out which example is relevant in this case. This may be very easy with some sort of logic manipulation but I do not know any formal logic. Any help, or a reference to a resource that might help would be greatly appreciated.
Here's the pyparsing bit, taken from the example SimpleBool.py. First, use infixNotation (formerly known as operatorPrecedence) to define an expression grammar that supports parenthetical grouping, and recognizes precedence of operations:
from pyparsing import *
term = Word(alphas)
AND = Keyword("and")
OR = Keyword("or")
expr = infixNotation(term,
[
(AND, 2, opAssoc.LEFT),
(OR, 2, opAssoc.LEFT),
])
sample = '(A or B) and ((C and D) or E)'
result = expr.parseString(sample)
from pprint import pprint
pprint(result.asList())
prints:
[[['A', 'or', 'B'], 'and', [['C', 'and', 'D'], 'or', 'E']]]
From this, we can see that the expression is at least parsed properly.
Next, we add parse actions to each level of the hierarchy of operations. For parse actions here, we actually pass classes, so that instead of executing functions and returning some value, the parser will call the class constructor and initializer and return a class instance for the particular subexpression:
class Operation(object):
def __init__(self, tokens):
self._tokens = tokens[0]
self.assign()
def assign(self):
"""
function to copy tokens to object attributes
"""
def __repr__(self):
return self.__class__.__name__ + ":" + repr(self.__dict__)
__str__ = __repr__
class BinOp(Operation):
def assign(self):
self.op = self._tokens[1]
self.terms = self._tokens[0::2]
del self._tokens
class AndOp(BinOp):
pass
class OrOp(BinOp):
pass
expr = infixNotation(term,
[
(AND, 2, opAssoc.LEFT, AndOp),
(OR, 2, opAssoc.LEFT, OrOp),
])
sample = '(A or B) and ((C and D) or E)'
result = expr.parseString(sample)
pprint(result.asList())
returns:
[AndOp:{'terms': [OrOp:{'terms': ['A', 'B'], 'op': 'or'},
OrOp:{'terms': [AndOp:{'terms': ['C', 'D'],
'op': 'and'}, 'E'], 'op': 'or'}],
'op': 'and'}]
Now that the expression has been converted to a data structure of subexpressions, I leave it to you to do the work of adding methods to AndOp and OrOp to generate the various combinations of terms that will evaluate overall to True. (Look at the logic in the invregex.py example that inverts regular expressions for ideas on how to add generator functions to the parsed classes to generate the different combinations of terms that you want.)
It sounds as if you want to convert these expressions to Disjunctive Normal Form. A canonical algorithm for doing that is the Quine-McCluskey algorithm; you can find some information about Python implementations thereof in the relevant Wikipedia article and in the answers to this SO question.

How do I do what strtok() does in C, in Python?

I am learning Python and trying to figure out an efficient way to tokenize a string of numbers separated by commas into a list. Well formed cases work as I expect, but less well formed cases not so much.
If I have this:
A = '1,2,3,4'
B = [int(x) for x in A.split(',')]
B results in [1, 2, 3, 4]
which is what I expect, but if the string is something more like
A = '1,,2,3,4,'
if I'm using the same list comprehension expression for B as above, I get an exception. I think I understand why (because some of the "x" string values are not integers), but I'm thinking that there would be a way to parse this still quite elegantly such that tokenization of the string a works a bit more directly like strtok(A,",\n\t") would have done when called iteratively in C.
To be clear what I am asking; I am looking for an elegant/efficient/typical way in Python to have all of the following example cases of strings:
A='1,,2,3,\n,4,\n'
A='1,2,3,4'
A=',1,2,3,4,\t\n'
A='\n\t,1,2,3,,4\n'
return with the same list of:
B=[1,2,3,4]
via some sort of compact expression.
How about this:
A = '1, 2,,3,4 '
B = [int(x) for x in A.split(',') if x.strip()]
x.strip() trims whitespace from the string, which will make it empty if the string is all whitespace. An empty string is "false" in a boolean context, so it's filtered by the if part of the list comprehension.
Generally, I try to avoid regular expressions, but if you want to split on a bunch of different things, they work. Try this:
import re
result = [int(x) for x in filter(None, re.split('[,\n,\t]', A))]
Mmm, functional goodness (with a bit of generator expression thrown in):
a = "1,2,,3,4,"
print map(int, filter(None, (i.strip() for i in a.split(','))))
For full functional joy:
import string
a = "1,2,,3,4,"
print map(int, filter(None, map(string.strip, a.split(','))))
For the sake of completeness, I will answer this seven year old question:
The C program that uses strtok:
int main()
{
char myLine[]="This is;a-line,with pieces";
char *p;
for(p=strtok(myLine, " ;-,"); p != NULL; p=strtok(NULL, " ;-,"))
{
printf("piece=%s\n", p);
}
}
can be accomplished in python with re.split as:
import re
myLine="This is;a-line,with pieces"
for p in re.split("[ ;\-,]",myLine):
print("piece="+p)
This will work, and never raise an exception, if all the numbers are ints. The isdigit() call is false if there's a decimal point in the string.
>>> nums = ['1,,2,3,\n,4\n', '1,2,3,4', ',1,2,3,4,\t\n', '\n\t,1,2,3,,4\n']
>>> for n in nums:
... [ int(i.strip()) for i in n if i.strip() and i.strip().isdigit() ]
...
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]
How about this?
>>> a = "1,2,,3,4,"
>>> map(int,filter(None,a.split(",")))
[1, 2, 3, 4]
filter will remove all false values (i.e. empty strings), which are then mapped to int.
EDIT: Just tested this against the above posted versions, and it seems to be significantly faster, 15% or so compared to the strip() one and more than twice as fast as the isdigit() one
Why accept inferior substitutes that cannot segfault your interpreter? With ctypes you can just call the real thing! :-)
# strtok in Python
from ctypes import c_char_p, cdll
try: libc = cdll.LoadLibrary('libc.so.6')
except WindowsError:
libc = cdll.LoadLibrary('msvcrt.dll')
libc.strtok.restype = c_char_p
dat = c_char_p("1,,2,3,4")
sep = c_char_p(",\n\t")
result = [libc.strtok(dat, sep)] + list(iter(lambda: libc.strtok(None, sep), None))
print(result)
Why not just wrap in a try except block which catches anything not an integer?
I was desperately in need of strtok equivalent in Python. So I developed a simple one by my own
def strtok(val,delim):
token_list=[]
token_list.append(val)
for key in delim:
nList=[]
for token in token_list:
subTokens = [ x for x in token.split(key) if x.strip()]
nList= nList + subTokens
token_list = nList
return token_list
I'd guess regular expressions are the way to go: http://docs.python.org/library/re.html

Categories