avoiding nested for loops python - python

I have a function which takes in expressions and replaces the variables with all the permutations of the values that I am using as inputs. This is my code that I have tested and works, however after looking through SO, people have said that nested for loops are a bad idea however I am unsure as to how to make this more efficient. Could somebody help? Thanks.
def replaceVar(expression):
eval_list = list()
a = [1, 8, 12, 13]
b = [1, 2, 3, 4]
c = [5, 9, 2, 7]
for i in expression:
first_eval = [i.replace("a", str(j)) for j in a]
tmp = list()
for k in first_eval:
snd_eval = [k.replace("b", str(l)) for l in b]
tmp2 = list()
for m in snd_eval:
trd_eval = [m.replace("c", str(n)) for n in c]
tmp2.append(trd_eval)
tmp.append(tmp2)
eval_list.append(tmp)
print(eval_list)
return eval_list
print(replaceVar(['b-16+(c-(a+11))', 'a-(c-5)+a-b-10']))

Foreword
Nested loops are not a bad thing per se. They are only bad, if there are used for problems, for which better algorithm have been found (better and bad in terms of efficiency regarding the input size). Sorting of a list of integers for example is such a problem.
Analyzing the Problem
The size
In your case above you have three lists, all of size 4. This makes 4 * 4 * 4 = 64 possible combinations of them, if a comes always before b and b before c. So you need at least 64 iterations!
Your approach
In your approach we have 4 iterations for each possible value of a, 4 iterations for each possible value of b and the same for c. So we have 4 * 4 * 4 = 64 iterations in total. So in fact your solution is quite good!
As there is no faster way of listening all combinations, your way is also the best one.
The style
Regarding the style one can say that you can improve your code by better variable names and combining some of the for loops. E.g. like that:
def replaceVar(expressions):
"""
Takes a list of expressions and returns a list of expressions with
evaluated variables.
"""
evaluatedExpressions = list()
valuesOfA = [1, 8, 12, 13]
valuesOfB = [1, 2, 3, 4]
valuesOfC = [5, 9, 2, 7]
for expression in expressions:
for valueOfA in valuesOfA:
for valueOfB in valuesOfB:
for valueOfC in valuesOfC:
newExpression = expression.\
replace('a', str(valueOfA)).\
replace('b', str(valueOfB)).\
replace('c', str(valueOfC))
evaluatedExpressions.append(newExpression)
print(evaluatedExpressions)
return evaluatedExpressions
print(replaceVar(['b-16+(c-(a+11))', 'a-(c-5)+a-b-10']))
Notice however that the amount of iterations remain the same!
Itertools
As Kevin noticed, you could also use itertools to generate the cartesian product. Internally it will do the same as what you did with the combined for loops:
import itertools
def replaceVar(expressions):
"""
Takes a list of expressions and returns a list of expressions with
evaluated variables.
"""
evaluatedExpressions = list()
valuesOfA = [1, 8, 12, 13]
valuesOfB = [1, 2, 3, 4]
valuesOfC = [5, 9, 2, 7]
for expression in expressions:
for values in itertools.product(valuesOfA, valuesOfB, valuesOfC):
valueOfA = values[0]
valueOfB = values[1]
valueOfC = values[2]
newExpression = expression.\
replace('a', str(valueOfA)).\
replace('b', str(valueOfB)).\
replace('c', str(valueOfC))
evaluatedExpressions.append(newExpression)
print(evaluatedExpressions)
return evaluatedExpressions
print(replaceVar(['b-16+(c-(a+11))', 'a-(c-5)+a-b-10']))

here are some ideas:
as yours list a, b and c are hardcoded, harcode them as strings, therefore you don't have to cast every element to string at each step
use list comprehension, they are a little more faster than a normal for-loop with append
instead of .replace, use .format, it does all the replace for you in a single step
use itertools.product to combine a, b and c
with all that, I arrive to this
import itertools
def replaceVar(expression):
a = ['1', '8', '12', '13' ]
b = ['1', '2', '3', '4' ]
c = ['5', '9', '2', '7' ]
expression = [exp.replace('a','{0}').replace('b','{1}').replace('c','{2}')
for exp in expression] #prepare the expresion so they can be used with format
return [ exp.format(*arg) for exp in expression for arg in itertools.product(a,b,c) ]
the speed gain is marginal, but is something, in my machine it goes from 148 milliseconds to 125
Functionality is the same to the version of R.Q.

"The problem" with nested loops is basically just that the number of levels is hard coded. You wrote nesting for 3 variables. What if you only have 2? What if it jumps to 5? Then you need non-trivial surgery on the code. That's why itertools.product() is recommended.
Relatedly, all suggestions so far hard-code the number of replace() calls. Same "problem": if you don't have exactly 3 variables, the replacement code has to be modified.
Instead of doing that, think about a cleaner way to do the replacements. For example, suppose your input string were:
s = '{b}-16+({c}-({a}+11))'
instead of:
'b-16+(c-(a+11))'
That is, the variables to be replaced are enclosed in curly braces. Then Python can do all the substitutions "at once" for you:
>>> s.format(a=333, b=444, c=555)
'444-16+(555-(333+11))'
That hard-codes the names and number of names too, but the same thing can be accomplished with a dict:
>>> d = dict(zip(["a", "b", "c"], (333, 444, 555)))
>>> s.format(**d)
'444-16+(555-(333+11))'
Now nothing about the number of variables, or their names, is hard-coded in the format() call.
The tuple of values ((333, 444, 555)) is exactly the kind of thing itertools.product() returns. The list of variable names (["a", "b", "c"]) can be created just once at the top, or even passed in to the function.
You just need a bit of code to transform your input expressions to enclose the variable names in curly braces.

So, your current structure addresses one of the inefficiencies that the solutions with itertools.product will not address. Your code is saving the intermediately substituted expressions and reusing them, rather than redoing these substitutions with each itertools.product tuple. This is good and I think your current code is efficient.
However, it is brittle and only works when substituting in exactly three variables. A dynamic programming approach can solve this issue. To do so, I'm going to slightly alter the input parameters. The function will use two inputs:
expressions - The expressions to be substituted into
replacement_map - A dictionary which provides the values to substitute for each variable
The dynamic programming function is given below:
def replace_variable(expressions, replacement_map):
return [list(_replace_variable([e], replacement_map)) for e in expressions]
def _replace_variable(expressions, replacement_map):
if not replacement_map:
for e in expressions:
yield e
else:
map_copy = replacement_map.copy()
key, value_list = map_copy.popitem()
for value in value_list:
substituted = [e.replace(key, value) for e in expressions]
for e in _replace_variable(substituted, map_copy):
yield e
With the example usage:
expressions = ['a+b', 'a-b']
replacement_map = {
'a': ['1', '2'],
'b': ['3', '4']
}
print replace_variable(expressions, replacement_map)
# [['1+3', '1+4', '2+3', '2+4'], ['1-3', '1-4', '2-3', '2-4']]
Note that if you're using Python 3.X, you can use the yield from iterator construct instead of reiterating over e twice in _replace_variables. This function would look like:
def _replace_variable(expressions, replacement_map):
if not replacement_map:
yield from expressions
else:
map_copy = replacement_map.copy()
key, value_list = map_copy.popitem()
for value in value_list:
substituted = [e.replace(key, value) for e in expressions]
yield from _replace_variable(substituted, map_copy)

Related

Python assigning copies of the object to variables

Not a technical question, just a matter of coding style.
To me it makes more sense for assigning the same value to variables syntax to be a, b = 0, rather than a, b = 0, 0, but it is what it is. At least you can go around it by doing a = b = 0 if object is a number or a string, but today I came in situation that I needed 10 identical lists. So I went like:
list1, list2... = big_list[:], big_list[:]....
So big and ugly. How would you do it more in accordance with do-not-repeat-yourself principle?
You could do the following:
list1, list2, list3 = (big_list[:] for _ in range(3))
Whether that's an improvement is debatable. If you need many parallel lists, perhaps you should keep them in a collection instead of separate variables?
Personally, I would use a dictionary comprehension like
listdict = { key: value for key, value in range(1, 11), [big_list[:] for x in range(10)] }
Which would give you a dictionary containing the lists for reference under using keys 1 through 10 (i.e. listdict[1], listdict[2] ...etc)

When to drop list Comprehension and the Pythonic way?

I created a line that appends an object to a list in the following manner
>>> foo = list()
>>> def sum(a, b):
... c = a+b; return c
...
>>> bar_list = [9,8,7,6,5,4,3,2,1,0]
>>> [foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
[None, None, None, None, None, None, None, None, None, None]
>>> foo
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
>>>
The line
[foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
would give a pylint W1060 Expression is assigned to nothing, but since I am already using the foo list to append the values I don't need to assing the List Comprehension line to something.
My questions is more of a matter of programming correctness
Should I drop list comprehension and just use a simple for expression?
>>> for i, x in enumerate(bar_list):
... foo.append(sum(i,x))
or is there a correct way to use both list comprehension an assign to nothing?
Answer
Thank you #user2387370, #kindall and #Martijn Pieters. For the rest of the comments I use append because I'm not using a list(), I'm not using i+x because this is just a simplified example.
I left it as the following:
histogramsCtr = hist_impl.HistogramsContainer()
for index, tupl in enumerate(local_ranges_per_histogram_list):
histogramsCtr.append(doSubHistogramData(index, tupl))
return histogramsCtr
Yes, this is bad style. A list comprehension is to build a list. You're building a list full of Nones and then throwing it away. Your actual desired result is a side effect of this effort.
Why not define foo using the list comprehension in the first place?
foo = [sum(i,x) for i, x in enumerate(bar_list)]
If it is not to be a list but some other container class, as you mentioned in a comment on another answer, write that class to accept an iterable in its constructor (or, if it's not your code, subclass it to do so), then pass it a generator expression:
foo = MyContainer(sum(i, x) for i, x in enumerate(bar_list))
If foo already has some value and you wish to append new items:
foo.extend(sum(i,x) for i, x in enumerate(bar_list))
If you really want to use append() and don't want to use a for loop for some reason then you can use this construction; the generator expression will at least avoid wasting memory and CPU cycles on a list you don't want:
any(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
But this is a good deal less clear than a regular for loop, and there's still some extra work being done: any is testing the return value of foo.append() on each iteration. You can write a function to consume the iterator and eliminate that check; the fastest way uses a zero-length collections.deque:
from collections import deque
do = deque([], maxlen=0).extend
do(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
This is actually fairly readable, but I believe it's not actually any faster than any() and requires an extra import. However, either do() or any() is a little faster than a for loop, if that is a concern.
I think it's generally frowned upon to use list comprehensions just for side-effects, so I would say a for loop is better in this case.
But in any case, couldn't you just do foo = [sum(i,x) for i, x in enumerate(bar_list)]?
You should definitely drop the list comprehension. End of.
You are confusing anyone reading your code. You are building a list for the side-effects.
You are paying CPU cycles and memory for building a list you are discarding again.
In your simplified case, you are overlooking the fact you could have used a list comprehension directly:
[sum(i,x) for i, x in enumerate(bar_list)]

What does [u'abcd', u'bcde'] mean in Python?

Used a loop to add a bunch of elements to a list with
mylist = []
for x in otherlist:
mylist.append(x[0:5])
But instead of the expected result ['x1','x2',...], I got: [u'x1', u'x2',...]. Where did the u's come from and why? Also is there a better way to loop through the other list, inserting the first six characters of each element into a new list?
The u means unicode, you probably will not need to worry about it
mylist.extend(x[:5] for x in otherlist)
The u means unicode. It's Python's internal string representation (from version ... ?).
Most times you don't need to worry about it. (Until you do.)
The answers above me already answered the "u" part - that the string is encoded in Unicode. About whether there's a better way to extract the first 6 letters from the items in a list:
>>> a = ["abcdefgh", "012345678"]
>>> b = map(lambda n: n[0:5], a);
>>> for x in b:
print(x)
abcde
01234
So, map applies a function (lambda n: n[0:5]) to each element of a and returns a new list with the results of the function for every element. More precisely, in Python 3, it returns an iterator, so the function gets called only as many times as needed (i.e. if your list has 5000 items, but you only pull 10 from the result b, lambda n: n[0:5] gets called only 10 times). In Python2, you need to use itertools.imap instead.
>>> a = [1, 2, 3]
>>> def plusone(x):
print("called with {}".format(x))
return x + 1
>>> b = map(plusone, a)
>>> print("first item: {}".format(b.__next__()))
called with 1
first item: 2
Of course, you can apply the function "eagerly" to every element by calling list(b), which will give you a normal list with the function applied to each element on creation.
>>> b = map(plusone, a)
>>> list(b)
called with 1
called with 2
called with 3
[2, 3, 4]

Slicing a list using a variable, in Python

Given a list
a = range(10)
You can slice it using statements such as
a[1]
a[2:4]
However, I want to do this based on a variable set elsewhere in the code. I can easily do this for the first one
i = 1
a[i]
But how do I do this for the other one? I've tried indexing with a list:
i = [2, 3, 4]
a[i]
But that doesn't work. I've also tried using a string:
i = "2:4"
a[i]
But that doesn't work either.
Is this possible?
that's what slice() is for:
a = range(10)
s = slice(2,4)
print a[s]
That's the same as using a[2:4].
Why does it have to be a single variable? Just use two variables:
i, j = 2, 4
a[i:j]
If it really needs to be a single variable you could use a tuple.
With the assignments below you are still using the same type of slicing operations you show, but now with variables for the values.
a = range(10)
i = 2
j = 4
then
print a[i:j]
[2, 3]
>>> a=range(10)
>>> i=[2,3,4]
>>> a[i[0]:i[-1]]
range(2, 4)
>>> list(a[i[0]:i[-1]])
[2, 3]
I ran across this recently, while looking up how to have the user mimic the usual slice syntax of a:b:c, ::c, etc. via arguments passed on the command line.
The argument is read as a string, and I'd rather not split on ':', pass that to slice(), etc. Besides, if the user passes a single integer i, the intended meaning is clearly a[i]. Nevertheless, slice(i) will default to slice(None,i,None), which isn't the desired result.
In any case, the most straightforward solution I could come up with was to read in the string as a variable st say, and then recover the desired list slice as eval(f"a[{st}]").
This uses the eval() builtin and an f-string where st is interpolated inside the braces. It handles precisely the usual colon-separated slicing syntax, since it just plugs in that colon-containing string as-is.

How do I do what strtok() does in C, in Python?

I am learning Python and trying to figure out an efficient way to tokenize a string of numbers separated by commas into a list. Well formed cases work as I expect, but less well formed cases not so much.
If I have this:
A = '1,2,3,4'
B = [int(x) for x in A.split(',')]
B results in [1, 2, 3, 4]
which is what I expect, but if the string is something more like
A = '1,,2,3,4,'
if I'm using the same list comprehension expression for B as above, I get an exception. I think I understand why (because some of the "x" string values are not integers), but I'm thinking that there would be a way to parse this still quite elegantly such that tokenization of the string a works a bit more directly like strtok(A,",\n\t") would have done when called iteratively in C.
To be clear what I am asking; I am looking for an elegant/efficient/typical way in Python to have all of the following example cases of strings:
A='1,,2,3,\n,4,\n'
A='1,2,3,4'
A=',1,2,3,4,\t\n'
A='\n\t,1,2,3,,4\n'
return with the same list of:
B=[1,2,3,4]
via some sort of compact expression.
How about this:
A = '1, 2,,3,4 '
B = [int(x) for x in A.split(',') if x.strip()]
x.strip() trims whitespace from the string, which will make it empty if the string is all whitespace. An empty string is "false" in a boolean context, so it's filtered by the if part of the list comprehension.
Generally, I try to avoid regular expressions, but if you want to split on a bunch of different things, they work. Try this:
import re
result = [int(x) for x in filter(None, re.split('[,\n,\t]', A))]
Mmm, functional goodness (with a bit of generator expression thrown in):
a = "1,2,,3,4,"
print map(int, filter(None, (i.strip() for i in a.split(','))))
For full functional joy:
import string
a = "1,2,,3,4,"
print map(int, filter(None, map(string.strip, a.split(','))))
For the sake of completeness, I will answer this seven year old question:
The C program that uses strtok:
int main()
{
char myLine[]="This is;a-line,with pieces";
char *p;
for(p=strtok(myLine, " ;-,"); p != NULL; p=strtok(NULL, " ;-,"))
{
printf("piece=%s\n", p);
}
}
can be accomplished in python with re.split as:
import re
myLine="This is;a-line,with pieces"
for p in re.split("[ ;\-,]",myLine):
print("piece="+p)
This will work, and never raise an exception, if all the numbers are ints. The isdigit() call is false if there's a decimal point in the string.
>>> nums = ['1,,2,3,\n,4\n', '1,2,3,4', ',1,2,3,4,\t\n', '\n\t,1,2,3,,4\n']
>>> for n in nums:
... [ int(i.strip()) for i in n if i.strip() and i.strip().isdigit() ]
...
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]
How about this?
>>> a = "1,2,,3,4,"
>>> map(int,filter(None,a.split(",")))
[1, 2, 3, 4]
filter will remove all false values (i.e. empty strings), which are then mapped to int.
EDIT: Just tested this against the above posted versions, and it seems to be significantly faster, 15% or so compared to the strip() one and more than twice as fast as the isdigit() one
Why accept inferior substitutes that cannot segfault your interpreter? With ctypes you can just call the real thing! :-)
# strtok in Python
from ctypes import c_char_p, cdll
try: libc = cdll.LoadLibrary('libc.so.6')
except WindowsError:
libc = cdll.LoadLibrary('msvcrt.dll')
libc.strtok.restype = c_char_p
dat = c_char_p("1,,2,3,4")
sep = c_char_p(",\n\t")
result = [libc.strtok(dat, sep)] + list(iter(lambda: libc.strtok(None, sep), None))
print(result)
Why not just wrap in a try except block which catches anything not an integer?
I was desperately in need of strtok equivalent in Python. So I developed a simple one by my own
def strtok(val,delim):
token_list=[]
token_list.append(val)
for key in delim:
nList=[]
for token in token_list:
subTokens = [ x for x in token.split(key) if x.strip()]
nList= nList + subTokens
token_list = nList
return token_list
I'd guess regular expressions are the way to go: http://docs.python.org/library/re.html

Categories