Unpacking iterable into other iterable?

Unpacking iterable into other iterable? - python

While reading data from a ASCII file, I find myself doing something like this:
(a, b, c1, c2, c3, d, e, f1, f2) = (float(x) for x in line.strip().split())
c = (c1, c2, c3)
f = (f1, f2)
If I have a determinate number of elements per line (which I do)¹ and only one multi-element entry to unpack, I can use something like `(a, b, *c, d, e) = ...' (Extended iterable unpacking).
Even if I don't, I can of course replace one of the two multi-element entries from the example above by a starred component: (a, b, *c, d, e, f1, f2) = ....
As far as I can tell, the itertools are not of immediate use here.
Are there any alternatives to the three-line code above that may be considered "more pythonic" for a reason I'm probably not aware of?
¹It's determinate but still varies per line, the pattern is too complicated for numpys functions loadtxt or genfromtxt.

If you use such statements really often, and want maximum flexibility and reusability of code instead of writing such patterns really often, I'd propose creating a small function for it. Just put it into some module and import it (you can even import the script I created).
For usage examples, see the if __name__=="__main__" block. The trick is to use a list of group ids to group values of t together. The length of this id list should be at least the same as the length of t.
I will only explain the main concepts, if you don't understand anything, just ask.
I use groupby from itertools. Even though it might not be straightforward how to use it here, I hope it might be understandable soon.
As key-function I use a method I dynamically create via a factory-function. The main concept here is "closures". The list of group ids is being "attached" to the internal function get_group. Thus:
The list is specific to each call to extract_groups_from_iterable. You can use it multiple times, no globals are used
The state of this list is shared between subsequent calls to the same instance of get_group (remember: functions are objects, too! So I have two instances of get_group during the execution of my script.
Beside of this, I have a simple method to create either lists or scalars from the groups returned by groupby.
That's it.
from itertools import groupby
def extract_groups_from_iterable(iterable, group_ids):
return [_make_list_or_scalar(g) for k, g in
groupby(iterable, _get_group_id_provider(group_ids))
]
def _get_group_id_provider(group_ids):
def get_group(value, group_ids = group_ids):
return group_ids.pop(0)
return get_group
def _make_list_or_scalar(iterable):
list_ = list(iterable)
return list_ if len(list_) != 1 else list_[0]
if __name__ == "__main__":
t1 = range(9)
group_ids1 = [1,2,3,4,5,5,6,7,8]
a,b,c,d,e,f,g,h = extract_groups_from_iterable(t1, group_ids1)
for varname in "abcdefgh":
print varname, globals()[varname]
print
t2 = range(15)
group_ids2 = [1,2,2,3,4,5,5,5,5,5,6,6,6,7,8]
a,b,c,d,e,f,g,h = extract_groups_from_iterable(t2, group_ids2)
for varname in "abcdefgh":
print varname, globals()[varname]
Output is:
a 0
b 1
c 2
d 3
e [4, 5]
f 6
g 7
h 8
a 0
b [1, 2]
c 3
d 4
e [5, 6, 7, 8, 9]
f [10, 11, 12]
g 13
h 14
Once again, this might seem like overkill, but if this helps you reducing your code, use it.

Why not just slice a tuple?
t = tuple(float(x) for x in line.split())
c = t[2:5] #maybe t[2:-4] instead?
f = t[-2:]
demo:
>>> line = "1 2 3 4 5 6 7 8 9"
>>> t = tuple(float(x) for x in line.split())
>>> c = t[2:5] #maybe t[2:-4] instead?
>>> f = t[-2:]
>>> c
(3.0, 4.0, 5.0)
>>> t
(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
>>> c = t[2:-4]
>>> c
(3.0, 4.0, 5.0)
While we're on the topic of being pythonic, line.strip().split() can always be safely written as line.split() where line is a string. split will strip the whitespace for you when you don't give it any arguments.

Related

avoiding nested for loops python

I have a function which takes in expressions and replaces the variables with all the permutations of the values that I am using as inputs. This is my code that I have tested and works, however after looking through SO, people have said that nested for loops are a bad idea however I am unsure as to how to make this more efficient. Could somebody help? Thanks.
def replaceVar(expression):
eval_list = list()
a = [1, 8, 12, 13]
b = [1, 2, 3, 4]
c = [5, 9, 2, 7]
for i in expression:
first_eval = [i.replace("a", str(j)) for j in a]
tmp = list()
for k in first_eval:
snd_eval = [k.replace("b", str(l)) for l in b]
tmp2 = list()
for m in snd_eval:
trd_eval = [m.replace("c", str(n)) for n in c]
tmp2.append(trd_eval)
tmp.append(tmp2)
eval_list.append(tmp)
print(eval_list)
return eval_list
print(replaceVar(['b-16+(c-(a+11))', 'a-(c-5)+a-b-10']))

Foreword
Nested loops are not a bad thing per se. They are only bad, if there are used for problems, for which better algorithm have been found (better and bad in terms of efficiency regarding the input size). Sorting of a list of integers for example is such a problem.
Analyzing the Problem
The size
In your case above you have three lists, all of size 4. This makes 4 * 4 * 4 = 64 possible combinations of them, if a comes always before b and b before c. So you need at least 64 iterations!
Your approach
In your approach we have 4 iterations for each possible value of a, 4 iterations for each possible value of b and the same for c. So we have 4 * 4 * 4 = 64 iterations in total. So in fact your solution is quite good!
As there is no faster way of listening all combinations, your way is also the best one.
The style
Regarding the style one can say that you can improve your code by better variable names and combining some of the for loops. E.g. like that:
def replaceVar(expressions):
"""
Takes a list of expressions and returns a list of expressions with
evaluated variables.
"""
evaluatedExpressions = list()
valuesOfA = [1, 8, 12, 13]
valuesOfB = [1, 2, 3, 4]
valuesOfC = [5, 9, 2, 7]
for expression in expressions:
for valueOfA in valuesOfA:
for valueOfB in valuesOfB:
for valueOfC in valuesOfC:
newExpression = expression.\
replace('a', str(valueOfA)).\
replace('b', str(valueOfB)).\
replace('c', str(valueOfC))
evaluatedExpressions.append(newExpression)
print(evaluatedExpressions)
return evaluatedExpressions
print(replaceVar(['b-16+(c-(a+11))', 'a-(c-5)+a-b-10']))
Notice however that the amount of iterations remain the same!
Itertools
As Kevin noticed, you could also use itertools to generate the cartesian product. Internally it will do the same as what you did with the combined for loops:
import itertools
def replaceVar(expressions):
"""
Takes a list of expressions and returns a list of expressions with
evaluated variables.
"""
evaluatedExpressions = list()
valuesOfA = [1, 8, 12, 13]
valuesOfB = [1, 2, 3, 4]
valuesOfC = [5, 9, 2, 7]
for expression in expressions:
for values in itertools.product(valuesOfA, valuesOfB, valuesOfC):
valueOfA = values[0]
valueOfB = values[1]
valueOfC = values[2]
newExpression = expression.\
replace('a', str(valueOfA)).\
replace('b', str(valueOfB)).\
replace('c', str(valueOfC))
evaluatedExpressions.append(newExpression)
print(evaluatedExpressions)
return evaluatedExpressions
print(replaceVar(['b-16+(c-(a+11))', 'a-(c-5)+a-b-10']))

here are some ideas:
as yours list a, b and c are hardcoded, harcode them as strings, therefore you don't have to cast every element to string at each step
use list comprehension, they are a little more faster than a normal for-loop with append
instead of .replace, use .format, it does all the replace for you in a single step
use itertools.product to combine a, b and c
with all that, I arrive to this
import itertools
def replaceVar(expression):
a = ['1', '8', '12', '13' ]
b = ['1', '2', '3', '4' ]
c = ['5', '9', '2', '7' ]
expression = [exp.replace('a','{0}').replace('b','{1}').replace('c','{2}')
for exp in expression] #prepare the expresion so they can be used with format
return [ exp.format(*arg) for exp in expression for arg in itertools.product(a,b,c) ]
the speed gain is marginal, but is something, in my machine it goes from 148 milliseconds to 125
Functionality is the same to the version of R.Q.

"The problem" with nested loops is basically just that the number of levels is hard coded. You wrote nesting for 3 variables. What if you only have 2? What if it jumps to 5? Then you need non-trivial surgery on the code. That's why itertools.product() is recommended.
Relatedly, all suggestions so far hard-code the number of replace() calls. Same "problem": if you don't have exactly 3 variables, the replacement code has to be modified.
Instead of doing that, think about a cleaner way to do the replacements. For example, suppose your input string were:
s = '{b}-16+({c}-({a}+11))'
instead of:
'b-16+(c-(a+11))'
That is, the variables to be replaced are enclosed in curly braces. Then Python can do all the substitutions "at once" for you:
>>> s.format(a=333, b=444, c=555)
'444-16+(555-(333+11))'
That hard-codes the names and number of names too, but the same thing can be accomplished with a dict:
>>> d = dict(zip(["a", "b", "c"], (333, 444, 555)))
>>> s.format(**d)
'444-16+(555-(333+11))'
Now nothing about the number of variables, or their names, is hard-coded in the format() call.
The tuple of values ((333, 444, 555)) is exactly the kind of thing itertools.product() returns. The list of variable names (["a", "b", "c"]) can be created just once at the top, or even passed in to the function.
You just need a bit of code to transform your input expressions to enclose the variable names in curly braces.

So, your current structure addresses one of the inefficiencies that the solutions with itertools.product will not address. Your code is saving the intermediately substituted expressions and reusing them, rather than redoing these substitutions with each itertools.product tuple. This is good and I think your current code is efficient.
However, it is brittle and only works when substituting in exactly three variables. A dynamic programming approach can solve this issue. To do so, I'm going to slightly alter the input parameters. The function will use two inputs:
expressions - The expressions to be substituted into
replacement_map - A dictionary which provides the values to substitute for each variable
The dynamic programming function is given below:
def replace_variable(expressions, replacement_map):
return [list(_replace_variable([e], replacement_map)) for e in expressions]
def _replace_variable(expressions, replacement_map):
if not replacement_map:
for e in expressions:
yield e
else:
map_copy = replacement_map.copy()
key, value_list = map_copy.popitem()
for value in value_list:
substituted = [e.replace(key, value) for e in expressions]
for e in _replace_variable(substituted, map_copy):
yield e
With the example usage:
expressions = ['a+b', 'a-b']
replacement_map = {
'a': ['1', '2'],
'b': ['3', '4']
}
print replace_variable(expressions, replacement_map)
# [['1+3', '1+4', '2+3', '2+4'], ['1-3', '1-4', '2-3', '2-4']]
Note that if you're using Python 3.X, you can use the yield from iterator construct instead of reiterating over e twice in _replace_variables. This function would look like:
def _replace_variable(expressions, replacement_map):
if not replacement_map:
yield from expressions
else:
map_copy = replacement_map.copy()
key, value_list = map_copy.popitem()
for value in value_list:
substituted = [e.replace(key, value) for e in expressions]
yield from _replace_variable(substituted, map_copy)

Understanding *x ,= lst

I'm going through some old code trying to understand what it does, and I came across this odd statement:
*x ,= p
p is a list in this context. I've been trying to figure out what this statement does. As far as I can tell, it just sets x to the value of p. For example:
p = [1,2]
*x ,= p
print(x)
Just gives
[1, 2]
So is this any different than x = p? Any idea what this syntax is doing?

*x ,= p is basically an obfuscated version of x = list(p) using extended iterable unpacking. The comma after x is required to make the assignment target a tuple (it could also be a list though).
*x, = p is different from x = p because the former creates a copy of p (i.e. a new list) while the latter creates a reference to the original list. To illustrate:
>>> p = [1, 2]
>>> *x, = p
>>> x == p
True
>>> x is p
False
>>> x = p
>>> x == p
True
>>> x is p
True

It's a feature that was introduced in Python 3.0 (PEP 3132). In Python 2, you could do something like this:
>>> p = [1, 2, 3]
>>> q, r, s = p
>>> q
1
>>> r
2
>>> s
3
Python 3 extended this so that one variable could hold multiple values:
>>> p = [1, 2, 3]
>>> q, *r = p
>>> q
1
>>> r
[2, 3]
This, therefore, is what is being used here. Instead of two variables to hold three values, however, it is just one variable that takes each value in the list. This is different from x = p because x = p just means that x is another name for p. In this case, however, it is a new list that just happens to have the same values in it. (You may be interested in "Least Astonishment" and the Mutable Default Argument)
Two other common ways of producing this effect are:
>>> x = list(p)
and
>>> x = p[:]
Since Python 3.3, the list object actually has a method intended for copying:
x = p.copy()
The slice is actually a very similar concept. As nneonneo pointed out, however, that works only with objects such as lists and tuples that support slices. The method you mention, however, works with any iterable: dictionaries, sets, generators, etc.

You should always throw these to dis and see what it throws back at you; you'll see how *x, = p is actually different from x = p:
dis('*x, = p')
1 0 LOAD_NAME 0 (p)
2 UNPACK_EX 0
4 STORE_NAME 1 (x)
While, the simple assignment statement:
dis('x = p')
1 0 LOAD_NAME 0 (p)
2 STORE_NAME 1 (x)
(Stripping off unrelated None returns)
As you can see UNPACK_EX is the different op-code between these; it's documented as:
Implements assignment with a starred target: Unpacks an iterable in TOS (top of stack) into individual values, where the total number of values can be smaller than the number of items in the iterable: one of the new values will be a list of all leftover items.
Which is why, as Eugene noted, you get a new object that's referred to by the name x and not a reference to an already existing object (as is the case with x = p).
*x, does seem very odd (the extra comma there and all) but it is required here. The left hand side must either be a tuple or a list and, due to the quirkiness of creating a single element tuple in Python, you need to use a trailing ,:
i = 1, # one element tuple
If you like confusing people, you can always use the list version of this:
[*x] = p
which does exactly the same thing but doesn't have that extra comma hanging around there.

You can clearly understand it from below example
L = [1, 2, 3, 4]
while L:
temp, *L = L
print(temp, L)
what it does is, the front variable will get the first item every time and the remaining list will be given to L.
The output will look shown below.
1 [2, 3, 4]
2 [3, 4]
3 [4]
4 []
Also look at below example
x, *y, z = "python"
print(x,y,z)
In this both x,z will get each one letter from the string meaning first letter is assigned to x and the last letter will be assigned to z and the remaining string will be assigned to variable y.
p ['y', 't', 'h', 'o'] n
One more example,
a, b, *c = [0,1,2,3]
print(a,b,c)
0 1 [2,3]
Boundary case: If there is nothing remaining for star variable then it will get an empty list.
Example:
a,b=[1]
print(a,b)
1 []

Assign multiple values of a list

I am curious to know if there is a "pythonic" way to assign the values in a list to elements? To be clearer, I am asking for something like this:
myList = [3, 5, 7, 2]
a, b, c, d = something(myList)
So that:
a = 3
b = 5
c = 7
d = 2
I am looking for any other, better option than doing this manually:
a = myList[0]
b = myList[1]
c = myList[2]
d = myList[3]

Simply type it out:
>>> a,b,c,d = [1,2,3,4]
>>> a
1
>>> b
2
>>> c
3
>>> d
4
Python employs assignment unpacking when you have an iterable being assigned to multiple variables like above.
In Python3.x this has been extended, as you can also unpack to a number of variables that is less than the length of the iterable using the star operator:
>>> a,b,*c = [1,2,3,4]
>>> a
1
>>> b
2
>>> c
[3, 4]

Totally agree with NDevox's answer
a,b,c,d = [1,2,3,4]
I think it is also worth to mention that if you only need part of the list e.g only the second and last element from the list, you could do
_, a, _, b = [1,2,3,4]

a, b, c, d = myList
is what you want.
Basically, the function returns a tuple, which is similar to a list - because it is an iterable.
This works with all iterables btw. And you need to know the length of the iterable when using it.

One trick is to use the walrus operator in Python 3.8 so that you still have the my_list variable. And make it a one-line operation.
>>> my_list = [a:=3, b:=5, c:=7, d:=2]
>>> a
3
>>> b
5
>>> c
7
>>> d
2
>>> my_list
[3, 5, 7, 2]
PS: Using camelCase (myList) is not pythonic too.
What's new in Python 3.8 : https://docs.python.org/3/whatsnew/3.8.html

You can also use a dictionary. This was if you have more elements in a list, you don't have to waste time hardcoding that.
import string
arr = [1,2,3,4,5,6,7,8,9,10]
var = {let:num for num,let in zip(arr,string.ascii_lowercase)}
Now we can access variables in this dictionary like so.
var['a']

How to combine tuples from two generators in python

I want to use two generators in a single for loop. Something like:
for a,b,c,d,e,f in f1(arg),f2(arg):
print a,b,c,d,e,f
Where a,b,c,d and e come from f1 and f comes from f2. I need to use the yield operator because of space constraints.
The above code however doesn't work. Due to some reason it keeps on taking values (for all six variables) from f1 until it is exhausted and then starts taking values from f2.
Please let me know if this is possible and if not is there any workaround. Thank you in advance.

You can use zip (itertools.izip if you're using Python 2) and sequence unpacking:
def f1(arg):
for i in range(10):
yield 1, 2, 3, 4, 5
def f2(arg):
for i in range(10):
yield 6
arg = 1
for (a, b, c, d, e), f in zip(f1(arg), f2(arg)):
print(a, b, c, d, e, f)

An elegant and fast way to consecutively iterate over two or more containers in Python?

I have three collection.deques and what I need to do is to iterate over each of them and perform the same action:
for obj in deque1:
some_action(obj)
for obj in deque2:
some_action(obj)
for obj in deque3:
some_action(obj)
I'm looking for some function XXX which would ideally allow me to write:
for obj in XXX(deque1, deque2, deque3):
some_action(obj)
The important thing here is that XXX have to be efficient enough - without making copy or silently using range(), etc. I was expecting to find it in built-in functions, but I found nothing similar to it so far.
Is there such thing already in Python or I have to write a function for that by myself?

Depending on what order you want to process the items:
import itertools
for items in itertools.izip(deque1, deque2, deque3):
for item in items:
some_action(item)
for item in itertools.chain(deque1, deque2, deque3):
some_action(item)
I'd recommend doing this to avoid hard-coding the actual deques or number of deques:
deques = [deque1, deque2, deque3]
for item in itertools.chain(*deques):
some_action(item)
To demonstrate the difference in order of the above methods:
>>> a = range(5)
>>> b = range(5)
>>> c = range(5)
>>> d = [a, b, c]
>>>
>>> for items in itertools.izip(*d):
... for item in items:
... print item,
...
0 0 0 1 1 1 2 2 2 3 3 3 4 4 4
>>>
>>> for item in itertools.chain(*d):
... print item,
...
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
>>>

The answer is in itertools
itertools.chain(*iterables)
Make an iterator that returns elements from the first iterable until
it is exhausted, then proceeds to the
next iterable, until all of the
iterables are exhausted. Used for
treating consecutive sequences as a
single sequence. Equivalent to:
def chain(*iterables):
# chain('ABC', 'DEF') --> A B C D E F
for it in iterables:
for element in it:
yield element

Call me crazy, but why is using itertools thought to be necessary? What's wrong with:
def perform_func_on_each_object_in_each_of_multiple_containers(func, containers):
for container in containers:
for obj in container:
func(obj)
perform_func_on_each_object_in_each_of_multiple_containers(some_action, (deque1, deque2, deque3)
Even crazier: you probably are going to use this once. Why not just do:
for d in (deque1, deque2, deque3):
for obj in d:
some_action(obj)
What's going on there is immediately obvious without having to look at the code/docs for the long-name function or having to look up the docs for itertools.something()

Use itertools.chain(deque1, deque2, deque3)

How about zip?
for obj in zip(deque1, deque2, deque3):
for sub_obj in obj:
some_action(sub_obj)

Accepts a bunch of iterables, and yields the contents for each of them in sequence.
def XXX(*lists):
for aList in lists:
for item in aList:
yield item
l1 = [1, 2, 3, 4]
l2 = ['a', 'b', 'c']
l3 = [1.0, 1.1, 1.2]
for item in XXX(l1, l2, l3):
print item
1
2
3
4
a
b
c
1.0
1.1
1.2

It looks like you want itertools.chain:
"Make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted. Used for treating consecutive sequences as a single sequence."

If I understand your question correctly, then you can use map with the first argument set to None, and all the other arguments as your lists to iterate over.
E.g (from an iPython prompt, but you get the idea):
In [85]: p = [1,2,3,4]
In [86]: q = ['a','b','c','d']
In [87]: f = ['Hi', 'there', 'world', '.']
In [88]: for i,j,k in map(None, p,q,f):
....: print i,j,k
....:
....:
1 a Hi
2 b there
3 c world
4 d .

I would simply do this :
for obj in deque1 + deque2 + deque3:
some_action(obj)

>>> a = [[],[],[]]
>>> b = [[],[],[]]
>>> for c in [*a,*b]:
c.append("derp")
>>> a
[['derp'], ['derp'], ['derp']]
>>> b
[['derp'], ['derp'], ['derp']]
>>>

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unpacking iterable into other iterable? - python

Related

avoiding nested for loops python

Understanding *x ,= lst

Assign multiple values of a list

How to combine tuples from two generators in python

An elegant and fast way to consecutively iterate over two or more containers in Python?

Categories

Resources