I'm having difficulties testing python functions that
return an iterable, like functions that are
yielding or functions that simply return an iterable, like return imap(f, some_iter) or return permutations([1,2,3]).
So with the permutations example, I expect the output of the function to be [(1, 2, 3), (1, 3, 2), ...]. So, I start testing my code.
def perm3():
return permutations([1,2,3])
# Lets ignore test framework and such details
def test_perm3():
assertEqual(perm3(), [(1, 2, 3), (1, 3, 2), ...])
This will not work, since perm3() is an iterable, not a
list. So we can fix this particular example.
def test_perm3():
assertEqual(list(perm3()), [(1, 2, 3), (1, 3, 2), ...])
And this works fine. But what if I have nested iterables? That is
iterables yielding iterables? Like say the expressions
product(permutations([1, 2]), permutations([3, 4])). Now this is
probably not useful but it's clear that it will be (once unrolling the
iterators) something like [((1, 2), (3, 4)), ((1, 2), (4, 3)), ...].
However, we can not just wrap list around our result, as that will only
turn iterable<blah> to [iterable<blah>, iterable<blah>, ...]. Well
of course I can do map(list, product(...)), but this only works for a
nesting level of 2.
So, does the python testing community have any solution for the
problems when testing iterables? Naturally some iterables can't
be tested in this way, like if you want an infinite generator, but
still this issue should be common enough for somebody to have thought
about this.
I use KennyTM's assertRecursiveEq:
import unittest
import collections
import itertools
class TestCase(unittest.TestCase):
def assertRecursiveEq(self, first, second, *args, **kwargs):
"""
https://stackoverflow.com/a/3124155/190597 (KennyTM)
"""
if (isinstance(first, collections.Iterable)
and isinstance(second, collections.Iterable)):
for first_, second_ in itertools.izip_longest(
first, second, fillvalue = object()):
self.assertRecursiveEq(first_, second_, *args, **kwargs)
else:
# If first = np.nan and second = np.nan, I want them to
# compare equal. np.isnan raises TypeErrors on some inputs,
# so I use `first != first` as a proxy. I avoid dependency on numpy
# as a bonus.
if not (first != first and second != second):
self.assertAlmostEqual(first, second, *args, **kwargs)
def perm3():
return itertools.permutations([1,2,3])
class Test(TestCase):
def test_perm3(self):
self.assertRecursiveEq(perm3(),
[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)])
if __name__ == '__main__':
import sys
sys.argv.insert(1, '--verbose')
unittest.main(argv = sys.argv)
1. If the order of results doesn't matter
Use unittest.assertItemsEqual(). This tests that the items are present in both self and reference, but ignores the order. This works on your example one nested deep example. It also works on a 2-deep example that I concocted.
2. If the order of results matters
I would suggest not ever casting the results of perm3() to a list. Instead, compare the elements directly as you iterate. Here's a test function that will work for your example. I added it to a subclass of unittest.TestCase:
def assertEqualIterables(self, itable1, itable2):
for ival1, ival2 in zip(itable1, itable2):
if "__iter__" in dir(ival1):
self.assertEqualIterables(ival1, ival2)
else:
self.assertEquals(ival1, ival2)
Use it like:
def test_perm3(self):
reference = [((1, 2), (3, 4)), ((1, 2), (4, 3)),
((2, 1), (3, 4)), ((2, 1), (4, 3)),]
self.assertEqualIterables(perm3(), reference)
You could extend you suggestion to include type (that was allowing you to distinguish between lists, tuples, etc.), like so:
def unroll(item):
if "__iter__" in dir(item):
return map(unroll, item), type(item)
else:
return item, type(item)
For example:
got = unroll(permutations([1,2]))
([([(1, <type 'int'>), (2, <type 'int'>)], <type 'tuple'>), ([(2, <type 'int'>), (1, <type 'int'>)], <type 'tuple'>)], <type 'itertools.permutations'>)
# note the final: <type 'itertools.permutations'>
expected = [(1, 2), (2, 1)]
assertEqual(x[0], unroll(expected) ) # check underlying
assertEqual(x[1], type(permutations([]) ) # check type
.
One thing to mention, type is coarse in distinguishing between objects e.g. <type 'classobj'>...
I don't know of any standard way python programmers test iterables, but you
can simply apply your idea of map and list into a recursive function
working for any level of nestedness.
def unroll(item):
if "__iter__" in dir(item):
return map(unroll, item)
else:
return item
Then your test will actually work.
def test_product_perms():
got = unroll(product(...))
expected = [[[1, 2], [3, 4]], [[1, 2], [4, 3]], ...]
assertEqual(got, expected)
However there is a flaw with this as you can see. When unrolling something, it
will always be turned to an array, this was desireable for iterables but it
also applies to the tuples. So therfore I had to manually convert the tuples in the expected result to lists. Hence, you can't diffrentiate if outputs are lists or tuples.
Another problem with this naive approach is that a passing test doesn't mean
that the function work. Say that you check assertEqual(list(my_fun()), [1, 2,
3]), while you think it might return an iterable that when "listed" is
equal to [1, 2, 3]. It might be that it did not return an iterable as you
wanted, it might have returned a list or a tuple too!
Related
I have a set of tuples of length 2. Tuples in set could be in format (x,y) or reversed (y,x).
It's guaranteed that there exist one tuple (x,y) or (y,x) in the set, but I can't know in advance in what order.
I need to remove either (x,y) or (y,x) from the set, without knowing which it is.
I tried it like this:
def flexRemove(S, tup):
try:
S.remove(tup)
except:
S.remove(tuple([tup[1], tup[0]]))
S = {(6, 1), (2, 4), (3, 8), (7, 5)}
flexRemove(S, (4, 2))
The above example removes (4,2) or (2,4) from the set, as desired.
Is there more elegant or more pythonic way to achieve this (without invoking the Exception)?
You could use an if ... else statement instead of the try ... except block.
Quoting this answer Using try vs if in python
So, whereas an if statement always costs you, it's nearly free to set
up a try/except block. But when an Exception actually occurs, the cost
is much higher.
Therefore if your code raises an exception too often, then the try...except syntax is less performant. Also I find the syntax below more readable, but this is more a matter of preference, if you will.
def flexRemove(S, tup):
if tup in S:
S.remove(tup)
else:
S.remove(tuple([tup[1], tup[0]]))
S = {(6, 1), (2, 4), (3, 8), (7, 5)}
flexRemove(S, (4, 2))
print(S)
You can also write this an one-liner, like below:
def flexRemove(S, tup):
S.remove(tup) if tup in S else S.remove(tuple([tup[1], tup[0]]))
Output:
{(6, 1), (7, 5), (3, 8)}
I have a "large" list of tuples:
thelist=[(1,2),(1,3),(2,3)]
I want to check whether any tuple in the list starts with a 1, and if it does, print "aaa":
for i in thelist:
templist.append((i[0],i))
for i in templist:
if i[0]==1:
print("aaa")
break
Which is rather ardurous as I have to create the templist. Is there any way I can do this:
if (1,_) in thelist:
print("aaa")
Where _ is the universal selector. Note that the list would be very large and thus it is very costly to implement another list.
There isn't, although you can just use any
any(i[0] == 1 for i in thelist) --> Returns true if the first element is 1
If you don’t actually need the actual tuple, like you do in your example, then you can actually use tuple unpacking for exactly that purpose:
>>> the_list = [(1, 2), (1, 3), (2, 3)]
>>> for x, y in the_list:
if x == 1:
print('aaa')
break
aaa
If you add a * in front of the y, you can also unpack tuples of different sizes, collecting the remainder of the tuple:
>>> other_list = [(1, 2, 3, 4, 5), (1, 3), (2, 3)]
>>> for x, *y in other_list:
if x == 1:
print(y)
break
[2, 3, 4, 5]
Otherwise, if you just want to filter your list based on some premise and then do something on those filtered items, you can use filter with a custom function:
>>> def startsWithOne(x):
return x[0] == 1
>>> thelist = [(1, 2), (1, 3), (2, 3)]
>>> for x in filter(starts_with_one, the_list):
print(x)
(1, 2)
(1, 3)
This is probably the most flexible way which also avoids creating a separate list in memory, as the elements are filtered lazily when you interate the list with your loop.
Finally, if you just want to figure out if any of your items starts with a 1, like you do in your example code, then you could just do it like this:
>>> if any(filter(starts_with_one, the_list)):
print('aaa')
aaa
But I assume that this was just an oversimplified example.
I was trying to write a function that inputs a nested tuple and returns a tuple where all the elements are backwards, including those elements in other tuples (basically mirrors it).
So with this input:
((1, (2, 3)), (4, 5))
It should return:
((5, 4), ((3, 2), 1))
What I tried
def mirror(t):
n = 1
for i in t:
if isinstance(i, tuple):
mirror(i)
if n == len(t):
t = list(t)
t = t[::-1]
t = tuple(t)
n += 1
return t
Maybe I'm missing something, but I think it can be done relatively simply:
def mirror(data):
if not isinstance(data, tuple):
return data
return tuple(map(mirror, reversed(data)))
>>> mirror(((1, (2, 3)), (4, 5)))
((5, 4), ((3, 2), 1))
This applies the mirror function to every element in the tuple, combining them into one new tuple in reverse order.
The trickiness of this problem lies in the fact that tuple objects are immutable. One solution I can think of is recursively building each piece in the final reversed result, and then using itertools to join them together.
from itertools import chain
def mirror(data):
r = []
for t in reversed(data):
if isinstance(t, tuple):
t = mirror(t)
r.append((t, ))
return tuple(chain.from_iterable(r))
>>> mirror(((1, (2, 3)), (4, 5)))
((5, 4), ((3, 2), 1))
Thanks to Chris_Rands for the improvement.
Here's a simpler solution, courtesy PM2 Ring -
def mirror(t):
return tuple(mirror(u) for u in t[::-1]) if isinstance(t, tuple) else t
>>> mirror(((1, (2, 3)), (4, 5)))
((5, 4), ((3, 2), 1))
It builds the result tuple recursively but using a gen comp.
This type of structure, list inside list, is called hierarchical structure, which has property that the whole structure is assembled by small structures which resemble the large structure and are again assembled by even smaller structures.
Imaging a tree with branches resembled the whole tree and leaves at the tips. The first thing is to distinguish branches from leaves. If you see a branch, you treat it as a smaller tree (this naturally forms a recursion). If you see a leave, that means you get to the tip of the structure and you can return it (base case in recursion).
To go from bigger branch to smaller branches (deduction in recursion), there are generally two recursive approaches. The first is as what I did, splitting the branch to left and right and going along each of them. The other way is to map on each branch as what had been done by khelwood.
def mirror(T):
if not isinstance(T, tuple):
return T
elif T == ():
return ()
else:
return mirror(T[1:]) + (mirror(T[0]),)
print(mirror(((1,(2,3)),(4,5))))
Couldn't help myself :)
(This is a joke of course, but has the added benefit of reversing the digits ;)
def rev(s, i, acc):
if i == len(s):
return acc
ps = {'(': ')', ')': '('}
return rev(s, i + 1, s[i] + acc) if not s[i] in ps else rev (s, i + 1, ps[s[i]] + acc)
def funnyMirror(t):
return eval(rev(str(t), 0, ''))
print funnyMirror(((1, (2, 83)), (4, 5))) # ((5, 4), ((38, 2), 1))
I'm trying to find duplicates in a list. I want to preserve the values and insert them into a tuple with their number of occurrences.
For example:
list_of_n = [2, 3, 5, 5, 5, 6, 2]
occurance_of_n = zip(set(list_of_n), [list_of_n.count(n) for n in set(list_of_n)])
[(2, 2), (3, 1), (5, 3), (6, 1)]
This works fine with small sets. My question is: as list_of_n gets larger, will I have to worry about arg1 and arg2 in zip(arg1, arg2) not lining up correctly if they're the same set?
I.e. Is there a conceivable future where I call zip() and it accidentally aligns index [0] of list_of_n in arg1 with some other index of list_of_n in arg2?
(in case it's not clear, I'm converting the list to a set for purposes of speed in arg2, and under the pretense that zip will behave better if they're the same in arg1)
Since your sample output preserves the order of appearance, you might want to go with a collections.OrderedDict to gather the counts:
list_of_n = [2, 3, 5, 5, 5, 6, 2]
d = OrderedDict()
for x in list_of_n:
d[x] = d.get(x, 0) + 1
occurance_of_n = list(d.items())
# [(2, 2), (3, 1), (5, 3), (6, 1)]
If order does not matter, the appropriate approach is using a collections.Counter:
occurance_of_n = list(Counter(list_of_n).items())
Note that both approach require only one iteration of the list. Your version could be amended to sth like:
occurance_of_n = list(set((n, list_of_n.count(n)) for n in set(list_of_n)))
# [(6, 1), (3, 1), (5, 3), (2, 2)]
but the repeated calls to list.count make an entire iteration of the initial list for each (unique) element.
("dezip" is obviously a bad name, but I'm not sure what the right one would be. Please excuse me if that means I've missed an authoritative answer by not knowing what to search for.)
Let's say we have
people = [
(1, 'anne'),
(2, 'ben'),
(3, 'charlie'),
]
(common in django for choices etc.)
Now we want a list of "keys" or list of the first elements: [1, 2, 3]
In python 3 I'm using
people_ids, _ = list(zip(*people))
# or even
people_ids = [p[0] for p in people]
The zip way doesn't seem very neat, particularly with the extra list(...) required by python 3 making zip an iterator.
The second comprehension approach is slightly more readable but wouldn't generalise as well, eg. return lists of the the second, third elements etc. in the same call.
Is there a better way?
(where "better" mainly means clean and readable but performance might also have some considerable)
Using next, you can get the first item from the iterable:
>>> people = [
... (1, 'anne'),
... (2, 'ben'),
... (3, 'charlie'),
... ]
>>> next(zip(*people))
(1, 2, 3)
alternative using map with operator.itemgetter:
>>> import operator
>>> list(map(operator.itemgetter(0), people))
[1, 2, 3]
BTW, zip solution without list should work:
>>> people_ids, _ = zip(*people)
>>> people_ids
(1, 2, 3)
For the particular example, you can "abuse" a little bit the dict:
people = [
(1, 'anne'),
(2, 'ben'),
(3, 'charlie'),
]
d_people = dict(people)
And then you have a nice data model. This has some problems: keys cannot be repeated, and won't work with tuples with more than two elements. But for this case (which is quite typical!) works very nicely.
Then you can simply get the keys by doing keys:
d_people.keys()
or, explicit list, either:
list(d_people)
list(d_people.keys())
which are equivalent.
Getting a subset based on ids may be done by operator.itemgetter. Getting a subslice of zipped values may be done by creating explicit slice object and pass it to function.
import operator
people = [
(1, 'anne', 'some'),
(2, 'ben', 'another'),
(3, 'charlie', 'field'),
]
people_ids = [p[0] for p in people] # 0 may be passed as funtion argument
people_ids_and_another = [operator.itemgetter(*[0, 2])(p) for p in people] # [0, 2] may be passed as function argument
people_ids_and_name_via_slice = [p[slice(0,2,None)] for p in people] # equal to p[0:2], but passable as argument
To demonstrate function usage:
def dezip(seq, what):
if isinstance(what, list):
return [operator.itemgetter(*what)(p) for p in people]
else:
return [p[what] for p in people]
assert dezip(people, slice(0,2,None)) == [(1, 'anne'), (2, 'ben'), (3, 'charlie')]
assert dezip(people, 0) == [1, 2, 3]
assert dezip(people, [0, 2]) == [(1, 'some'), (2, 'another'), (3, 'field')]
If you'll drop 'list of indices' requirement, you may drop if statement in function body.