Python: determining whether any item in sequence is equal to any other - python

I'd like to compare multiple objects and return True only if all objects are not equal among themselves. I tried using the code below, but it doesn't work. If obj1 and obj3 are equal and obj2 and obj3 are not equal, the result is True.
obj1 != obj2 != obj3
I have more than 3 objects to compare. Using the code below is out of question:
all([obj1 != obj2, obj1 != obj3, obj2 != obj3])

#Michael Hoffman's answer is good if the objects are all hashable. If not, you can use itertools.combinations:
>>> all(a != b for a, b in itertools.combinations(['a', 'b', 'c', 'd', 'a'], 2))
False
>>> all(a != b for a, b in itertools.combinations(['a', 'b', 'c', 'd'], 2))
True

If the objects are all hashable, then you can see whether a frozenset of the sequence of objects has the same length as the sequence itself:
def all_different(objs):
return len(frozenset(objs)) == len(objs)
Example:
>>> all_different([3, 4, 5])
True
>>> all_different([3, 4, 5, 3])
False

If the objects are unhashable but orderable (for example, lists) then you can transform the itertools solution from O(n^2) to O(n log n) by sorting:
def all_different(*objs):
s = sorted(objs)
return all(x != y for x, y in zip(s[:-1], s[1:]))
Here's a full implementation:
def all_different(*objs):
try:
return len(frozenset(objs)) == len(objs)
except TypeError:
try:
s = sorted(objs)
return all(x != y for x, y in zip(s[:-1], s[1:]))
except TypeError:
return all(x != y for x, y in itertools.combinations(objs, 2))

from itertools import combinations
all(x != y for x, y in combinations(objs, 2))

You can check that all of the items in a list are unique by converting it to a set.
my_obs = [obj1, obj2, obj3]
all_not_equal = len(set(my_obs)) == len(my_obs)

Related

A tuple that only accepts some group of letters

I was trying to create a function that only recieves tuples that have elements only with the letters C,B,E,D. Arguments like CEE, DDBBB, ECDBE, or CCCCB. The input was going to be a tup = ('CEE', 'DDBBB', 'ECDBE', 'CCCCB') and using other functions that i created should convert them in a number that represents a position.
def obter_pin(tup):
pin=()
posicao=5
if not 4<=len(tup)<=10 or 'CBED' not in tup:
raise ValueError('obter pin: argumento invalido')
else:
for ele in tup:
dig=obter_digito(ele,posicao)
posicao=dig
pin+=(dig,)
return pin
Using set comparison:
>>> allow = {'C', 'B', 'E', 'D'}
>>> tup1 = ('CEE', 'DDBBB', 'ECDBE', 'CCCCB')
>>> tup2 = ('CEE', 'DDBBB', 'ECDBE', 'CCCCBA')
>>> allow >= set().union(*tup1)
True
>>> allow >= set().union(*tup2)
False
You can use this one-line function.
Function definition:
def myfunction (t):
return all(set(e).issubset('CBED') for e in t)
Function in use:
tup1 = ('CEE', 'DDBBB', 'ECDBE', 'CCCCB')
tup2 = ('CEE', 'DDBBB', 'ECDBE', 'GOO')
tup3 = ('CEE', 'DDBBB', 'ECDBE', 'ALPHA')
print(myfunction(tup1)) #True
print(myfunction(tup2)) #Fale
print(myfunction(tup3)) #Fale
Even though I like #Jab's solution much better, I just wanted to add another way of going about it:
invalid = any(s.remove('C').remove('B').remove('E').remove('D') for s in tup)
Much better:
invalid = any(s.strip('CBED') for s in tup)
Or:
if ''.join(tup).strip('CBED'):
...

Python: Combining numbers in a list

I have a list, say
x = [0,1,2,3,"a","b","cd"]
I want to keep the smallest number and all of the letters, hence in the example it would become
x = [0,"a","b","cd"]
How can I do this? Ideally the code would be very efficient, as I'm doing this for millions of lists.
Attempts: I've tried finding min(x), however it results in an error as there are strings in the list
I think the below code is most efficient. It does not take extra memory and complexity is O(N)
import sys
x = [0, 1, 2, 3, "a", "b", "cd"]
minimum = sys.maxsize # for python 3.x
# minimum = sys.maxint #for python 2
j = 0
for i in range(len(x)):
if isinstance(x[i], str):
x[j] = x[i]
j+=1
else:
minimum = min(minimum, x[i])
print([minimum]+x[:j])
Output
[0, 'a', 'b', 'cd']
try this:
Python 2.7
output = [s for s in x if isinstance(s, str)]
output.append(min(x))
#>>> output
#['a', 'b', 'cd', 0]
Python3:
output = [s for s in x if isinstance(s, str)]
output.append(min([i for i in x if isinstance(i, int)]))
You can use itertools.groupby
>>> x = [0,1,2,3,"a","b","cd"]
>>> [min(n, *g) if t == int else n for t, g in groupby(x, type) for n in g]
[0, 'a', 'b', 'cd']
More efficient would be to just min the integers and unpack the strings.
>>> x = [0,1,2,3,"a","b","cd"]
>>> grouped = [list(g) for t, g in groupby(x, type)]
>>> [min(grouped[0]), *grouped[1]]
[0, 'a', 'b', 'cd']
One option would be to use the .isnumeric() method to find the minimum number while building a new list for the strings. This should be O(n). Not super fast, but not slow either.
You could say something like:
min_number = None
string_list = []
for i in x:
if i.isnumeric():
if min_number is None or i < min_number:
min_number = i
elif isinstance(i, str):
string_list.append(i)
if min_number is not None:
x = string_list.insert(0, min_number)
I don't think this would be the most efficient but you could separate the list into two lists - one with ints and one with strings - and then find the min then rejoin them. This would look something like:
x = [0, 1, 2, 'a', 'b', 'c']
nums = []
strings = []
for item in x:
if isinstance(item, int):
nums.append(item)
else:
strings.append(item)
Now, after you run this, you can get the min and then rejoin the lists
result = [min(nums)] + chars
This will give [0, 'a', 'b', 'c']
Something like this should work
def minNum(array):
min = None
numPos = []
for i in array:
if type(i) == int or type(i) == float:
if min is None or i < min:
min = i
numPos.append(array.index(i))
else:
numPos.append(array.index(i))
else:
pass
numPos.reverse()
for j in numPos:
if array[j] != min:
del array[j]
return array
Definitely not the only solution but its fairly compact and works well for all the test cases I gave

Is there way to sort cases in list comprehension to produce two outputs? [duplicate]

I have some code like:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
The goal is to split up the contents of mylist into two other lists, based on whether or not they meet a condition.
How can I do this more elegantly? Can I avoid doing two separate iterations over mylist? Can I improve performance by doing so?
Iterate manually, using the condition to select a list to which each element will be appended:
good, bad = [], []
for x in mylist:
(bad, good)[x in goodvals].append(x)
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
How can I do this more elegantly?
That code is already perfectly elegant.
There might be slight performance improvements using sets, but the difference is trivial. set based approaches will also discard duplicates and will not preserve the order of elements. I find the list comprehension far easier to read, too.
In fact, we could even more simply just use a for loop:
good, bad = [], []
for x in mylist:
if x in goodvals:
good.append(f)
else:
bad.append(f)
This approach makes it easier to add additional logic. For example, the code is easily modified to discard None values:
good, bad = [], []
for x in mylist:
if x is None:
continue
if x in goodvals:
good.append(f)
else:
bad.append(f)
Here's the lazy iterator approach:
from itertools import tee
def split_on_condition(seq, condition):
l1, l2 = tee((condition(item), item) for item in seq)
return (i for p, i in l1 if p), (i for p, i in l2 if not p)
It evaluates the condition once per item and returns two generators, first yielding values from the sequence where the condition is true, the other where it's false.
Because it's lazy you can use it on any iterator, even an infinite one:
from itertools import count, islice
def is_prime(n):
return n > 1 and all(n % i for i in xrange(2, n))
primes, not_primes = split_on_condition(count(), is_prime)
print("First 10 primes", list(islice(primes, 10)))
print("First 10 non-primes", list(islice(not_primes, 10)))
Usually though the non-lazy list returning approach is better:
def split_on_condition(seq, condition):
a, b = [], []
for item in seq:
(a if condition(item) else b).append(item)
return a, b
Edit: For your more specific usecase of splitting items into different lists by some key, heres a generic function that does that:
DROP_VALUE = lambda _:_
def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
"""Split a sequence into lists based on a key function.
seq - input sequence
resultmapping - a dictionary that maps from target lists to keys that go to that list
keyfunc - function to calculate the key of an input value
default - the target where items that don't have a corresponding key go, by default they are dropped
"""
result_lists = dict((key, []) for key in resultmapping)
appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)
if default is not DROP_VALUE:
result_lists.setdefault(default, [])
default_action = result_lists[default].append
else:
default_action = DROP_VALUE
for item in seq:
appenders.get(keyfunc(item), default_action)(item)
return result_lists
Usage:
def file_extension(f):
return f[2].lower()
split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
print split_files['images']
print split_files['anims']
Problem with all proposed solutions is that it will scan and apply the filtering function twice. I'd make a simple small function like this:
def split_into_two_lists(lst, f):
a = []
b = []
for elem in lst:
if f(elem):
a.append(elem)
else:
b.append(elem)
return a, b
That way you are not processing anything twice and also are not repeating code.
My take on it. I propose a lazy, single-pass, partition function,
which preserves relative order in the output subsequences.
1. Requirements
I assume that the requirements are:
maintain elements' relative order (hence, no sets and
dictionaries)
evaluate condition only once for every element (hence not using
(i)filter or groupby)
allow for lazy consumption of either sequence (if we can afford to
precompute them, then the naïve implementation is likely to be
acceptable too)
2. split library
My partition function (introduced below) and other similar functions
have made it into a small library:
python-split
It's installable normally via PyPI:
pip install --user split
To split a list base on condition, use partition function:
>>> from split import partition
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
>>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> images, other = partition(lambda f: f[-1] in image_types, files)
>>> list(images)
[('file1.jpg', 33L, '.jpg')]
>>> list(other)
[('file2.avi', 999L, '.avi')]
3. partition function explained
Internally we need to build two subsequences at once, so consuming
only one output sequence will force the other one to be computed
too. And we need to keep state between user requests (store processed
but not yet requested elements). To keep state, I use two double-ended
queues (deques):
from collections import deque
SplitSeq class takes care of the housekeeping:
class SplitSeq:
def __init__(self, condition, sequence):
self.cond = condition
self.goods = deque([])
self.bads = deque([])
self.seq = iter(sequence)
Magic happens in its .getNext() method. It is almost like .next()
of the iterators, but allows to specify which kind of element we want
this time. Behind the scene it doesn't discard the rejected elements,
but instead puts them in one of the two queues:
def getNext(self, getGood=True):
if getGood:
these, those, cond = self.goods, self.bads, self.cond
else:
these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
if these:
return these.popleft()
else:
while 1: # exit on StopIteration
n = self.seq.next()
if cond(n):
return n
else:
those.append(n)
The end user is supposed to use partition function. It takes a
condition function and a sequence (just like map or filter), and
returns two generators. The first generator builds a subsequence of
elements for which the condition holds, the second one builds the
complementary subsequence. Iterators and generators allow for lazy
splitting of even long or infinite sequences.
def partition(condition, sequence):
cond = condition if condition else bool # evaluate as bool if condition == None
ss = SplitSeq(cond, sequence)
def goods():
while 1:
yield ss.getNext(getGood=True)
def bads():
while 1:
yield ss.getNext(getGood=False)
return goods(), bads()
I chose the test function to be the first argument to facilitate
partial application in the future (similar to how map and filter
have the test function as the first argument).
I basically like Anders' approach as it is very general. Here's a version that puts the categorizer first (to match filter syntax) and uses a defaultdict (assumed imported).
def categorize(func, seq):
"""Return mapping from categories to lists
of categorized items.
"""
d = defaultdict(list)
for item in seq:
d[func(item)].append(item)
return d
First go (pre-OP-edit): Use sets:
mylist = [1,2,3,4,5,6,7]
goodvals = [1,3,7,8,9]
myset = set(mylist)
goodset = set(goodvals)
print list(myset.intersection(goodset)) # [1, 3, 7]
print list(myset.difference(goodset)) # [2, 4, 5, 6]
That's good for both readability (IMHO) and performance.
Second go (post-OP-edit):
Create your list of good extensions as a set:
IMAGE_TYPES = set(['.jpg','.jpeg','.gif','.bmp','.png'])
and that will increase performance. Otherwise, what you have looks fine to me.
itertools.groupby almost does what you want, except it requires the items to be sorted to ensure that you get a single contiguous range, so you need to sort by your key first (otherwise you'll get multiple interleaved groups for each type). eg.
def is_good(f):
return f[2].lower() in IMAGE_TYPES
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file3.gif', 123L, '.gif')]
for key, group in itertools.groupby(sorted(files, key=is_good), key=is_good):
print key, list(group)
gives:
False [('file2.avi', 999L, '.avi')]
True [('file1.jpg', 33L, '.jpg'), ('file3.gif', 123L, '.gif')]
Similar to the other solutions, the key func can be defined to divide into any number of groups you want.
Elegant and Fast
Inspired by DanSalmo's comment, here is a solution that is concise, elegant, and at the same time is one of the fastest solutions.
good_set = set(goodvals)
good, bad = [], []
for item in my_list:
good.append(item) if item in good_set else bad.append(item)
Tip: Turning goodvals into a set gives us an easy speed boost.
Fastest
For maximum speed, we take the fastest answer and turbocharge it by turning good_list into a set. That alone gives us a 40%+ speed boost, and we end up with a solution that is more than 5.5x as fast as the slowest solution, even while it remains readable.
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
if item in good_list_set:
good.append(item)
else:
bad.append(item)
A little shorter
This is a more concise version of the previous answer.
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
out = good if item in good_list_set else bad
out.append(item)
Elegance can be somewhat subjective, but some of the Rube Goldberg style solutions that are cute and ingenious are quite concerning and should not be used in production code in any language, let alone python which is elegant at heart.
Benchmark results:
filter_BJHomer 80/s -- -3265% -5312% -5900% -6262% -7273% -7363% -8051% -8162% -8244%
zip_Funky 118/s 4848% -- -3040% -3913% -4450% -5951% -6085% -7106% -7271% -7393%
two_lst_tuple_JohnLaRoy 170/s 11332% 4367% -- -1254% -2026% -4182% -4375% -5842% -6079% -6254%
if_else_DBR 195/s 14392% 6428% 1434% -- -882% -3348% -3568% -5246% -5516% -5717%
two_lst_compr_Parand 213/s 16750% 8016% 2540% 967% -- -2705% -2946% -4786% -5083% -5303%
if_else_1_line_DanSalmo 292/s 26668% 14696% 7189% 5033% 3707% -- -331% -2853% -3260% -3562%
tuple_if_else 302/s 27923% 15542% 7778% 5548% 4177% 343% -- -2609% -3029% -3341%
set_1_line 409/s 41308% 24556% 14053% 11035% 9181% 3993% 3529% -- -569% -991%
set_shorter 434/s 44401% 26640% 15503% 12303% 10337% 4836% 4345% 603% -- -448%
set_if_else 454/s 46952% 28358% 16699% 13349% 11290% 5532% 5018% 1100% 469% --
The full benchmark code for Python 3.7 (modified from FunkySayu):
good_list = ['.jpg','.jpeg','.gif','.bmp','.png']
import random
import string
my_origin_list = []
for i in range(10000):
fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(list(good_list))
else:
fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
# Parand
def two_lst_compr_Parand(*_):
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def if_else_DBR(*_):
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def two_lst_tuple_JohnLaRoy(*_):
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# # Ants Aasma
# def f4():
# l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
# return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def zip_Funky(*_):
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def filter_BJHomer(*_):
return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list, my_origin_list))
# ChaimG's answer; as a list.
def if_else_1_line_DanSalmo(*_):
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list else bad.append(e)
return good, bad
# ChaimG's answer; as a set.
def set_1_line(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list_set else bad.append(e)
return good, bad
# ChaimG set and if else list.
def set_shorter(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
out = good if e[2] in good_list_set else bad
out.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def set_if_else(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_set:
good.append(e)
else:
bad.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def tuple_if_else(*_):
good_list_tuple = tuple(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_tuple:
good.append(e)
else:
bad.append(e)
return good, bad
def cmpthese(n=0, functions=None):
results = {}
for func_name in functions:
args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
t = Timer(*args)
results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec
functions_sorted = sorted(functions, key=results.__getitem__)
for f in functions_sorted:
diff = []
for func in functions_sorted:
if func == f:
diff.append("--")
else:
diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
diffs = " ".join(f'{x:>8s}' for x in diff)
print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")
if __name__=='__main__':
from timeit import Timer
cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))
good.append(x) if x in goodvals else bad.append(x)
This elegant and concise answer by #dansalmo showed up buried in the comments, so I'm just reposting it here as an answer so it can get the prominence it deserves, especially for new readers.
Complete example:
good, bad = [], []
for x in my_list:
good.append(x) if x in goodvals else bad.append(x)
bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]
append returns None, so it works.
Personally, I like the version you cited, assuming you already have a list of goodvals hanging around. If not, something like:
good = filter(lambda x: is_good(x), mylist)
bad = filter(lambda x: not is_good(x), mylist)
Of course, that's really very similar to using a list comprehension like you originally did, but with a function instead of a lookup:
good = [x for x in mylist if is_good(x)]
bad = [x for x in mylist if not is_good(x)]
In general, I find the aesthetics of list comprehensions to be very pleasing. Of course, if you don't actually need to preserve ordering and don't need duplicates, using the intersection and difference methods on sets would work well too.
If you want to make it in FP style:
good, bad = [ sum(x, []) for x in zip(*(([y], []) if y in goodvals else ([], [y])
for y in mylist)) ]
Not the most readable solution, but at least iterates through mylist only once.
Sometimes, it looks like list comprehension is not the best thing to use !
I made a little test based on the answer people gave to this topic, tested on a random generated list. Here is the generation of the list (there's probably a better way to do, but it's not the point) :
good_list = ('.jpg','.jpeg','.gif','.bmp','.png')
import random
import string
my_origin_list = []
for i in xrange(10000):
fname = ''.join(random.choice(string.lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(good_list)
else:
fext = "." + ''.join(random.choice(string.lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
And here we go
# Parand
def f1():
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def f2():
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def f3():
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# Ants Aasma
def f4():
l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def f5():
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def f6():
return filter(lambda e: e[2] in good_list, my_origin_list), filter(lambda e: not e[2] in good_list, my_origin_list)
Using the cmpthese function, the best result is the dbr answer :
f1 204/s -- -5% -14% -15% -20% -26%
f6 215/s 6% -- -9% -11% -16% -22%
f3 237/s 16% 10% -- -2% -7% -14%
f4 240/s 18% 12% 2% -- -6% -13%
f5 255/s 25% 18% 8% 6% -- -8%
f2 277/s 36% 29% 17% 15% 9% --
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
Check this
I think a generalization of splitting a an iterable based on N conditions is handy
from collections import OrderedDict
def partition(iterable,*conditions):
'''Returns a list with the elements that satisfy each of condition.
Conditions are assumed to be exclusive'''
d= OrderedDict((i,list())for i in range(len(conditions)))
for e in iterable:
for i,condition in enumerate(conditions):
if condition(e):
d[i].append(e)
break
return d.values()
For instance:
ints,floats,other = partition([2, 3.14, 1, 1.69, [], None],
lambda x: isinstance(x, int),
lambda x: isinstance(x, float),
lambda x: True)
print " ints: {}\n floats:{}\n other:{}".format(ints,floats,other)
ints: [2, 1]
floats:[3.14, 1.69]
other:[[], None]
If the element may satisfy multiple conditions, remove the break.
Yet another solution to this problem. I needed a solution that is as fast as possible. That means only one iteration over the list and preferably O(1) for adding data to one of the resulting lists. This is very similar to the solution provided by sastanin, except much shorter:
from collections import deque
def split(iterable, function):
dq_true = deque()
dq_false = deque()
# deque - the fastest way to consume an iterator and append items
deque((
(dq_true if function(item) else dq_false).append(item) for item in iterable
), maxlen=0)
return dq_true, dq_false
Then, you can use the function in the following way:
lower, higher = split([0,1,2,3,4,5,6,7,8,9], lambda x: x < 5)
selected, other = split([0,1,2,3,4,5,6,7,8,9], lambda x: x in {0,4,9})
If you're not fine with the resulting deque object, you can easily convert it to list, set, whatever you like (for example list(lower)). The conversion is much faster, that construction of the lists directly.
This methods keeps order of the items, as well as any duplicates.
If you don't mind using an external library there two I know that nativly implement this operation:
>>> files = [ ('file1.jpg', 33, '.jpg'), ('file2.avi', 999, '.avi')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
iteration_utilities.partition:
>>> from iteration_utilities import partition
>>> notimages, images = partition(files, lambda x: x[2].lower() in IMAGE_TYPES)
>>> notimages
[('file2.avi', 999, '.avi')]
>>> images
[('file1.jpg', 33, '.jpg')]
more_itertools.partition
>>> from more_itertools import partition
>>> notimages, images = partition(lambda x: x[2].lower() in IMAGE_TYPES, files)
>>> list(notimages) # returns a generator so you need to explicitly convert to list.
[('file2.avi', 999, '.avi')]
>>> list(images)
[('file1.jpg', 33, '.jpg')]
For example, splitting list by even and odd
arr = range(20)
even, odd = reduce(lambda res, next: res[next % 2].append(next) or res, arr, ([], []))
Or in general:
def split(predicate, iterable):
return reduce(lambda res, e: res[predicate(e)].append(e) or res, iterable, ([], []))
Advantages:
Shortest posible way
Predicate applies only once for each element
Disadvantages
Requires knowledge of functional programing paradigm
Inspired by #gnibbler's great (but terse!) answer, we can apply that approach to map to multiple partitions:
from collections import defaultdict
def splitter(l, mapper):
"""Split an iterable into multiple partitions generated by a callable mapper."""
results = defaultdict(list)
for x in l:
results[mapper(x)] += [x]
return results
Then splitter can then be used as follows:
>>> l = [1, 2, 3, 4, 2, 3, 4, 5, 6, 4, 3, 2, 3]
>>> split = splitter(l, lambda x: x % 2 == 0) # partition l into odds and evens
>>> split.items()
>>> [(False, [1, 3, 3, 5, 3, 3]), (True, [2, 4, 2, 4, 6, 4, 2])]
This works for more than two partitions with a more complicated mapping (and on iterators, too):
>>> import math
>>> l = xrange(1, 23)
>>> split = splitter(l, lambda x: int(math.log10(x) * 5))
>>> split.items()
[(0, [1]),
(1, [2]),
(2, [3]),
(3, [4, 5, 6]),
(4, [7, 8, 9]),
(5, [10, 11, 12, 13, 14, 15]),
(6, [16, 17, 18, 19, 20, 21, 22])]
Or using a dictionary to map:
>>> map = {'A': 1, 'X': 2, 'B': 3, 'Y': 1, 'C': 2, 'Z': 3}
>>> l = ['A', 'B', 'C', 'C', 'X', 'Y', 'Z', 'A', 'Z']
>>> split = splitter(l, map.get)
>>> split.items()
(1, ['A', 'Y', 'A']), (2, ['C', 'C', 'X']), (3, ['B', 'Z', 'Z'])]
solution
from itertools import tee
def unpack_args(fn):
return lambda t: fn(*t)
def separate(fn, lx):
return map(
unpack_args(
lambda i, ly: filter(
lambda el: bool(i) == fn(el),
ly)),
enumerate(tee(lx, 2)))
test
[even, odd] = separate(
lambda x: bool(x % 2),
[1, 2, 3, 4, 5])
print(list(even) == [2, 4])
print(list(odd) == [1, 3, 5])
If the list is made of groups and intermittent separators, you can use:
def split(items, p):
groups = [[]]
for i in items:
if p(i):
groups.append([])
groups[-1].append(i)
return groups
Usage:
split(range(1,11), lambda x: x % 3 == 0)
# gives [[1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
Use Boolean logic to assign data to two arrays
>>> images, anims = [[i for i in files if t ^ (i[2].lower() in IMAGE_TYPES) ] for t in (0, 1)]
>>> images
[('file1.jpg', 33, '.jpg')]
>>> anims
[('file2.avi', 999, '.avi')]
For perfomance, try itertools.
The itertools module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an “iterator algebra” making it possible to construct specialized tools succinctly and efficiently in pure Python.
See itertools.ifilter or imap.
itertools.ifilter(predicate, iterable)
Make an iterator that filters elements from iterable returning only those for which the predicate is True
If you insist on clever, you could take Winden's solution and just a bit spurious cleverness:
def splay(l, f, d=None):
d = d or {}
for x in l: d.setdefault(f(x), []).append(x)
return d
Sometimes you won't need that other half of the list.
For example:
import sys
from itertools import ifilter
trustedPeople = sys.argv[1].split(',')
newName = sys.argv[2]
myFriends = ifilter(lambda x: x.startswith('Shi'), trustedPeople)
print '%s is %smy friend.' % (newName, newName not in myFriends 'not ' or '')
Already quite a few solutions here, but yet another way of doing that would be -
anims = []
images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
Iterates over the list only once, and looks a bit more pythonic and hence readable to me.
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file1.bmp', 33L, '.bmp')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> anims = []
>>> images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
>>> print '\n'.join([str(anims), str(images)])
[('file2.avi', 999L, '.avi')]
[('file1.jpg', 33L, '.jpg'), ('file1.bmp', 33L, '.bmp')]
>>>
I'd take a 2-pass approach, separating evaluation of the predicate from filtering the list:
def partition(pred, iterable):
xs = list(zip(map(pred, iterable), iterable))
return [x[1] for x in xs if x[0]], [x[1] for x in xs if not x[0]]
What's nice about this, performance-wise (in addition to evaluating pred only once on each member of iterable), is that it moves a lot of logic out of the interpreter and into highly-optimized iteration and mapping code. This can speed up iteration over long iterables, as described in this answer.
Expressivity-wise, it takes advantage of expressive idioms like comprehensions and mapping.
Not sure if this is a good approach but it can be done in this way as well
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi')]
images, anims = reduce(lambda (i, a), f: (i + [f], a) if f[2] in IMAGE_TYPES else (i, a + [f]), files, ([], []))
Yet another answer, short but "evil" (for list-comprehension side effects).
digits = list(range(10))
odd = [x.pop(i) for i, x in enumerate(digits) if x % 2]
>>> odd
[1, 3, 5, 7, 9]
>>> digits
[0, 2, 4, 6, 8]

how to check isinstance of iterable in python? [duplicate]

This question already has answers here:
In Python, how do I determine if an object is iterable?
(23 answers)
Closed 9 years ago.
consider this example?
p = [1,2,3,4], (1,2,3), set([1,2,3])]
instead of checking for each types like
for x in p:
if isinstance(x, list):
xxxxx
elif isinstance(x, tuple):
xxxxxx
elif isinstance(x, set):
xxxxxxx
Is there some equivalent for the following:
for element in something:
if isinstance(x, iterable):
do something
You could try using the Iterable ABC from the collections module:
In [1]: import collections
In [2]: p = [[1,2,3,4], (1,2,3), set([1,2,3]), 'things', 123]
In [3]: for item in p:
...: print isinstance(item, collections.Iterable)
...:
True
True
True
True
False
You can check if the object has __iter__ attribute in it, to make sure if it is iterable or not.
a = [1, 2, 3]
b = {1, 2, 3}
c = (1, 2, 3)
d = {"a": 1}
f = "Welcome"
e = 1
print (hasattr(a, "__iter__"))
print (hasattr(b, "__iter__"))
print (hasattr(c, "__iter__"))
print (hasattr(d, "__iter__"))
print (hasattr(f, "__iter__") or isinstance(f, str))
print (hasattr(e, "__iter__"))
Output
True
True
True
True
True
False
Note: Even though Strings are iterable, in python 2 they dont have __iter__, but in python 3 they have it. So, in python 2 you might want to have or isinstance(f, str) as well

assertAlmostEqual in Python unit-test for collections of floats

The assertAlmostEqual(x, y) method in Python's unit testing framework tests whether x and y are approximately equal assuming they are floats.
The problem with assertAlmostEqual() is that it only works on floats. I'm looking for a method like assertAlmostEqual() which works on lists of floats, sets of floats, dictionaries of floats, tuples of floats, lists of tuples of floats, sets of lists of floats, etc.
For instance, let x = 0.1234567890, y = 0.1234567891. x and y are almost equal because they agree on each and every digit except for the last one. Therefore self.assertAlmostEqual(x, y) is True because assertAlmostEqual() works for floats.
I'm looking for a more generic assertAlmostEquals() which also evaluates the following calls to True:
self.assertAlmostEqual_generic([x, x, x], [y, y, y]).
self.assertAlmostEqual_generic({1: x, 2: x, 3: x}, {1: y, 2: y, 3: y}).
self.assertAlmostEqual_generic([(x,x)], [(y,y)]).
Is there such a method or do I have to implement it myself?
Clarifications:
assertAlmostEquals() has an optional parameter named places and the numbers are compared by computing the difference rounded to number of decimal places. By default places=7, hence self.assertAlmostEqual(0.5, 0.4) is False while self.assertAlmostEqual(0.12345678, 0.12345679) is True. My speculative assertAlmostEqual_generic() should have the same functionality.
Two lists are considered almost equal if they have almost equal numbers in exactly the same order. formally, for i in range(n): self.assertAlmostEqual(list1[i], list2[i]).
Similarly, two sets are considered almost equal if they can be converted to almost equal lists (by assigning an order to each set).
Similarly, two dictionaries are considered almost equal if the key set of each dictionary is almost equal to the key set of the other dictionary, and for each such almost equal key pair there's a corresponding almost equal value.
In general: I consider two collections almost equal if they're equal except for some corresponding floats which are just almost equal to each other. In other words, I would like to really compare objects but with a low (customized) precision when comparing floats along the way.
if you don't mind using NumPy (which comes with your Python(x,y)), you may want to look at the np.testing module which defines, among others, a assert_almost_equal function.
The signature is np.testing.assert_almost_equal(actual, desired, decimal=7, err_msg='', verbose=True)
>>> x = 1.000001
>>> y = 1.000002
>>> np.testing.assert_almost_equal(x, y)
AssertionError:
Arrays are not almost equal to 7 decimals
ACTUAL: 1.000001
DESIRED: 1.000002
>>> np.testing.assert_almost_equal(x, y, 5)
>>> np.testing.assert_almost_equal([x, x, x], [y, y, y], 5)
>>> np.testing.assert_almost_equal((x, x, x), (y, y, y), 5)
As of python 3.5 you may compare using
math.isclose(a, b, rel_tol=1e-9, abs_tol=0.0)
As described in pep-0485.
The implementation should be equivalent to
abs(a-b) <= max( rel_tol * max(abs(a), abs(b)), abs_tol )
Here's how I've implemented a generic is_almost_equal(first, second) function:
First, duplicate the objects you need to compare (first and second), but don't make an exact copy: cut the insignificant decimal digits of any float you encounter inside the object.
Now that you have copies of first and second for which the insignificant decimal digits are gone, just compare first and second using the == operator.
Let's assume we have a cut_insignificant_digits_recursively(obj, places) function which duplicates obj but leaves only the places most significant decimal digits of each float in the original obj. Here's a working implementation of is_almost_equals(first, second, places):
from insignificant_digit_cutter import cut_insignificant_digits_recursively
def is_almost_equal(first, second, places):
'''returns True if first and second equal.
returns true if first and second aren't equal but have exactly the same
structure and values except for a bunch of floats which are just almost
equal (floats are almost equal if they're equal when we consider only the
[places] most significant digits of each).'''
if first == second: return True
cut_first = cut_insignificant_digits_recursively(first, places)
cut_second = cut_insignificant_digits_recursively(second, places)
return cut_first == cut_second
And here's a working implementation of cut_insignificant_digits_recursively(obj, places):
def cut_insignificant_digits(number, places):
'''cut the least significant decimal digits of a number,
leave only [places] decimal digits'''
if type(number) != float: return number
number_as_str = str(number)
end_of_number = number_as_str.find('.')+places+1
if end_of_number > len(number_as_str): return number
return float(number_as_str[:end_of_number])
def cut_insignificant_digits_lazy(iterable, places):
for obj in iterable:
yield cut_insignificant_digits_recursively(obj, places)
def cut_insignificant_digits_recursively(obj, places):
'''return a copy of obj except that every float loses its least significant
decimal digits remaining only [places] decimal digits'''
t = type(obj)
if t == float: return cut_insignificant_digits(obj, places)
if t in (list, tuple, set):
return t(cut_insignificant_digits_lazy(obj, places))
if t == dict:
return {cut_insignificant_digits_recursively(key, places):
cut_insignificant_digits_recursively(val, places)
for key,val in obj.items()}
return obj
The code and its unit tests are available here: https://github.com/snakile/approximate_comparator. I welcome any improvement and bug fix.
If you don't mind using the numpy package then numpy.testing has the assert_array_almost_equal method.
This works for array_like objects, so it is fine for arrays, lists and tuples of floats, but does it not work for sets and dictionaries.
The documentation is here.
There is no such method, you'd have to do it yourself.
For lists and tuples the definition is obvious, but note that the other cases you mention aren't obvious, so it's no wonder such a function isn't provided. For instance, is {1.00001: 1.00002} almost equal to {1.00002: 1.00001}? Handling such cases requires making a choice about whether closeness depends on keys or values or both. For sets you are unlikely to find a meaningful definition, since sets are unordered, so there is no notion of "corresponding" elements.
You may have to implement it yourself, while its true that list and sets can be iterated the same way, dictionaries are a different story, you iterate their keys not values, and the third example seems a bit ambiguous to me, do you mean to compare each value within the set, or each value from each set.
heres a simple code snippet.
def almost_equal(value_1, value_2, accuracy = 10**-8):
return abs(value_1 - value_2) < accuracy
x = [1,2,3,4]
y = [1,2,4,5]
assert all(almost_equal(*values) for values in zip(x, y))
None of these answers work for me. The following code should work for python collections, classes, dataclasses, and namedtuples. I might have forgotten something, but so far this works for me.
import unittest
from collections import namedtuple, OrderedDict
from dataclasses import dataclass
from typing import Any
def are_almost_equal(o1: Any, o2: Any, max_abs_ratio_diff: float, max_abs_diff: float) -> bool:
"""
Compares two objects by recursively walking them trough. Equality is as usual except for floats.
Floats are compared according to the two measures defined below.
:param o1: The first object.
:param o2: The second object.
:param max_abs_ratio_diff: The maximum allowed absolute value of the difference.
`abs(1 - (o1 / o2)` and vice-versa if o2 == 0.0. Ignored if < 0.
:param max_abs_diff: The maximum allowed absolute difference `abs(o1 - o2)`. Ignored if < 0.
:return: Whether the two objects are almost equal.
"""
if type(o1) != type(o2):
return False
composite_type_passed = False
if hasattr(o1, '__slots__'):
if len(o1.__slots__) != len(o2.__slots__):
return False
if any(not are_almost_equal(getattr(o1, s1), getattr(o2, s2),
max_abs_ratio_diff, max_abs_diff)
for s1, s2 in zip(sorted(o1.__slots__), sorted(o2.__slots__))):
return False
else:
composite_type_passed = True
if hasattr(o1, '__dict__'):
if len(o1.__dict__) != len(o2.__dict__):
return False
if any(not are_almost_equal(k1, k2, max_abs_ratio_diff, max_abs_diff)
or not are_almost_equal(v1, v2, max_abs_ratio_diff, max_abs_diff)
for ((k1, v1), (k2, v2))
in zip(sorted(o1.__dict__.items()), sorted(o2.__dict__.items()))
if not k1.startswith('__')): # avoid infinite loops
return False
else:
composite_type_passed = True
if isinstance(o1, dict):
if len(o1) != len(o2):
return False
if any(not are_almost_equal(k1, k2, max_abs_ratio_diff, max_abs_diff)
or not are_almost_equal(v1, v2, max_abs_ratio_diff, max_abs_diff)
for ((k1, v1), (k2, v2)) in zip(sorted(o1.items()), sorted(o2.items()))):
return False
elif any(issubclass(o1.__class__, c) for c in (list, tuple, set)):
if len(o1) != len(o2):
return False
if any(not are_almost_equal(v1, v2, max_abs_ratio_diff, max_abs_diff)
for v1, v2 in zip(o1, o2)):
return False
elif isinstance(o1, float):
if o1 == o2:
return True
else:
if max_abs_ratio_diff > 0: # if max_abs_ratio_diff < 0, max_abs_ratio_diff is ignored
if o2 != 0:
if abs(1.0 - (o1 / o2)) > max_abs_ratio_diff:
return False
else: # if both == 0, we already returned True
if abs(1.0 - (o2 / o1)) > max_abs_ratio_diff:
return False
if 0 < max_abs_diff < abs(o1 - o2): # if max_abs_diff < 0, max_abs_diff is ignored
return False
return True
else:
if not composite_type_passed:
return o1 == o2
return True
class EqualityTest(unittest.TestCase):
def test_floats(self) -> None:
o1 = ('hi', 3, 3.4)
o2 = ('hi', 3, 3.400001)
self.assertTrue(are_almost_equal(o1, o2, 0.0001, 0.0001))
self.assertFalse(are_almost_equal(o1, o2, 0.00000001, 0.00000001))
def test_ratio_only(self):
o1 = ['hey', 10000, 123.12]
o2 = ['hey', 10000, 123.80]
self.assertTrue(are_almost_equal(o1, o2, 0.01, -1))
self.assertFalse(are_almost_equal(o1, o2, 0.001, -1))
def test_diff_only(self):
o1 = ['hey', 10000, 1234567890.12]
o2 = ['hey', 10000, 1234567890.80]
self.assertTrue(are_almost_equal(o1, o2, -1, 1))
self.assertFalse(are_almost_equal(o1, o2, -1, 0.1))
def test_both_ignored(self):
o1 = ['hey', 10000, 1234567890.12]
o2 = ['hey', 10000, 0.80]
o3 = ['hi', 10000, 0.80]
self.assertTrue(are_almost_equal(o1, o2, -1, -1))
self.assertFalse(are_almost_equal(o1, o3, -1, -1))
def test_different_lengths(self):
o1 = ['hey', 1234567890.12, 10000]
o2 = ['hey', 1234567890.80]
self.assertFalse(are_almost_equal(o1, o2, 1, 1))
def test_classes(self):
class A:
d = 12.3
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
o1 = A(2.34, 'str', {1: 'hey', 345.23: [123, 'hi', 890.12]})
o2 = A(2.34, 'str', {1: 'hey', 345.231: [123, 'hi', 890.121]})
self.assertTrue(are_almost_equal(o1, o2, 0.1, 0.1))
self.assertFalse(are_almost_equal(o1, o2, 0.0001, 0.0001))
o2.hello = 'hello'
self.assertFalse(are_almost_equal(o1, o2, -1, -1))
def test_namedtuples(self):
B = namedtuple('B', ['x', 'y'])
o1 = B(3.3, 4.4)
o2 = B(3.4, 4.5)
self.assertTrue(are_almost_equal(o1, o2, 0.2, 0.2))
self.assertFalse(are_almost_equal(o1, o2, 0.001, 0.001))
def test_classes_with_slots(self):
class C(object):
__slots__ = ['a', 'b']
def __init__(self, a, b):
self.a = a
self.b = b
o1 = C(3.3, 4.4)
o2 = C(3.4, 4.5)
self.assertTrue(are_almost_equal(o1, o2, 0.3, 0.3))
self.assertFalse(are_almost_equal(o1, o2, -1, 0.01))
def test_dataclasses(self):
#dataclass
class D:
s: str
i: int
f: float
#dataclass
class E:
f2: float
f4: str
d: D
o1 = E(12.3, 'hi', D('hello', 34, 20.01))
o2 = E(12.1, 'hi', D('hello', 34, 20.0))
self.assertTrue(are_almost_equal(o1, o2, -1, 0.4))
self.assertFalse(are_almost_equal(o1, o2, -1, 0.001))
o3 = E(12.1, 'hi', D('ciao', 34, 20.0))
self.assertFalse(are_almost_equal(o2, o3, -1, -1))
def test_ordereddict(self):
o1 = OrderedDict({1: 'hey', 345.23: [123, 'hi', 890.12]})
o2 = OrderedDict({1: 'hey', 345.23: [123, 'hi', 890.0]})
self.assertTrue(are_almost_equal(o1, o2, 0.01, -1))
self.assertFalse(are_almost_equal(o1, o2, 0.0001, -1))
Use Pandas
Another way is to convert each of the two dicts etc into pandas dataframes and then use pd.testing.assert_frame_equal() to compare the two. I have used this successfully to compare lists of dicts.
Previous answers often don't work on structures involving dictionaries, but this one should. I haven't exhaustively tested this on highly nested structures, but imagine pandas would handle them correctly.
Example 1: compare two dicts
To illustrate this I will use your example data of a dict, since the other methods don't work with dicts. Your dict was:
x, y = 0.1234567890, 0.1234567891
{1: x, 2: x, 3: x}, {1: y, 2: y, 3: y}
Then we can do:
pd.testing.assert_frame_equal(
pd.DataFrame.from_dict({1: x, 2: x, 3: x}, orient='index') ,
pd.DataFrame.from_dict({1: y, 2: y, 3: y}, orient='index') )
This doesn't raise an error, meaning that they are equal to a certain degree of precision.
However if we were to do
pd.testing.assert_frame_equal(
pd.DataFrame.from_dict({1: x, 2: x, 3: x}, orient='index') ,
pd.DataFrame.from_dict({1: y, 2: y, 3: y + 1}, orient='index') ) #add 1 to last value
then we are rewarded with the following informative message:
AssertionError: DataFrame.iloc[:, 0] (column name="0") are different
DataFrame.iloc[:, 0] (column name="0") values are different (33.33333 %)
[index]: [1, 2, 3]
[left]: [0.123456789, 0.123456789, 0.123456789]
[right]: [0.1234567891, 0.1234567891, 1.1234567891]
For further details see pd.testing.assert_frame_equal documentation , particularly parameters check_exact, rtol, atol for info about how to specify required degree of precision either relative or actual.
Example 2: Nested dict of dicts
a = {i*10 : {1:1.1,2:2.1} for i in range(4)}
b = {i*10 : {1:1.1000001,2:2.100001} for i in range(4)}
# a = {0: {1: 1.1, 2: 2.1}, 10: {1: 1.1, 2: 2.1}, 20: {1: 1.1, 2: 2.1}, 30: {1: 1.1, 2: 2.1}}
# b = {0: {1: 1.1000001, 2: 2.100001}, 10: {1: 1.1000001, 2: 2.100001}, 20: {1: 1.1000001, 2: 2.100001}, 30: {1: 1.1000001, 2: 2.100001}}
and then do
pd.testing.assert_frame_equal( pd.DataFrame(a), pd.DataFrame(b) )
- it doesn't raise an error: all values fairly similar.
However, if we change a value e.g.
b[30][2] += 1
# b = {0: {1: 1.1000001, 2: 2.1000001}, 10: {1: 1.1000001, 2: 2.1000001}, 20: {1: 1.1000001, 2: 2.1000001}, 30: {1: 1.1000001, 2: 3.1000001}}
and then run the same test, we get the following clear error message:
AssertionError: DataFrame.iloc[:, 3] (column name="30") are different
DataFrame.iloc[:, 3] (column name="30") values are different (50.0 %)
[index]: [1, 2]
[left]: [1.1, 2.1]
[right]: [1.1000001, 3.1000001]
Looking at this myself, I used the addTypeEqualityFunc method of the UnitTest library in combination with math.isclose.
Sample setup:
import math
from unittest import TestCase
class SomeFixtures(TestCase):
#classmethod
def float_comparer(cls, a, b, msg=None):
if len(a) != len(b):
raise cls.failureException(msg)
if not all(map(lambda args: math.isclose(*args), zip(a, b))):
raise cls.failureException(msg)
def some_test(self):
self.addTypeEqualityFunc(list, self.float_comparer)
self.assertEqual([1.0, 2.0, 3.0], [1.0, 2.0, 3.0])
I would still use self.assertEqual() for it stays the most informative when shit hits the fan. You can do that by rounding, eg.
self.assertEqual(round_tuple((13.949999999999999, 1.121212), 2), (13.95, 1.12))
where round_tuple is
def round_tuple(t: tuple, ndigits: int) -> tuple:
return tuple(round(e, ndigits=ndigits) for e in t)
def round_list(l: list, ndigits: int) -> list:
return [round(e, ndigits=ndigits) for e in l]
According to the python docs (see https://stackoverflow.com/a/41407651/1031191) you can get away with rounding issues like 13.94999999, because 13.94999999 == 13.95 is True.
You can also recursively call the already present unittest.assertAlmostEquals() and keep track of what element you are comparing, by adding a method to your unittest.
E.g. for lists of lists and list of tuples of floats:
def assertListAlmostEqual(self, first, second, delta=None, context=None):
"""Asserts lists of lists or tuples to check if they compare and
shows which element is wrong when comparing two lists
"""
self.assertEqual(len(first), len(second), msg="List have different length")
context = [first, second] if context is None else context
for i in range(0, len(first)):
if isinstance(first[0], tuple):
context.append(i)
self.assertListAlmostEqual(first[i], second[i], delta, context=context)
if isinstance(first[0], list):
context.append(i)
self.assertListAlmostEqual(first[i], second[i], delta, context=context)
elif isinstance(first[0], float):
msg = "Difference in \n{} and \n{}\nFaulty element index={}".format(context[0], context[1], context[2:]+[i]) \
if context is not None else None
self.assertAlmostEqual(first[i], second[i], delta, msg=msg)
Outputs something like:
line 23, in assertListAlmostEqual
self.assertAlmostEqual(first[i], second[i], delta, msg=msg)
AssertionError: 5.0 != 6.0 within 7 places (1.0 difference) : Difference in
[(0.0, 5.0), (8.0, 2.0), (10.0, 1.999999), (11.0, 1.9999989090909092)] and
[(0.0, 6.0), (8.0, 2.0), (10.0, 1.999999), (11.0, 1.9999989)]
Faulty element index=[0, 1]
An alternative approach is to convert your data into a comparable form by e.g turning each float into a string with fixed precision.
def comparable(data):
"""Converts `data` to a comparable structure by converting any floats to a string with fixed precision."""
if isinstance(data, (int, str)):
return data
if isinstance(data, float):
return '{:.4f}'.format(data)
if isinstance(data, list):
return [comparable(el) for el in data]
if isinstance(data, tuple):
return tuple([comparable(el) for el in data])
if isinstance(data, dict):
return {k: comparable(v) for k, v in data.items()}
Then you can:
self.assertEquals(comparable(value1), comparable(value2))

Categories