How to compare two list of dictionaries [duplicate] - python

a = [1, 2, 3, 1, 2, 3]
b = [3, 2, 1, 3, 2, 1]
a & b should be considered equal, because they have exactly the same elements, only in different order.
The thing is, my actual lists will consist of objects (my class instances), not integers.

O(n): The Counter() method is best (if your objects are hashable):
def compare(s, t):
return Counter(s) == Counter(t)
O(n log n): The sorted() method is next best (if your objects are orderable):
def compare(s, t):
return sorted(s) == sorted(t)
O(n * n): If the objects are neither hashable, nor orderable, you can use equality:
def compare(s, t):
t = list(t) # make a mutable copy
try:
for elem in s:
t.remove(elem)
except ValueError:
return False
return not t

You can sort both:
sorted(a) == sorted(b)
A counting sort could also be more efficient (but it requires the object to be hashable).
>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True

If you know the items are always hashable, you can use a Counter() which is O(n)
If you know the items are always sortable, you can use sorted() which is O(n log n)
In the general case you can't rely on being able to sort, or has the elements, so you need a fallback like this, which is unfortunately O(n^2)
len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)

If you have to do this in tests:
https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual
assertCountEqual(first, second, msg=None)
Test that sequence first contains the same elements as second, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated.
Duplicate elements are not ignored when comparing first and second. It verifies whether each element has the same count in both sequences. Equivalent to: assertEqual(Counter(list(first)), Counter(list(second))) but works with sequences of unhashable objects as well.
New in version 3.2.
or in 2.7:
https://docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual
Outside of tests I would recommend the Counter method.

The best way to do this is by sorting the lists and comparing them. (Using Counter won't work with objects that aren't hashable.) This is straightforward for integers:
sorted(a) == sorted(b)
It gets a little trickier with arbitrary objects. If you care about object identity, i.e., whether the same objects are in both lists, you can use the id() function as the sort key.
sorted(a, key=id) == sorted(b, key==id)
(In Python 2.x you don't actually need the key= parameter, because you can compare any object to any object. The ordering is arbitrary but stable, so it works fine for this purpose; it doesn't matter what order the objects are in, only that the ordering is the same for both lists. In Python 3, though, comparing objects of different types is disallowed in many circumstances -- for example, you can't compare strings to integers -- so if you will have objects of various types, best to explicitly use the object's ID.)
If you want to compare the objects in the list by value, on the other hand, first you need to define what "value" means for the objects. Then you will need some way to provide that as a key (and for Python 3, as a consistent type). One potential way that would work for a lot of arbitrary objects is to sort by their repr(). Of course, this could waste a lot of extra time and memory building repr() strings for large lists and so on.
sorted(a, key=repr) == sorted(b, key==repr)
If the objects are all your own types, you can define __lt__() on them so that the object knows how to compare itself to others. Then you can just sort them and not worry about the key= parameter. Of course you could also define __hash__() and use Counter, which will be faster.

If the comparison is to be performed in a testing context, use assertCountEqual(a, b) (py>=3.2) and assertItemsEqual(a, b) (2.7<=py<3.2).
Works on sequences of unhashable objects too.

If the list contains items that are not hashable (such as a list of objects) you might be able to use the Counter Class and the id() function such as:
from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
print("Lists a and b contain the same objects")

Let a,b lists
def ass_equal(a,b):
try:
map(lambda x: a.pop(a.index(x)), b) # try to remove all the elements of b from a, on fail, throw exception
if len(a) == 0: # if a is empty, means that b has removed them all
return True
except:
return False # b failed to remove some items from a
No need to make them hashable or sort them.

I hope the below piece of code might work in your case :-
if ((len(a) == len(b)) and
(all(i in a for i in b))):
print 'True'
else:
print 'False'
This will ensure that all the elements in both the lists a & b are same, regardless of whether they are in same order or not.
For better understanding, refer to my answer in this question

You can write your own function to compare the lists.
Let's get two lists.
list_1=['John', 'Doe']
list_2=['Doe','Joe']
Firstly, we define an empty dictionary, count the list items and write in the dictionary.
def count_list(list_items):
empty_dict={}
for list_item in list_items:
list_item=list_item.strip()
if list_item not in empty_dict:
empty_dict[list_item]=1
else:
empty_dict[list_item]+=1
return empty_dict
After that, we'll compare both lists by using the following function.
def compare_list(list_1, list_2):
if count_list(list_1)==count_list(list_2):
return True
return False
compare_list(list_1,list_2)

from collections import defaultdict
def _list_eq(a: list, b: list) -> bool:
if len(a) != len(b):
return False
b_set = set(b)
a_map = defaultdict(lambda: 0)
b_map = defaultdict(lambda: 0)
for item1, item2 in zip(a, b):
if item1 not in b_set:
return False
a_map[item1] += 1
b_map[item2] += 1
return a_map == b_map
Sorting can be quite slow if the data is highly unordered (timsort is extra good when the items have some degree of ordering). Sorting both also requires fully iterating through both lists.
Rather than mutating a list, just allocate a set and do a left-->right membership check, keeping a count of how many of each item exist along the way:
If the two lists are not the same length you can short circuit and return False immediately.
If you hit any item in list a that isn't in list b you can return False
If you get through all items then you can compare the values of a_map and b_map to find out if they match.
This allows you to short-circuit in many cases long before you've iterated both lists.

plug in this:
def lists_equal(l1: list, l2: list) -> bool:
"""
import collections
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
ref:
- https://stackoverflow.com/questions/9623114/check-if-two-unordered-lists-are-equal
- https://stackoverflow.com/questions/7828867/how-to-efficiently-compare-two-unordered-lists-not-sets
"""
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
set_comp = set(l1) == set(l2) # removes duplicates, so returns true when not sometimes :(
multiset_comp = compare(l1, l2) # approximates multiset
return set_comp and multiset_comp #set_comp is gere in case the compare function doesn't work

Related

comparing contents of two lists python [duplicate]

a = [1, 2, 3, 1, 2, 3]
b = [3, 2, 1, 3, 2, 1]
a & b should be considered equal, because they have exactly the same elements, only in different order.
The thing is, my actual lists will consist of objects (my class instances), not integers.
O(n): The Counter() method is best (if your objects are hashable):
def compare(s, t):
return Counter(s) == Counter(t)
O(n log n): The sorted() method is next best (if your objects are orderable):
def compare(s, t):
return sorted(s) == sorted(t)
O(n * n): If the objects are neither hashable, nor orderable, you can use equality:
def compare(s, t):
t = list(t) # make a mutable copy
try:
for elem in s:
t.remove(elem)
except ValueError:
return False
return not t
You can sort both:
sorted(a) == sorted(b)
A counting sort could also be more efficient (but it requires the object to be hashable).
>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True
If you know the items are always hashable, you can use a Counter() which is O(n)
If you know the items are always sortable, you can use sorted() which is O(n log n)
In the general case you can't rely on being able to sort, or has the elements, so you need a fallback like this, which is unfortunately O(n^2)
len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)
If you have to do this in tests:
https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual
assertCountEqual(first, second, msg=None)
Test that sequence first contains the same elements as second, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated.
Duplicate elements are not ignored when comparing first and second. It verifies whether each element has the same count in both sequences. Equivalent to: assertEqual(Counter(list(first)), Counter(list(second))) but works with sequences of unhashable objects as well.
New in version 3.2.
or in 2.7:
https://docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual
Outside of tests I would recommend the Counter method.
The best way to do this is by sorting the lists and comparing them. (Using Counter won't work with objects that aren't hashable.) This is straightforward for integers:
sorted(a) == sorted(b)
It gets a little trickier with arbitrary objects. If you care about object identity, i.e., whether the same objects are in both lists, you can use the id() function as the sort key.
sorted(a, key=id) == sorted(b, key==id)
(In Python 2.x you don't actually need the key= parameter, because you can compare any object to any object. The ordering is arbitrary but stable, so it works fine for this purpose; it doesn't matter what order the objects are in, only that the ordering is the same for both lists. In Python 3, though, comparing objects of different types is disallowed in many circumstances -- for example, you can't compare strings to integers -- so if you will have objects of various types, best to explicitly use the object's ID.)
If you want to compare the objects in the list by value, on the other hand, first you need to define what "value" means for the objects. Then you will need some way to provide that as a key (and for Python 3, as a consistent type). One potential way that would work for a lot of arbitrary objects is to sort by their repr(). Of course, this could waste a lot of extra time and memory building repr() strings for large lists and so on.
sorted(a, key=repr) == sorted(b, key==repr)
If the objects are all your own types, you can define __lt__() on them so that the object knows how to compare itself to others. Then you can just sort them and not worry about the key= parameter. Of course you could also define __hash__() and use Counter, which will be faster.
If the comparison is to be performed in a testing context, use assertCountEqual(a, b) (py>=3.2) and assertItemsEqual(a, b) (2.7<=py<3.2).
Works on sequences of unhashable objects too.
If the list contains items that are not hashable (such as a list of objects) you might be able to use the Counter Class and the id() function such as:
from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
print("Lists a and b contain the same objects")
Let a,b lists
def ass_equal(a,b):
try:
map(lambda x: a.pop(a.index(x)), b) # try to remove all the elements of b from a, on fail, throw exception
if len(a) == 0: # if a is empty, means that b has removed them all
return True
except:
return False # b failed to remove some items from a
No need to make them hashable or sort them.
I hope the below piece of code might work in your case :-
if ((len(a) == len(b)) and
(all(i in a for i in b))):
print 'True'
else:
print 'False'
This will ensure that all the elements in both the lists a & b are same, regardless of whether they are in same order or not.
For better understanding, refer to my answer in this question
You can write your own function to compare the lists.
Let's get two lists.
list_1=['John', 'Doe']
list_2=['Doe','Joe']
Firstly, we define an empty dictionary, count the list items and write in the dictionary.
def count_list(list_items):
empty_dict={}
for list_item in list_items:
list_item=list_item.strip()
if list_item not in empty_dict:
empty_dict[list_item]=1
else:
empty_dict[list_item]+=1
return empty_dict
After that, we'll compare both lists by using the following function.
def compare_list(list_1, list_2):
if count_list(list_1)==count_list(list_2):
return True
return False
compare_list(list_1,list_2)
from collections import defaultdict
def _list_eq(a: list, b: list) -> bool:
if len(a) != len(b):
return False
b_set = set(b)
a_map = defaultdict(lambda: 0)
b_map = defaultdict(lambda: 0)
for item1, item2 in zip(a, b):
if item1 not in b_set:
return False
a_map[item1] += 1
b_map[item2] += 1
return a_map == b_map
Sorting can be quite slow if the data is highly unordered (timsort is extra good when the items have some degree of ordering). Sorting both also requires fully iterating through both lists.
Rather than mutating a list, just allocate a set and do a left-->right membership check, keeping a count of how many of each item exist along the way:
If the two lists are not the same length you can short circuit and return False immediately.
If you hit any item in list a that isn't in list b you can return False
If you get through all items then you can compare the values of a_map and b_map to find out if they match.
This allows you to short-circuit in many cases long before you've iterated both lists.
plug in this:
def lists_equal(l1: list, l2: list) -> bool:
"""
import collections
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
ref:
- https://stackoverflow.com/questions/9623114/check-if-two-unordered-lists-are-equal
- https://stackoverflow.com/questions/7828867/how-to-efficiently-compare-two-unordered-lists-not-sets
"""
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
set_comp = set(l1) == set(l2) # removes duplicates, so returns true when not sometimes :(
multiset_comp = compare(l1, l2) # approximates multiset
return set_comp and multiset_comp #set_comp is gere in case the compare function doesn't work

partition iterator in two based on attribute

I have a list of objects from which I have filtered out those that have a particular value in a single attribute:
import itertools
iterator = itertools.ifilter(lambda record: record.outcome==1, list)
The iterator has now all objects with outcome = 1
However, they now differ in the value of another attribute order=X
I would like to partition the iterator in two: one for those objects whose order = 1
and another one for those with order > 1
Is there a another way than looping over all elements and adding them to one of the two lists / a list comprehension?
Say I have a list l with obj1,obj2,obj3 as content where obj1.order=1, obj2.order=2 and obj3.order=3 I would like to yield a # containing obj1
and b # containing obj2 & obj3
Preferably, I would like to have two other iterators so that I can do with the partial lists whatever I would like to do!
I was thinking about itertools.groupby but as my variable order=X has a number of possible values, it would give me more than two sub-iterators!
Use tee and ifilter. You don't absolutely need the tee, but it potentially makes it more efficient if the original iterator is expensive.
import itertools
iter_outcome = itertools.ifilter(lambda record: record.outcome==1, list)
a, b = itertools.tee(iter_outcome, 2)
iter_order_1 = itertools.ifilter(lambda record: record.order == 1, a)
iter_greater_1 = itertools.ifilter(lambda record: record.order > 1, b)

How do I determine whether a container is infinitely recursive and find its smallest unique container?

I was reading Flatten (an irregular) list of lists and decided to adopt it as a Python exercise - a small function I'll occasionally rewrite without referring to the original, just for practice. The first time I tried this, I had something like the following:
def flat(iterable):
try:
iter(iterable)
except TypeError:
yield iterable
else:
for item in iterable:
yield from flatten(item)
This works fine for basic structures like nested lists containing numbers, but strings crash it because the first element of a string is a single-character string, the first element of which is itself, the first element of which is itself again, and so on. Checking the question linked above, I realized that that explains the check for strings. That gave me the following:
def flatter(iterable):
try:
iter(iterable)
if isinstance(iterable, str):
raise TypeError
except TypeError:
yield iterable
else:
for item in iterable:
yield from flatten(item)
Now it works for strings as well. However, I then recalled that a list can contain references to itself.
>>> lst = []
>>> lst.append(lst)
>>> lst
[[...]]
>>> lst[0][0][0][0] is lst
True
So, a string isn't the only type that could cause this sort of problem. At this point, I started looking for a way to guard against this issue without explicit type-checking.
The following flattener.py ensued. flattish() is a version that just checks for strings. flatten_notype() checks whether an object's first item's first item is equal to itself to determine recursion. flatten() does this and then checks whether either the object or its first item's first item is an instance of the other's type. The Fake class basically just defines a wrapper for sequences. The comments on the lines that test each function describe the results, in the form should be `desired_result` [> `undesired_actual_result`]. As you can see, each fails in various ways on Fake wrapped around a string, Fake wrapped around a list of integers, single-character strings, and multiple-character strings.
def flattish(*i):
for item in i:
try: iter(item)
except: yield item
else:
if isinstance(item, str): yield item
else: yield from flattish(*item)
class Fake:
def __init__(self, l):
self.l = l
self.index = 0
def __iter__(self):
return self
def __next__(self):
if self.index >= len(self.l):
raise StopIteration
else:
self.index +=1
return self.l[self.index-1]
def __str__(self):
return str(self.l)
def flatten_notype(*i):
for item in i:
try:
n = next(iter(item))
try:
n2 = next(iter(n))
recur = n == n2
except TypeError:
yield from flatten(*item)
else:
if recur:
yield item
else:
yield from flatten(*item)
except TypeError:
yield item
def flatten(*i):
for item in i:
try:
n = next(iter(item))
try:
n2 = next(iter(n))
recur = n == n2
except TypeError:
yield from flatten(*item)
else:
if recur:
yield item if isinstance(n2, type(item)) or isinstance(item, type(n2)) else n2
else:
yield from flatten(*item)
except TypeError:
yield item
f = Fake('abc')
print(*flattish(f)) # should be `abc`
print(*flattish((f,))) # should be `abc` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake([1, 2, 3])
print(*flattish(f)) # should be `1 2 3`
print(*flattish((f,))) # should be `1 2 3` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake('abc')
print(*flatten_notype(f)) # should be `abc`
print(*flatten_notype((f,))) # should be `abc` > `c`
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`
f = Fake([1, 2, 3])
print(*flatten_notype(f)) # should be `1 2 3` > `2 3`
print(*flatten_notype((f,))) # should be `1 2 3` > ``
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`
f = Fake('abc')
print(*flatten(f)) # should be `abc` > `a`
print(*flatten((f,))) # should be `abc` > `c`
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake([1, 2, 3])
print(*flatten(f)) # should be `1 2 3` > `2 3`
print(*flatten((f,))) # should be `1 2 3` > ``
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`
I've also tried the following with the recursive lst defined above and flatten():
>>> print(*flatten(lst))
[[...]]
>>> lst.append(0)
>>> print(*flatten(lst))
[[...], 0]
>>> print(*list(flatten(lst))[0])
[[...], 0] 0
As you can see, it fails similarly to 1 ('a',) bc as well as in its own special way.
I read how can python function access its own attributes? thinking that maybe the function could keep track of every object it had seen, but that wouldn't work either because our lst contains an object with matching identity and equality, strings contain objects that may only have matching equality, and equality isn't enough due to the possibility of something like flatten([1, 2], [1, 2]).
Is there any reliable way (i.e. doesn't simply check known types, doesn't require that a recursive container and its containers all be of the same type, etc.) to check whether a container holds iterable objects with potential infinite recursion, and reliably determine the smallest unique container? If there is, please explain how it can be done, why it is reliable, and how it handles various recursive circumstances. If not, please explain why this is logically impossible.
I don't think there's a reliable way to find out if an arbitrary iterable is infinite. The best we can is to yield primitives infinitely from such an iterable without exhausting the stack, for example:
from collections import deque
def flat(iterable):
d = deque([iterable])
def _primitive(x):
return type(x) in (int, float, bool, str, unicode)
def _next():
x = d.popleft()
if _primitive(x):
return True, x
d.extend(x)
return False, None
while d:
ok, x = _next()
if ok:
yield x
xs = [1,[2], 'abc']
xs.insert(0, xs)
for p in flat(xs):
print p
The above definition of "primitive" is, well, primitive, but that surely can be improved.
The scenario you ask about is very loosely defined. As defined in your question, it is logically impossible "to check whether a container holds iterable objects with potential infinite recursion[.]" The only limit on the scope of your question is "iterable" object. The official Python documentation defines "iterable" as follows:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. [...]
The key phrase here is "any classes [defined] with an __iter__() or __getitem__() method." This allows for "iterable" objects with members that are generated on demand. For example, suppose that someone seeks to use a bunch of string objects that automatically sort and compare in chronological order based on the time at which the particular string was created. They either subclass str or reimplement its functionality, adding a timestamp associated with each pointer to a timestampedString( ) object, and adjust the comparison methods accordingly.
Accessing a substring by index location is a way of creating a new string, so a timestampedString( ) of len( ) == 1 could legitimately return a timestampedString( ) of len( ) == 1 with the same character but a new timestamp when you access timestampedString( )[0:1]. Because the timestamp is part of the specific object instance, there is no kind of identity test that would say that the two objects are the same unless any two strings consisting of the same character are considered to be the same. You state in your question that this should not be the case.
To detect infinite recursion, you first need to add a constraint to the scope of your question that the container only contain static, i.e. pre-generated, objects. With this constraint, any legal object in the container can be converted to some byte-string representation of the object. A simple way to do this would be to pickle each object in the container as you reach it, and maintain a stack of the byte-string representations that result from pickling. If you allow any arbitrary static object, nothing less than a raw-byte interpretation of the objects is going to work.
However, algorithmically enforcing the constraint that the container only contain static objects presents another problem: it requires type-checking against some pre-approved list of types such as some notion of primitives. Two categories of objects can then be accommodated: single objects of a known-static type (e.g. primitives) and containers for which the number of contained items can be determined in advance. The latter category can then be shown to be finite when that many contained objects have been iterated through and all have been shown to be finite. Containers within the container can be handled recursively. The known-static type single objects are the recursive base-case.
If the container produces more objects, then it violates the definition of this category of object. The problem with allowing arbitrary objects in Python is that these objects can be defined in Python code that can use components written in C code and any other language that C can be linked to. There is no way to evaluate this code to determine if it actually complies with the static requirement.
There's an issue with your test code that's unrelated to the recursive container issue you're trying to solve. The issue is that your Fake class is an iterator and can only be used once. After you iterate over all its values, it will always raise StopIteration when you try to iterate on it again.
So if you do multiple operations on the same Fake instance, you shouldn't expect to get anything be empty output after the first operation has consumed the iterator. If you recreate the iterator before each operation, you won't have that problem (and you can actually try addressing the recursion issue).
So on to that issue. One way to avoid infinite recursion is to maintain a stack with the objects that you're currently nested in. If the next value you see is already on the stack somewhere, you know it's recursive and can skip it. Here's an implementation of this using a list as the stack:
def flatten(obj, stack=None):
if stack is None:
stack = []
if obj in stack:
yield obj
try:
it = iter(obj)
except TypeError:
yield obj
else:
stack.append(obj)
for item in it:
yield from flatten(item, stack)
stack.pop()
Note that this can still yield values from the same container more than once, as long as it's not nested within itself (e.g. for x=[1, 2]; y=[x, 3, x]; print(*flatten(y)) will print 1 2 3 1 2).
It also does recurse into strings, but it will only do so for only one level, so flatten("foo") will yield the letters 'f', 'o' and 'o' in turn. If you want to avoid that, you probably do need the function to be type aware, since from the iteration protocol's perspective, a string is not any different than an iterable container of its letters. It's only single character strings that recursively contain themselves.
What about something like this:
def flat(obj, used=[], old=None):
#This is to get inf. recurrences
if obj==old:
if obj not in used:
used.append(obj)
yield obj
raise StopIteration
try:
#Get strings
if isinstance(obj, str):
raise TypeError
#Try to iterate the obj
for item in obj:
yield from flat(item, used, obj)
except TypeError:
#Get non-iterable items
if obj not in used:
used.append(obj)
yield obj
After a finite number of (recursion) steps a list will contain at most itself as iterable element (Since we have to generate it in finite many steps). That's what we test for with obj==old where obj in an element of old.
The list used keeps track of all elements since we want each element only once. We could remove it but we'd get an ugly (and more importantly not well-defined) behaviour on which elements get yield how often.
Drawback is that we store the entire list at the end in the list used...
Testing this with some lists seems to work:
>> lst = [1]
>> lst.append(lst)
>> print('\nList1: ', lst)
>> print([x for x in flat(lst)])
List1: [1, [...]]
Elements: [1, [1, [...]]]
#We'd need to reset the iterator here!
>> lst2 = []
>> lst2.append(lst2)
>> lst2.append((1,'ab'))
>> lst2.append(lst)
>> lst2.append(3)
>> print('\nList2: ', lst2)
>> print([x for x in flat(lst2)])
List2: [[...], (1, 'ab'), [1, [...]], 3]
Elements: [[[...], (1, 'ab'), [1, [...]], 3], 1, 'ab', [1, [...]], 3]
Note: It actually makes sense that the infinite lists [[...], (1, 'ab'), [1, [...]], 3] and [1, [...]] are considered as elements since these actually contain themselves but if that's not desired one can comment out the first yield in the code above.
Just avoid flattening recurring containers. In the example below keepobj keeps track of them and keepcls ignores containers of a certain type. I believe this works down to python 2.3.
def flatten(item, keepcls=(), keepobj=()):
if not hasattr(item, '__iter__') or isinstance(item, keepcls) or item in keepobj:
yield item
else:
for i in item:
for j in flatten(i, keepcls, keepobj + (item,)):
yield j
It can flatten circular lists like lst = [1, 2, [5, 6, {'a': 1, 'b': 2}, 7, 'string'], [...]] and keep some containers like strings and dicts un-flattened.
>>> list(flatten(l, keepcls=(dict, str)))
[1, 2, 5, 6, {'a': 1, 'b': 2}, 7, 'string', [1, 2, [5, 6, {'a': 1, 'b': 2}, 7, 'string'], [...]]]
It also works with the following case:
>>> list(flatten([[1,2],[1,[1,2]],[1,2]]))
[1, 2, 1, 1, 2, 1, 2]
You may want to keep some default classes in keepcls to make calling
the function more terse.

Is there a short contains function for lists?

Given a list xs and a value item, how can I check whether xs contains item (i.e., if any of the elements of xs is equal to item)? Is there something like xs.contains(item)?
For performance considerations, see Fastest way to check if a value exists in a list.
Use:
if my_item in some_list:
...
Also, inverse operation:
if my_item not in some_list:
...
It works fine for lists, tuples, sets and dicts (check keys).
Note that this is an O(n) operation in lists and tuples, but an O(1) operation in sets and dicts.
In addition to what other have said, you may also be interested to know that what in does is to call the list.__contains__ method, that you can define on any class you write and can get extremely handy to use python at his full extent.
A dumb use may be:
>>> class ContainsEverything:
def __init__(self):
return None
def __contains__(self, *elem, **k):
return True
>>> a = ContainsEverything()
>>> 3 in a
True
>>> a in a
True
>>> False in a
True
>>> False not in a
False
>>>
I came up with this one liner recently for getting True if a list contains any number of occurrences of an item, or False if it contains no occurrences or nothing at all. Using next(...) gives this a default return value (False) and means it should run significantly faster than running the whole list comprehension.
list_does_contain = next((True for item in list_to_test if item == test_item), False)
The list method index will return -1 if the item is not present, and will return the index of the item in the list if it is present. Alternatively in an if statement you can do the following:
if myItem in list:
#do things
You can also check if an element is not in a list with the following if statement:
if myItem not in list:
#do things
There is also the list method:
[2, 51, 6, 8, 3].__contains__(8)
# Out[33]: True
[2, 51, 6, 3].__contains__(8)
# Out[33]: False
There is one another method that uses index. But I am not sure if this has any fault or not.
list = [5,4,3,1]
try:
list.index(2)
#code for when item is expected to be in the list
print("present")
except:
#code for when item is not expected to be in the list
print("not present")
Output:
not present

How to uniformly handle A LIST or A LIST OF LIST (multi-dimensional list)

Without any heavy libraries such as numpy, I want to uniformly handle a single list or multi-dimensional list in my code. For example, the function sum_up(list_or_matrix) should
return 6 for argument [1, 2, 3] and return 9 for [[1, 2, 3], [1, 2, 0]].
My question is:
1. Can I code in a way without explicitly detecting the dimension of my input such as by isinstance(arg[0], (tuple, list))?
2. If I have to do so, is there any elegant way of detecting the dimension of a list (of list of list ...), e.g. recursively?
As many users suggested you can always use dict instead of list for any-dimensinal collection. Dictionaries are accepting tuples as arguments as they are hashable. So you can easy fill-up your collection like
>>> m = {}
>>> m[1] = 1
>>> m[1,2] = 12
>>> m[1,2,"three",4.5] = 12345
>>> sum(m.values()) #better use m.itervalues() in python 2.*
12358
You can solve this problem using recursion, like this:
#!/usr/bin/env python
def sum(seq_or_elem):
if hasattr(seq_or_elem, '__iter__'):
# We were passed a sequence so iterate over it, summing the elements.
total = 0
for i in seq_or_elem:
total += sum(i)
return total
else:
# We were passed an atomic element, the sum is identical to the passed value.
return seq_or_elem
Test:
>>> print(sum([1, 2, [3, [4]], [], 5]))
15
Well I dont see a way if you are planning to use a single function to sum up your list like sum_up(list_or_matrix).
If you are having a list of lists I would only imagine you need to loop through the list to find out if its a 1-D or a 2-D list. Anyway whats wrong in looping?
def sum_up(matrix):
2DMatrix = False
for item in matrix:
if type(item) == list:
2DMatrix = True
if(2DMatrix):
//sum 2d matrix
else:
//sum 1d matrix
A simple way to sum up a matrix is as follow:
def sum_up(matrix):
if isinstance(matrix, (tuple, list)):
return sum(matrix)
else:
return sum((sum(x) for x in matrix))
The 'else' branch uses list comprehension, a powerful and quick tool.
You could sum recursively until you have a scalar value:
def flatten(x):
if isinstance(x, list):
return sum(map(flatten, x))
return x
Note: you can use collections.Iterable (or another base class) instead of list, depending on what you want to flatten.

Categories