partition iterator in two based on attribute

partition iterator in two based on attribute - python

I have a list of objects from which I have filtered out those that have a particular value in a single attribute:
import itertools
iterator = itertools.ifilter(lambda record: record.outcome==1, list)
The iterator has now all objects with outcome = 1
However, they now differ in the value of another attribute order=X
I would like to partition the iterator in two: one for those objects whose order = 1
and another one for those with order > 1
Is there a another way than looping over all elements and adding them to one of the two lists / a list comprehension?
Say I have a list l with obj1,obj2,obj3 as content where obj1.order=1, obj2.order=2 and obj3.order=3 I would like to yield a # containing obj1
and b # containing obj2 & obj3
Preferably, I would like to have two other iterators so that I can do with the partial lists whatever I would like to do!
I was thinking about itertools.groupby but as my variable order=X has a number of possible values, it would give me more than two sub-iterators!

Use tee and ifilter. You don't absolutely need the tee, but it potentially makes it more efficient if the original iterator is expensive.
import itertools
iter_outcome = itertools.ifilter(lambda record: record.outcome==1, list)
a, b = itertools.tee(iter_outcome, 2)
iter_order_1 = itertools.ifilter(lambda record: record.order == 1, a)
iter_greater_1 = itertools.ifilter(lambda record: record.order > 1, b)

Related

How to compare two list of dictionaries [duplicate]

a = [1, 2, 3, 1, 2, 3]
b = [3, 2, 1, 3, 2, 1]
a & b should be considered equal, because they have exactly the same elements, only in different order.
The thing is, my actual lists will consist of objects (my class instances), not integers.

O(n): The Counter() method is best (if your objects are hashable):
def compare(s, t):
return Counter(s) == Counter(t)
O(n log n): The sorted() method is next best (if your objects are orderable):
def compare(s, t):
return sorted(s) == sorted(t)
O(n * n): If the objects are neither hashable, nor orderable, you can use equality:
def compare(s, t):
t = list(t) # make a mutable copy
try:
for elem in s:
t.remove(elem)
except ValueError:
return False
return not t

You can sort both:
sorted(a) == sorted(b)
A counting sort could also be more efficient (but it requires the object to be hashable).
>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True

If you know the items are always hashable, you can use a Counter() which is O(n)
If you know the items are always sortable, you can use sorted() which is O(n log n)
In the general case you can't rely on being able to sort, or has the elements, so you need a fallback like this, which is unfortunately O(n^2)
len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)

If you have to do this in tests:
https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual
assertCountEqual(first, second, msg=None)
Test that sequence first contains the same elements as second, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated.
Duplicate elements are not ignored when comparing first and second. It verifies whether each element has the same count in both sequences. Equivalent to: assertEqual(Counter(list(first)), Counter(list(second))) but works with sequences of unhashable objects as well.
New in version 3.2.
or in 2.7:
https://docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual
Outside of tests I would recommend the Counter method.

The best way to do this is by sorting the lists and comparing them. (Using Counter won't work with objects that aren't hashable.) This is straightforward for integers:
sorted(a) == sorted(b)
It gets a little trickier with arbitrary objects. If you care about object identity, i.e., whether the same objects are in both lists, you can use the id() function as the sort key.
sorted(a, key=id) == sorted(b, key==id)
(In Python 2.x you don't actually need the key= parameter, because you can compare any object to any object. The ordering is arbitrary but stable, so it works fine for this purpose; it doesn't matter what order the objects are in, only that the ordering is the same for both lists. In Python 3, though, comparing objects of different types is disallowed in many circumstances -- for example, you can't compare strings to integers -- so if you will have objects of various types, best to explicitly use the object's ID.)
If you want to compare the objects in the list by value, on the other hand, first you need to define what "value" means for the objects. Then you will need some way to provide that as a key (and for Python 3, as a consistent type). One potential way that would work for a lot of arbitrary objects is to sort by their repr(). Of course, this could waste a lot of extra time and memory building repr() strings for large lists and so on.
sorted(a, key=repr) == sorted(b, key==repr)
If the objects are all your own types, you can define __lt__() on them so that the object knows how to compare itself to others. Then you can just sort them and not worry about the key= parameter. Of course you could also define __hash__() and use Counter, which will be faster.

If the comparison is to be performed in a testing context, use assertCountEqual(a, b) (py>=3.2) and assertItemsEqual(a, b) (2.7<=py<3.2).
Works on sequences of unhashable objects too.

If the list contains items that are not hashable (such as a list of objects) you might be able to use the Counter Class and the id() function such as:
from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
print("Lists a and b contain the same objects")

Let a,b lists
def ass_equal(a,b):
try:
map(lambda x: a.pop(a.index(x)), b) # try to remove all the elements of b from a, on fail, throw exception
if len(a) == 0: # if a is empty, means that b has removed them all
return True
except:
return False # b failed to remove some items from a
No need to make them hashable or sort them.

I hope the below piece of code might work in your case :-
if ((len(a) == len(b)) and
(all(i in a for i in b))):
print 'True'
else:
print 'False'
This will ensure that all the elements in both the lists a & b are same, regardless of whether they are in same order or not.
For better understanding, refer to my answer in this question

You can write your own function to compare the lists.
Let's get two lists.
list_1=['John', 'Doe']
list_2=['Doe','Joe']
Firstly, we define an empty dictionary, count the list items and write in the dictionary.
def count_list(list_items):
empty_dict={}
for list_item in list_items:
list_item=list_item.strip()
if list_item not in empty_dict:
empty_dict[list_item]=1
else:
empty_dict[list_item]+=1
return empty_dict
After that, we'll compare both lists by using the following function.
def compare_list(list_1, list_2):
if count_list(list_1)==count_list(list_2):
return True
return False
compare_list(list_1,list_2)

from collections import defaultdict
def _list_eq(a: list, b: list) -> bool:
if len(a) != len(b):
return False
b_set = set(b)
a_map = defaultdict(lambda: 0)
b_map = defaultdict(lambda: 0)
for item1, item2 in zip(a, b):
if item1 not in b_set:
return False
a_map[item1] += 1
b_map[item2] += 1
return a_map == b_map
Sorting can be quite slow if the data is highly unordered (timsort is extra good when the items have some degree of ordering). Sorting both also requires fully iterating through both lists.
Rather than mutating a list, just allocate a set and do a left-->right membership check, keeping a count of how many of each item exist along the way:
If the two lists are not the same length you can short circuit and return False immediately.
If you hit any item in list a that isn't in list b you can return False
If you get through all items then you can compare the values of a_map and b_map to find out if they match.
This allows you to short-circuit in many cases long before you've iterated both lists.

plug in this:
def lists_equal(l1: list, l2: list) -> bool:
"""
import collections
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
ref:
- https://stackoverflow.com/questions/9623114/check-if-two-unordered-lists-are-equal
- https://stackoverflow.com/questions/7828867/how-to-efficiently-compare-two-unordered-lists-not-sets
"""
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
set_comp = set(l1) == set(l2) # removes duplicates, so returns true when not sometimes :(
multiset_comp = compare(l1, l2) # approximates multiset
return set_comp and multiset_comp #set_comp is gere in case the compare function doesn't work

Is there another way besides "all" to check if all elements values before my target element are true?

I'm trying to return true if only all the previous elements are true up to the current position.
I have it set up with all function but I don't want to code it this way
def check(lightsOnOff, light):
for light in lights[:light]:
if not on:
return False
return True
count = count + 1

In general all is a useful construct to use, I can see why it looks wrong in this expression
all(list(lightsOnOff.values())[:light])
but the smelly part is actually the list(iterable)[:number] construction, which forces construction of the whole list then truncates it.
As an important aside, if lightsOnOff is a dict (not e.g. an OrderedDict) your code will be non-deterministic (see notes at bottom).
If you don't want to create a list and slice it, you can leverage itertools:
from itertools import islince
...
all(islice(lightsOnOff.values(), n))
As a frame challenge, if your dict has an order and you know the keys, you can simply rewrite it as:
all(lightsOnOff[k] for k in keys[:light])
and if your dict has keys that are ordered and e.g. integers, just use a list?
all(listOfLights[:light])

Provided you want to implement all yourself on an arbitrary list, you can do something like:
my_list = [1, 7, 2, 1, None, 2, 3]
up_to_ix = 5
def my_all(some_list, up_to_index):
for element in some_list[:up_to_index]:
if not element:
return False
return True
my_all(my_list, up_to_ix)
The function will loop through all elements in the list up to, but excluding the some_index and if it finds at least one Falsy value, will return False, otherwise True.

comparing contents of two lists python [duplicate]

a = [1, 2, 3, 1, 2, 3]
b = [3, 2, 1, 3, 2, 1]
a & b should be considered equal, because they have exactly the same elements, only in different order.
The thing is, my actual lists will consist of objects (my class instances), not integers.

O(n): The Counter() method is best (if your objects are hashable):
def compare(s, t):
return Counter(s) == Counter(t)
O(n log n): The sorted() method is next best (if your objects are orderable):
def compare(s, t):
return sorted(s) == sorted(t)
O(n * n): If the objects are neither hashable, nor orderable, you can use equality:
def compare(s, t):
t = list(t) # make a mutable copy
try:
for elem in s:
t.remove(elem)
except ValueError:
return False
return not t

You can sort both:
sorted(a) == sorted(b)
A counting sort could also be more efficient (but it requires the object to be hashable).
>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True

If you know the items are always hashable, you can use a Counter() which is O(n)
If you know the items are always sortable, you can use sorted() which is O(n log n)
In the general case you can't rely on being able to sort, or has the elements, so you need a fallback like this, which is unfortunately O(n^2)
len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)

If you have to do this in tests:
https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual
assertCountEqual(first, second, msg=None)
Test that sequence first contains the same elements as second, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated.
Duplicate elements are not ignored when comparing first and second. It verifies whether each element has the same count in both sequences. Equivalent to: assertEqual(Counter(list(first)), Counter(list(second))) but works with sequences of unhashable objects as well.
New in version 3.2.
or in 2.7:
https://docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual
Outside of tests I would recommend the Counter method.

The best way to do this is by sorting the lists and comparing them. (Using Counter won't work with objects that aren't hashable.) This is straightforward for integers:
sorted(a) == sorted(b)
It gets a little trickier with arbitrary objects. If you care about object identity, i.e., whether the same objects are in both lists, you can use the id() function as the sort key.
sorted(a, key=id) == sorted(b, key==id)
(In Python 2.x you don't actually need the key= parameter, because you can compare any object to any object. The ordering is arbitrary but stable, so it works fine for this purpose; it doesn't matter what order the objects are in, only that the ordering is the same for both lists. In Python 3, though, comparing objects of different types is disallowed in many circumstances -- for example, you can't compare strings to integers -- so if you will have objects of various types, best to explicitly use the object's ID.)
If you want to compare the objects in the list by value, on the other hand, first you need to define what "value" means for the objects. Then you will need some way to provide that as a key (and for Python 3, as a consistent type). One potential way that would work for a lot of arbitrary objects is to sort by their repr(). Of course, this could waste a lot of extra time and memory building repr() strings for large lists and so on.
sorted(a, key=repr) == sorted(b, key==repr)
If the objects are all your own types, you can define __lt__() on them so that the object knows how to compare itself to others. Then you can just sort them and not worry about the key= parameter. Of course you could also define __hash__() and use Counter, which will be faster.

If the comparison is to be performed in a testing context, use assertCountEqual(a, b) (py>=3.2) and assertItemsEqual(a, b) (2.7<=py<3.2).
Works on sequences of unhashable objects too.

If the list contains items that are not hashable (such as a list of objects) you might be able to use the Counter Class and the id() function such as:
from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
print("Lists a and b contain the same objects")

Let a,b lists
def ass_equal(a,b):
try:
map(lambda x: a.pop(a.index(x)), b) # try to remove all the elements of b from a, on fail, throw exception
if len(a) == 0: # if a is empty, means that b has removed them all
return True
except:
return False # b failed to remove some items from a
No need to make them hashable or sort them.

I hope the below piece of code might work in your case :-
if ((len(a) == len(b)) and
(all(i in a for i in b))):
print 'True'
else:
print 'False'
This will ensure that all the elements in both the lists a & b are same, regardless of whether they are in same order or not.
For better understanding, refer to my answer in this question

You can write your own function to compare the lists.
Let's get two lists.
list_1=['John', 'Doe']
list_2=['Doe','Joe']
Firstly, we define an empty dictionary, count the list items and write in the dictionary.
def count_list(list_items):
empty_dict={}
for list_item in list_items:
list_item=list_item.strip()
if list_item not in empty_dict:
empty_dict[list_item]=1
else:
empty_dict[list_item]+=1
return empty_dict
After that, we'll compare both lists by using the following function.
def compare_list(list_1, list_2):
if count_list(list_1)==count_list(list_2):
return True
return False
compare_list(list_1,list_2)

from collections import defaultdict
def _list_eq(a: list, b: list) -> bool:
if len(a) != len(b):
return False
b_set = set(b)
a_map = defaultdict(lambda: 0)
b_map = defaultdict(lambda: 0)
for item1, item2 in zip(a, b):
if item1 not in b_set:
return False
a_map[item1] += 1
b_map[item2] += 1
return a_map == b_map
Sorting can be quite slow if the data is highly unordered (timsort is extra good when the items have some degree of ordering). Sorting both also requires fully iterating through both lists.
Rather than mutating a list, just allocate a set and do a left-->right membership check, keeping a count of how many of each item exist along the way:
If the two lists are not the same length you can short circuit and return False immediately.
If you hit any item in list a that isn't in list b you can return False
If you get through all items then you can compare the values of a_map and b_map to find out if they match.
This allows you to short-circuit in many cases long before you've iterated both lists.

plug in this:
def lists_equal(l1: list, l2: list) -> bool:
"""
import collections
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
ref:
- https://stackoverflow.com/questions/9623114/check-if-two-unordered-lists-are-equal
- https://stackoverflow.com/questions/7828867/how-to-efficiently-compare-two-unordered-lists-not-sets
"""
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
set_comp = set(l1) == set(l2) # removes duplicates, so returns true when not sometimes :(
multiset_comp = compare(l1, l2) # approximates multiset
return set_comp and multiset_comp #set_comp is gere in case the compare function doesn't work

Priority queue with two priority values

As it is good known, elements which are inserted to the priority queue have a value which determines its priority. For example if I have five elements A,B,C,D,E with priorities (let's call this priority values priorityI):
A = 10, B = 5, C = 1, D = 3, E = 2.
But how can I write a priority queue where I can define two priority values, I mean:
if two elements has the same value of priorityI, then value priorityII decides which element should be taken first, like for example:
element A has priorityI = 3, and prioriotyII = 5
element B has priorityI = 3, and prioriotyII = 1
then first element B will be taken from the queue first.

Starting from Python2.6, you can use Queue.PriorityQueue.
Items inserted into the queue are sorted based on their __cmp__ method, so just implement one for the class whose objects are to be inserted into the queue.
Note that if your items consist of tuples of objects, you don't need to implement a container class for the tuple, as the built in tuple comparison implementation probably fits your needs, as stated above (pop the lower value item first). Though, you might need to implement the __cmp__ method for the class whose objects reside in the tuple.
>>> from Queue import PriorityQueue
>>> priority_queue = PriorityQueue()
>>> priority_queue.put((1, 2))
>>> priority_queue.put((1, 1))
>>> priority_queue.get()
(1, 1)
>>> priority_queue.get()
(1, 2)
EDIT: As #Blckknght noted, if your priority queue is only going to be used by a single thread, the heapq module, available from Python2.3, is the preferred solution. If so, please refer to his answer.

The usual way to do this is to make your priority value a tuple of your two priorities. Python sorts tuples lexographically, so it first will compare the first tuple item of each priority, and only if they are equal will the next items be compared.
The usual way to make a priority queue in Python is using the heapq module's functions to manipulate a list. Since the whole value is compared, we can simply put our two priorities along with a value into a single tuple:
import heapq
q = [] # the queue is a regular list
A = (3, 5, "Element A") # our first item, a three-tuple with two priorities and a value
B = (3, 1, "Element B") # a second item
heapq.heappush(q, A) # push the items into the queue
heapq.heappush(q, B)
print(heapq.heappop(q)[2]) # pop the highest priority item and print its value
This prints "Element B".

Explanation regarding "generator object"

Could some one please explain why this code:
A = [1,2,3,4]
B = ((element) for element in A)
print(B)
produces: <generator object <genexpr> at 0x0319B490>
while this code:
A = [1,2,3,4]
for element in A:
print(A)
produces:
1
2
3
4
They seem to be the same to me but they are obviously different. I can't figure out the difference between them.
Thanks.

The first code is a generator expression, hence it will create a generator object at a certain memory address. If you want to use list comprehension then use [] as per:
A = [1,2,3,4]
B = [element for element in A]
print(B)
# [1, 2, 3, 4]
This list comprehension is equivalent to:
A = [1,2,3,4]
B = []
for element in A:
B.append(element)

The first is not a loop but a generator expresion so will printing B it shows us the object ref.
The second one Is a loop, it iterates over the elements and print them all.
Try doing this, you can iterate over a generator:
A = [1,2,3,4]
B = ((element) for element in A)
for e in B:
print(e)
This will result in the same as your second expresion:
for e in A:
print(e)
Notice that you can only iterate once until the generator is exausted.

The fundamental difference between the two is that a generator expression defines an object that will generate values as you loop. In other words, the values will be generated on each iteration and consumed on demand. With a list comprehension, the values are created up-front and will consume as much memory as is required to hold all the values in memory at once.
It's easy to look at these two constructs as being the exact same thing but in the case of the generator, you are consuming the values on demand in a lazy way. This is very useful because you don't have pay the cost of memory to hold all of the data up-front.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

partition iterator in two based on attribute - python

Related

How to compare two list of dictionaries [duplicate]

Is there another way besides "all" to check if all elements values before my target element are true?

comparing contents of two lists python [duplicate]

Priority queue with two priority values

Explanation regarding "generator object"

Categories

Resources