Python aggregate on a generator

Python aggregate on a generator - python

I have a generator that returns a list in each iteration. Each element of the list could be either 0 or 1. I want to count the total number of elements returned (including both 0 and 1) and the total number of 1 returned. I tried to implement this using reduce function like this :
t = reduce( (lambda x,y:(y[0]+1,y[1]+x)), gen_fn(), (0,0))
gen_fn() above is the generator that returns part of the list in each yield statement. I wanted to implement it by initializing with a tuple (0,0) for count. Given that the elements returned from generator are following :
[0, 1, 1, 0, 1]
My expected output for t is (5,3). But my code is failing with this error message :
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'
Can anybody help me identify the problem? My lack of experience with reduce and lambda functions is preventing me from figuring out what I am doing wrong. Thanks in advance.

I think the best answer here is to keep it simple:
count = 0
total = 0
for item in gen_fn():
count += 1
total += item
Using reduce() here only makes your code less readable.
If your question is code golf and you want a one liner (while keeping lazy evaluation), then you want:
count, total = collections.deque(zip(itertools.count(1), itertools.accumulate(gen_fn())), maxlen=1).pop()
Of course, you'd be mad to pick such a construction over the simple solution.
Edit:
If the generator yields multiple smaller parts, then simply use itertools.chain.from_iterable(gen_fn()) to flatten it.

You have the lambda arguments the wrong way around; the first argument (x) is the total so far (the tuple) and the second (y) is the new value (the integer). Try:
t = reduce((lambda x, y: (x[0]+1, x[1]+y)), gen_fn(), (0,0))
Using a dummy function:
def gen_fn():
for x in [0, 1, 1, 0, 1]:
yield x
I get (5, 3).
This equivalent implementation of reduce from the docs might make things clearer:
def reduce(function, iterable, initializer=None):
it = iter(iterable)
if initializer is None:
try:
initializer = next(it)
except StopIteration:
raise TypeError('reduce() of empty sequence with no initial value')
accum_value = initializer
for x in it:
accum_value = function(accum_value, x) # note value so far is first arg
return accum_value

As jonrsharpe has pointed out, you are using your lambda arguments backwards, given the way reduce works. However, there may be a further issue with how you're adding things up, if each item yielded from your generator is a list.
This issue is that your y value (the item yielded by the generator) is not a single number, but a list. You need to count its length and the number of 1s it has, so you probably want your lambda function to be:
lambda x, y: (x[0]+len(y), x[1]+sum(y))

How about taking a completely different approach?
t = [(len(row), len(filter(lambda x: x == 1, row))) for row in gen_fn()]

Related

Selection_sort (python)

I came across this code on selection sort algorithm:
ls = [2, 5, 1, -9, 10, 13, 7, 2]
def selection_sort(ls):
for i in range(len(ls)):
imin = min(range(i,len(ls)), key = lambda x: ls[x])
ls[i], ls[imin] = ls[imin], ls[i]
I know the typical selection_sort with the if block, but this one is hard to understand. I tried to print imin with all possible i's and the result was 33337777 which doesn't make sense to me. I think my problem is that I don't know how this specific key works. Does anyone have any insight on this?

function declaration
The statement is the function definition, Which takes one argument i.e. list
def selection_sort(ls):
Loop Initialization
A for-loop is defined that will iterate the list from i = 0 to the len(ls).
for i in range(len(ls)):
Main Logic
Inside the for-loop, there are 2 statements
imin = min(range(i,len(ls)), key = lambda x: ls[x])
The above code uses python's min function taking 2 arguments, an iterator and a function to find the minimum value from the list starting from index i to len(ls) and return the item's index using the lambda function passed as the second argument.
ls[i], ls[imin] = ls[imin], ls[i]
The above code is responsible for swapping the minimum item with the item at the index i of the list.

Python all()/any() like method for a portion/part of list?

What would be the most elegant/pythonic way of achieving: "if x% of total values in a list are greater than the y, return true". I have currently implemented a function:
def check(listItems, val):
'''A method to check all elements of a list against a given value.
Returns true if all items of list are greater than value.'''
return all(x>val for x in listItems)
But for my use case, waiting for this particular condition is quite costly and somewhat useless. I would want to proceed if ~80% of the items in list are greater than the given value.
One approach in my mind is to sort the list in descending order, create another list and copy 80% of the elements of list to the new list, and run the function for that new list. However, I am hoping that there must be a more elegant way of doing this. Any suggestions?

It sounds like you are dealing with long lists which is why this is costly. If would be nice if you could exit early as soon as a condition is met. any() will do this, but you'll want to avoid reading the whole list before passing it to any(). One options might be to use itertools.accumulate to keep a running total of True values and the pass that to any. Something like:
from itertools import accumulate
a = [1, 2, 2, 3, 4, 2, 4, 1, 1, 1]
# true if 50% are greater than 1
goal = .5 * len(a) # at least 5 out of 10
any( x > goal for x in accumulate(n > 1 for n in a))
accumulate won't need to read the whole list — it will just start passing the number of True values seen up to that point. any should short-circuit as soon as it finds a true value, which in the above case is at index 5.

What about this:
def check(listItems, val, threshold=0.8):
return sum(x > val for x in listItems) > len(listItems) * threshold
It states: check is True if more than threshold% (0.80 by default) of the elements in listItems are greater than val.

You can use filter for this. By far this is the fastest method. Refer to my other answer as this is faster than the methods in that.
def check(listItems, val, goal=0.8):
return len((*filter(val.__lt__, listItems),)) >= len(listItems) * goal
Tested result time for this ran along with the methods in my other question is:
1.684135717988247

Check each item in order.
If you reach a point where you are satisfied then return True early.
If you reach a point where you can never be satisfied, even if every future item passes the test, then return False early.
Otherwise keep going (in case the later elements help you satisfy the requirement).
This is the same idea as FatihAkici in the comments above, but with a further optimization.
def check(list_items, ratio, val):
passing = 0
satisfied = ratio * len(list_items)
for index, item in enumerate(list_items):
if item > val:
passing += 1
if passing >= satisfied:
return True
remaining_items = len(list_items) - index - 1
if passing + remaining_items < satisfied:
return False

I don’t want to take credit for Mark Meyer’s answer as he came up with the concept of using accumulate and any as well as theirs being more pythonic/readable, but if you’re looking for the "fastest" approach then modifying his approach with using map vs using comprehensions is faster.
any(map(goal.__le__, accumulate(map(val.__lt__, listItems))))
Just to test:
from timeit import timeit
from itertools import accumulate
def check1(listItems, val):
goal = len(listItems)*0.8
return any(x > goal for x in accumulate(n > val for n in listItems))
def check2(listItems, val):
goal = len(listItems)*0.8
return any(map(goal.__le__, accumulate(map(val.__lt__, listItems))))
items = [1, 2, 2, 3, 4, 2, 4, 1, 1, 1]
for t in (check1, check2):
print(timeit(lambda: t(items, 1)))
The results are:
3.2596251670038328
2.0594907909980975

python - Length of a list with the reduce() function

I need some help to count the numbers of elements in a list by using the reduce function.
def lenReduce(L):
return reduce(lambda x: x + 1, L)
With this one I get the following error message:
TypeError: <lambda>() takes 1 positional argument but 2 were given
Greetings from Berlin. ;-)

lenReduce([5,3,1])
returns 7
What this means is, for the very first time when the lambda function is invoked, count is set to 5 and item is set to 3 which are the first two elemets of the list. From the next invocation of the lambda function, count is incremented. Therefore the solution does not work.
The solution is to set the count to a value of our choosing rather than the first element of the list. To do that, invoke reduce with three arguments.
def lenReduce(L):
return reduce(lambda count, item: count + 1, L, 0)
In the above call of reduce, count is set to 0 and item will be set to the elements of the list starting from index 0 on each iteration.
lenReduce([3,2,1])
outputs 3 which is the desired result.

The function argument to reduce takes two arguments: the return value of the previous call, and an item from the list.
def counter(count, item):
return count + 1
In this case, you don't really care what the value of item is; simply passing it to counter means you want to return the current value of the counter plus 1.
def lenReduce(L):
return reduce(counter, L)
or, using a lambda expression,
def lenReduce(L):
return reduce(lambda count, item: count + 1, L)
Even though your function ignores the second argument, reduce still expects to be able to pass it to the function, so it must be defined to take two arguments.

All previous answers seemed not work since it will add 1 to the first element of the given list every iteration. Try this trick, add 0 as the first element of the list, then decrease the return value by 1.
lenReduce = lambda L: reduce(lambda x, y: x+1, [0]+L, 0) - 1

In this short recursive function `list_sum(aList)`, the finish condition is `if not aList: return 0`. I see no logic in why this condition works

I am learning the recursive functions. I completed an exercise, but in a different way than proposed.
"Write a recursive function which takes a list argument and returns the sum of its integers."
L = [0, 1, 2, 3, 4] # The sum of elements will be 10
My solution is:
def list_sum(aList):
count = len(aList)
if count == 0:
return 0
count -= 1
return aList[0] + list_sum(aList[1:])
The proposed solution is:
def proposed_sum(aList):
if not aList:
return 0
return aList[0] + proposed_sum(aList[1:])
My solution is very clear in how it works.
The proposed solution is shorter, but it is not clear for me why does the function work. How does if not aList even happen? I mean, how would the rest of the code fulfill a not aList, if not aList means it checks for True/False, but how is it True/False here?
I understand that return 0 causes the recursion to stop.
As a side note, executing without if not aList throws IndexError: list index out of range.
Also, timeit-1million says my function is slower. It takes 3.32 seconds while the proposed takes 2.26. Which means I gotta understand the proposed solution.

On the call of the function, aList will have no elements. Or in other words, the only element it has is null. A list is like a string or array. When you create a variable you reserve some space in the memory for it. Lists and such have a null on the very last position which marks the end so nothing can be stored after that point. You keep cutting the first element in the list, so the only thing left is the null. When you reach it you know you're done.
If you don't use that condition the function will try to take a number that doesn't exist, so it throws that error.

You are counting the items in the list, and the proposed one check if it's empty with if not aList this is equals to len(aList) == 0, so both of you use the same logic.
But, you're doing count -= 1, this has no sense since when you use recursion, you pass the list quiting one element, so here you lose some time.
According to PEP 8, this is the proper way:
• For sequences, (strings, lists, tuples), use the fact that empty
sequences are false.
Yes: if not seq:
if seq:
No: if len(seq)
if not len(seq)
Here is my amateur thougts about why:
This implicit check will be faster than calling len, since len is a function to get the length of a collection, it works by calling an object's __len__ method. This will find up there is no item to check __len__.
So both will find up there is no item there, but one does it directly.

not aList
return True if there is no elements in aList. That if statement in the solution covers edge case and checks if input parameter is not empty list.

For understand this function, let's run it step by step :
step 0 :
L=[0,1,2,3,4]
proposed_sum([0,1,2,3,4])
L != []
return l[0] + proposed_sum([1,2,3,4])
step 1 calcul proposed_sum([1,2,3,4]):
proposed_sum([1,2,3,4])
L != []
return l[0] + sum([2,3,4])
step 2 calcul proposed_sum([2,3,4]):
proposed_sum([2,3,4])
L != []
return l[0] + sum([3,4])
step 3 calcul proposed_sum([3,4]):
proposed_sum([3,4])
L != []
return l[0] + sum([4])
step 4 calcul proposed_sum([4]):
proposed_sum([4])
L != []
return l[0] + sum([])
step 5 calcul proposed_sum([]):
proposed_sum([])
L == []
return 0
step 6 replace:
proposed_sum([0,1,2,3,4])
By
proposed_sum([]) + proposed_sum([4]) + proposed_sum([3,4]) + proposed_sum([2,3,4]) + proposed_sum([1,2,3,4])+ proposed_sum([0,1,2,3,4])
=
(0) + 4 + 3 + 2 + 1 + 0

Python considers as False multiple values:
False (of course)
0
None
empty collections (dictionaries, lists, tuples)
empty strings ('', "", '''''', """""", r'', u"", etc...)
any other object whose __nonzero__ method returns False
in your case, the list is evaluated as a boolean. If it is empty, it is considered as False, else it is considered as True. This is just a shorter way to write if len(aList) == 0:
in addition, concerning your new question in the comments, consider the last line of your function:
return aList[0] + proposed_sum(aList[1:])
This line call a new "instance" of the function but with a subset of the original list (the original list minus the first element). At each recursion, the list passed in argument looses an element and after a certain amount of recursions, the passed list is empty.

Returning the index of the largest element in an array in Python

I'm trying to create a function that returns the largest element of an array, I feel I have the correct code but my syntax is in the wrong order, I'm trying to use a for/while loop in order to do so. So far I have the following:
def manindex(arg):
ans = 0
while True:
for i in range (len(arg)):
if arg[i] > arg[ans]:
pass
ans = i
return ans
Not sure where I'm going wrong if anyone could provide some guidance, thanks
EDIT: So it's been pointing out I'm causing an infinite loop so if I take out the while statement I'm left with
def manindex(arg):
ans = 0
for i in range (len(arg)):
if arg[i] > arg[ans]:
ans = i
return ans
But I have a feeling it's still not correct

When you say array I think you mean list in Python, you don't need a for/loop or while/loop to achieve this at all.
You can also use index with max, like so:
xs.index(max(xs))
sample:
xs = [1,123,12,234,34,23,42,34]
xs.index(max(xs))
3

You could use max with the key parameter set to seq.__getitem__:
def argmax(seq):
return max(range(len(seq)), key=seq.__getitem__)
print(argmax([0,1,2,3,100,4,5]))
yields
4

The idea behind finding the largest index is always the same, iterating over the elements of the array, compare to the max value we have at the moment, if it's better, the index of the current element is the maximum now, if it's not, we keep looking for it.
enumerate approach:
def max_element_index(items):
max_index, max_value = None, None
for index, item in enumerate(items):
if item > max_value:
max_index, max_value = index, item
return max_index
functional approach:
def max_element_index(items):
return reduce(lambda x,y: x[1] > y[1] and x or y,
enumerate(items), (None, None))[0]
At the risk of looking cryptic, the functional approach uses the reduce function which takes two elements and decides what is the reduction. Those elements are tuples (index, element), which are the result of the enumerate function.
The reduce function, defined on the lambda body takes two elements and return the tuple of the largest. As the reduce function reduces until only one element in the result is encountered, the champion is the tuple containing the index of the largest and the largest element, so we only need to access the 0-index of the tuple to get the element.
On the other hand if the list is empty, None object is returned, which is granted on the third parameter of the reduce function.

Before I write a long winded explanation, let me give you the solution:
index, value = max(enumerate(list1), key=lambda x: x[1])
One line, efficient (single pass O(n)), and readable (I think).
Explanation
In general, it's a good idea to use as much of python's incredibly powerful built-in functions as possible.
In this instance, the two key functions are enumerate() and max().
enumerate() converts a list (or actually any iterable) into a sequence of indices and values. e.g.
>>> list1 = ['apple', 'banana', 'cherry']
>>> for tup in enumerate(list1):
... print tup
...
(0, 'apple')
(1, 'banana')
(2, 'cherry')
max() takes an iterable and returns the maximum element. Unfortunately, max(enumerate(list1)) doesn't work, because max() will sort based on the first element of the tuple created by enumerate(), which sadly is the index.
One lesser-known feature of max() is that it can take a second argument in the form max(list1, key=something). The key is a function that can be applied to each value in the list, and the output of that function is what gets used to determine the maximum. We can use this feature to tell max() that it should be ranking items by the second item of each tuple, which is the value contained in the list.
Combining enumerate() and max() with key (plus a little help from lambda to create a function that returns the second element of a tuple) gives you this solution.
index, value = max(enumerate(list1), key=lambda x: x[1])
I came up with this recently (and am sprinkling it everywhere in my code) after watching Raymond Hettinger's talk on Transforming Code into Beautiful, Idiomatic Python, where he suggests exorcising the for i in xrange(len(list1)): pattern from your code.
Alternatively, without resorting to lambda (Thanks #sweeneyrod!):
from operator import itemgetter
index, value = max(enumerate(list1), key=itemgetter(1))

I believe if you change your for loop to....
for i in range (len(arg)):
if arg[i] > ans:
ans = arg[i]
it should work.

You could try something like this. If the list is empty, then the function will return an error.
m is set to the first element of the list, we then iterate over the list comparing the value at ever step.
def findMax(xs):
m = xs[0]
for x in xs:
if x > m:
m = x
return m
findMax([]) # error
findMax([1]) # 1
findMax([2,1]) # 2
if you wanted to use a for loop and make it more generic, then:
def findGeneric(pred, xs):
m = xs[0]
for x in xs:
if pred(x,m):
m = x
return m
findGeneric(lambda a,b: len(a) > len(b), [[1],[1,1,1,1],[1,1]]) # [1,1,1,1]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python aggregate on a generator - python

How about taking a completely different approach? t = [(len(row), len(filter(lambda x: x == 1, row))) for row in gen_fn()]

Related

Selection_sort (python)

Python all()/any() like method for a portion/part of list?

python - Length of a list with the reduce() function

In this short recursive function `list_sum(aList)`, the finish condition is `if not aList: return 0`. I see no logic in why this condition works

Returning the index of the largest element in an array in Python

Categories

Resources