Python reduce explanation - python

I'm not able to understand the following code segment:
>>> lot = ((1, 2), (3, 4), (5,))
>>> reduce(lambda t1, t2: t1 + t2, lot)
(1, 2, 3, 4, 5)
How does the reduce function produce a tuple of (1,2,3,4,5) ?

It's easier if you break out the lambda into a function, so it's clearer to what's going on:
>>> def do_and_print(t1, t2):
print 't1 is', t1
print 't2 is', t2
return t1+t2
>>> reduce(do_and_print, ((1,2), (3,4), (5,)))
t1 is (1, 2)
t2 is (3, 4)
t1 is (1, 2, 3, 4)
t2 is (5,)
(1, 2, 3, 4, 5)

reduce() applies a function sequentially, chaining the elements of a sequence:
reduce(f, [a,b,c,d], s)
is the same as
f(f(f(f(s, a), b), c), d)
and so on. In your case the f() is a lambda function (lambda t1, t2: t1 + t2) which just adds up its two arguments, so you end up with
(((s + a) + b) + c) + d
and because the parenthesizing on adding sequences doesn't make any difference, this is
s + a + b + c + d
or with your actual values
(1, 2) + (3, 4) + (5,)
If s is not given, the first term is just not done, but usually the neutral element is used for s, so in your case () would have been correct:
reduce(lambda t1, t2: t1 + t2, lot, ())
But without it, you only run into trouble if lot has no elements (TypeError: reduce() of empty sequence with no initial value).

reduce(...)
reduce(function, sequence[, initial]) -> value
Apply a function of two arguments cumulatively to the items of a sequence,
from left to right, so as to reduce the sequence to a single value.
For example, reduce(lambda x, y: x+y, ((1, 2), (3, 4), (5))) calculates
(((1+2)+(3+4))+5). If initial is present, it is placed before the items
of the sequence in the calculation, and serves as a default when the
sequence is empty.

let's trace the reduce
result = (1,2) + (3,4)
result = result + (5, )
Notice that your reduction concatenates tuples.

reduce takes a function and an iterator as arguments. The function must accept two arguments.
What reduce does is that it iterates through the iterable. First it sends the first two values to the function. Then it sends the result of that together with the next value, and so on.
So in your case, it takes the first and the second item in the tuple, (1,2) and (3,4) and sends them to the lambda function. That function adds them together. The result is sent to the lambda function again, together with the third item. Since there are no more items in the tuple, the result is returned.

Related

How can zip be used to chunk data into equal sized groups?

>>> n = 3
>>> x = range(n ** 2),
>>> xn = list(zip(*[iter(x)] * n))
In PEP 618, the author gives this example of how zip can be used to chunk data into equal sized groups.
How does it work?
I think that it relies on an implementation detail of zip such that if it takes the first element of each of the elements of the list [iter(x)] * n that equates to the first n elements because of the changing state of iter(x) as each of the elements are taken.
This is because the following code replicates the above behavior:
n = 3
x = range(n ** 2)
xn = [iter(x)] * n
res = []
while True:
try:
col = []
for element in xn:
col.append(next(element))
res.append(col)
except:
break
However, I would like to make sure that this is indeed the case and that this is a reliable behavior that can be used to chunk elements of an iterable.
It's not really specific to zip, but you basically have that right. In effect, it's zipping 3 references to the same iterator, causing it to round-robin between them. During each iteration, one more element is consumed from the iterator.
Effectively, it's the same as doing this:
>>> n = 3
>>> x = range(n ** 2)
>>> a = b = c = iter(x)
>>> list(zip(a, b, c))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
Note that it only produces equal sized groups and may drop elements (that part is a characteristic of zip, because it's limited by the smallest iterable, though you could use itertools.zip_longest if you want):
>>> n = 4
>>> x = range(n ** 2)
>>> a = b = c = iter(x)
>>> list(zip(a, b, c))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 14)]
It's not an implementation of zip. It's how iterables work in Python - they always "consume" and move forward.
eg:
whatever = iter([1, 2, 3])
next(whatever)
# 1
next(whatever)
# 2
What zip does is "advance" each object it's provided with and given the example you've provided [iter(x)] * n... this becomes basically zip(whatever, whatever, whatever)
Since zip works in sequence - it takes the first next from whatever - then the next from whatever which has already moved on from the first next, so it's the value of 2. Which means the next one is 3. etc...
It's behaviour by design and the language guarantees it.

Implementation of nested pairs

I'm doing Advent of Code and until this point I had no problem to solve issues on my own until day 18 which got me. I want to solve it on my own but I feel like I can't even start however the task itself is not difficult. It's kinda long to elaborate but the point is the following:
I have to implement nested pairs, so every item of a pair can be an another pair and so on, for example:
[[[[[9,8],1],2],3],4]
[[3,[2,[1,[7,3]]]],[6,[5,[4,[3,2]]]]]
I have to do different operations on these pairs, such as deleting pairs on a very deep level and use its values at higher level. I know Python handles tuples really great but I have yet to find a solution for recursive (?) traversal, deletion and "saving" values from the "deep". It's not a wise solution to delete items during iterating. Shall I use a different approach or some custom data structure for a task like this? I don't need an exact solution just some generic guidance.
I haven't read problem 18 from advent of code, but here is an example answer to your question with a list version and a tuple version.
Imagine I have nested pairs, and I want to write a function that delete all the deepest pairs and replaced them with a single number being the sum of the two numbers in the pair. For example:
input -> output
(1, 2) -> 3
((1, 2), 3) -> (3, 3)
(((1,2), (3,4)), ((5,6), (7,8))) -> ((3, 7), (11, 15))
((((((1, 2), 3), 4), 5), 6), 7) -> (((((3, 3), 4), 5), 6), 7)
Here is a version with immutable tuples, that returns a new tuple:
def updated(p):
result, _ = updated_helper(p)
return result
def updated_helper(p):
if isinstance(p, tuple) and len(p) == 2:
a, b = p
new_a, a_is_number = updated_helper(a)
new_b, b_is_number = updated_helper(b)
if a_is_number and b_is_number:
return a+b, False
else:
return (new_a, new_b), False
else:
return p, True
Here is a version with mutable lists, that returns nothing useful but mutates the list:
def update(p):
if isinstance(p, list) and len(p) == 2:
a, b = p
a_is_number = update(a)
b_is_number = update(b)
if isinstance(a, list) and len(a) == 1:
p[0] = a[0]
if isinstance(b, list) and len(b) == 1:
p[1] = b[0]
if a_is_number and b_is_number:
p[:] = [a+b]
return False
else:
return True
Note how I used a substantive, updated, and a verb, update, to highlight the different logic between these two similar functions. Function update performs an action (modifying a list), whereas updated(p) doesn't perform any action, but is the updated pair.
Testing:
print( updated( (((1,2), (3,4)), ((5,6), (7,8))) ) )
# ((3, 7), (11, 15))
l = [[[1,2], [3,4]], [[5,6], [7,8]]]
update(l)
print(l)
# [[3, 7], [11, 15]]

python tuple remove the first match appearance using only higher order functions

I want to write a Rem(a, b) which return a new tuple that is like a, with the first appearance of element b is removed. For example
Rem((0, 1, 9, 1, 4), 1) which will return (0, 9, 1, 4).
I am only allowed to use higher order functions such as lambda, filter, map, and reduce.
I am thinking about to use filter but this will delete all of the match elements
def myRem(T, E):
return tuple(filter(lambda x: (x!=E), T))
myRem((0, 1, 9, 1, 4), 1) I will have (0,9,4)
The following works (Warning: hacky code):
tuple(map(lambda y: y[1], filter(lambda x: (x[0]!=T.index(E)), enumerate(T))))
But I would never recommend doing this unless the requirements are rigid
Trick with temporary list:
def removeFirst(t, v):
tmp_lst = [v]
return tuple(filter(lambda x: (x != v or (not tmp_lst or v != tmp_lst.pop(0))), t))
print(removeFirst((0, 1, 9, 1, 4), 1))
tmp_lst.pop(0) - will be called only once (thus, excluding the 1st occurrence of the crucial value v)
not tmp_lst - all remaining/potential occurrences will be included due to this condition
The output:
(0, 9, 1, 4)
For fun, using itertools, you can sorta use mostly higher-order functions...
>>> from itertools import *
>>> data = (0, 1, 9, 1, 4)
>>> not1 = (1).__ne__
>>> tuple(chain(takewhile(not1, data), islice(dropwhile(not1, data), 1, None)))
(0, 9, 1, 4)
BTW, here's some timings comparing different approaches for dropping a particular index in a tuple:
>>> timeit.timeit("t[:i] + t[i+1:]", "t = tuple(range(100000)); i=50000", number=10000)
10.42419078599778
>>> timeit.timeit("(*t[:i], *t[i+1:])", "t = tuple(range(100000)); i=50000", number=10000)
20.06185237201862
>>> timeit.timeit("(*islice(t,None, i), *islice(t, i+1, None))", "t = tuple(range(100000)); i=50000; from itertools import islice", number=10000)
>>> timeit.timeit("tuple(chain(islice(t,None, i), islice(t, i+1, None)))", "t = tuple(range(100000)); i=50000; from itertools import islice, chain", number=10000)
19.71128663700074
>>> timeit.timeit("it = iter(t); tuple(chain(islice(it,None, i), islice(it, 1, None)))", "t = tuple(range(100000)); i=50000; from itertools import islice, chain", number=10000)
17.6895881179953
Looks like it is hard to beat the straightforward: t[:i] + t[i+1:], which is not surprising.
Note, this one is shockingly less performant:
>>> timeit.timeit("tuple(j for i, j in enumerate(t) if i != idx)", "t = tuple(range(100000)); idx=50000", number=10000)
111.66658291200292
Which makes me thing all these solutions using takewhile, filter and lambda will all suffer pretty bad...
Although:
>>> timeit.timeit("not1 = (i).__ne__; tuple(chain(takewhile(not1, t), islice(dropwhile(not1, t), 1, None)))", "t = tuple(range(100000)); i=50000; from itertools import chain, takewhile,dropwhile, islice", number=10000)
62.22159145199112
Almost twice as fast as the generator expression, which goes to show, generator overhead can be quite large. However, takewhile and dropwhile are implemented in C, albeit this implementation has redundancy (take-while and dropwhile will pass the dropwhile areas twice).
Another interesting observation, if we simply wrap the substitute a list-comp for the generator expression, it is significantly faster despite the fact that the list-comprehension + tuple call iterates over the result twice compared to only once with the generator expression:
>>> timeit.timeit("tuple([j for i, j in enumerate(t) if i != idx])", "t = tuple(range(100000)); idx=50000", number=10000)
82.59887028901721
Goes to show how steep the generator-expression price can be...
Here is a solution that only uses lambda, filter(), map(), reduce() and tuple().
def myRem(T, E):
# map the tuple into a list of tuples (value, indicator)
M = map(lambda x: [(x, 1)] if x == E else [(x,0)], T)
# make the indicator 0 once the first instance of E is found
# think of this as a boolean mask of items to remove
# here the second reduce can be changed to the sum function
R = reduce(
lambda x, y: x + (y if reduce(lambda a, b: a+b, map(lambda z: z[1], x)) < 1
else [(y[0][0], 0)]),
M
)
# filter the reduced output based on the indicator
F = filter(lambda x: x[1]==0, R)
# map the output back to the desired format
O = map(lambda x: x[0], F)
return tuple(O)
Explanation
A good way to understand what's going on is to print the outputs of the intermediate steps.
Step 1: First Map
For each value in the tuple, we return a tuple with the value and a flag to indicate if it's the value to remove. These tuples are encapsulated in a list because it makes combining easier in the next step.
# original example
T = (0, 1, 9, 1, 4)
E = 1
M = map(lambda x: [(x, 1)] if x == E else [(x,0)], T)
print(M)
#[[(0, 0)], [(1, 1)], [(9, 0)], [(1, 1)], [(4, 0)]]
Step 2: Reduce
This returns a list of tuples in a similar structure to the contents of M, but the flag variable is set to 1 for the first instance of E, and 0 for all subsequent instances. This is achieved by calculating the sum of the indicator up to that point (implemented as another reduce()).
R = reduce(
lambda x, y: x + (y if reduce(lambda a, b: a+b, map(lambda z: z[1], x)) < 1
else [(y[0][0], 0)]),
M
)
print(R)
#[(0, 0), (1, 1), (9, 0), (1, 0), (4, 0)]
Now the output is in the form of (value, to_be_removed).
Step 3: Filter
Filter out the value to be removed.
F = filter(lambda x: x[1]==0, R)
print(F)
#[(0, 0), (9, 0), (1, 0), (4, 0)]
Step 4: Second map and conversion to tuple
Extract the value from the filtered list, and convert it to a tuple.
O = map(lambda x: x[0], F)
print(tuple(O))
#(0, 9, 1, 4)
This violates your requirement for "only using higher order functions" - but since it's not clear why this is a requirement, I include the below solution.
def myRem(tup, n):
idx = tup.index(n)
return tuple(j for i, j in enumerate(tup) if i != idx)
myRem((0, 1, 9, 1, 4), 1)
# (0, 9, 1, 4)
Here is a numpy solution (still not using higher-order functions):
import numpy as np
def myRem(tup, n):
tup_arr = np.array(tup)
return tuple(np.delete(tup_arr, np.min(np.nonzero(tup_arr == n)[0])))
myRem((0, 1, 9, 1, 4), 1)
# (0, 9, 1, 4)

numpy / python - indexing by arrays with duplicates

I'm trying to make a 3D histogram. Initially h = zeros((6,6,8)).
I'll explain my problem with an example. Suppose I have 3 lists of coordinates for h, each list for one dimension:
x = array([2,1,0,1,2,2])
y = array([1,3,0,3,2,1])
z = array([6,2,0,2,5,6]) (the coordinates (x[0],y[0],z[0]) and (x[6],y[6],z[6]) are duplicates, and (x[1],y[1],z[1]) and (x[3],y[3],z[3]) also are)
and also a list of corresponding quantities to accumulate into h:
q = array([1,2,5,9,8,7])
I tried and h[x,y,z] += q does not work because only q[5] = 7 is added to h[2,1,6] and q[0] = 1 is not.
How can I work around this? Thank you.
IIUC, you want np.add.at. To quote the docs: "For addition ufunc, this method is equivalent to a[indices] += b, except that results are accumulated for elements that are indexed more than once."
For example:
>>> np.add.at(h, [x,y,z], q)
>>> for i, val in np.ndenumerate(h):
... if val: print(i, val)
...
((0, 0, 0), 5.0)
((1, 3, 2), 11.0)
((2, 1, 6), 8.0)
((2, 2, 5), 8.0)

Sum possibilities, one loop

Earlier I had a lot of wonderful programmers help me get a function done. however the instructor wanted it in a single loop and all the working solutions used multiple loops.
I wrote an another program that almost solves the problem. Instead of using a loop to compare all the values, you have to use the function has_key to see if that specific key exists. Answer of that will rid you of the need to iter through the dictionary to find matching values because u can just know if they are matching or not.
again, charCount is just a function that enters the constants of itself into a dictionary and returns the dictionary.
def sumPair(theList, n):
for a, b in level5.charCount(theList).iteritems():
x = n - a
if level5.charCount(theList).get(a):
if a == x:
if b > 1: #this checks that the frequency of the number is greater then one so the program wouldn't try to multiply a single possibility by itself and use it (example is 6+6=12. there could be a single 6 but it will return 6+6
return a, x
else:
if level5.charCount(theList).get(a) != x:
return a, x
print sumPair([6,3,8,3,2,8,3,2], 9)
I need to just make this code find the sum without iteration by seeing if the current element exists in the list of elements.
You can use collections.Counter function instead of the level5.charCount
And I don't know why you need to check if level5.charCount(theList).get(a):. I think it is no need. a is the key you get from the level5.charCount(theList)
So I simplify you code:
form collections import Counter
def sumPair(the_list, n):
for a, b in Counter(the_list).iteritems():
x = n - a
if a == x and b >1:
return a, x
if a != x and b != x:
return a, x
print sumPair([6, 3, 8, 3, 2, 8, 3, 2], 9) #output>>> (8, 1)
The also can use List Comprehension like this:
>>>result = [(a, n-a) for a, b in Counter(the_list).iteritems() if a==n-a and b>1 or (a != n-a and b != n-a)]
>>>print result
[(8, 1), (2, 7), (3, 6), (6, 3)]
>>>print result[0] #this is the result you want
(8, 1)

Categories