Pythonic way to extract elements from heapq

Pythonic way to extract elements from heapq - python

I am using a priority queue (heapq) with datetime.datetime as priority.
What is the most pythonic way to extract a subset of element from this list if I have the startTime and endTime to search for.
(I can not alter the original list, so I must create a new list and return, or return an iterator)
Below is an example of what I have:
>>> import heapq
>>> timeLine = []
>>> from datetime import datetime
>>> heapq.heappush(timeLine, (datetime.now(),'A'))
>>> heapq.heappush(timeLine, (datetime.now(),'B'))
>>> heapq.heappush(timeLine, (datetime.now(),'C'))
>>> timeLine
[(datetime.datetime(2013, 2, 8, 15, 25, 14, 720000), 'A'), (datetime.datetime(2013, 2, 8, 15, 25, 30, 575000), 'B'), (datetime.datetime(2013, 2, 8, 15, 25, 36, 959000), 'C')]
The real application-list is huge.

Heaps are not the ideal structure to perform this operation; if you stick to heapq's public API, the heap will be altered and made useless for further operations. #Anonymous' solution may work, but (IMHO) relies too much on implementation details. While these are publicly documented, I'm not sure if you should really be using them.
Simply sorting the list and doing two binary searches is an easy way to do what you want:
from bisect import bisect_left, bisect_right
def find_range(timeline, start, end):
l = bisect_left(timeline, start)
r = bisect_right(timeline, end)
for i in xrange(l, r):
yield timeline[i]
The only trouble with this approach is that sorting takes O(n lg n) time in the worst case, but then so does your way of constructing the heap (heapq.heapify would take linear time).

Every node in a heap is larger than all its children. Also, if the node is at index i, its immediate children are at indices 2 * i + 1 and 2 * i + 2.
Viewing the heap as a binary tree, you can recurse down the heap, stopping if an entry is bigger than the maxkey you've got (since all its children will be larger), and outputting the node if its key is between minkey and maxkey.
Putting this together gives this:
def extract_range(h, i, minkey, maxkey):
if i >= len(h) or h[i][0] >= maxkey:
return
if h[i][0] >= minkey:
yield h[i]
for k in 1, 2:
for r in extract_range(h, 2 * i + k, minkey, maxkey):
yield r

Related

How do I generate a list of all possible combinations from a single element in a pair, from numerous pairs nested within a parent list?

I find myself in a unique situation in which I need to multiply single elements within a listed pair of numbers where each pair is nested within a parent list of elements. For example, I have my pre-defined variables as:
output = []
initial_list = [[1,2],[3,4],[5,6]]
I am trying to calculate an output such that each element is the product of a unique combination (always of length len(initial_list)) of a single element from each pair. Using my example of initial_list, I am looking to generate an output of length pow(2 * len(initial_list)) that is scable for any "n" number of pairs in initial_list (with a minimum of 2 pairs). So in this case each element of the output would be as follows:
output[0] = 1 * 3 * 5
output[1] = 1 * 3 * 6
output[2] = 1 * 4 * 5
output[3] = 1 * 4 * 6
output[4] = 2 * 3 * 5
output[5] = 2 * 3 * 6
output[6] = 2 * 4 * 5
output[7] = 2 * 4 * 6
In my specific case, the order of output assignments does not matter other than output[0], which I need to be equivalent to the product of the first element in each pair in initial_list. What is the best way to proceed to generate an output list such that each element is a unique combination of every element in each list?
...
My initial approach consisted of using;
from itertools import combinations
from itertools import permutations
from itertools import product
to somehow generate a list of every possible combination then multiply the products together and append each product to the output list, but I couldn't figure out a wait to implement the tools successfully. I have since tried to create a recursive function that combines for x in range(2): with nested recursion recalls, but once again I cannot figured out a solution.
Someone more experienced and smarter than me please help me out; Any and all help is appreciated! Thank you!

Without using any external library
def multi_comb(my_list):
"""
This returns the multiplication of
every possible combinationation of
the `my_list` of type [[a1, a2], [b1, b2], ...]
Arg: List
Return: List
"""
if not my_list: return [1]
a, b = my_list.pop(0)
result = multi_comb(my_list)
left = [a * i for i in result]
right = [b * i for i in result]
return (left + right)
print(multi_comb([[1, 2], [3, 4], [5, 6]]))
# Output
# [15, 18, 20, 24, 30, 36, 40, 48]
I am using reccursion to get the result. Here's the visual illustration of how this works.
Instead of taking a top-down approach, we can take bottom-up approach to better understand how this program works.
At the last step, a and b becomes 5 and 6 respectively. Calling multi_comb() with empty list returns [1] as a result. So left and right becomes [5] and [6]. Thus we return [5, 6] to our previous step.
At the second last step, a and b was 3 and 4 respectively. From the last step we got [5, 6] as a result. After multiplying each of the values inside the result with a and b (notice left and right), we return the result [15, 18, 20, 24] to our previous step.
At our first step, that is our starting step, we had a and b as 1 and 2 respectively. The value returned from our last step becomes our result, ie, [15, 18, 20, 24]. Now we multiply both a and b with this result and return our final output.
Note:
This program works only if list is in the form [ [a1, a2], [b1, b2], [c1, c2], ... ] as told by the OP in the comments. The problem of solving the list containing the sub-list of n items will be little different in code, but the concept is same as in this answer.

This problem can also be solved using dynamic programming
output = [1, ]
for arr in initial_list:
output = [a * b for a in arr for b in product]
This problem is easy to solve if you have just one subarray -- the output is the given subarray.
Suppose you solved the problem for the first n - 1 subarrays, and you got the output. The new subarray is appended. How the output should change? The new output is all pair-wise products of the previous output and the "new" subarray.

Look closely, there's an easy pattern. Let there be n sublists, and 2 elements in each: at index 0 and 1. Now, the indexes selected can be represented as a binary string of length n.
It'll start with 0000..000, then 0000...001, 0000...010 and so on. So all you need to do is:
n = len(lst)
for i in range(2**n):
binary = bin(i)[2:] #get binary representation
for j in range(n):
if binary[j]=="1":
#include jth list's 1st index in product
else:
#include jth list's 0th index in product
The problem would a scalable solution would be, since you're generating all possible pairs, the time complexity will be O(2^N)

Your idea to use itertools.product is great!
import itertools
initial_list = [[1,2],[3,4],[5,6]]
combinations = list(itertools.product(*initial_list))
# [(1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6)]
Now, you can get the product of each tuple in combination using for-loops, or using functools.reduce, or you can use math.prod which was introduced in python 3.8:
import itertools
import math
initial_list = [[1,2],[3,4],[5,6]]
output = [math.prod(c) for c in itertools.product(*initial_list)]
# [15, 18, 20, 24, 30, 36, 40, 48]
import itertools
import functools
import operator
initial_list = [[1,2],[3,4],[5,6]]
output = [functools.reduce(operator.mul, c) for c in itertools.product(*initial_list)]
# [15, 18, 20, 24, 30, 36, 40, 48]
import itertools
output = []
for c in itertools.product(*initial_list):
p = 1
for x in c:
p *= x
output.append(p)
# output == [15, 18, 20, 24, 30, 36, 40, 48]
Note: if you are more familiar with lambdas, operator.mul is pretty much equivalent to lambda x,y: x*y.

itertools.product and math.prod are a nice fit -
from itertools import product
from math import prod
input = [[1,2],[3,4],[5,6]]
output = [prod(x) for x in product(*input)]
print(output)
[15, 18, 20, 24, 30, 36, 40, 48]

Most Pythonic way to iterate by multiples

I was reviewing some algorithms and they had a for loop that increases by a constant multiple. What would be the most Pythonic way of solving this?
This is not an issue of how to solve the problem, but more of a discussion on what the best solution would be?
This is the Java snip:
for (int i = 1; i <=n; i *= c) {
// some stuff
}
Here is an actual solution in python. I don't think it is the most Pythonic method:
i = 1
while i < limit:
# some stuff, remember to use i - 1 as array index
i *= constant
Pythonic way I could see (That does not exist):
for i in mrange(1, limit, c):
# some stuff
First post here. Hope I tagged and all correctly...

You still can do this :
def mrange(start, stop, step):
i = start
while i < stop:
yield i
i *= step
And then :
for i in mrange(1, 100, 4):
print(i)
Prints :
1
4
16
64
Python cannot provide default range functions to fit every needs, but it is pythonic to create your own generators.
If you don't like this solution, the while alternative looks ok too.

You can use itertools.accumulate; start with a range 1 to n, then apply a function which multiplies its first argument by your constant and ignores its second argument.
>>> from itertools import accumulate
>>> [x for x in accumulate(range(1,10), lambda x,_: 4*x)]
[1, 4, 16, 64, 256, 1024, 4096, 16384, 65536]
Having missed that you want to take the values less than n, start with the infinite sequence [c, c**2, c**3, c**4, ...] and use takewhile to "filter" it. (Also, I just realized you only need map, not accumulate, although accumulate may be more efficient. Note the difference in starting points when using map vs accumulate, too.):
>>> from itertools import count, takewhile
>>> n = 100
>>> [x for x in takewhile(lambda x: x < n, map(lambda x: 4**x, count(0)))]
[1, 4, 16, 64]
>>> [x for x in takewhile(lambda x: x < n, accumulate(count(1), lambda x,_: x*4))]
[1, 4, 16, 64]

Using math module:-
for i in range(math.ceil(math.log(limit, const))):
# code goes here
Ex:-
>>> for i in range(math.ceil(math.log(20, 2))):
... print("runs")
...
runs
runs
runs
runs
runs
which is similar to:-
i = 1
while i< 20:
print('runs')
i*=2
Finally; in easy seeming way:-
>>> import math
>>> mrange = lambda i, limit, const: range(i, math.ceil(math.log(limit, const)))
>>> for i in mrange(0, 20, 2):
print('whoa')
..
whoa
whoa
whoa
whoa
whoa

Compare nums need optimisation (codingame.com)

www.codingame.com
Task
Write a program which, using a given number of strengths,
identifies the two closest strengths and shows their difference with an integer
Info
n = Number of horses
pi = strength of each horse
d = difference
1 < n < 100000
0 < pi ≤ 10000000
My code currently
def get_dif(a, b):
return abs(a - b)
horse_str = [10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7]
n = len(horse_str)
d = 10000001
for x in range(len(horse_str)):
for y in range(x, len(horse_str) - 1):
d = min([get_dif(horse_str[x], horse_str[y + 1]), d])
print(d)
Test cases
[3,5,8, 9] outputs: 1
[10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7] outputs: 1
Problem
They both work but then the next test gives me a very long list of horse strengths and i get **Process has timed out. This may mean that your solution is not optimized enough to handle some cases.
How can i optimise it? Thank you!
EDIT ONE
Default code given
import sys
import math
# Auto-generated code below aims at helping you parse
# the standard input according to the problem statement.
n = int(input())
for i in range(n):
pi = int(input())
# Write an action using print
# To debug: print("Debug messages...", file=sys.stderr)
print("answer")

Since you can use sort method (which is optimized to avoid performing a costly bubble sort or double loop by hand which has O(n**2) complexity, and times out with a very big list), let me propose something:
sort the list
compute the minimum of absolute value of difference of the adjacent values, passing a generator comprehension to the min function
The minimum has to be the abs difference of adjacent values. Since the list is sorted using a fast algorithm, the heavy lifting is done for you.
like this:
horse_str = [10, 5, 15, 17, 3, 8, 11, 28, 6, 55, 7]
sh = sorted(horse_str)
print(min(abs(sh[i]-sh[i+1]) for i in range(len(sh)-1)))
I also get 1 as a result (I hope I didn't miss anything)

pythonic format for indices

I am after a string format to efficiently represent a set of indices.
For example "1-3,6,8-10,16" would produce [1,2,3,6,8,9,10,16]
Ideally I would also be able to represent infinite sequences.
Is there an existing standard way of doing this? Or a good library? Or can you propose your own format?
thanks!
Edit: Wow! - thanks for all the well considered responses. I agree I should use ':' instead. Any ideas about infinite lists? I was thinking of using "1.." to represent all positive numbers.
The use case is for a shopping cart. For some products I need to restrict product sales to multiples of X, for others any positive number. So I am after a string format to represent this in the database.

You don't need a string for that, This is as simple as it can get:
from types import SliceType
class sequence(object):
def __getitem__(self, item):
for a in item:
if isinstance(a, SliceType):
i = a.start
step = a.step if a.step else 1
while True:
if a.stop and i > a.stop:
break
yield i
i += step
else:
yield a
print list(sequence()[1:3,6,8:10,16])
Output:
[1, 2, 3, 6, 8, 9, 10, 16]
I'm using Python slice type power to express the sequence ranges. I'm also using generators to be memory efficient.
Please note that I'm adding 1 to the slice stop, otherwise the ranges will be different because the stop in slices is not included.
It supports steps:
>>> list(sequence()[1:3,6,8:20:2])
[1, 2, 3, 6, 8, 10, 12, 14, 16, 18, 20]
And infinite sequences:
sequence()[1:3,6,8:]
1, 2, 3, 6, 8, 9, 10, ...
If you have to give it a string then you can combine #ilya n. parser with this solution. I'll extend #ilya n. parser to support indexes as well as ranges:
def parser(input):
ranges = [a.split('-') for a in input.split(',')]
return [slice(*map(int, a)) if len(a) > 1 else int(a[0]) for a in ranges]
Now you can use it like this:
>>> print list(sequence()[parser('1-3,6,8-10,16')])
[1, 2, 3, 6, 8, 9, 10, 16]

If you're into something Pythonic, I think 1:3,6,8:10,16 would be a better choice, as x:y is a standard notation for index range and the syntax allows you to use this notation on objects. Note that the call
z[1:3,6,8:10,16]
gets translated into
z.__getitem__((slice(1, 3, None), 6, slice(8, 10, None), 16))
Even though this is a TypeError if z is a built-in container, you're free to create the class that will return something reasonable, e.g. as NumPy's arrays.
You might also say that by convention 5: and :5 represent infinite index ranges (this is a bit stretched as Python has no built-in types with negative or infinitely large positive indexes).
And here's the parser (a beautiful one-liner that suffers from slice(16, None, None) glitch described below):
def parse(s):
return [slice(*map(int, x.split(':'))) for x in s.split(',')]
There's one pitfall, however: 8:10 by definition includes only indices 8 and 9 -- without upper bound. If that's unacceptable for your purposes, you certainly need a different format and 1-3,6,8-10,16 looks good to me. The parser then would be
def myslice(start, stop=None, step=None):
return slice(start, (stop if stop is not None else start) + 1, step)
def parse(s):
return [myslice(*map(int, x.split('-'))) for x in s.split(',')]
Update: here's the full parser for a combined format:
from sys import maxsize as INF
def indices(s: 'string with indices list') -> 'indices generator':
for x in s.split(','):
splitter = ':' if (':' in x) or (x[0] == '-') else '-'
ix = x.split(splitter)
start = int(ix[0]) if ix[0] is not '' else -INF
if len(ix) == 1:
stop = start + 1
else:
stop = int(ix[1]) if ix[1] is not '' else INF
step = int(ix[2]) if len(ix) > 2 else 1
for y in range(start, stop + (splitter == '-'), step):
yield y
This handles negative numbers as well, so
print(list(indices('-5, 1:3, 6, 8:15:2, 20-25, 18')))
prints
[-5, 1, 2, 6, 7, 8, 10, 12, 14, 20, 21, 22, 23, 24, 25, 18, 19]
Yet another alternative is to use ... (which Python recognizes as the built-in constant Ellipsis so you can call z[...] if you want) but I think 1,...,3,6, 8,...,10,16 is less readable.

This is probably about as lazily as it can be done, meaning it will be okay for even very large lists:
def makerange(s):
for nums in s.split(","): # whole list comma-delimited
range_ = nums.split("-") # number might have a dash - if not, no big deal
start = int(range_[0])
for i in xrange(start, start + 1 if len(range_) == 1 else int(range_[1]) + 1):
yield i
s = "1-3,6,8-10,16"
print list(makerange(s))
output:
[1, 2, 3, 6, 8, 9, 10, 16]

import sys
class Sequencer(object):
def __getitem__(self, items):
if not isinstance(items, (tuple, list)):
items = [items]
for item in items:
if isinstance(item, slice):
for i in xrange(*item.indices(sys.maxint)):
yield i
else:
yield item
>>> s = Sequencer()
>>> print list(s[1:3,6,8:10,16])
[1, 2, 6, 8, 9, 16]
Note that I am using the xrange builtin to generate the sequence. That seems awkward at first because it doesn't include the upper number of sequences by default, however it proves to be very convenient. You can do things like:
>>> print list(s[1:10:3,5,5,16,13:5:-1])
[1, 4, 7, 5, 5, 16, 13, 12, 11, 10, 9, 8, 7, 6]
Which means you can use the step part of xrange.

This looked like a fun puzzle to go with my coffee this morning. If you settle on your given syntax (which looks okay to me, with some notes at the end), here is a pyparsing converter that will take your input string and return a list of integers:
from pyparsing import *
integer = Word(nums).setParseAction(lambda t : int(t[0]))
intrange = integer("start") + '-' + integer("end")
def validateRange(tokens):
if tokens.from_ > tokens.to:
raise Exception("invalid range, start must be <= end")
intrange.setParseAction(validateRange)
intrange.addParseAction(lambda t: list(range(t.start, t.end+1)))
indices = delimitedList(intrange | integer)
def mergeRanges(tokens):
ret = set()
for item in tokens:
if isinstance(item,int):
ret.add(item)
else:
ret += set(item)
return sorted(ret)
indices.setParseAction(mergeRanges)
test = "1-3,6,8-10,16"
print indices.parseString(test)
This also takes care of any overlapping or duplicate entries, such "3-8,4,6,3,4", and returns a list of just the unique integers.
The parser takes care of validating that ranges like "10-3" are not allowed. If you really wanted to allow this, and have something like "1,5-3,7" return 1,5,4,3,7, then you could tweak the intrange and mergeRanges parse actions to get this simpler result (and discard the validateRange parse action altogether).
You are very likely to get whitespace in your expressions, I assume that this is not significant. "1, 2, 3-6" would be handled the same as "1,2,3-6". Pyparsing does this by default, so you don't see any special whitespace handling in the code above (but it's there...)
This parser does not handle negative indices, but if that were needed too, just change the definition of integer to:
integer = Combine(Optional('-') + Word(nums)).setParseAction(lambda t : int(t[0]))
Your example didn't list any negatives, so I left it out for now.
Python uses ':' for a ranging delimiter, so your original string could have looked like "1:3,6,8:10,16", and Pascal used '..' for array ranges, giving "1..3,6,8..10,16" - meh, dashes are just as good as far as I'm concerned.

Useful code which uses reduce()? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Does anyone here have any useful code which uses reduce() function in python? Is there any code other than the usual + and * that we see in the examples?
Refer Fate of reduce() in Python 3000 by GvR

The other uses I've found for it besides + and * were with and and or, but now we have any and all to replace those cases.
foldl and foldr do come up in Scheme a lot...
Here's some cute usages:
Flatten a list
Goal: turn [[1, 2, 3], [4, 5], [6, 7, 8]] into [1, 2, 3, 4, 5, 6, 7, 8].
reduce(list.__add__, [[1, 2, 3], [4, 5], [6, 7, 8]], [])
List of digits to a number
Goal: turn [1, 2, 3, 4, 5, 6, 7, 8] into 12345678.
Ugly, slow way:
int("".join(map(str, [1,2,3,4,5,6,7,8])))
Pretty reduce way:
reduce(lambda a,d: 10*a+d, [1,2,3,4,5,6,7,8], 0)

reduce() can be used to find Least common multiple for 3 or more numbers:
#!/usr/bin/env python
from math import gcd
from functools import reduce
def lcm(*args):
return reduce(lambda a,b: a * b // gcd(a, b), args)
Example:
>>> lcm(100, 23, 98)
112700
>>> lcm(*range(1, 20))
232792560

reduce() could be used to resolve dotted names (where eval() is too unsafe to use):
>>> import __main__
>>> reduce(getattr, "os.path.abspath".split('.'), __main__)
<function abspath at 0x009AB530>

Find the intersection of N given lists:
input_list = [[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7]]
result = reduce(set.intersection, map(set, input_list))
returns:
result = set([3, 4, 5])
via: Python - Intersection of two lists

I think reduce is a silly command. Hence:
reduce(lambda hold,next:hold+chr(((ord(next.upper())-65)+13)%26+65),'znlorabggbbhfrshy','')

The usage of reduce that I found in my code involved the situation where I had some class structure for logic expression and I needed to convert a list of these expression objects to a conjunction of the expressions. I already had a function make_and to create a conjunction given two expressions, so I wrote reduce(make_and,l). (I knew the list wasn't empty; otherwise it would have been something like reduce(make_and,l,make_true).)
This is exactly the reason that (some) functional programmers like reduce (or fold functions, as such functions are typically called). There are often already many binary functions like +, *, min, max, concatenation and, in my case, make_and and make_or. Having a reduce makes it trivial to lift these operations to lists (or trees or whatever you got, for fold functions in general).
Of course, if certain instantiations (such as sum) are often used, then you don't want to keep writing reduce. However, instead of defining the sum with some for-loop, you can just as easily define it with reduce.
Readability, as mentioned by others, is indeed an issue. You could argue, however, that only reason why people find reduce less "clear" is because it is not a function that many people know and/or use.

Function composition: If you already have a list of functions that you'd like to apply in succession, such as:
color = lambda x: x.replace('brown', 'blue')
speed = lambda x: x.replace('quick', 'slow')
work = lambda x: x.replace('lazy', 'industrious')
fs = [str.lower, color, speed, work, str.title]
Then you can apply them all consecutively with:
>>> call = lambda s, func: func(s)
>>> s = "The Quick Brown Fox Jumps Over the Lazy Dog"
>>> reduce(call, fs, s)
'The Slow Blue Fox Jumps Over The Industrious Dog'
In this case, method chaining may be more readable. But sometimes it isn't possible, and this kind of composition may be more readable and maintainable than a f1(f2(f3(f4(x)))) kind of syntax.

You could replace value = json_obj['a']['b']['c']['d']['e'] with:
value = reduce(dict.__getitem__, 'abcde', json_obj)
If you already have the path a/b/c/.. as a list. For example, Change values in dict of nested dicts using items in a list.

#Blair Conrad: You could also implement your glob/reduce using sum, like so:
files = sum([glob.glob(f) for f in args], [])
This is less verbose than either of your two examples, is perfectly Pythonic, and is still only one line of code.
So to answer the original question, I personally try to avoid using reduce because it's never really necessary and I find it to be less clear than other approaches. However, some people get used to reduce and come to prefer it to list comprehensions (especially Haskell programmers). But if you're not already thinking about a problem in terms of reduce, you probably don't need to worry about using it.

reduce can be used to support chained attribute lookups:
reduce(getattr, ('request', 'user', 'email'), self)
Of course, this is equivalent to
self.request.user.email
but it's useful when your code needs to accept an arbitrary list of attributes.
(Chained attributes of arbitrary length are common when dealing with Django models.)

reduce is useful when you need to find the union or intersection of a sequence of set-like objects.
>>> reduce(operator.or_, ({1}, {1, 2}, {1, 3})) # union
{1, 2, 3}
>>> reduce(operator.and_, ({1}, {1, 2}, {1, 3})) # intersection
{1}
(Apart from actual sets, an example of these are Django's Q objects.)
On the other hand, if you're dealing with bools, you should use any and all:
>>> any((True, False, True))
True

I'm writing a compose function for a language, so I construct the composed function using reduce along with my apply operator.
In a nutshell, compose takes a list of functions to compose into a single function. If I have a complex operation that is applied in stages, I want to put it all together like so:
complexop = compose(stage4, stage3, stage2, stage1)
This way, I can then apply it to an expression like so:
complexop(expression)
And I want it to be equivalent to:
stage4(stage3(stage2(stage1(expression))))
Now, to build my internal objects, I want it to say:
Lambda([Symbol('x')], Apply(stage4, Apply(stage3, Apply(stage2, Apply(stage1, Symbol('x'))))))
(The Lambda class builds a user-defined function, and Apply builds a function application.)
Now, reduce, unfortunately, folds the wrong way, so I wound up using, roughly:
reduce(lambda x,y: Apply(y, x), reversed(args + [Symbol('x')]))
To figure out what reduce produces, try these in the REPL:
reduce(lambda x, y: (x, y), range(1, 11))
reduce(lambda x, y: (y, x), reversed(range(1, 11)))

reduce can be used to get the list with the maximum nth element
reduce(lambda x,y: x if x[2] > y[2] else y,[[1,2,3,4],[5,2,5,7],[1,6,0,2]])
would return [5, 2, 5, 7] as it is the list with max 3rd element +

Reduce isn't limited to scalar operations; it can also be used to sort things into buckets. (This is what I use reduce for most often).
Imagine a case in which you have a list of objects, and you want to re-organize it hierarchically based on properties stored flatly in the object. In the following example, I produce a list of metadata objects related to articles in an XML-encoded newspaper with the articles function. articles generates a list of XML elements, and then maps through them one by one, producing objects that hold some interesting info about them. On the front end, I'm going to want to let the user browse the articles by section/subsection/headline. So I use reduce to take the list of articles and return a single dictionary that reflects the section/subsection/article hierarchy.
from lxml import etree
from Reader import Reader
class IssueReader(Reader):
def articles(self):
arts = self.q('//div3') # inherited ... runs an xpath query against the issue
subsection = etree.XPath('./ancestor::div2/#type')
section = etree.XPath('./ancestor::div1/#type')
header_text = etree.XPath('./head//text()')
return map(lambda art: {
'text_id': self.id,
'path': self.getpath(art)[0],
'subsection': (subsection(art)[0] or '[none]'),
'section': (section(art)[0] or '[none]'),
'headline': (''.join(header_text(art)) or '[none]')
}, arts)
def by_section(self):
arts = self.articles()
def extract(acc, art): # acc for accumulator
section = acc.get(art['section'], False)
if section:
subsection = acc.get(art['subsection'], False)
if subsection:
subsection.append(art)
else:
section[art['subsection']] = [art]
else:
acc[art['section']] = {art['subsection']: [art]}
return acc
return reduce(extract, arts, {})
I give both functions here because I think it shows how map and reduce can complement each other nicely when dealing with objects. The same thing could have been accomplished with a for loop, ... but spending some serious time with a functional language has tended to make me think in terms of map and reduce.
By the way, if anybody has a better way to set properties like I'm doing in extract, where the parents of the property you want to set might not exist yet, please let me know.

Not sure if this is what you are after but you can search source code on Google.
Follow the link for a search on 'function:reduce() lang:python' on Google Code search
At first glance the following projects use reduce()
MoinMoin
Zope
Numeric
ScientificPython
etc. etc. but then these are hardly surprising since they are huge projects.
The functionality of reduce can be done using function recursion which I guess Guido thought was more explicit.
Update:
Since Google's Code Search was discontinued on 15-Jan-2012, besides reverting to regular Google searches, there's something called Code Snippets Collection that looks promising. A number of other resources are mentioned in answers this (closed) question Replacement for Google Code Search?.
Update 2 (29-May-2017):
A good source for Python examples (in open-source code) is the Nullege search engine.

After grepping my code, it seems the only thing I've used reduce for is calculating the factorial:
reduce(operator.mul, xrange(1, x+1) or (1,))

import os
files = [
# full filenames
"var/log/apache/errors.log",
"home/kane/images/avatars/crusader.png",
"home/jane/documents/diary.txt",
"home/kane/images/selfie.jpg",
"var/log/abc.txt",
"home/kane/.vimrc",
"home/kane/images/avatars/paladin.png",
]
# unfolding of plain filiname list to file-tree
fs_tree = ({}, # dict of folders
[]) # list of files
for full_name in files:
path, fn = os.path.split(full_name)
reduce(
# this fucction walks deep into path
# and creates placeholders for subfolders
lambda d, k: d[0].setdefault(k, # walk deep
({}, [])), # or create subfolder storage
path.split(os.path.sep),
fs_tree
)[1].append(fn)
print fs_tree
#({'home': (
# {'jane': (
# {'documents': (
# {},
# ['diary.txt']
# )},
# []
# ),
# 'kane': (
# {'images': (
# {'avatars': (
# {},
# ['crusader.png',
# 'paladin.png']
# )},
# ['selfie.jpg']
# )},
# ['.vimrc']
# )},
# []
# ),
# 'var': (
# {'log': (
# {'apache': (
# {},
# ['errors.log']
# )},
# ['abc.txt']
# )},
# [])
#},
#[])

I just found useful usage of reduce: splitting string without removing the delimiter. The code is entirely from Programatically Speaking blog. Here's the code:
reduce(lambda acc, elem: acc[:-1] + [acc[-1] + elem] if elem == "\n" else acc + [elem], re.split("(\n)", "a\nb\nc\n"), [])
Here's the result:
['a\n', 'b\n', 'c\n', '']
Note that it handles edge cases that popular answer in SO doesn't. For more in-depth explanation, I am redirecting you to original blog post.

I used reduce to concatenate a list of PostgreSQL search vectors with the || operator in sqlalchemy-searchable:
vectors = (self.column_vector(getattr(self.table.c, column_name))
for column_name in self.indexed_columns)
concatenated = reduce(lambda x, y: x.op('||')(y), vectors)
compiled = concatenated.compile(self.conn)

I have an old Python implementation of pipegrep that uses reduce and the glob module to build a list of files to process:
files = []
files.extend(reduce(lambda x, y: x + y, map(glob.glob, args)))
I found it handy at the time, but it's really not necessary, as something similar is just as good, and probably more readable
files = []
for f in args:
files.extend(glob.glob(f))

Let say that there are some yearly statistic data stored a list of Counters.
We want to find the MIN/MAX values in each month across the different years.
For example, for January it would be 10. And for February it would be 15.
We need to store the results in a new Counter.
from collections import Counter
stat2011 = Counter({"January": 12, "February": 20, "March": 50, "April": 70, "May": 15,
"June": 35, "July": 30, "August": 15, "September": 20, "October": 60,
"November": 13, "December": 50})
stat2012 = Counter({"January": 36, "February": 15, "March": 50, "April": 10, "May": 90,
"June": 25, "July": 35, "August": 15, "September": 20, "October": 30,
"November": 10, "December": 25})
stat2013 = Counter({"January": 10, "February": 60, "March": 90, "April": 10, "May": 80,
"June": 50, "July": 30, "August": 15, "September": 20, "October": 75,
"November": 60, "December": 15})
stat_list = [stat2011, stat2012, stat2013]
print reduce(lambda x, y: x & y, stat_list) # MIN
print reduce(lambda x, y: x | y, stat_list) # MAX

I have objects representing some kind of overlapping intervals (genomic exons), and redefined their intersection using __and__:
class Exon:
def __init__(self):
...
def __and__(self,other):
...
length = self.length + other.length # (e.g.)
return self.__class__(...length,...)
Then when I have a collection of them (for instance, in the same gene), I use
intersection = reduce(lambda x,y: x&y, exons)

def dump(fname,iterable):
with open(fname,'w') as f:
reduce(lambda x, y: f.write(unicode(y,'utf-8')), iterable)

Using reduce() to find out if a list of dates are consecutive:
from datetime import date, timedelta
def checked(d1, d2):
"""
We assume the date list is sorted.
If d2 & d1 are different by 1, everything up to d2 is consecutive, so d2
can advance to the next reduction.
If d2 & d1 are not different by 1, returning d1 - 1 for the next reduction
will guarantee the result produced by reduce() to be something other than
the last date in the sorted date list.
Definition 1: 1/1/14, 1/2/14, 1/2/14, 1/3/14 is consider consecutive
Definition 2: 1/1/14, 1/2/14, 1/2/14, 1/3/14 is consider not consecutive
"""
#if (d2 - d1).days == 1 or (d2 - d1).days == 0: # for Definition 1
if (d2 - d1).days == 1: # for Definition 2
return d2
else:
return d1 + timedelta(days=-1)
# datelist = [date(2014, 1, 1), date(2014, 1, 3),
# date(2013, 12, 31), date(2013, 12, 30)]
# datelist = [date(2014, 2, 19), date(2014, 2, 19), date(2014, 2, 20),
# date(2014, 2, 21), date(2014, 2, 22)]
datelist = [date(2014, 2, 19), date(2014, 2, 21),
date(2014, 2, 22), date(2014, 2, 20)]
datelist.sort()
if datelist[-1] == reduce(checked, datelist):
print "dates are consecutive"
else:
print "dates are not consecutive"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pythonic way to extract elements from heapq - python

Related

How do I generate a list of all possible combinations from a single element in a pair, from numerous pairs nested within a parent list?

Most Pythonic way to iterate by multiples

Compare nums need optimisation (codingame.com)

pythonic format for indices

Useful code which uses reduce()? [closed]

Categories

Resources