Negative bounds slice

Negative bounds slice - python

I have two n-length tuples and I need to check whether all the elements in the same position are the same except for the element in position w. This is what I wrote:
if all(tup1[i] == tup2[i] for i in xrange(n) if i != w):
...
In order to avoid the loop (since this piece of code will be used many times), I tried to use slicing. Unfortunately, this doesn't work:
if tup1[w-1:w-n:-1] == tup2[w-1:w-n:-1]:
...
Am I obliged to write something like this?
if tup1[:w-1] == tup2[:w-1] and tup1[w+1:] == tup2[w+1:]
Isn't there a more elegant methodd?
Or both loop and slicing are no good and there is a better way of obtaining the result I'm looking for? (I can't use filter because there may be elements with the same value of the one in position w)

I think you've already found the best solution:
tup1[:w-1] == tup2[:w-1] and tup1[w+1:] == tup2[w+1:]
If the tuples were extremly long and you didn't want to copy the data and you wanted early-out behavior, there is a much more complicated alternative using itertools and operator:
>>> from operator import eq
>>> from itertools import imap
>>> w = 5
>>> t1 = (10, 20, 30, 40, -1, 50, 60, 70)
>>> t2 = (10, 20, 30, 40, -1, 50, 60, 70)
>>> it1, it2 = iter(t1), iter(t2)
>>> all(imap(eq, islice(it1, w-1), islice(it2, w-1))) \
and (next(it1, True) or True) and (next(it2, True) and True) \
and all(imap(eq, it1, it2))
True
This is a lot of set-up work and stepwise isn't as fast as tuple slicing, but it does avoid copying all the data and it does have an early out.
In the non-extreme case, I would stick with your double-sliced-tuple-equality solution.

Related

Most Pythonic way to iterate by multiples

I was reviewing some algorithms and they had a for loop that increases by a constant multiple. What would be the most Pythonic way of solving this?
This is not an issue of how to solve the problem, but more of a discussion on what the best solution would be?
This is the Java snip:
for (int i = 1; i <=n; i *= c) {
// some stuff
}
Here is an actual solution in python. I don't think it is the most Pythonic method:
i = 1
while i < limit:
# some stuff, remember to use i - 1 as array index
i *= constant
Pythonic way I could see (That does not exist):
for i in mrange(1, limit, c):
# some stuff
First post here. Hope I tagged and all correctly...

You still can do this :
def mrange(start, stop, step):
i = start
while i < stop:
yield i
i *= step
And then :
for i in mrange(1, 100, 4):
print(i)
Prints :
1
4
16
64
Python cannot provide default range functions to fit every needs, but it is pythonic to create your own generators.
If you don't like this solution, the while alternative looks ok too.

You can use itertools.accumulate; start with a range 1 to n, then apply a function which multiplies its first argument by your constant and ignores its second argument.
>>> from itertools import accumulate
>>> [x for x in accumulate(range(1,10), lambda x,_: 4*x)]
[1, 4, 16, 64, 256, 1024, 4096, 16384, 65536]
Having missed that you want to take the values less than n, start with the infinite sequence [c, c**2, c**3, c**4, ...] and use takewhile to "filter" it. (Also, I just realized you only need map, not accumulate, although accumulate may be more efficient. Note the difference in starting points when using map vs accumulate, too.):
>>> from itertools import count, takewhile
>>> n = 100
>>> [x for x in takewhile(lambda x: x < n, map(lambda x: 4**x, count(0)))]
[1, 4, 16, 64]
>>> [x for x in takewhile(lambda x: x < n, accumulate(count(1), lambda x,_: x*4))]
[1, 4, 16, 64]

Using math module:-
for i in range(math.ceil(math.log(limit, const))):
# code goes here
Ex:-
>>> for i in range(math.ceil(math.log(20, 2))):
... print("runs")
...
runs
runs
runs
runs
runs
which is similar to:-
i = 1
while i< 20:
print('runs')
i*=2
Finally; in easy seeming way:-
>>> import math
>>> mrange = lambda i, limit, const: range(i, math.ceil(math.log(limit, const)))
>>> for i in mrange(0, 20, 2):
print('whoa')
..
whoa
whoa
whoa
whoa
whoa

Cluster analysis within a set of integers

Sorry for the broad title, I just do not know how to name this.
I have a list of integers, let's say:
X = [20, 30, 40, 50, 60, 70, 80, 100]
And a second list of tuples of size 2 to 6 made from this integers:
Y = [(20, 30), (40, 50, 80, 100), (100, 100, 100), ...]
Some of the numbers come back quite often in Y and I'd like to identify the group of integers coming back often.
Right now, I'm counting the number of apparition of each integer. It gives me some information, but nothing about the groups.
Example:
Y = [(20, 40, 80), (30, 60, 80), (60, 80, 100), (60, 80, 100, 20), (40, 60, 80, 20, 100), ...]
On that example (60, 80) and (60, 80, 100) are combinations which come back often.
I could use itertools.combinations_with_replacement() to generate every combinations and then count the number of apparition, but is there any other better way to do this?
Thanks.

Don't know if it is a strictly better way to do it or rather similar, but you could try to check for appearance fraction of subsets. Below a brute force way of doing so, storing the results in a dictionary. Quite possibly, it would be better to build a tree where you don't search through a branch if the appearance rate of its elements already did not make the cut. (i.e. if (20,80) does not appear together often enough, then why search for (20,80,100)?)
N=len(Y)
dicter = {}
for i in range(2,7):
for comb in itertools.combinations(X,i):
c3 = set(comb)
d3 = sum([c3.issubset(set(val)) for val in Y])/N
dicter['{}'.format(c3)] = d3
As edit: you probably are not interested in all non-appearances, so I'll throw in a piece of code to chop down the final dictionary size..First we define a function to return a shallow copy of our dictionary with 1 value removed. This is required to avoid RunTimeError when looping over the dict.
def removekey(d, key):
r = dict(d)
del r[key]
return r
Then we remove insignificant "clusters"
for d, v in dicter.items():
if v < 0.1:
dicter = removekey(dicter, d)
It will still be unsorted, as itertools and sets do not sort by themselves. Hope this will help you further along.

The approach that you are looking for is called
Frequent Itemset Mining
It finds frequent subsets, given a list of sets.

All possible combinations of value-pairs (2-item tuples) in a sequence - PYTHON 2.7

I'm having a math brain fart moment, and google has failed to answer my quandary.
Given a sequence or list of 2 item tuples (from a Counter object), how do I quickly and elegantly get python to spit out a linear sequence or array of all the possible combinations of those tuples? My goal is trying to find the combinations of results from a Counter object.....
For example clarity, if I have this sequence:
[(500, 2), (250, 1)]
Doing this example out manually by hand, it should yield these results:
250, 500, 750, 1000, 1250.
Basically, I THINK it's a*b for the range of b and then add the resulting lists together...
I've tried this (where c=Counter object):
res = [[k*(j+1) for j in range(c[k])] for k in c]
And it will give me back:
res = [[250], [500, 1000]]
So far so good, it's going through each tuple and multiplying x * y for each y... But the resulting list isn't full of all the combinations yet, the first list [250] needs to be added to each element of the second list. This would be the case for any number of results I believe.
Now I think I need to take each list in this result list and add it to the other elements in the other lists in turn. Am I going about this wrong? I swear there should be a simpler way. I feel there should be a way to do this in a one line list comp.
Is the solution recursive? Is there a magic import or builtin method I don't know about? My head hurts......

I'm not entirely sure I follow you, but maybe you're looking for something like
from itertools import product
def lincombs(s):
terms, ffs = zip(*s)
factors = product(*(range(f+1) for f in ffs))
outs = (sum(v*f for v,f in zip(terms, ff)) for ff in factors if any(ff))
return outs
which gives
>>> list(lincombs([(500, 2), (250, 1)]))
[250, 500, 750, 1000, 1250]
>>> list(lincombs([(100, 3), (10, 3)]))
[10, 20, 30, 100, 110, 120, 130, 200, 210, 220, 230, 300, 310, 320, 330]

v*f multiplication from #DSM's answer could be avoided:
>>> from itertools import product
>>> terms = [(500, 2), (250, 1)]
>>> map(sum, product(*[xrange(0, v*a+1, v) for v, a in terms]))
[0, 250, 500, 750, 1000, 1250]
To get a sorted output without duplicates:
from itertools import groupby, imap
from operator import itemgetter
it = imap(itemgetter(0), groupby(sorted(it)))
though sorted(set(it)) that you use is ok in this case.

python: how to merge a list into clusters?

I have a list of tuples:
[(3,4), (18,27), (4,14)]
and need a code merging tuples which has repeated numbers, making another list where all list elements will only contain unique numbers. The list should be sorted by the length of the tuples, i.e.:
>>> MergeThat([(3,4), (18,27), (4,14)])
[(3,4,14), (18,27)]
>>> MergeThat([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
[(57,66,76,85), (1,3,10), (15,21)]
I understand it's something similar to hierarchical clustering algorithms, which I've read about, but can't figure them out.
Is there a relatively simple code for a MergeThat() function?

I tried hard to figure this out, but only after I tried the approach Ian's answer (thanks!) suggested I realized what the theoretical problem is: The input is a list of edges and defines a graph. We are looking for the strongly connected components of this graph. It's simple as that.
While you can do this efficiently, there is actually no reason to implement this yourself! Just import a good graph library:
import networkx as nx
# one of your examples
g1 = nx.Graph([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
print nx.connected_components(g1) # [[57, 66, 76, 85], [1, 10, 3], [21, 15]]
# my own test case
g2 = nx.Graph([(1,2),(2,10), (20,3), (3,4), (4,10)])
print nx.connected_components(g2) # [[1, 2, 3, 4, 10, 20]]

import itertools
def merge_it(lot):
merged = [ set(x) for x in lot ] # operate on sets only
finished = False
while not finished:
finished = True
for a, b in itertools.combinations(merged, 2):
if a & b:
# we merged in this iteration, we may have to do one more
finished = False
if a in merged: merged.remove(a)
if b in merged: merged.remove(b)
merged.append(a.union(b))
break # don't inflate 'merged' with intermediate results
return merged
if __name__ == '__main__':
print merge_it( [(3,4), (18,27), (4,14)] )
# => [set([18, 27]), set([3, 4, 14])]
print merge_it( [(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)] )
# => [set([21, 15]), set([1, 10, 3]), set([57, 66, 76, 85])]
print merge_it( [(1,2), (2,3), (3,4), (4,5), (5,9)] )
# => [set([1, 2, 3, 4, 5, 9])]
Here's a snippet (including doctests): http://gist.github.com/586252

def collapse(L):
""" The input L is a list that contains tuples of various sizes.
If any tuples have shared elements,
exactly one instance of the shared and unshared elements are merged into the first tuple with a shared element.
This function returns a new list that contain merged tuples and an int that represents how many merges were performed."""
answer = []
merges = 0
seen = [] # a list of all the numbers that we've seen so far
for t in L:
tAdded = False
for num in t:
pleaseMerge = True
if num in seen and pleaseMerge:
answer += merge(t, answer)
merges += 1
pleaseMerge = False
tAdded= True
else:
seen.append(num)
if not tAdded:
answer.append(t)
return (answer, merges)
def merge(t, L):
""" The input L is a list that contains tuples of various sizes.
The input t is a tuple that contains an element that is contained in another tuple in L.
Return a new list that is similar to L but contains the new elements in t added to the tuple with which t has a common element."""
answer = []
while L:
tup = L[0]
tupAdded = False
for i in tup:
if i in t:
try:
L.remove(tup)
newTup = set(tup)
for i in t:
newTup.add(i)
answer.append(tuple(newTup))
tupAdded = True
except ValueError:
pass
if not tupAdded:
L.remove(tup)
answer.append(tup)
return answer
def sortByLength(L):
""" L is a list of n-tuples, where n>0.
This function will return a list with the same contents as L
except that the tuples are sorted in non-ascending order by length"""
lengths = {}
for t in L:
if len(t) in lengths.keys():
lengths[len(t)].append(t)
else:
lengths[len(t)] = [(t)]
l = lengths.keys()[:]
l.sort(reverse=True)
answer = []
for i in l:
answer += lengths[i]
return answer
def MergeThat(L):
answer, merges = collapse(L)
while merges:
answer, merges = collapse(answer)
return sortByLength(answer)
if __name__ == "__main__":
print 'starting'
print MergeThat([(3,4), (18,27), (4,14)])
# [(3, 4, 14), (18, 27)]
print MergeThat([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
# [(57, 66, 76, 85), (1, 10, 3), (15, 21)]

Here's another solution that doesn't use itertools and takes a different, slightly more verbose, approach. The tricky bit of this solution is the merging of cluster sets when t0 in index and t1 in index.
import doctest
def MergeThat(a):
""" http://stackoverflow.com/questions/3744048/python-how-to-merge-a-list-into-clusters
>>> MergeThat([(3,4), (18,27), (4,14)])
[(3, 4, 14), (18, 27)]
>>> MergeThat([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
[(57, 66, 76, 85), (1, 3, 10), (15, 21)]
"""
index = {}
for t0, t1 in a:
if t0 not in index and t1 not in index:
index[t0] = set()
index[t1] = index[t0]
elif t0 in index and t1 in index:
index[t0] |= index[t1]
oldt1 = index[t1]
for x in index.keys():
if index[x] is oldt1:
index[x] = index[t0]
elif t0 not in index:
index[t0] = index[t1]
else:
index[t1] = index[t0]
assert index[t0] is index[t1]
index[t0].add(t0)
index[t0].add(t1)
return sorted([tuple(sorted(x)) for x in set(map(frozenset, index.values()))], key=len, reverse=True)
if __name__ == "__main__":
import doctest
doctest.testmod()

The code others have written will surely work, but here's another option, maybe simpler to understand and maybe less algorithmic complexity.
Keep a dictionary from numbers to the cluster (implemented as a python set) they're a member of. Also include that number in the corresponding set. Process an input pair either as:
Neither element is in the dictionary: create a new set, hook up dictionary links appropriately.
One or the other, but not both elements are in the dictionary: Add the yet-unseen element to the set of its brother, and add its dictionary link into the correct set.
Both elements are seen before, but in different sets: Take the union of the old sets and update all dictionary links to the new set.
You've seen both members before, and they're in the same set: Do nothing.
Afterward, simply collect the unique values from the dictionary and sort in descending order of size. This portion of the job is O(m log n) and thus will not dominate runtime.
This should work in a single pass. Writing the actual code is left as an exercise for the reader.

This is not efficient for huge lists.
def merge_that(lot):
final_list = []
while len(lot) >0 :
temp_set = set(lot[0])
deletable = [0] #list of all tuples consumed by temp_set
for i, tup2 in enumerate(lot[1:]):
if tup2[0] in temp_set or tup2[1] in temp_set:
deletable.append(i)
temp_set = temp_set.union(tup2)
for d in deletable:
del lot[d]
deletable = []
# Some of the tuples consumed later might have missed their brothers
# So, looping again after deleting the consumed tuples
for i, tup2 in enumerate(lot):
if tup2[0] in temp_set or tup2[1] in temp_set:
deletable.append(i)
temp_set = temp_set.union(tup2)
for d in deletable:
del lot[d]
final_list.append(tuple(temp_set))
return final_list
It looks ugly but works.

Useful code which uses reduce()? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Does anyone here have any useful code which uses reduce() function in python? Is there any code other than the usual + and * that we see in the examples?
Refer Fate of reduce() in Python 3000 by GvR

The other uses I've found for it besides + and * were with and and or, but now we have any and all to replace those cases.
foldl and foldr do come up in Scheme a lot...
Here's some cute usages:
Flatten a list
Goal: turn [[1, 2, 3], [4, 5], [6, 7, 8]] into [1, 2, 3, 4, 5, 6, 7, 8].
reduce(list.__add__, [[1, 2, 3], [4, 5], [6, 7, 8]], [])
List of digits to a number
Goal: turn [1, 2, 3, 4, 5, 6, 7, 8] into 12345678.
Ugly, slow way:
int("".join(map(str, [1,2,3,4,5,6,7,8])))
Pretty reduce way:
reduce(lambda a,d: 10*a+d, [1,2,3,4,5,6,7,8], 0)

reduce() can be used to find Least common multiple for 3 or more numbers:
#!/usr/bin/env python
from math import gcd
from functools import reduce
def lcm(*args):
return reduce(lambda a,b: a * b // gcd(a, b), args)
Example:
>>> lcm(100, 23, 98)
112700
>>> lcm(*range(1, 20))
232792560

reduce() could be used to resolve dotted names (where eval() is too unsafe to use):
>>> import __main__
>>> reduce(getattr, "os.path.abspath".split('.'), __main__)
<function abspath at 0x009AB530>

Find the intersection of N given lists:
input_list = [[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7]]
result = reduce(set.intersection, map(set, input_list))
returns:
result = set([3, 4, 5])
via: Python - Intersection of two lists

I think reduce is a silly command. Hence:
reduce(lambda hold,next:hold+chr(((ord(next.upper())-65)+13)%26+65),'znlorabggbbhfrshy','')

The usage of reduce that I found in my code involved the situation where I had some class structure for logic expression and I needed to convert a list of these expression objects to a conjunction of the expressions. I already had a function make_and to create a conjunction given two expressions, so I wrote reduce(make_and,l). (I knew the list wasn't empty; otherwise it would have been something like reduce(make_and,l,make_true).)
This is exactly the reason that (some) functional programmers like reduce (or fold functions, as such functions are typically called). There are often already many binary functions like +, *, min, max, concatenation and, in my case, make_and and make_or. Having a reduce makes it trivial to lift these operations to lists (or trees or whatever you got, for fold functions in general).
Of course, if certain instantiations (such as sum) are often used, then you don't want to keep writing reduce. However, instead of defining the sum with some for-loop, you can just as easily define it with reduce.
Readability, as mentioned by others, is indeed an issue. You could argue, however, that only reason why people find reduce less "clear" is because it is not a function that many people know and/or use.

Function composition: If you already have a list of functions that you'd like to apply in succession, such as:
color = lambda x: x.replace('brown', 'blue')
speed = lambda x: x.replace('quick', 'slow')
work = lambda x: x.replace('lazy', 'industrious')
fs = [str.lower, color, speed, work, str.title]
Then you can apply them all consecutively with:
>>> call = lambda s, func: func(s)
>>> s = "The Quick Brown Fox Jumps Over the Lazy Dog"
>>> reduce(call, fs, s)
'The Slow Blue Fox Jumps Over The Industrious Dog'
In this case, method chaining may be more readable. But sometimes it isn't possible, and this kind of composition may be more readable and maintainable than a f1(f2(f3(f4(x)))) kind of syntax.

You could replace value = json_obj['a']['b']['c']['d']['e'] with:
value = reduce(dict.__getitem__, 'abcde', json_obj)
If you already have the path a/b/c/.. as a list. For example, Change values in dict of nested dicts using items in a list.

#Blair Conrad: You could also implement your glob/reduce using sum, like so:
files = sum([glob.glob(f) for f in args], [])
This is less verbose than either of your two examples, is perfectly Pythonic, and is still only one line of code.
So to answer the original question, I personally try to avoid using reduce because it's never really necessary and I find it to be less clear than other approaches. However, some people get used to reduce and come to prefer it to list comprehensions (especially Haskell programmers). But if you're not already thinking about a problem in terms of reduce, you probably don't need to worry about using it.

reduce can be used to support chained attribute lookups:
reduce(getattr, ('request', 'user', 'email'), self)
Of course, this is equivalent to
self.request.user.email
but it's useful when your code needs to accept an arbitrary list of attributes.
(Chained attributes of arbitrary length are common when dealing with Django models.)

reduce is useful when you need to find the union or intersection of a sequence of set-like objects.
>>> reduce(operator.or_, ({1}, {1, 2}, {1, 3})) # union
{1, 2, 3}
>>> reduce(operator.and_, ({1}, {1, 2}, {1, 3})) # intersection
{1}
(Apart from actual sets, an example of these are Django's Q objects.)
On the other hand, if you're dealing with bools, you should use any and all:
>>> any((True, False, True))
True

I'm writing a compose function for a language, so I construct the composed function using reduce along with my apply operator.
In a nutshell, compose takes a list of functions to compose into a single function. If I have a complex operation that is applied in stages, I want to put it all together like so:
complexop = compose(stage4, stage3, stage2, stage1)
This way, I can then apply it to an expression like so:
complexop(expression)
And I want it to be equivalent to:
stage4(stage3(stage2(stage1(expression))))
Now, to build my internal objects, I want it to say:
Lambda([Symbol('x')], Apply(stage4, Apply(stage3, Apply(stage2, Apply(stage1, Symbol('x'))))))
(The Lambda class builds a user-defined function, and Apply builds a function application.)
Now, reduce, unfortunately, folds the wrong way, so I wound up using, roughly:
reduce(lambda x,y: Apply(y, x), reversed(args + [Symbol('x')]))
To figure out what reduce produces, try these in the REPL:
reduce(lambda x, y: (x, y), range(1, 11))
reduce(lambda x, y: (y, x), reversed(range(1, 11)))

reduce can be used to get the list with the maximum nth element
reduce(lambda x,y: x if x[2] > y[2] else y,[[1,2,3,4],[5,2,5,7],[1,6,0,2]])
would return [5, 2, 5, 7] as it is the list with max 3rd element +

Reduce isn't limited to scalar operations; it can also be used to sort things into buckets. (This is what I use reduce for most often).
Imagine a case in which you have a list of objects, and you want to re-organize it hierarchically based on properties stored flatly in the object. In the following example, I produce a list of metadata objects related to articles in an XML-encoded newspaper with the articles function. articles generates a list of XML elements, and then maps through them one by one, producing objects that hold some interesting info about them. On the front end, I'm going to want to let the user browse the articles by section/subsection/headline. So I use reduce to take the list of articles and return a single dictionary that reflects the section/subsection/article hierarchy.
from lxml import etree
from Reader import Reader
class IssueReader(Reader):
def articles(self):
arts = self.q('//div3') # inherited ... runs an xpath query against the issue
subsection = etree.XPath('./ancestor::div2/#type')
section = etree.XPath('./ancestor::div1/#type')
header_text = etree.XPath('./head//text()')
return map(lambda art: {
'text_id': self.id,
'path': self.getpath(art)[0],
'subsection': (subsection(art)[0] or '[none]'),
'section': (section(art)[0] or '[none]'),
'headline': (''.join(header_text(art)) or '[none]')
}, arts)
def by_section(self):
arts = self.articles()
def extract(acc, art): # acc for accumulator
section = acc.get(art['section'], False)
if section:
subsection = acc.get(art['subsection'], False)
if subsection:
subsection.append(art)
else:
section[art['subsection']] = [art]
else:
acc[art['section']] = {art['subsection']: [art]}
return acc
return reduce(extract, arts, {})
I give both functions here because I think it shows how map and reduce can complement each other nicely when dealing with objects. The same thing could have been accomplished with a for loop, ... but spending some serious time with a functional language has tended to make me think in terms of map and reduce.
By the way, if anybody has a better way to set properties like I'm doing in extract, where the parents of the property you want to set might not exist yet, please let me know.

Not sure if this is what you are after but you can search source code on Google.
Follow the link for a search on 'function:reduce() lang:python' on Google Code search
At first glance the following projects use reduce()
MoinMoin
Zope
Numeric
ScientificPython
etc. etc. but then these are hardly surprising since they are huge projects.
The functionality of reduce can be done using function recursion which I guess Guido thought was more explicit.
Update:
Since Google's Code Search was discontinued on 15-Jan-2012, besides reverting to regular Google searches, there's something called Code Snippets Collection that looks promising. A number of other resources are mentioned in answers this (closed) question Replacement for Google Code Search?.
Update 2 (29-May-2017):
A good source for Python examples (in open-source code) is the Nullege search engine.

After grepping my code, it seems the only thing I've used reduce for is calculating the factorial:
reduce(operator.mul, xrange(1, x+1) or (1,))

import os
files = [
# full filenames
"var/log/apache/errors.log",
"home/kane/images/avatars/crusader.png",
"home/jane/documents/diary.txt",
"home/kane/images/selfie.jpg",
"var/log/abc.txt",
"home/kane/.vimrc",
"home/kane/images/avatars/paladin.png",
]
# unfolding of plain filiname list to file-tree
fs_tree = ({}, # dict of folders
[]) # list of files
for full_name in files:
path, fn = os.path.split(full_name)
reduce(
# this fucction walks deep into path
# and creates placeholders for subfolders
lambda d, k: d[0].setdefault(k, # walk deep
({}, [])), # or create subfolder storage
path.split(os.path.sep),
fs_tree
)[1].append(fn)
print fs_tree
#({'home': (
# {'jane': (
# {'documents': (
# {},
# ['diary.txt']
# )},
# []
# ),
# 'kane': (
# {'images': (
# {'avatars': (
# {},
# ['crusader.png',
# 'paladin.png']
# )},
# ['selfie.jpg']
# )},
# ['.vimrc']
# )},
# []
# ),
# 'var': (
# {'log': (
# {'apache': (
# {},
# ['errors.log']
# )},
# ['abc.txt']
# )},
# [])
#},
#[])

I just found useful usage of reduce: splitting string without removing the delimiter. The code is entirely from Programatically Speaking blog. Here's the code:
reduce(lambda acc, elem: acc[:-1] + [acc[-1] + elem] if elem == "\n" else acc + [elem], re.split("(\n)", "a\nb\nc\n"), [])
Here's the result:
['a\n', 'b\n', 'c\n', '']
Note that it handles edge cases that popular answer in SO doesn't. For more in-depth explanation, I am redirecting you to original blog post.

I used reduce to concatenate a list of PostgreSQL search vectors with the || operator in sqlalchemy-searchable:
vectors = (self.column_vector(getattr(self.table.c, column_name))
for column_name in self.indexed_columns)
concatenated = reduce(lambda x, y: x.op('||')(y), vectors)
compiled = concatenated.compile(self.conn)

I have an old Python implementation of pipegrep that uses reduce and the glob module to build a list of files to process:
files = []
files.extend(reduce(lambda x, y: x + y, map(glob.glob, args)))
I found it handy at the time, but it's really not necessary, as something similar is just as good, and probably more readable
files = []
for f in args:
files.extend(glob.glob(f))

Let say that there are some yearly statistic data stored a list of Counters.
We want to find the MIN/MAX values in each month across the different years.
For example, for January it would be 10. And for February it would be 15.
We need to store the results in a new Counter.
from collections import Counter
stat2011 = Counter({"January": 12, "February": 20, "March": 50, "April": 70, "May": 15,
"June": 35, "July": 30, "August": 15, "September": 20, "October": 60,
"November": 13, "December": 50})
stat2012 = Counter({"January": 36, "February": 15, "March": 50, "April": 10, "May": 90,
"June": 25, "July": 35, "August": 15, "September": 20, "October": 30,
"November": 10, "December": 25})
stat2013 = Counter({"January": 10, "February": 60, "March": 90, "April": 10, "May": 80,
"June": 50, "July": 30, "August": 15, "September": 20, "October": 75,
"November": 60, "December": 15})
stat_list = [stat2011, stat2012, stat2013]
print reduce(lambda x, y: x & y, stat_list) # MIN
print reduce(lambda x, y: x | y, stat_list) # MAX

I have objects representing some kind of overlapping intervals (genomic exons), and redefined their intersection using __and__:
class Exon:
def __init__(self):
...
def __and__(self,other):
...
length = self.length + other.length # (e.g.)
return self.__class__(...length,...)
Then when I have a collection of them (for instance, in the same gene), I use
intersection = reduce(lambda x,y: x&y, exons)

def dump(fname,iterable):
with open(fname,'w') as f:
reduce(lambda x, y: f.write(unicode(y,'utf-8')), iterable)

Using reduce() to find out if a list of dates are consecutive:
from datetime import date, timedelta
def checked(d1, d2):
"""
We assume the date list is sorted.
If d2 & d1 are different by 1, everything up to d2 is consecutive, so d2
can advance to the next reduction.
If d2 & d1 are not different by 1, returning d1 - 1 for the next reduction
will guarantee the result produced by reduce() to be something other than
the last date in the sorted date list.
Definition 1: 1/1/14, 1/2/14, 1/2/14, 1/3/14 is consider consecutive
Definition 2: 1/1/14, 1/2/14, 1/2/14, 1/3/14 is consider not consecutive
"""
#if (d2 - d1).days == 1 or (d2 - d1).days == 0: # for Definition 1
if (d2 - d1).days == 1: # for Definition 2
return d2
else:
return d1 + timedelta(days=-1)
# datelist = [date(2014, 1, 1), date(2014, 1, 3),
# date(2013, 12, 31), date(2013, 12, 30)]
# datelist = [date(2014, 2, 19), date(2014, 2, 19), date(2014, 2, 20),
# date(2014, 2, 21), date(2014, 2, 22)]
datelist = [date(2014, 2, 19), date(2014, 2, 21),
date(2014, 2, 22), date(2014, 2, 20)]
datelist.sort()
if datelist[-1] == reduce(checked, datelist):
print "dates are consecutive"
else:
print "dates are not consecutive"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Negative bounds slice - python

Related

Most Pythonic way to iterate by multiples

Cluster analysis within a set of integers

All possible combinations of value-pairs (2-item tuples) in a sequence - PYTHON 2.7

python: how to merge a list into clusters?

Useful code which uses reduce()? [closed]

Categories

Resources