Is it possible to initialize an empty default value on leftOuterJoin?

Is it possible to initialize an empty default value on leftOuterJoin? - python

I have two following rdds:
name_to_hour = sc.parallelize([("Amy", [7,8,7,18,19]), ("Dan", [6,7]), ("Emily", [1,2,3,7,7,7,2])])
name_biz = sc.parallelize(["Amy", "Brian", "Chris", "Dan", "Emily"])
and I want to join them, so my resulting rdd looks like this:
[('Amy', [7, 8, 7, 18, 19]), ('Chris', []), ('Brian', []), ('Dan', [6, 7]), ('Emily', [1, 2, 3, 7, 7, 7, 2])]
I can achieve that with what I think is a clumsy solution:
from pyspark import SparkContext
sc = SparkContext()
name_to_hour = sc.parallelize([("Amy", [7,8,7,18,19]), ("Dan", [6,7]), ("Emily", [1,2,3,7,7,7,2])])
name_biz = sc.parallelize(["Amy", "Brian", "Chris", "Dan", "Emily"])
temp = name_biz.map(lambda x: (x, []))
joined_rdd = temp.leftOuterJoin(name_to_hour)
def concat(my_tup):
if my_tup[1] is None:
return []
else:
return my_tup[1]
result_rdd = joined_rdd.map(lambda x: (x[0], concat(x[1])))
print "\033[0;34m{}\033[0m".format(result_rdd.collect())
Is there a better way to do it?
I was thinking that if it was possible to somehow specify on the leftOuterJoin, that non-empty fields keep what they had in name_to_hour and empty get default value of [], the my problem could be solved much more easily, but I don't think there is such a way.

One way you can approach this problem is to leverage lexicographical ordering of Python lists. Since empty list is always "less than" non empty one we can simply make an union and reduce with max:
temp.union(name_to_hour).reduceByKey(max)
This of course assumes that the keys are unique.

Related

Replace specific character of item in list - Python

I'm trying to create a very simple guitar tab creator in python, I had the idea but i'm not sure how to execute it the way I want. For example, I have a list containing each guitar string as a different item in the list, and I want the the user to be able to input which fret they want to use and which note. How can I replace a specific char in a list item based on a index number.
This is what I have so far:
tab = ['E|------------', 'B|------------', 'G|------------', 'D|------------', 'A|------------', 'E|------------']
for line in tab:
print(line)
fret = input("Fret: ")
note = input("Note: ")
#if note == 'E':
(Stuck here)
The output I'm looking for is something like:
e|-------5-7-----7-|-8-----8-2-----2-|-0---------0-----|-----------------|
B|-----5-----5-----|---5-------3-----|---1---1-----1---|-0-1-1-----------|
G|---5---------5---|-----5-------2---|-----2---------2-|-0-2-2---2-------|
D|-7-------6-------|-5-------4-------|-3---------------|-----------------|
A|-----------------|-----------------|-----------------|-2-0-0---0--/8-7-|
E|-----------------|-----------------|-----------------|-----------------|
I am aware that there is much more needed to get the output i'm looking for, but I want to try to figure that stuff out by myself first, I'm just looking for a nudge in the right direction to get a list item changed.

Note sure how you want to construct the whole thing, but here are some ideas:
The main line is
"-".join(str(d.get(idx, "-")) for idx in range(1, 9))
which, given a dict d having indices as keys and corresponding fret numbers as values, construct the string representation.
It uses the dict.get method which allows for a default value of "-" when nothing corresponds in the given dict.
from collections import namedtuple
def construct_string(indices, frets):
d = dict(zip(indices, frets))
out = "-".join(str(d.get(idx, "-")) for idx in range(1, 9))
return f"-{out}-"
String = namedtuple("Note", ["indices", "frets"])
strings = {
"e": String([4, 5, 8], [5, 7, 7]),
"B": String([3, 6], [5, 5]),
"G": String([2, 7], [5, 5]),
"D": String([1, 5], [7, 6]),
"A": String([], []),
"E": String([], []),
}
for string_name, string in strings.items():
string_line = construct_string(string.indices, string.frets)
print(f"{string_name}|{string_line}")
which prints:
e|-------5-7-----7-
B|-----5-----5-----
G|---5---------5---
D|-7-------6-------
A|-----------------
E|-----------------

How to make a Custom Sorting Function for Dictionary Key Values?

I have a dictionary whose key values are kind of like this,
CC-1A
CC-1B
CC-1C
CC-3A
CC-3B
CC-5A
CC-7A
CC-7B
CC-7D
SS-1A
SS-1B
SS-1C
SS-3A
SS-3B
SS-5A
SS-5B
lst = ['CC-1A', 'CC-1B', 'CC-1C', 'CC-3A', 'CC-3B', 'CC-5A', 'CC-7A', 'CC-7B',
'CC-7D', 'SS-1A', 'SS-1B', 'SS-1C', 'SS-3A', 'SS-3B', 'SS-5A', 'SS-5B']
d = dict.fromkeys(lst)
^Not exactly in this order, but in fact they are all randomly placed in the dictionary as key values.
Now, I want to sort them. If I use the built in function to sort the dictionary, it sorts all the key values according to the order given above.
However, I want the dictionary to be first sorted based upon the values after the - sign (i.e. 1A, 1B, 1C etc.) and then based upon the first two characters.
So, for the values given above, following would be my sorted list,
CC-1A
CC-1B
CC-1C
SS-1A
SS-1B
SS-1C
CC-3A
CC-3B
SS-3A
SS-3B
CC-5A
and so on
First, sorting is done based upon the "4th" character in the keys. (that is, 1, 3, etc.)
Then sorting is done based upon the last character (i.e. A, B etc.)
Then sorting is done based upon the first two characters of the keys (i.e. CC, SS etc.)
Is there any way to achieve this?

Your "wanted" and your sorting description deviate.
Your "wanted" can be achieved by
di = {"CC-1A":"value1","CC-1A":"value2","CC-1B":"value3",
"CC-1C":"value4","CC-3A":"value5","CC-3B":"value6",
"CC-5A":"value7","CC-7A":"value8","CC-7B":"value9",
"CC-7D":"value0","SS-1A":"value11","SS-1B":"value12",
"SS-1C":"value13","SS-3A":"value14","SS-3B":"value15",
"SS-5A":"value16","SS-5B":"value17"}
print(*((v,di[v]) for v in sorted(di, key= lambda x: (x[3], x[:2], x[4]) )),
sep="\n")
to get
('CC-1A', 'value2')
('CC-1B', 'value3')
('CC-1C', 'value4')
('SS-1A', 'value11')
('SS-1B', 'value12')
('SS-1C', 'value13')
('CC-3A', 'value5')
('CC-3B', 'value6')
('SS-3A', 'value14')
('SS-3B', 'value15')
('CC-5A', 'value7')
('SS-5A', 'value16')
('SS-5B', 'value17')
('CC-7A', 'value8')
('CC-7B', 'value9')
('CC-7D', 'value0')
which sorts by number (Pos 4 - (1based)), Start (Pos 1+2 (1based)) then letter (Pos 5 (1based))
but that conflicts with
First, sorting is done based upon the "4th" character in the keys.
(that is, 1, 3, etc.)
Then sorting is done based upon the last character (i.e. A, B etc.)
Then sorting is done based upon the first two characters of the keys
(i.e. CC, SS etc.)

One suggestion is to use a nested dictionary, so instead of:
my_dict = {'CC-1A1': 2,
'CC-1A2': 3,
'CC-1B': 1,
'CC-1C': 5,
'SS-1A': 33,
'SS-1B': 23,
'SS-1C': 31,
'CC-3A': 55,
'CC-3B': 222,
}
you would have something like:
my_dict = {'CC': {'1A1': 2, '1A2': 3, '1B': 1, '1C': 5, '3A': 55, '3B': 222},
'SS': {'1A': 33, '1B': 22, '1C': 31}
}
which would allow you to sort first based on the leading number/characters and then by group. (Actually I think you want this concept reversed based on your question).
Then you can create two lists with your sorted keys/values by doing something like:
top_keys = sorted(my_dict)
keys_sorted = []
values_sorted = []
for key in top_keys:
keys_sorted.append([f"{key}-{k}" for k in my_dict[key].keys()])
values_sorted.append([v for v in my_dict[key].values()])
flat_keys = [key for sublist in keys_sorted for key in sublist]
flat_values = [value for sublist in values_sorted for value in sublist]
Otherwise, you'd have to implement a custom sorting algorithm based first the characters after the - and subsequently on the initial characters.

You can write a function to build a sorting key that will make the required decomposition of the key strings and return a tuple to sort by. Then use that function as the key= parameter of the sorted function:
D = {'CC-1A': 0, 'CC-1B': 1, 'CC-1C': 2, 'CC-3A': 3, 'CC-3B': 4,
'CC-5A': 5, 'CC-7A': 6, 'CC-7B': 7, 'CC-7D': 8, 'SS-1A': 9,
'SS-1B': 10, 'SS-1C': 11, 'SS-3A': 12, 'SS-3B': 13, 'SS-5A': 14,
'SS-5B': 15}
def sortKey(s):
L,R = s.split("-",1)
return (R[:-1],L)
D={k:D[k] for k in sorted(D.keys(),key=sortKey)}
print(D)
{'CC-1A': 0,
'CC-1B': 1,
'CC-1C': 2,
'SS-1A': 9,
'SS-1B': 10,
'SS-1C': 11,
'CC-3A': 3,
'CC-3B': 4,
'SS-3A': 12,
'SS-3B': 13,
'CC-5A': 5,
'SS-5A': 14,
'SS-5B': 15,
'CC-7A': 6,
'CC-7B': 7,
'CC-7D': 8}
If you expect the numbers to eventually go beyond 9 and want a numerical order, then right justify the R part in the tuple: e.g. return (R[:-1].rjust(10),L)

You could use a custom function that implements your rule as sorting key:
def get_order(tpl):
s = tpl[0].split('-')
return (s[1][0], s[0], s[1][1])
out = dict(sorted(d.items(), key=get_order))
Output:
{'CC-1A': None, 'CC-1B': None, 'CC-1C': None, 'SS-1A': None, 'SS-1B': None, 'SS-1C': None, 'CC-3A': None, 'CC-3B': None, 'SS-3A': None, 'SS-3B': None, 'CC-5A': None, 'SS-5A': None, 'SS-5B': None, 'CC-7A': None, 'CC-7B': None, 'CC-7D': None}

Python: list comprehension to produce consecutive sublists of even numbers

I'm trying to practice Python exercises, but using list comprehension to solve problems rather than the beginner style loops shown in the book. There is one example where it asks for a list of numbers to be put into a list of even numbers only, BUT they must be in sublists so that if the numbers follow after one another without being interrupted by an odd number, they should be put into a sublist together:
my_list = [2,3,5,7,8,9,10,12,14,15,17,25,31,32]
desired_output = [[2],[8],[10,12,14],[32]]
So you can see in the desired output above, 10,12,14 are evens that follow on from one another without being interrupted by an odd, so they get put into a sublist together. 8 has an odd on either side of it, so it gets put into a sublist alone after the odds are removed.
I can put together an evens list easily using list comprehension like this below, but I have no idea how to get it into sublists like the desired output shows. Could someone please suggest an idea for this using list comprehension (or generators, I don't mind which as I'm trying to learn both at the moment). Thanks!
evens = [x for x in my_list if x%2==0]
print(evens)
[2, 8, 10, 12, 14, 32]

As explained in the comments, list comprehensions should not be deemed "for beginners" - first focus on writing your logic using simple for loops.
When you're ready, you can look at comprehension-based methods. Here's one:
from itertools import groupby
my_list = [2,3,5,7,8,9,10,12,14,15,17,25,31,32]
condition = lambda x: all(i%2==0 for i in x)
grouper = (list(j) for _, j in groupby(my_list, key=lambda x: x%2))
res = filter(condition, grouper)
print(list(res))
# [[2], [8], [10, 12, 14], [32]]
The main point to note in this solution is nothing is computed until you call list(res). This is because filter and generator comprehensions are lazy.

You mentioned also wanting to learn generators, so here is a version that's also a bit more readable, imho.
from itertools import groupby
def is_even(n):
return n%2 == 0
def runs(lst):
for even, run in groupby(lst, key=is_even):
if even:
yield list(run)
if __name__ == '__main__':
lst = [2, 3, 5, 7, 8, 9, 10, 12, 14, 15, 17, 25, 31, 32]
res = list(runs(lst))
print(res)
Incidentally, if you absolutely, positively want to implement it as a list comprehension, this solutions falls out of the above quite naturally:
[list(run) for even, run in groupby(lst, key=is_even) if even]

If you don't want to use itertools, there's another way to do it with list comprehensions.
First, take the indices of the odd elements:
[i for i,x in enumerate(my_list) if x%2==1]
And add two sentinels: [-1] before and [len(my_list)] after:
odd_indices = [-1]+[i for i,x in enumerate(my_list) if x%2==1]+[len(my_list)]
# [-1, 1, 2, 3, 5, 9, 10, 11, 12, 14]
You have now something like that:
[2,3,5,7,8,9,10,12,14,15,17,25,31,32]
^---^-^-^---^-----------^--^--^--^----^
You can see your sequences. Now, take the elements between those indices. To do that, zip odd_indices with itself to get the intervals as tuples:
zip(odd_indices, odd_indices[1:])
# [(-1, 1), (1, 2), (2, 3), (3, 5), (5, 9), (9, 10), (10, 11), (11, 12), (12, 14)]
even_groups = [my_list[a+1:b] for a,b in zip(odd_indices, odd_indices[1:])]
# [[2], [], [], [8], [10, 12, 14], [], [], [], [32]]
You just have to filter the non empty lists:
even_groups = [my_list[a+1:b] for a,b in zip(odd_indices, odd_indices[1:]) if a+1<b]
# [[2], [8], [10, 12, 14], [32]]
You can merge the two steps into one comprehension list, but that is a bit unreadable:
>>> my_list = [2,3,5,7,8,9,10,12,14,15,17,25,31,32]
>>> [my_list[a+1:b] for l1 in [[-1]+[i for i,x in enumerate(my_list) if x%2==1]+[len(my_list)]] for a,b in zip(l1, l1[1:]) if b>a+1]
[[2], [8], [10, 12, 14], [32]]
As pointed by #jpp, prefer basic loops until you feel comfortable. And maybe avoid those nested list comprehensions forever...

How is sorted(key=lambda x:) implemented behind the scene?

An example:
names = ["George Washington", "John Adams", "Thomas Jefferson", "James Madison"]
sorted(names, key=lambda name: name.split()[-1].lower())
I know key is used to compare different names, but it can have two different implementations:
First compute all keys for each name, and bind the key and name together in some way, and sort them. The p
Compute the key each time when a comparison happens
The problem with the first approach is that it has to define another data structure to bind the key and data. The problem with the second approach is that the key might be computed for multiple times, that is, name.split()[-1].lower() will be executed many times, which is very time-consuming.
I am just wondering in which way Python implemented sorted().

The key function is executed just once per value, to produce a (keyvalue, value) pair; this is then used to sort and later on just the values are returned in the sorted order. This is sometimes called a Schwartzian transform.
You can test this yourself; you could count how often the function is called, for example:
>>> def keyfunc(value):
... keyfunc.count += 1
... return value
...
>>> keyfunc.count = 0
>>> sorted([0, 8, 1, 6, 4, 5, 3, 7, 9, 2], key=keyfunc)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> keyfunc.count
10
or you could collect all the values that are being passed in; you'll see that they follow the original input order:
>>> def keyfunc(value):
... keyfunc.arguments.append(value)
... return value
...
>>> keyfunc.arguments = []
>>> sorted([0, 8, 1, 6, 4, 5, 3, 7, 9, 2], key=keyfunc)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> keyfunc.arguments
[0, 8, 1, 6, 4, 5, 3, 7, 9, 2]
If you want to read the CPython source code, the relevant function is called listsort(), and the keyfunc is used in the following loop (saved_ob_item is the input array), which is executed before sorting takes place:
for (i = 0; i < saved_ob_size ; i++) {
keys[i] = PyObject_CallFunctionObjArgs(keyfunc, saved_ob_item[i],
NULL);
if (keys[i] == NULL) {
for (i=i-1 ; i>=0 ; i--)
Py_DECREF(keys[i]);
if (saved_ob_size >= MERGESTATE_TEMP_SIZE/2)
PyMem_FREE(keys);
goto keyfunc_fail;
}
}
lo.keys = keys;
lo.values = saved_ob_item;
so in the end, you have two arrays, one with keys and one with the original values. All sort operations act on the two arrays in parallel, sorting the values in lo.keys and moving the elements in lo.values in tandem.

Useful code which uses reduce()? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Does anyone here have any useful code which uses reduce() function in python? Is there any code other than the usual + and * that we see in the examples?
Refer Fate of reduce() in Python 3000 by GvR

The other uses I've found for it besides + and * were with and and or, but now we have any and all to replace those cases.
foldl and foldr do come up in Scheme a lot...
Here's some cute usages:
Flatten a list
Goal: turn [[1, 2, 3], [4, 5], [6, 7, 8]] into [1, 2, 3, 4, 5, 6, 7, 8].
reduce(list.__add__, [[1, 2, 3], [4, 5], [6, 7, 8]], [])
List of digits to a number
Goal: turn [1, 2, 3, 4, 5, 6, 7, 8] into 12345678.
Ugly, slow way:
int("".join(map(str, [1,2,3,4,5,6,7,8])))
Pretty reduce way:
reduce(lambda a,d: 10*a+d, [1,2,3,4,5,6,7,8], 0)

reduce() can be used to find Least common multiple for 3 or more numbers:
#!/usr/bin/env python
from math import gcd
from functools import reduce
def lcm(*args):
return reduce(lambda a,b: a * b // gcd(a, b), args)
Example:
>>> lcm(100, 23, 98)
112700
>>> lcm(*range(1, 20))
232792560

reduce() could be used to resolve dotted names (where eval() is too unsafe to use):
>>> import __main__
>>> reduce(getattr, "os.path.abspath".split('.'), __main__)
<function abspath at 0x009AB530>

Find the intersection of N given lists:
input_list = [[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7]]
result = reduce(set.intersection, map(set, input_list))
returns:
result = set([3, 4, 5])
via: Python - Intersection of two lists

I think reduce is a silly command. Hence:
reduce(lambda hold,next:hold+chr(((ord(next.upper())-65)+13)%26+65),'znlorabggbbhfrshy','')

The usage of reduce that I found in my code involved the situation where I had some class structure for logic expression and I needed to convert a list of these expression objects to a conjunction of the expressions. I already had a function make_and to create a conjunction given two expressions, so I wrote reduce(make_and,l). (I knew the list wasn't empty; otherwise it would have been something like reduce(make_and,l,make_true).)
This is exactly the reason that (some) functional programmers like reduce (or fold functions, as such functions are typically called). There are often already many binary functions like +, *, min, max, concatenation and, in my case, make_and and make_or. Having a reduce makes it trivial to lift these operations to lists (or trees or whatever you got, for fold functions in general).
Of course, if certain instantiations (such as sum) are often used, then you don't want to keep writing reduce. However, instead of defining the sum with some for-loop, you can just as easily define it with reduce.
Readability, as mentioned by others, is indeed an issue. You could argue, however, that only reason why people find reduce less "clear" is because it is not a function that many people know and/or use.

Function composition: If you already have a list of functions that you'd like to apply in succession, such as:
color = lambda x: x.replace('brown', 'blue')
speed = lambda x: x.replace('quick', 'slow')
work = lambda x: x.replace('lazy', 'industrious')
fs = [str.lower, color, speed, work, str.title]
Then you can apply them all consecutively with:
>>> call = lambda s, func: func(s)
>>> s = "The Quick Brown Fox Jumps Over the Lazy Dog"
>>> reduce(call, fs, s)
'The Slow Blue Fox Jumps Over The Industrious Dog'
In this case, method chaining may be more readable. But sometimes it isn't possible, and this kind of composition may be more readable and maintainable than a f1(f2(f3(f4(x)))) kind of syntax.

You could replace value = json_obj['a']['b']['c']['d']['e'] with:
value = reduce(dict.__getitem__, 'abcde', json_obj)
If you already have the path a/b/c/.. as a list. For example, Change values in dict of nested dicts using items in a list.

#Blair Conrad: You could also implement your glob/reduce using sum, like so:
files = sum([glob.glob(f) for f in args], [])
This is less verbose than either of your two examples, is perfectly Pythonic, and is still only one line of code.
So to answer the original question, I personally try to avoid using reduce because it's never really necessary and I find it to be less clear than other approaches. However, some people get used to reduce and come to prefer it to list comprehensions (especially Haskell programmers). But if you're not already thinking about a problem in terms of reduce, you probably don't need to worry about using it.

reduce can be used to support chained attribute lookups:
reduce(getattr, ('request', 'user', 'email'), self)
Of course, this is equivalent to
self.request.user.email
but it's useful when your code needs to accept an arbitrary list of attributes.
(Chained attributes of arbitrary length are common when dealing with Django models.)

reduce is useful when you need to find the union or intersection of a sequence of set-like objects.
>>> reduce(operator.or_, ({1}, {1, 2}, {1, 3})) # union
{1, 2, 3}
>>> reduce(operator.and_, ({1}, {1, 2}, {1, 3})) # intersection
{1}
(Apart from actual sets, an example of these are Django's Q objects.)
On the other hand, if you're dealing with bools, you should use any and all:
>>> any((True, False, True))
True

I'm writing a compose function for a language, so I construct the composed function using reduce along with my apply operator.
In a nutshell, compose takes a list of functions to compose into a single function. If I have a complex operation that is applied in stages, I want to put it all together like so:
complexop = compose(stage4, stage3, stage2, stage1)
This way, I can then apply it to an expression like so:
complexop(expression)
And I want it to be equivalent to:
stage4(stage3(stage2(stage1(expression))))
Now, to build my internal objects, I want it to say:
Lambda([Symbol('x')], Apply(stage4, Apply(stage3, Apply(stage2, Apply(stage1, Symbol('x'))))))
(The Lambda class builds a user-defined function, and Apply builds a function application.)
Now, reduce, unfortunately, folds the wrong way, so I wound up using, roughly:
reduce(lambda x,y: Apply(y, x), reversed(args + [Symbol('x')]))
To figure out what reduce produces, try these in the REPL:
reduce(lambda x, y: (x, y), range(1, 11))
reduce(lambda x, y: (y, x), reversed(range(1, 11)))

reduce can be used to get the list with the maximum nth element
reduce(lambda x,y: x if x[2] > y[2] else y,[[1,2,3,4],[5,2,5,7],[1,6,0,2]])
would return [5, 2, 5, 7] as it is the list with max 3rd element +

Reduce isn't limited to scalar operations; it can also be used to sort things into buckets. (This is what I use reduce for most often).
Imagine a case in which you have a list of objects, and you want to re-organize it hierarchically based on properties stored flatly in the object. In the following example, I produce a list of metadata objects related to articles in an XML-encoded newspaper with the articles function. articles generates a list of XML elements, and then maps through them one by one, producing objects that hold some interesting info about them. On the front end, I'm going to want to let the user browse the articles by section/subsection/headline. So I use reduce to take the list of articles and return a single dictionary that reflects the section/subsection/article hierarchy.
from lxml import etree
from Reader import Reader
class IssueReader(Reader):
def articles(self):
arts = self.q('//div3') # inherited ... runs an xpath query against the issue
subsection = etree.XPath('./ancestor::div2/#type')
section = etree.XPath('./ancestor::div1/#type')
header_text = etree.XPath('./head//text()')
return map(lambda art: {
'text_id': self.id,
'path': self.getpath(art)[0],
'subsection': (subsection(art)[0] or '[none]'),
'section': (section(art)[0] or '[none]'),
'headline': (''.join(header_text(art)) or '[none]')
}, arts)
def by_section(self):
arts = self.articles()
def extract(acc, art): # acc for accumulator
section = acc.get(art['section'], False)
if section:
subsection = acc.get(art['subsection'], False)
if subsection:
subsection.append(art)
else:
section[art['subsection']] = [art]
else:
acc[art['section']] = {art['subsection']: [art]}
return acc
return reduce(extract, arts, {})
I give both functions here because I think it shows how map and reduce can complement each other nicely when dealing with objects. The same thing could have been accomplished with a for loop, ... but spending some serious time with a functional language has tended to make me think in terms of map and reduce.
By the way, if anybody has a better way to set properties like I'm doing in extract, where the parents of the property you want to set might not exist yet, please let me know.

Not sure if this is what you are after but you can search source code on Google.
Follow the link for a search on 'function:reduce() lang:python' on Google Code search
At first glance the following projects use reduce()
MoinMoin
Zope
Numeric
ScientificPython
etc. etc. but then these are hardly surprising since they are huge projects.
The functionality of reduce can be done using function recursion which I guess Guido thought was more explicit.
Update:
Since Google's Code Search was discontinued on 15-Jan-2012, besides reverting to regular Google searches, there's something called Code Snippets Collection that looks promising. A number of other resources are mentioned in answers this (closed) question Replacement for Google Code Search?.
Update 2 (29-May-2017):
A good source for Python examples (in open-source code) is the Nullege search engine.

After grepping my code, it seems the only thing I've used reduce for is calculating the factorial:
reduce(operator.mul, xrange(1, x+1) or (1,))

import os
files = [
# full filenames
"var/log/apache/errors.log",
"home/kane/images/avatars/crusader.png",
"home/jane/documents/diary.txt",
"home/kane/images/selfie.jpg",
"var/log/abc.txt",
"home/kane/.vimrc",
"home/kane/images/avatars/paladin.png",
]
# unfolding of plain filiname list to file-tree
fs_tree = ({}, # dict of folders
[]) # list of files
for full_name in files:
path, fn = os.path.split(full_name)
reduce(
# this fucction walks deep into path
# and creates placeholders for subfolders
lambda d, k: d[0].setdefault(k, # walk deep
({}, [])), # or create subfolder storage
path.split(os.path.sep),
fs_tree
)[1].append(fn)
print fs_tree
#({'home': (
# {'jane': (
# {'documents': (
# {},
# ['diary.txt']
# )},
# []
# ),
# 'kane': (
# {'images': (
# {'avatars': (
# {},
# ['crusader.png',
# 'paladin.png']
# )},
# ['selfie.jpg']
# )},
# ['.vimrc']
# )},
# []
# ),
# 'var': (
# {'log': (
# {'apache': (
# {},
# ['errors.log']
# )},
# ['abc.txt']
# )},
# [])
#},
#[])

I just found useful usage of reduce: splitting string without removing the delimiter. The code is entirely from Programatically Speaking blog. Here's the code:
reduce(lambda acc, elem: acc[:-1] + [acc[-1] + elem] if elem == "\n" else acc + [elem], re.split("(\n)", "a\nb\nc\n"), [])
Here's the result:
['a\n', 'b\n', 'c\n', '']
Note that it handles edge cases that popular answer in SO doesn't. For more in-depth explanation, I am redirecting you to original blog post.

I used reduce to concatenate a list of PostgreSQL search vectors with the || operator in sqlalchemy-searchable:
vectors = (self.column_vector(getattr(self.table.c, column_name))
for column_name in self.indexed_columns)
concatenated = reduce(lambda x, y: x.op('||')(y), vectors)
compiled = concatenated.compile(self.conn)

I have an old Python implementation of pipegrep that uses reduce and the glob module to build a list of files to process:
files = []
files.extend(reduce(lambda x, y: x + y, map(glob.glob, args)))
I found it handy at the time, but it's really not necessary, as something similar is just as good, and probably more readable
files = []
for f in args:
files.extend(glob.glob(f))

Let say that there are some yearly statistic data stored a list of Counters.
We want to find the MIN/MAX values in each month across the different years.
For example, for January it would be 10. And for February it would be 15.
We need to store the results in a new Counter.
from collections import Counter
stat2011 = Counter({"January": 12, "February": 20, "March": 50, "April": 70, "May": 15,
"June": 35, "July": 30, "August": 15, "September": 20, "October": 60,
"November": 13, "December": 50})
stat2012 = Counter({"January": 36, "February": 15, "March": 50, "April": 10, "May": 90,
"June": 25, "July": 35, "August": 15, "September": 20, "October": 30,
"November": 10, "December": 25})
stat2013 = Counter({"January": 10, "February": 60, "March": 90, "April": 10, "May": 80,
"June": 50, "July": 30, "August": 15, "September": 20, "October": 75,
"November": 60, "December": 15})
stat_list = [stat2011, stat2012, stat2013]
print reduce(lambda x, y: x & y, stat_list) # MIN
print reduce(lambda x, y: x | y, stat_list) # MAX

I have objects representing some kind of overlapping intervals (genomic exons), and redefined their intersection using __and__:
class Exon:
def __init__(self):
...
def __and__(self,other):
...
length = self.length + other.length # (e.g.)
return self.__class__(...length,...)
Then when I have a collection of them (for instance, in the same gene), I use
intersection = reduce(lambda x,y: x&y, exons)

def dump(fname,iterable):
with open(fname,'w') as f:
reduce(lambda x, y: f.write(unicode(y,'utf-8')), iterable)

Using reduce() to find out if a list of dates are consecutive:
from datetime import date, timedelta
def checked(d1, d2):
"""
We assume the date list is sorted.
If d2 & d1 are different by 1, everything up to d2 is consecutive, so d2
can advance to the next reduction.
If d2 & d1 are not different by 1, returning d1 - 1 for the next reduction
will guarantee the result produced by reduce() to be something other than
the last date in the sorted date list.
Definition 1: 1/1/14, 1/2/14, 1/2/14, 1/3/14 is consider consecutive
Definition 2: 1/1/14, 1/2/14, 1/2/14, 1/3/14 is consider not consecutive
"""
#if (d2 - d1).days == 1 or (d2 - d1).days == 0: # for Definition 1
if (d2 - d1).days == 1: # for Definition 2
return d2
else:
return d1 + timedelta(days=-1)
# datelist = [date(2014, 1, 1), date(2014, 1, 3),
# date(2013, 12, 31), date(2013, 12, 30)]
# datelist = [date(2014, 2, 19), date(2014, 2, 19), date(2014, 2, 20),
# date(2014, 2, 21), date(2014, 2, 22)]
datelist = [date(2014, 2, 19), date(2014, 2, 21),
date(2014, 2, 22), date(2014, 2, 20)]
datelist.sort()
if datelist[-1] == reduce(checked, datelist):
print "dates are consecutive"
else:
print "dates are not consecutive"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is it possible to initialize an empty default value on leftOuterJoin? - python

One way you can approach this problem is to leverage lexicographical ordering of Python lists. Since empty list is always "less than" non empty one we can simply make an union and reduce with max: temp.union(name_to_hour).reduceByKey(max) This of course assumes that the keys are unique.

Related

Replace specific character of item in list - Python

How to make a Custom Sorting Function for Dictionary Key Values?

Python: list comprehension to produce consecutive sublists of even numbers

How is sorted(key=lambda x:) implemented behind the scene?

Useful code which uses reduce()? [closed]

Categories

Resources