Pythonic way of defining the function - python

A homework assignment asks us to write some functions, namely orSearch and andSearch .
"""
Input: an inverse index, as created by makeInverseIndex, and a list of words to query
Output: the set of document ids that contain _any_ of the specified words
Feel free to use a loop instead of a comprehension.
>>> idx = makeInverseIndex(['Johann Sebastian Bach', 'Johannes Brahms', 'Johann Strauss the Younger', 'Johann Strauss the Elder', ' Johann Christian Bach', 'Carl Philipp Emanuel Bach'])
>>> orSearch(idx, ['Bach','the'])
{0, 2, 3, 4, 5}
>>> orSearch(idx, ['Johann', 'Carl'])
{0, 2, 3, 4, 5}
"""
Given above is the documentation of orSearch similarly in andSearch we return only those set of docs which contains all instances of the query list.
We can assume that the inverse index has already been provided. An example of an inverse index for ['hello world','hello','hello cat','hellolot of cats'] is {'hello': {0, 1, 2}, 'cat': {2}, 'of': {3}, 'world': {0}, 'cats': {3}, 'hellolot': {3}}
So my question is, I was able to write a single line comprehension for the orSearch method given by
def orSearch(inverseIndex, query):
return {index for word in query if word in inverseIndex.keys() for index in inverseIndex[word]}
But I am unable to think of the most pythonic way of writing andSearch. I have written the following code, it works but I guess it is not that pythonic
def andSearch(inverseIndex, query):
if len(query) != 0:
result = inverseIndex[query[0]]
else:
result = set()
for word in query:
if word in inverseIndex.keys():
result = result & inverseIndex[word]
return result
Any suggestions on more compact code for andSearch ?

Rewrite orSearch() to use any() to find any of the terms, and then derive andSearch() by modifying your solution to use all() instead to find all of the terms.

More Pythonic way to write andSerch() will be:
from functools import reduce
def andSearch(inverseIndex, query):
return reduce(lambda x, y: x & y, [(inverseIndex[key]) for key in query])
Here we used reduce function to aggregate results of transitional calculations.
Also it may be useful to check if all items of query are in inverseIndex. Then our function will look like
from functools import reduce
def andSearch(inverseIndex, query):
if set(query) < set(inverseIndex.keys()):
return reduce(lambda x, y: x & y, [(inverseIndex[key]) for key in query])
else:
return False # or what ever is meaningful to return

Related

How to reduce on a list of tuples in python

I have an array and I want to count the occurrence of each item in the array.
I have managed to use a map function to produce a list of tuples.
def mapper(a):
return (a, 1)
r = list(map(lambda a: mapper(a), arr));
//output example:
//(11817685, 1), (2014036792, 1), (2014047115, 1), (11817685, 1)
I'm expecting the reduce function can help me to group counts by the first number (id) in each tuple. For example:
(11817685, 2), (2014036792, 1), (2014047115, 1)
I tried
cnt = reduce(lambda a, b: a + b, r);
and some other ways but they all don't do the trick.
NOTE
Thanks for all the advice on other ways to solve the problems, but I'm just learning Python and how to implement a map-reduce here, and I have simplified my real business problem a lot to make it easy to understand, so please kindly show me a correct way of doing map-reduce.
You could use Counter:
from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
counter = Counter(arr)
print zip(counter.keys(), counter.values())
EDIT:
As pointed by #ShadowRanger Counter has items() method:
from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
print Counter(arr).items()
Instead of using any external module you can use some logic and do it without any module:
track={}
if intr not in track:
track[intr]=1
else:
track[intr]+=1
Example code :
For these types of list problems there is a pattern :
So suppose you have a list :
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
And you want to convert this to a dict as the first element of the tuple as key and second element of the tuple. something like :
{2008: [9], 2006: [5], 2007: [4]}
But there is a catch you also want that those keys which have different values but keys are same like (2006,1) and (2006,5) keys are same but values are different. you want that those values append with only one key so expected output :
{2008: [9], 2006: [1, 5], 2007: [4]}
for this type of problem we do something like this:
first create a new dict then we follow this pattern:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
So we first check if key is in new dict and if it already then add the value of duplicate key to its value:
full code:
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
new_dict={}
for item in a:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
print(new_dict)
output:
{2008: [9], 2006: [1, 5], 2007: [4]}
After writing my answer to a different question, I remembered this post and thought it would be helpful to write a similar answer here.
Here is a way to use reduce on your list to get the desired output.
arr = [11817685, 2014036792, 2014047115, 11817685]
def mapper(a):
return (a, 1)
def reducer(x, y):
if isinstance(x, dict):
ykey, yval = y
if ykey not in x:
x[ykey] = yval
else:
x[ykey] += yval
return x
else:
xkey, xval = x
ykey, yval = y
a = {xkey: xval}
if ykey in a:
a[ykey] += yval
else:
a[ykey] = yval
return a
mapred = reduce(reducer, map(mapper, arr))
print mapred.items()
Which prints:
[(2014036792, 1), (2014047115, 1), (11817685, 2)]
Please see the linked answer for a more detailed explanation.
If all you need is cnt, then a dict would probably be better than a list of tuples here (if you need this format, just use dict.items).
The collections module has a useful data structure for this, a defaultdict.
from collections import defaultdict
cnt = defaultdict(int) # create a default dict where the default value is
# the result of calling int
for key in arr:
cnt[key] += 1 # if key is not in cnt, it will put in the default
# cnt_list = list(cnt.items())

Extending python dictionary and changing key values

Assume I have a python dictionary with 2 keys.
dic = {0:'Hi!', 1:'Hello!'}
What I want to do is to extend this dictionary by duplicating itself, but change the key value.
For example, if I have a code
dic = {0:'Hi!', 1:'Hello'}
multiplier = 3
def DictionaryExtend(number_of_multiplier, dictionary):
"Function code"
then the result should look like
>>> DictionaryExtend(multiplier, dic)
>>> dic
>>> dic = {0:'Hi!', 1:'Hello', 2:'Hi!', 3:'Hello', 4:'Hi!', 5:'Hello'}
In this case, I changed the key values by adding the multipler at each duplication step. What's the efficient way of doing this?
Plus, I'm also planning to do the same job for list variable. I mean, extend a list by duplicating itself and change some values like above exmple. Any suggestion for this would be helpful, too!
You can try itertools to repeat the values and OrderedDict to maintain input order.
import itertools as it
import collections as ct
def extend_dict(multiplier, dict_):
"""Return a dictionary of repeated values."""
return dict(enumerate(it.chain(*it.repeat(dict_.values(), multiplier))))
d = ct.OrderedDict({0:'Hi!', 1:'Hello!'})
multiplier = 3
extend_dict(multiplier, d)
# {0: 'Hi!', 1: 'Hello!', 2: 'Hi!', 3: 'Hello!', 4: 'Hi!', 5: 'Hello!'}
Regarding handling other collection types, it is not clear what output is desired, but the following modification reproduces the latter and works for lists as well:
def extend_collection(multiplier, iterable):
"""Return a collection of repeated values."""
repeat_values = lambda x: it.chain(*it.repeat(x, multiplier))
try:
iterable = iterable.values()
except AttributeError:
result = list(repeat_values(iterable))
else:
result = dict(enumerate(repeat_values(iterable)))
return result
lst = ['Hi!', 'Hello!']
multiplier = 3
extend_collection(multiplier, lst)
# ['Hi!', 'Hello!', 'Hi!', 'Hello!', 'Hi!', 'Hello!']
It's not immediately clear why you might want to do this. If the keys are always consecutive integers then you probably just want a list.
Anyway, here's a snippet:
def dictExtender(multiplier, d):
return dict(zip(range(multiplier * len(d)), list(d.values()) * multiplier))
I don't think you need to use inheritance to achieve that. It's also unclear what the keys should be in the resulting dictionary.
If the keys are always consecutive integers, then why not use a list?
origin = ['Hi', 'Hello']
extended = origin * 3
extended
>> ['Hi', 'Hello', 'Hi', 'Hello', 'Hi', 'Hello']
extended[4]
>> 'Hi'
If you want to perform a different operation with the keys, then simply:
mult_key = lambda key: [key,key+2,key+4] # just an example, this can be any custom implementation but beware of duplicate keys
dic = {0:'Hi', 1:'Hello'}
extended = { mkey:dic[key] for key in dic for mkey in mult_key(key) }
extended
>> {0:'Hi', 1:'Hello', 2:'Hi', 3:'Hello', 4:'Hi', 5:'Hello'}
You don't need to extend anything, you need to pick a better input format or a more appropriate type.
As others have mentioned, you need a list, not an extended dict or OrderedDict. Here's an example with lines.txt:
1:Hello!
0: Hi.
2: pylang
And here's a way to parse the lines in the correct order:
def extract_number_and_text(line):
number, text = line.split(':')
return (int(number), text.strip())
with open('lines.txt') as f:
lines = f.readlines()
data = [extract_number_and_text(line) for line in lines]
print(data)
# [(1, 'Hello!'), (0, 'Hi.'), (2, 'pylang')]
sorted_text = [text for i,text in sorted(data)]
print(sorted_text)
# ['Hi.', 'Hello!', 'pylang']
print(sorted_text * 2)
# ['Hi.', 'Hello!', 'pylang', 'Hi.', 'Hello!', 'pylang']
print(list(enumerate(sorted_text * 2)))
# [(0, 'Hi.'), (1, 'Hello!'), (2, 'pylang'), (3, 'Hi.'), (4, 'Hello!'), (5, 'pylang')]

Algorithmics issue, python string, no idea

I have algorithm problem with Python and strings.
My issue:
My function should sum maximum values of substring.
For example:
ae-afi-re-fi -> 2+6+3+5=16
but
ae-a-fi-re-fi -> 2-10+5+3+5=5
I try use string.count function and counting substring, but this method is not good.
What would be the best way to do this in Python? Thanks in advance.
string = "aeafirefi"
Sum the value of substrings.
In my solution i'll use permutations from itertools module in order to list all the possible permutations of substrings that you gave in your question presented into a dict called vals. Then iterate through the input string and split the strings by all the permutations found below. Then sum the values of each permutations and finally get the max.
PS: The key of this solution is the get_sublists() method.
This is an example with some tests:
from itertools import permutations
def get_sublists(a, perm_vals):
# Find the sublists in the input string
# Based on the permutations of the dict vals.keys()
for k in perm_vals:
if k in a:
a = ''.join(a.split(k))
# Yield the sublist if we found any
yield k
def sum_sublists(a, sub, vals):
# Join the sublist and compare it to the input string
# Get the difference by lenght
diff = len(a) - len(''.join(sub))
# Sum the value of each sublist (on every permutation)
return sub , sum(vals[k] for k in sub) - diff * 10
def get_max_sum_sublists(a, vals):
# Get all the possible permutations
perm_vals = permutations(vals.keys())
# Remove duplicates if there is any
sub = set(tuple(get_sublists(a, k)) for k in perm_vals)
# Get the sum of each possible permutation
aa = (sum_sublists(a, k, vals) for k in sub)
# return the max of the above operation
return max(aa, key= lambda x: x[1])
vals = {'ae': 2, 'qd': 3, 'qdd': 5, 'fir': 4, 'afi': 6, 're': 3, 'fi': 5}
# Test
a = "aeafirefi"
final, s = get_max_sum_sublists(a, vals)
print("Sublists: {}\nSum: {}".format(final, s))
print('----')
a = "aeafirefiqdd"
final, s = get_max_sum_sublists(a, vals)
print("Sublists: {}\nSum: {}".format(final, s))
print('----')
a = "aeafirefiqddks"
final, s = get_max_sum_sublists(a, vals)
print("Sublists: {}\nSum: {}".format(final, s))
Output:
Sublists: ('ae', 'afi', 're', 'fi')
Sum: 16
----
Sublists: ('afi', 'ae', 'qdd', 're', 'fi')
Sum: 21
----
Sublists: ('afi', 'ae', 'qdd', 're', 'fi')
Sum: 1
Please try this solution with many input strings as you can and don't hesitate to comment if you found any wrong result.
Probably having a dictionary with:
key = substring: value = value
So if you have:
string = "aeafirefi"
first you look for the whole string in the dictionary, if you don't find it, you cut the last letter so you have "aeafiref", until you find a substring or you have an only letter.
then you skip the letters used: for example, if you found "aeaf", you start all over again using string = "iref".
Here's a brute force solution:
values_dict = {
'ae': 2,
'qd': 3,
'qdd': 5,
'fir': 4,
'afi': 6,
're': 3,
'fi': 5
}
def get_value(x):
return values_dict[x] if x in values_dict else -10
def next_tokens(s):
"""Returns possible tokens"""
# Return any tokens in values_dict
for x in values_dict.keys():
if s.startswith(x):
yield x
# Return single character.
yield s[0]
def permute(s, stack=[]):
"""Returns all possible variations"""
if len(s) == 0:
yield stack
return
for token in next_tokens(s):
perms = permute(s[len(token):], stack + [token])
for perm in perms:
yield perm
def process_string(s):
def process_tokens(tokens):
return sum(map(get_value, tokens))
return max(map(process_tokens, permute(s)))
print('Max: {}'.format(process_string('aeafirefi')))

How to hash strings in python to match within 1 character?

I've read about LSH hashing and am wondering what is the best implementation to match strings within 1 character?
test = {'dog':1, 'cat': 2, 'eagle': 3}
test['dog']
>> 1
I would want to also return 1 if I lookup test['dogs'] or test['dogg']. I realize that it would also return 1 if I were to look up "log" or "cog", but I can write a method to exclude those results.
Also how can I further this method for general strings to return a match within X characters?
string1 = "brown dogs"
string2 = "brown doggie"
Assuming only string1 is stored in my dictionary, a lookup for string2 would return string1.
Thanks
Well, you can define the similarity between 2 strings by the length of the start they share in common (3 for doga and dogs, for instance). This is simplistic, but that could fit your needs.
With this assumption, you can define this:
>>> test = {'dog':1, 'cat': 2, 'eagle': 3}
>>> def same_start(s1, s2):
ret = 0
for i in range(min(len(s1), len(s2))):
if s1[i] != s2[i]:
break
ret += 1
return ret
>>> def closest_match(s):
return max(((k, v, same_start(k, s)) for k, v in test.iteritems()), key=lambda x: x[2])[1]
>>> closest_match('dogs') # matches dog
1
>>> closest_match('cogs') # matches cat
2
>>> closest_match('eaogs') # matches eagle
3
>>>
Maybe you could try using a Soundex function as your dictionary key?
Since your relation is not 1:1, maybe you could define your own dict type with redefined __getitem__ which could return a list of possible items. Here's what I mean:
class MyDict(dict):
def __getitem__(self, key):
l = []
for k, v in self.items():
if key.startswith(k): # or some other comparation method
l.append(v)
return l
This is just an idea, probably other dict methods should be redefined too in order to avoid possible errors or infinite loops. Also, #Emmanuel's answer could be very useful here if you want only one item returned instead of the list, and that way you wouldn't have to redefine everything.

Map list onto dictionary

Is there a way to map a list onto a dictionary? What I want to do is give it a function that will return the name of a key, and the value will be the original value. For example;
somefunction(lambda a: a[0], ["hello", "world"])
=> {"h":"hello", "w":"world"}
(This isn't a specific example that I want to do, I want a generic function like map() that can do this)
In Python 3 you can use this dictionary comprehension syntax:
def foo(somelist):
return {x[0]:x for x in somelist}
I don't think a standard function exists that does exactly that, but it's very easy to construct one using the dict builtin and a comprehension:
def somefunction(keyFunction, values):
return dict((keyFunction(v), v) for v in values)
print somefunction(lambda a: a[0], ["hello", "world"])
Output:
{'h': 'hello', 'w': 'world'}
But coming up with a good name for this function is more difficult than implementing it. I'll leave that as an exercise for the reader.
If I understand your question correctly, I believe you can accomplish this with a combination of map, zip, and the dict constructor:
def dictMap(f, xs) :
return dict(zip(map(f, xs), xs)
And a saner implementation :
def dictMap(f, xs) :
return dict((f(i), i) for i in xs)
Taking hints from other answers I achieved this using map operation. I am not sure if this exactly answers your question.
mylist = ["hello", "world"]
def convert_to_dict( somelist ):
return dict( map( lambda x: (x[0], x), somelist ) )
final_ans = convert_to_dict( mylist )
print final_ans
If you want a general function to do this, then you're asking almost the right question. Your example doesn't specify what happens when the key function produces duplicates, though. Do you keep the last one? The first one? Do you actually want to make a list of all the words that start with the same letter? These questions are probably best answered by the user of the function, not the designer.
Parametrizing over these results in a more complicated, but very general, function. Here's one that I've used for several years:
def reduce_list(key, update_value, default_value, l):
"""Reduce a list to a dict.
key :: list_item -> dict_key
update_value :: key * existing_value -> updated_value
default_value :: initial value passed to update_value
l :: The list
default_value comes before l. This is different from functools.reduce,
because functools.reduce's order is wrong.
"""
d = {}
for k in l:
j = key(k)
d[j] = update_value(k, d.get(j, default_value))
return d
Then you can write your function by saying:
reduce_list(lambda s:s, lambda s,old:s[0], '', ['hello', 'world'])
# OR
reduce_list(lambda s:s, lambda s,old: old or s[0], '', ['hello', 'world'])
Depending on whether you want to keep the first or last word starting with, for example, 'h'.
This function is very general, though, so most of the time it's the basis for other functions, like group_dict or histogram:
def group_dict(l):
return reduce_list(lambda x:x, lambda x,old: [x] + old, [], l)
def histogram(l):
return reduce_list(lambda x:x, lambda x,total: total + 1, 0, l)
>>> dict((a[0], a) for a in "hello world".split())
{'h': 'hello', 'w': 'world'}
If you want to use a function instead of subscripting, use operator.itemgetter:
>>> from operator import itemgetter
>>> first = itemgetter(0)
>>> dict((first(x), x) for x in "hello world".split())
{'h': 'hello', 'w': 'world'}
Or as a function:
>>> dpair = lambda x : (first(x), x)
>>> dict(dpair(x) for x in "hello world".split())
{'h': 'hello', 'w': 'world'}
Finally, if you want more than one word per letter as a possibility, use collections.defaultdict
>>> from collections import defaultdict
>>> words = defaultdict(set)
>>> addword = lambda x : words[first(x)].add(x)
>>> for word in "hello house home hum world wry wraught".split():
addword(word)
>>> print words['h']
set(['house', 'hello', 'hum', 'home'])

Categories