Is there a way to map a list onto a dictionary? What I want to do is give it a function that will return the name of a key, and the value will be the original value. For example;
somefunction(lambda a: a[0], ["hello", "world"])
=> {"h":"hello", "w":"world"}
(This isn't a specific example that I want to do, I want a generic function like map() that can do this)
In Python 3 you can use this dictionary comprehension syntax:
def foo(somelist):
return {x[0]:x for x in somelist}
I don't think a standard function exists that does exactly that, but it's very easy to construct one using the dict builtin and a comprehension:
def somefunction(keyFunction, values):
return dict((keyFunction(v), v) for v in values)
print somefunction(lambda a: a[0], ["hello", "world"])
Output:
{'h': 'hello', 'w': 'world'}
But coming up with a good name for this function is more difficult than implementing it. I'll leave that as an exercise for the reader.
If I understand your question correctly, I believe you can accomplish this with a combination of map, zip, and the dict constructor:
def dictMap(f, xs) :
return dict(zip(map(f, xs), xs)
And a saner implementation :
def dictMap(f, xs) :
return dict((f(i), i) for i in xs)
Taking hints from other answers I achieved this using map operation. I am not sure if this exactly answers your question.
mylist = ["hello", "world"]
def convert_to_dict( somelist ):
return dict( map( lambda x: (x[0], x), somelist ) )
final_ans = convert_to_dict( mylist )
print final_ans
If you want a general function to do this, then you're asking almost the right question. Your example doesn't specify what happens when the key function produces duplicates, though. Do you keep the last one? The first one? Do you actually want to make a list of all the words that start with the same letter? These questions are probably best answered by the user of the function, not the designer.
Parametrizing over these results in a more complicated, but very general, function. Here's one that I've used for several years:
def reduce_list(key, update_value, default_value, l):
"""Reduce a list to a dict.
key :: list_item -> dict_key
update_value :: key * existing_value -> updated_value
default_value :: initial value passed to update_value
l :: The list
default_value comes before l. This is different from functools.reduce,
because functools.reduce's order is wrong.
"""
d = {}
for k in l:
j = key(k)
d[j] = update_value(k, d.get(j, default_value))
return d
Then you can write your function by saying:
reduce_list(lambda s:s, lambda s,old:s[0], '', ['hello', 'world'])
# OR
reduce_list(lambda s:s, lambda s,old: old or s[0], '', ['hello', 'world'])
Depending on whether you want to keep the first or last word starting with, for example, 'h'.
This function is very general, though, so most of the time it's the basis for other functions, like group_dict or histogram:
def group_dict(l):
return reduce_list(lambda x:x, lambda x,old: [x] + old, [], l)
def histogram(l):
return reduce_list(lambda x:x, lambda x,total: total + 1, 0, l)
>>> dict((a[0], a) for a in "hello world".split())
{'h': 'hello', 'w': 'world'}
If you want to use a function instead of subscripting, use operator.itemgetter:
>>> from operator import itemgetter
>>> first = itemgetter(0)
>>> dict((first(x), x) for x in "hello world".split())
{'h': 'hello', 'w': 'world'}
Or as a function:
>>> dpair = lambda x : (first(x), x)
>>> dict(dpair(x) for x in "hello world".split())
{'h': 'hello', 'w': 'world'}
Finally, if you want more than one word per letter as a possibility, use collections.defaultdict
>>> from collections import defaultdict
>>> words = defaultdict(set)
>>> addword = lambda x : words[first(x)].add(x)
>>> for word in "hello house home hum world wry wraught".split():
addword(word)
>>> print words['h']
set(['house', 'hello', 'hum', 'home'])
Related
I have a data like this, and I want to add String with comma separated and sum int values.
data=[{"string": "x","int": 1},
{"string": "y","int": 2},
{"string": "z","int": 3}]
I'am expecting an output some thing like this.
Output:
{ "string":"x,y,z","int":"6"}
I tried using reduce function
func = lambda x, y: dict((m, n + y[m]) for m, n in x.items() )
print reduce(func, data)
and i am getting something like this.
{"string": "xyz", "int": "6"}
How to get string with comma separated.
func = lambda x, y: dict((m, n + y[m]) for m, n in x.items() )
You need a custom function to replace n+y[m] (let's say custom_add(a,b)), which,
if arguments are integers to return algebraic sum of them
if arguments are strings, to join them with ',' and return final string
let's implement it.
def custom_join(a,b):
arr = list((a,b))
return sum(arr) if is_int_array(arr) else ','.join(arr)
we have no is_int_array/1 yet. let's do it now.
def is_int_array(arr):
return all(i for i in map(is_int, arr))
no is_int/1. let's do it
def is_int(e):
return isinstance(e, int)
do the same things for strings
def is_str(e):
return isinstance(e, str)
def is_str_array(arr):
return all(i for i in map(is_str, arr))
Summing all of them - https://repl.it/LPRR
OK, this is insane but when you try to implement functional-only approach, you need to be ready such situations -)))
You can use str.join() and sum() with some generator expressions like this:
res = {"string": ','.join(d['string'] for d in data), "int": sum(d['int'] for d in data)}
Output:
>>> res
{'int': 6, 'string': 'x,y,z'}
Assume I have a python dictionary with 2 keys.
dic = {0:'Hi!', 1:'Hello!'}
What I want to do is to extend this dictionary by duplicating itself, but change the key value.
For example, if I have a code
dic = {0:'Hi!', 1:'Hello'}
multiplier = 3
def DictionaryExtend(number_of_multiplier, dictionary):
"Function code"
then the result should look like
>>> DictionaryExtend(multiplier, dic)
>>> dic
>>> dic = {0:'Hi!', 1:'Hello', 2:'Hi!', 3:'Hello', 4:'Hi!', 5:'Hello'}
In this case, I changed the key values by adding the multipler at each duplication step. What's the efficient way of doing this?
Plus, I'm also planning to do the same job for list variable. I mean, extend a list by duplicating itself and change some values like above exmple. Any suggestion for this would be helpful, too!
You can try itertools to repeat the values and OrderedDict to maintain input order.
import itertools as it
import collections as ct
def extend_dict(multiplier, dict_):
"""Return a dictionary of repeated values."""
return dict(enumerate(it.chain(*it.repeat(dict_.values(), multiplier))))
d = ct.OrderedDict({0:'Hi!', 1:'Hello!'})
multiplier = 3
extend_dict(multiplier, d)
# {0: 'Hi!', 1: 'Hello!', 2: 'Hi!', 3: 'Hello!', 4: 'Hi!', 5: 'Hello!'}
Regarding handling other collection types, it is not clear what output is desired, but the following modification reproduces the latter and works for lists as well:
def extend_collection(multiplier, iterable):
"""Return a collection of repeated values."""
repeat_values = lambda x: it.chain(*it.repeat(x, multiplier))
try:
iterable = iterable.values()
except AttributeError:
result = list(repeat_values(iterable))
else:
result = dict(enumerate(repeat_values(iterable)))
return result
lst = ['Hi!', 'Hello!']
multiplier = 3
extend_collection(multiplier, lst)
# ['Hi!', 'Hello!', 'Hi!', 'Hello!', 'Hi!', 'Hello!']
It's not immediately clear why you might want to do this. If the keys are always consecutive integers then you probably just want a list.
Anyway, here's a snippet:
def dictExtender(multiplier, d):
return dict(zip(range(multiplier * len(d)), list(d.values()) * multiplier))
I don't think you need to use inheritance to achieve that. It's also unclear what the keys should be in the resulting dictionary.
If the keys are always consecutive integers, then why not use a list?
origin = ['Hi', 'Hello']
extended = origin * 3
extended
>> ['Hi', 'Hello', 'Hi', 'Hello', 'Hi', 'Hello']
extended[4]
>> 'Hi'
If you want to perform a different operation with the keys, then simply:
mult_key = lambda key: [key,key+2,key+4] # just an example, this can be any custom implementation but beware of duplicate keys
dic = {0:'Hi', 1:'Hello'}
extended = { mkey:dic[key] for key in dic for mkey in mult_key(key) }
extended
>> {0:'Hi', 1:'Hello', 2:'Hi', 3:'Hello', 4:'Hi', 5:'Hello'}
You don't need to extend anything, you need to pick a better input format or a more appropriate type.
As others have mentioned, you need a list, not an extended dict or OrderedDict. Here's an example with lines.txt:
1:Hello!
0: Hi.
2: pylang
And here's a way to parse the lines in the correct order:
def extract_number_and_text(line):
number, text = line.split(':')
return (int(number), text.strip())
with open('lines.txt') as f:
lines = f.readlines()
data = [extract_number_and_text(line) for line in lines]
print(data)
# [(1, 'Hello!'), (0, 'Hi.'), (2, 'pylang')]
sorted_text = [text for i,text in sorted(data)]
print(sorted_text)
# ['Hi.', 'Hello!', 'pylang']
print(sorted_text * 2)
# ['Hi.', 'Hello!', 'pylang', 'Hi.', 'Hello!', 'pylang']
print(list(enumerate(sorted_text * 2)))
# [(0, 'Hi.'), (1, 'Hello!'), (2, 'pylang'), (3, 'Hi.'), (4, 'Hello!'), (5, 'pylang')]
A homework assignment asks us to write some functions, namely orSearch and andSearch .
"""
Input: an inverse index, as created by makeInverseIndex, and a list of words to query
Output: the set of document ids that contain _any_ of the specified words
Feel free to use a loop instead of a comprehension.
>>> idx = makeInverseIndex(['Johann Sebastian Bach', 'Johannes Brahms', 'Johann Strauss the Younger', 'Johann Strauss the Elder', ' Johann Christian Bach', 'Carl Philipp Emanuel Bach'])
>>> orSearch(idx, ['Bach','the'])
{0, 2, 3, 4, 5}
>>> orSearch(idx, ['Johann', 'Carl'])
{0, 2, 3, 4, 5}
"""
Given above is the documentation of orSearch similarly in andSearch we return only those set of docs which contains all instances of the query list.
We can assume that the inverse index has already been provided. An example of an inverse index for ['hello world','hello','hello cat','hellolot of cats'] is {'hello': {0, 1, 2}, 'cat': {2}, 'of': {3}, 'world': {0}, 'cats': {3}, 'hellolot': {3}}
So my question is, I was able to write a single line comprehension for the orSearch method given by
def orSearch(inverseIndex, query):
return {index for word in query if word in inverseIndex.keys() for index in inverseIndex[word]}
But I am unable to think of the most pythonic way of writing andSearch. I have written the following code, it works but I guess it is not that pythonic
def andSearch(inverseIndex, query):
if len(query) != 0:
result = inverseIndex[query[0]]
else:
result = set()
for word in query:
if word in inverseIndex.keys():
result = result & inverseIndex[word]
return result
Any suggestions on more compact code for andSearch ?
Rewrite orSearch() to use any() to find any of the terms, and then derive andSearch() by modifying your solution to use all() instead to find all of the terms.
More Pythonic way to write andSerch() will be:
from functools import reduce
def andSearch(inverseIndex, query):
return reduce(lambda x, y: x & y, [(inverseIndex[key]) for key in query])
Here we used reduce function to aggregate results of transitional calculations.
Also it may be useful to check if all items of query are in inverseIndex. Then our function will look like
from functools import reduce
def andSearch(inverseIndex, query):
if set(query) < set(inverseIndex.keys()):
return reduce(lambda x, y: x & y, [(inverseIndex[key]) for key in query])
else:
return False # or what ever is meaningful to return
I've read about LSH hashing and am wondering what is the best implementation to match strings within 1 character?
test = {'dog':1, 'cat': 2, 'eagle': 3}
test['dog']
>> 1
I would want to also return 1 if I lookup test['dogs'] or test['dogg']. I realize that it would also return 1 if I were to look up "log" or "cog", but I can write a method to exclude those results.
Also how can I further this method for general strings to return a match within X characters?
string1 = "brown dogs"
string2 = "brown doggie"
Assuming only string1 is stored in my dictionary, a lookup for string2 would return string1.
Thanks
Well, you can define the similarity between 2 strings by the length of the start they share in common (3 for doga and dogs, for instance). This is simplistic, but that could fit your needs.
With this assumption, you can define this:
>>> test = {'dog':1, 'cat': 2, 'eagle': 3}
>>> def same_start(s1, s2):
ret = 0
for i in range(min(len(s1), len(s2))):
if s1[i] != s2[i]:
break
ret += 1
return ret
>>> def closest_match(s):
return max(((k, v, same_start(k, s)) for k, v in test.iteritems()), key=lambda x: x[2])[1]
>>> closest_match('dogs') # matches dog
1
>>> closest_match('cogs') # matches cat
2
>>> closest_match('eaogs') # matches eagle
3
>>>
Maybe you could try using a Soundex function as your dictionary key?
Since your relation is not 1:1, maybe you could define your own dict type with redefined __getitem__ which could return a list of possible items. Here's what I mean:
class MyDict(dict):
def __getitem__(self, key):
l = []
for k, v in self.items():
if key.startswith(k): # or some other comparation method
l.append(v)
return l
This is just an idea, probably other dict methods should be redefined too in order to avoid possible errors or infinite loops. Also, #Emmanuel's answer could be very useful here if you want only one item returned instead of the list, and that way you wouldn't have to redefine everything.
In my application I am receiving a string 'abc[0]=123'
I want to convert this string to an array of items. I have tried eval() it didnt work for me. I know the array name abc but the number of items will be different in each time.
I can split the string, get array index and do. But I would like to know if there is any direct way to convert this string as an array insert.
I would greately appreciate any suggestion.
are you looking for something like
In [36]: s = "abc[0]=123"
In [37]: vars()[s[:3]] = []
In [38]: vars()[s[:3]].append(eval(s[s.find('=') + 1:]))
In [39]: abc
Out[39]: [123]
But this is not a good way to create a variable
Here's a function for parsing urls according to php rules (i.e. using square brackets to create arrays or nested structures):
import urlparse, re
def parse_qs_as_php(qs):
def sint(x):
try:
return int(x)
except ValueError:
return x
def nested(rest, base, val):
curr, rest = base, re.findall(r'\[(.*?)\]', rest)
while rest:
curr = curr.setdefault(
sint(rest.pop(0) or len(curr)),
{} if rest else val)
return base
def dtol(d):
if not hasattr(d, 'items'):
return d
if sorted(d) == range(len(d)):
return [d[x] for x in range(len(d))]
return {k:dtol(v) for k, v in d.items()}
r = {}
for key, val in urlparse.parse_qsl(qs):
id, rest = re.match(r'^(\w+)(.*)$', key).groups()
r[id] = nested(rest, r.get(id, {}), val) if rest else val
return dtol(r)
Example:
qs = 'one=1&abc[0]=123&abc[1]=345&foo[bar][baz]=555'
print parse_qs_as_php(qs)
# {'abc': ['123', '345'], 'foo': {'bar': {'baz': '555'}}, 'one': '1'}
Your other application is doing it wrong. It should not be specifying index values in the parameter keys. The correct way to specify multiple values for a single key in a GET is to simply repeat the key:
http://my_url?abc=123&abc=456
The Python server side should correctly resolve this into a dictionary-like object: you don't say what framework you're running, but for instance Django uses a QueryDict which you can then access using request.GET.getlist('abc') which will return ['123', '456']. Other frameworks will be similar.