Extending python dictionary and changing key values - python

Assume I have a python dictionary with 2 keys.
dic = {0:'Hi!', 1:'Hello!'}
What I want to do is to extend this dictionary by duplicating itself, but change the key value.
For example, if I have a code
dic = {0:'Hi!', 1:'Hello'}
multiplier = 3
def DictionaryExtend(number_of_multiplier, dictionary):
"Function code"
then the result should look like
>>> DictionaryExtend(multiplier, dic)
>>> dic
>>> dic = {0:'Hi!', 1:'Hello', 2:'Hi!', 3:'Hello', 4:'Hi!', 5:'Hello'}
In this case, I changed the key values by adding the multipler at each duplication step. What's the efficient way of doing this?
Plus, I'm also planning to do the same job for list variable. I mean, extend a list by duplicating itself and change some values like above exmple. Any suggestion for this would be helpful, too!

You can try itertools to repeat the values and OrderedDict to maintain input order.
import itertools as it
import collections as ct
def extend_dict(multiplier, dict_):
"""Return a dictionary of repeated values."""
return dict(enumerate(it.chain(*it.repeat(dict_.values(), multiplier))))
d = ct.OrderedDict({0:'Hi!', 1:'Hello!'})
multiplier = 3
extend_dict(multiplier, d)
# {0: 'Hi!', 1: 'Hello!', 2: 'Hi!', 3: 'Hello!', 4: 'Hi!', 5: 'Hello!'}
Regarding handling other collection types, it is not clear what output is desired, but the following modification reproduces the latter and works for lists as well:
def extend_collection(multiplier, iterable):
"""Return a collection of repeated values."""
repeat_values = lambda x: it.chain(*it.repeat(x, multiplier))
try:
iterable = iterable.values()
except AttributeError:
result = list(repeat_values(iterable))
else:
result = dict(enumerate(repeat_values(iterable)))
return result
lst = ['Hi!', 'Hello!']
multiplier = 3
extend_collection(multiplier, lst)
# ['Hi!', 'Hello!', 'Hi!', 'Hello!', 'Hi!', 'Hello!']

It's not immediately clear why you might want to do this. If the keys are always consecutive integers then you probably just want a list.
Anyway, here's a snippet:
def dictExtender(multiplier, d):
return dict(zip(range(multiplier * len(d)), list(d.values()) * multiplier))

I don't think you need to use inheritance to achieve that. It's also unclear what the keys should be in the resulting dictionary.
If the keys are always consecutive integers, then why not use a list?
origin = ['Hi', 'Hello']
extended = origin * 3
extended
>> ['Hi', 'Hello', 'Hi', 'Hello', 'Hi', 'Hello']
extended[4]
>> 'Hi'
If you want to perform a different operation with the keys, then simply:
mult_key = lambda key: [key,key+2,key+4] # just an example, this can be any custom implementation but beware of duplicate keys
dic = {0:'Hi', 1:'Hello'}
extended = { mkey:dic[key] for key in dic for mkey in mult_key(key) }
extended
>> {0:'Hi', 1:'Hello', 2:'Hi', 3:'Hello', 4:'Hi', 5:'Hello'}

You don't need to extend anything, you need to pick a better input format or a more appropriate type.
As others have mentioned, you need a list, not an extended dict or OrderedDict. Here's an example with lines.txt:
1:Hello!
0: Hi.
2: pylang
And here's a way to parse the lines in the correct order:
def extract_number_and_text(line):
number, text = line.split(':')
return (int(number), text.strip())
with open('lines.txt') as f:
lines = f.readlines()
data = [extract_number_and_text(line) for line in lines]
print(data)
# [(1, 'Hello!'), (0, 'Hi.'), (2, 'pylang')]
sorted_text = [text for i,text in sorted(data)]
print(sorted_text)
# ['Hi.', 'Hello!', 'pylang']
print(sorted_text * 2)
# ['Hi.', 'Hello!', 'pylang', 'Hi.', 'Hello!', 'pylang']
print(list(enumerate(sorted_text * 2)))
# [(0, 'Hi.'), (1, 'Hello!'), (2, 'pylang'), (3, 'Hi.'), (4, 'Hello!'), (5, 'pylang')]

Related

Handle dictionary collision in python3

I currently have the code below working fine:
Can someone help me solve the collision created from having two keys with the same number in the dictionary?
I tried multiple approach (not listed here) to try create an array to handle it but my approaches are still unsuccessful.
I am using #python3.7
def find_key(dic1, n):
'''
Return the key '3' from the dict
below.
'''
d = {}
for x, y in dic1.items():
# swap keys and values
# and update the result to 'd'
d[y] = x
try:
if n in d:
return d[y]
except Exception as e:
return (e)
dic1 = {'james':2,'david':3}
# Case to test that return ‘collision’
# comment 'dic1' above and replace it by
# dic1 below to create a 'collision'
# dic1 = {'james':2,'david':3, 'sandra':3}
n = 3
print(find_key(dic1,n))
Any help would be much appreciated.
You know there should be multiple returns, so plan for that in advance.
def find_keys_for_value(d, value):
for k, v in d.items():
if v == value:
yield k
data = {'james': 2, 'david': 3, 'sandra':3}
for result in find_keys_for_value(data, 3):
print (result)
You can use a defaultdict:
from collections import defaultdict
def find_key(dct, n):
dd = defaultdict(list)
for x, y in dct.items():
dd[y].append(x)
return dd[n]
dic1 = {'james':2, 'david':3, 'sandra':3}
print(find_key(dic1, 3))
print(find_key(dic1, 2))
print(find_key(dic1, 1))
Output:
['david', 'sandra']
['james']
[]
Building a defaultdict from all keys and values is only justified if you will repeatedly search for keys of the same dict given different values, though. Otherwise, the approach of Kenny Ostrom is preferrable. In any case, the above makes little sense if left as it stands.
If you are not at ease with generators and yield, here is the approach of Kenny Ostrom translated to lists (less efficient than generators, better than the above for one-shot searches):
def find_key(dct, n):
return [x for x, y in dct.items() if y == n]
The output is the same as above.

How to structure dictionary to apply to function with enumerate

I am trying to re-build a simple function, that ask for a dictionary as an input. No matter what I try I cannot figure out a minimum working example of a dictionary to pass through this function. I've read upon dictionaries and there is not so much room to create it differently, hence I do not know what the problem is.
I've tried to apply following minimum dictionary examples:
import nltk
#Different dictionaries to try as minimum working examples:
comments1 = {1 : 'Rockies', 2: 'Red Sox'}
comments2 = {'key1' : 'Rockies', 'key2': 'Red Sox'}
comments3 = dict([(1, 3), (2, 3)])
#Function:
def tokenize_body(comments):
tokens = {}
for idx, com_id in enumerate(comments):
body = comments[com_id]['body']
tokenized = [x.lower() for x in nltk.word_tokenize(body)]
tokens[com_id] = tokenized
return tokens
tokens = tokenize_body(comments1)
I know that with enumerate I am basically calling the index and the key, I can not figure out how to call the 'body', i.e the strings that I want to tokenize.
For both comments1 and comments2 with strings as inputs I receive the error: TypeError: string indices must be integers.
If I apply integers instead of strings, comments3, I receive the error:
TypeError: 'int' object is not subscriptable.
This may seem trivial to you, but I can not figure out what I am doing wrong. If you could provide a minimum working example, that would be highly appreciated.
In order to loop through a dictionary in python, you need to use the items method to get both keys and values:
comments = {"key1": "word", "key2": "word2"}
def tokenize_body(comments):
tokens = {}
for key, value in comments.items():
# values - word, word2
# keys - key1, key2
tokens[key] = [x.lower() for x in nltk.word_tokenize(value)]
return tokens
enumerate is used for lists, in order to get the index of an element:
l = ['a', 'b']
for index, elm in enumerate(l):
print(index) # => 0, 1
You might be looking for .items(), e.g.:
for idx, item in enumerate(comments1.items()):
print(idx, item)
This will print
0 (1, 'Rockies')
1 (2, 'Red Sox')
See a demo on ideone.com.

Keeping the order of an OrderedDict

I have an OrderedDict that I'm passing to a function. Somewhere in the function it changes the ordering, though I'm not sure why and am trying to debug it. Here is an example of the function and the function and output:
def unnest_data(data):
path_prefix = ''
UNNESTED = OrderedDict()
list_of_subdata = [(data, ''),] # (data, prefix)
while list_of_subdata:
for subdata, path_prefix in list_of_subdata:
for key, value in subdata.items():
path = (path_prefix + '.' + key).lstrip('.').replace('.[', '[')
if not (isinstance(value, (list, dict))):
UNNESTED[path] = value
elif isinstance(value, dict):
list_of_subdata.append((value, path))
elif isinstance(value, list):
list_of_subdata.extend([(_, path) for _ in value])
list_of_subdata.remove((subdata, path_prefix))
if not list_of_subdata: break
return UNNESTED
Then, if I call it:
from collections import OrderedDict
data = OrderedDict([('Item', OrderedDict([('[#ID]', '288917'), ('Main', OrderedDict([('Platform', 'iTunes'), ('PlatformID', '353736518')])), ('Genres', OrderedDict([('Genre', [OrderedDict([('[#FacebookID]', '6003161475030'), ('Value', 'Comedy')]), OrderedDict([('[#FacebookID]', '6003172932634'), ('Value', 'TV-Show')])])]))]))])
unnest_data(data)
I get an OrderedDict that doesn't match the ordering of my original one:
OrderedDict([('Item[#ID]', '288917'), ('Item.Genres.Genre[#FacebookID]', ['6003172932634', '6003161475030']), ('Item.Genres.Genre.Value', ['TV-Show', 'Comedy']), ('Item.Main.Platform', 'iTunes'), ('Item.Main.PlatformID', '353736518')])
Notice how it has "Genre" before "PlatformID", which is not the way it was sorted in the original dict. What seems to be my error here and how would I fix it?
It’s hard to say exactly what’s wrong without a complete working example. But based on the code you’ve shown, I suspect your problem isn’t with OrderedDict at all, but rather that you’re modifying list_of_subdata while iterating through it, which will result in items being unexpectedly skipped.
>>> a = [1, 2, 3, 4, 5, 6, 7]
>>> for x in a:
... print(x)
... a.remove(x)
...
1
3
5
7
Given your use, consider a deque instead of a list.

How to reduce on a list of tuples in python

I have an array and I want to count the occurrence of each item in the array.
I have managed to use a map function to produce a list of tuples.
def mapper(a):
return (a, 1)
r = list(map(lambda a: mapper(a), arr));
//output example:
//(11817685, 1), (2014036792, 1), (2014047115, 1), (11817685, 1)
I'm expecting the reduce function can help me to group counts by the first number (id) in each tuple. For example:
(11817685, 2), (2014036792, 1), (2014047115, 1)
I tried
cnt = reduce(lambda a, b: a + b, r);
and some other ways but they all don't do the trick.
NOTE
Thanks for all the advice on other ways to solve the problems, but I'm just learning Python and how to implement a map-reduce here, and I have simplified my real business problem a lot to make it easy to understand, so please kindly show me a correct way of doing map-reduce.
You could use Counter:
from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
counter = Counter(arr)
print zip(counter.keys(), counter.values())
EDIT:
As pointed by #ShadowRanger Counter has items() method:
from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
print Counter(arr).items()
Instead of using any external module you can use some logic and do it without any module:
track={}
if intr not in track:
track[intr]=1
else:
track[intr]+=1
Example code :
For these types of list problems there is a pattern :
So suppose you have a list :
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
And you want to convert this to a dict as the first element of the tuple as key and second element of the tuple. something like :
{2008: [9], 2006: [5], 2007: [4]}
But there is a catch you also want that those keys which have different values but keys are same like (2006,1) and (2006,5) keys are same but values are different. you want that those values append with only one key so expected output :
{2008: [9], 2006: [1, 5], 2007: [4]}
for this type of problem we do something like this:
first create a new dict then we follow this pattern:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
So we first check if key is in new dict and if it already then add the value of duplicate key to its value:
full code:
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
new_dict={}
for item in a:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
print(new_dict)
output:
{2008: [9], 2006: [1, 5], 2007: [4]}
After writing my answer to a different question, I remembered this post and thought it would be helpful to write a similar answer here.
Here is a way to use reduce on your list to get the desired output.
arr = [11817685, 2014036792, 2014047115, 11817685]
def mapper(a):
return (a, 1)
def reducer(x, y):
if isinstance(x, dict):
ykey, yval = y
if ykey not in x:
x[ykey] = yval
else:
x[ykey] += yval
return x
else:
xkey, xval = x
ykey, yval = y
a = {xkey: xval}
if ykey in a:
a[ykey] += yval
else:
a[ykey] = yval
return a
mapred = reduce(reducer, map(mapper, arr))
print mapred.items()
Which prints:
[(2014036792, 1), (2014047115, 1), (11817685, 2)]
Please see the linked answer for a more detailed explanation.
If all you need is cnt, then a dict would probably be better than a list of tuples here (if you need this format, just use dict.items).
The collections module has a useful data structure for this, a defaultdict.
from collections import defaultdict
cnt = defaultdict(int) # create a default dict where the default value is
# the result of calling int
for key in arr:
cnt[key] += 1 # if key is not in cnt, it will put in the default
# cnt_list = list(cnt.items())

Map list onto dictionary

Is there a way to map a list onto a dictionary? What I want to do is give it a function that will return the name of a key, and the value will be the original value. For example;
somefunction(lambda a: a[0], ["hello", "world"])
=> {"h":"hello", "w":"world"}
(This isn't a specific example that I want to do, I want a generic function like map() that can do this)
In Python 3 you can use this dictionary comprehension syntax:
def foo(somelist):
return {x[0]:x for x in somelist}
I don't think a standard function exists that does exactly that, but it's very easy to construct one using the dict builtin and a comprehension:
def somefunction(keyFunction, values):
return dict((keyFunction(v), v) for v in values)
print somefunction(lambda a: a[0], ["hello", "world"])
Output:
{'h': 'hello', 'w': 'world'}
But coming up with a good name for this function is more difficult than implementing it. I'll leave that as an exercise for the reader.
If I understand your question correctly, I believe you can accomplish this with a combination of map, zip, and the dict constructor:
def dictMap(f, xs) :
return dict(zip(map(f, xs), xs)
And a saner implementation :
def dictMap(f, xs) :
return dict((f(i), i) for i in xs)
Taking hints from other answers I achieved this using map operation. I am not sure if this exactly answers your question.
mylist = ["hello", "world"]
def convert_to_dict( somelist ):
return dict( map( lambda x: (x[0], x), somelist ) )
final_ans = convert_to_dict( mylist )
print final_ans
If you want a general function to do this, then you're asking almost the right question. Your example doesn't specify what happens when the key function produces duplicates, though. Do you keep the last one? The first one? Do you actually want to make a list of all the words that start with the same letter? These questions are probably best answered by the user of the function, not the designer.
Parametrizing over these results in a more complicated, but very general, function. Here's one that I've used for several years:
def reduce_list(key, update_value, default_value, l):
"""Reduce a list to a dict.
key :: list_item -> dict_key
update_value :: key * existing_value -> updated_value
default_value :: initial value passed to update_value
l :: The list
default_value comes before l. This is different from functools.reduce,
because functools.reduce's order is wrong.
"""
d = {}
for k in l:
j = key(k)
d[j] = update_value(k, d.get(j, default_value))
return d
Then you can write your function by saying:
reduce_list(lambda s:s, lambda s,old:s[0], '', ['hello', 'world'])
# OR
reduce_list(lambda s:s, lambda s,old: old or s[0], '', ['hello', 'world'])
Depending on whether you want to keep the first or last word starting with, for example, 'h'.
This function is very general, though, so most of the time it's the basis for other functions, like group_dict or histogram:
def group_dict(l):
return reduce_list(lambda x:x, lambda x,old: [x] + old, [], l)
def histogram(l):
return reduce_list(lambda x:x, lambda x,total: total + 1, 0, l)
>>> dict((a[0], a) for a in "hello world".split())
{'h': 'hello', 'w': 'world'}
If you want to use a function instead of subscripting, use operator.itemgetter:
>>> from operator import itemgetter
>>> first = itemgetter(0)
>>> dict((first(x), x) for x in "hello world".split())
{'h': 'hello', 'w': 'world'}
Or as a function:
>>> dpair = lambda x : (first(x), x)
>>> dict(dpair(x) for x in "hello world".split())
{'h': 'hello', 'w': 'world'}
Finally, if you want more than one word per letter as a possibility, use collections.defaultdict
>>> from collections import defaultdict
>>> words = defaultdict(set)
>>> addword = lambda x : words[first(x)].add(x)
>>> for word in "hello house home hum world wry wraught".split():
addword(word)
>>> print words['h']
set(['house', 'hello', 'hum', 'home'])

Categories