I have the following lists:
keys = ['god', 'hel', 'helo']
values = ['good','god', 'hell', 'hello']
I want to create a dictionary like this:
{'god':set(['god', 'good']), 'hel':'hell', 'helo': 'hello'}
where the key is determined by reducing repeated letters in the value to a single letter.
How would I do this programmatically?
"all repeated letters are reduced to single letters"
Actually according to this rule you don't need the keys list, because it will be created from the values.
Also I would suggest to use a dict of sets for all values, also for the single ones, such as "hell" and "hello". It will make the usage of the dictionary much simpler:
import itertools as it
values = ['good','god', 'hell', 'hello']
d = {}
for value in values:
d.setdefault(''.join(k for k,v in it.groupby(value)), set()).add(value)
# d == {'god': set(['god', 'good']),
# 'hel': set(['hell']),
# 'helo': set(['hello'])}
This should do it for you:
import re
import collections
values = ['good', 'god', 'hell', 'hello']
result = collections.defaultdict(set)
for value in values:
key = re.sub(r'(\w)\1*', r'\1', value)
result[key].add(value)
# result: defaultdict(<type 'set'>, {'hel': set(['hell']), 'god': set(['god', 'good']), 'helo': set(['hello'])})
# if you want to ensure that all your keys exist in the dictionary
keys = ['god', 'hel', 'helo', 'bob']
for key in keys:
result[key]
# result: defaultdict(<type 'set'>, {'hel': set(['hell']), 'god': set(['god', 'good']), 'helo': set(['hello']), 'bob': set([])})
Some code golf (sort of - obviously more obfuscation is possible) upon eumiro's answer, observing that itertools.groupby can be used twice (once to get the letter-sets in order of appearance, something I didn't think of - and again to actually create the key-value pairs for the dictionary).
from itertools import groupby
data = ['good', 'god', 'hell', 'hello']
dict((''.join(k), list(v)) for k, v in groupby(data, lambda x: zip(*groupby(x))[0]))
How it works: each word is first processed with lambda x: zip(*groupby(x))[0]. That is, we take the list of (letter, grouper-object) pairs produced by the groupby generator, transform it into a pair (list-of-letters, list-of-grouper-objects) (the generator contents are implicitly evaluated for passing to zip), and discard the list-of-grouper-objects which we don't want. Then, we group the entire word-list according to the list-of-letters produced by each word, transform the list of letters back into a string, evaluate the grouper-object generators to get the corresponding words, and use those key-value pairs to construct the final dict.
Edit: I guess it's cleaner to do the ''.join step within the lambda:
from itertools import groupby
data = ['good', 'god', 'hell', 'hello']
dict((k, list(v)) for k, v in groupby(data, lambda x: ''.join(zip(*groupby(x))[0])))
Related
I want to sort the following word pool according to occurrence of their 3-letter suffix, from most frequent to least frequent:
wordPool = ['beat','neat','food','good','mood','wood','bike','like','mike']
Expected output:
['food','good','mood','wood','bike','like','mike','beat','neat']
For simplicity, only 4-letter-words are in the pool and the suffix is always 3-letter ones.
(Note: If the counts are the same, then order can be arbitrary.)
You can use collections.Counter() to get the frequency of the suffixes, and then use sort() with a key parameter to sort by the generated frequencies:
from collections import Counter
suffix_counters = Counter(s[-3:] for s in wordPool)
wordPool.sort(key=lambda x: suffix_counters[x[-3:]], reverse=True)
print(wordPool)
This outputs:
['food', 'good', 'mood', 'wood', 'bike', 'like', 'mike', 'beat', 'neat']
Group by suffix using a dict of lists;
Sort the groups by decreasing order of size;
Join all the groups into a list.
def sorted_by_suffix_frequency(wordpool, n=3):
groups = {}
for w in wordpool:
groups.setdefault(w[-n:], []).append(w)
return [w for g in sorted(groups.values(), key=len, reverse=True) for w in g]
wordpool = ['beat','neat','food','good','mood','wood','bike','like','mike']
sorted_wordpool = sorted_by_suffix_frequency(wordpool)
print(sorted_wordpool)
# ['food', 'good', 'mood', 'wood', 'bike', 'like', 'mike', 'beat', 'neat']
Is it possible to combine all the keys which have same values inside a dictionary and swap keys with values and values with keys?
I am not sure if anything like this is even possible, but here is the example.
mydict = {'./three/failures/1.log': ['UVM_ERROR: This is one error'], './one/failures/1.log': ['UVM_ERROR: This is one error'], './two/failures/1.log': ['UVM_ERROR: This is two error']}
Expected output:
{'UVM_ERROR: This is one error': ['./three/failures/1.log', ./one/failures/1.log'], 'UVM_ERROR: This is two error': ['./two/failures/1.log']}
Little hint I found to find keys with same values:
>>> [k for k,v in a.items() if v == 'UVM_ERROR: This is one error']
['./one/failures/1.log', './three/failures/1.log']
Updated after trying one of the solutions:
If my dictionary doesn't have same values for any of the keys, then defaultdict doesn't work.
For example:
Dictionary : {'./three/failures/1.log': 'UVM_ERROR: This is three error', './one/failures/1.log': 'UVM_ERROR: This is one error', './two/failures/1.log': 'UVM_ERROR: This is two error'}
Output:
defaultdict(<type 'list'>, {'U': ['./three/failures/1.log', './one/failures/1.log', './two/failures/1.log']})
You can use defaultdict:
from collections import defaultdict
mydict = {'./three/failures/1.log': 'UVM_ERROR: This is one error', './one/failures/1.log': 'UVM_ERROR: This is one error', './two/failures/1.log': 'UVM_ERROR: This is two error'}
output = defaultdict(list)
for k, v in mydict.items():
output[v].append(k)
print output
Output:
defaultdict(<type 'list'>, {'UVM_ERROR: This is two error': ['./two/failures/1.log'], 'UVM_ERROR: This is one error':['./three/failures/1.log', './one/failures/1.log']})
defaultdict is derived from dict, so you can use it exactly how you can use dict. If you really wanted a pure dict, just do dict(output).
You can use itertools.groupby to group mydict items by values (doc):
mydict = {'./three/failures/1.log': ['UVM_ERROR: This is one error'], './one/failures/1.log': ['UVM_ERROR: This is one error'], './two/failures/1.log': ['UVM_ERROR: This is two error']}
from itertools import groupby
out = {}
for v, g in groupby(sorted(mydict.items(), key=lambda k: k[1]), lambda k: k[1]):
out[v[0]] = [i[0] for i in g]
print(out)
Prints:
{'UVM_ERROR: This is one error': ['./three/failures/1.log', './one/failures/1.log'],
'UVM_ERROR: This is two error': ['./two/failures/1.log']}
It's not too difficult. You can do it all in-line in fact!
d = {'a':1, 'b':2}
d0 = dict(zip(list(d.values()), list(d.keys())))
d0
{1: 'a', 2: 'b'}
I have a list1 with names:
["SAM","TOM","LOUIS"]
And I have a dict1 like this (where in the list of values there are no repeated names:
{"NICE": ["SAM", "MAIK", "CARL", "LAURA", "MARTH"],
"BAD": ["LOUIS", "TOM", "KEVIN"],
"GOOD": ["BILL", "JEN", "ALEX"]}
How could I iterate throught the list1 so that if any of the names appear in any of the lists of the dict1 it assigns the corresponding key of the dict?
I am looking forward to generate the following output:
["NICE","BAD","BAD"]
which would correspond to the keys of the values that appear in the list : SAM, TOM , LOUIS .
This is what I thought about:
lista=[]
for k,v in dict1:
for values in arr1:
if values in v:
lista.append(v)
lista
However not sure how to iterate over the different v, how can I get the desired output in an efficient manner?
You can create an intermediate dict that maps names to their keys in dict1:
categories = {name: category for category, names in dict1.items() for name in names}
so that you can map the names in list1 to their respective keys efficiently with:
[categories[name] for name in list1]
which returns:
['NICE', 'BAD', 'BAD']
I think you need the items() function for dictionaries here. For each name, iterate over all of the dictionary pairs and stop when a match is found, adding the corresponding adjective to your list.
lista = []
for name in list1:
for adjective, names in dict1.items():
if name in names:
lista.append(adjective)
break
return lista
There are 2 ways to achieve your result.
The way you intended was to use dict1.items(). Unfortunately, that way is computationally slow, so for the sake of completeness I'll add the the more efficient way:
# First, convert the dict to a different representation:
from itertools import chain, repeat
# Change {k: [v1,v2], k2: [v3,v4]} to {v1: k, v2: k, v3: k2, v4: k2}
name_to_adjective = dict(chain.from_iterable(zip(v, repeat(k)) for k, v in a.items()))
Name_to_adjective is now equal to this:
{'ALEX': 'GOOD',
'BILL': 'GOOD',
'CARL': 'NICE',
'JEN': 'GOOD',
'KEVIN': 'BAD',
'LAURA': 'NICE',
'LOUIS': 'BAD',
'MAIK': 'NICE',
'MARTH': 'NICE',
'SAM': 'NICE',
'TOM': 'BAD'}
Then, you can get your result in one run:
result = [name_to_adjective[name] for name in list1]
I believe the following will give you your desired output:
L = ["SAM","TOM","LOUIS"]
D = {"NICE": ["SAM", "MAIK", "CARL", "LAURA", "MARTH"]
, "BAD": ["LOUIS", "TOM", "KEVIN"]
, "GOOD": ["BILL", "JEN", "ALEX"]}
lista = []
for key in D.keys():
for name in L:
if name in D[key]:
lista.append(key)
print(lista)
The D.keys() part gives you a list of the keys in a friendly manner (e.g, ["NICE", "BAD", "GOOD"]).
You iterate over this and then look for each name from L (in order) in the dictionary.
This is not the most efficient way to do this, however, it's a more straightforward approach.
I have one list of program names which need to be sorted into lists of smaller jsons based of a priority list. I need to do this in python 3.
B and C being of the same priority 2, they will be in a list together.
program_names = ['A','B','C','D']
priorities = [1,2,2,3]
Required end result:
[[{"name": "A"}], [{"name":"B"}, {"name":"C"}], [{"name":"D"}]]
Current code:
program_names_list = []
final_list = []
for x in program_names.split(','):
program_names_list.append(x)
for x in program_names_list:
final_list.append([{"name": x}])
That's what I currently have which is outputting the following result:
[[{'name': 'A'}], [{'name': 'B'}], [{'name': 'C'}], [{'name': 'D'}]]
I should add that program_names is a string "A,B,C,D"
Full solution
items = {}
for k, v in zip(priorities, program_names):
items.setdefault(k, []).append(v)
[[{'name': name} for name in items[key]] for key in sorted(items.keys())]
returns:
[[{'name': 'A'}], [{'name': 'B'}, {'name': 'C'}], [{'name': 'D'}]]
In steps
Create a dictionary that uses the priorities as keys and a list of all program names with corresponding priority as values:
items = {}
for k, v in zip(priorities, program_names):
items.setdefault(k, []).append(v)
Go through the sorted keys and create a new list of program names by getting them from the dictionary by the key:
[[{'name': name} for name in items[key]] for key in sorted(items.keys())]
Loop through the priorities and use a dictionary with priorities as keys and lists of programs as values to group all elements with the same priority.
In [24]: from collections import defaultdict
In [25]: program_names = ['A','B','C','D']
In [26]: priorities = [1,2,2,3]
In [27]: d = defaultdict(list)
In [28]: for i, p in enumerate(sorted(priorities)):
d[p].append({'name': program_names[i]})
....:
In [29]: list(d.values())
Out[29]: [[{'name': 'A'}], [{'name': 'B'}, {'name': 'C'}], [{'name': 'D'}]]
Use groupby.
from itertools import groupby
program_names = ['a','b','c','d']
priorities = [1,2,2,3]
data = zip(priorities, program_names)
groups_dict = []
for k, g in groupby(data, lambda x: x[0]):
m = map(lambda x: dict(name=x[1]), list(g))
groups_dict.append(m)
print(groups_dict)
Although this may be wrong from an educational point of view, I cannot resist answering such questions by one-liners:
[[{'name': p_n} for p_i, p_n in zip(priorities, program_names) if p_i == p] for p in sorted(set(priorities))]
(This assumes your "priorities" list may be sorted and is less efficient than the "normal" approach with a defaultdict(list)).
Update: Borrowing from damn_c-s answer, here's an efficient one-liner (not counting the implied from itertools import groupby):
[[{'name': pn} for pi, pn in l] for v, l in groupby(zip(priorities, program_names), lambda x: x[0])]
I have a list of lists, like so:
items = [['118', 'white'], ['118','Jack'], ['118','guilty'], ['200','black'], ['200','mark'], ['200','not guilty']]
Is there a way to use a for loop to grab the second value in each list and collapse it into a new list?
Like so:
['white', 'jack', 'guilty']
['black','mark','not guilty']
Assuming your list always has elements with the same key grouped as in your example, you can use itertools.groupby() to do this efficiently:
>>> import itertools
>>> items = [['118', 'white'], ['118','Jack'], ['118','guilty'], ['200','black'], ['200','mark'], ['200','not guilty']]
>>> [[x[1] for x in g] for k, g in itertools.groupby(items, lambda x: x[0])]
[['white', 'Jack', 'guilty'], ['black', 'mark', 'not guilty']]
You could also use operator.itemgetter(0) as an alternative to lambda x: x[0].
Note that if items does not necessarily have elements grouped by their keys, you can use sorted(items) instead of items in the groupby() call and it will work.
Here is a version that preserves the key as well:
>>> [(k, [x[1] for x in g]) for k, g in itertools.groupby(items, lambda x: x[0])]
[('118', ['white', 'Jack', 'guilty']), ('200', ['black', 'mark', 'not guilty'])]
You could pass this list directly into the dict() built-in function to convert this to a dictionary.
from collections import defaultdict
entries = defaultdict(list)
for (key, value) in items:
entries[key].append(value)
Now entries is a dict of lists of the second values. You can either get them by key ('118') or use values() for a list of lists.
>>> k = set(x[0] for x in items)
>>> [ [x[1] for x in items if x[0] == key] for key in sorted(k) ]
[['white', 'Jack', 'guilty'], ['black', 'mark', 'not guilty']]
output_dict = {}
for each_key in items:
for each_item in each_key:
try:
output_dict[each_key].append(each_item) #fails as I'm trying to use a list as a dict key
except Exception as e:
output_dict[each_key] = [] #see above
output_dict[each_key].append(each_item) #see above
for each_list in output_dict:
print(each_list)
as Peter DeGlopper pointed out below, this code is bad and I should feel bad. I've commented the code to point out my error. There are better solutions, but just to correct my mistake:
items = [['118', 'white'], ['118','Jack'], ['118','guilty'], ['200','black'], ['200','mark'], ['200','not guilty']]
output_dict = {}
for each_list in items:
if each_list[0] not in output_dict: output_dict[each_list[0]] = list()
output_dict[each_list[0]].append(each_list[1])
>>> for each_list in output_dict:
>>> print(each_list)
['white','Jack','guilty']
['black','mark','not guilty']