Dictionary with lists as values - find longest list - python

I have a dictionary where the values are lists. I need to find which key has the longest list as value, after removing the duplicates. If i just find the longest list this won't work as there may be a lot of duplicates. I have tried several things, but nothing is remotely close to being correct.

d = # your dictionary of lists
max_key = max(d, key= lambda x: len(set(d[x])))
# here's the short version. I'll explain....
max( # the function that grabs the biggest value
d, # this is the dictionary, it iterates through and grabs each key...
key = # this overrides the default behavior of max
lambda x: # defines a lambda to handle new behavior for max
len( # the length of...
set( # the set containing (sets have no duplicates)
d[x] # the list defined by key `x`
)
)
)
Since the code for max iterates through the dictionaries' keys (that's what a dictionary iterates through, by the by. for x in dict: print x will print each key in dict) it will return the key that it finds to have the highest result when it applies the function we built (that's what the lambda does) for key=. You could literally do ANYTHING here, that's the beauty of it. However, if you wanted the key AND the value, you might be able to do something like this....
d = # your dictionary
max_key, max_value = max(d.items(), key = lambda k,v: len(set(v)))
# THIS DOESN'T WORK, SEE MY NOTE AT BOTTOM
This differs because instead of passing d, which is a dictionary, we pass d.items(), which is a list of tuples built from d's keys and values. As example:
d = {"foo":"bar", "spam":['green','eggs','and','ham']}
print(d.items())
# [ ("foo", "bar"),
# ("spam", ["green","eggs","and","ham"])]
We're not looking at a dictionary anymore, but all the data is still there! It makes it easier to deal with using the unpack statement I used: max_key, max_value =. This works the same way as if you did WIDTH, HEIGHT = 1024, 768. max still works as usual, it iterates through the new list we built with d.items() and passes those values to its key function (the lambda k,v: len(set(v))). You'll also notice we don't have to do len(set(d[k])) but instead are operating directly on v, that's because d.items() has already created the d[k] value, and using lambda k,v is using that same unpack statement to assign the key to k and the value to v.
Magic! Magic that doesn't work, apparently. I didn't dig deep enough here, and lambdas cannot, in fact, unpack values on their own. Instead, do:
max_key, max_value = max(d.items(), key = lambda x: len(set(x[1])))

for less advanced user this can be a solution:
longest = max(len(item) for item in your_dict.values())
result = [item for item in your_dict.values() if len(item) == longest]

Related

How to get a subset from an OrderedDict?

I have an OrderedDict in Python, and I only want to get the first key-vale pairs. How to get it? For example, to get the first 4 elements, i did the following:
subdict = {}
for index, pair in enumerate(my_ordered_dict.items()):
if index < 4:
subdict[pair[0]] = pair[1]
Is this the good way to do it?
That approach involves running over the whole dictionary even though you only need the first four elements, checking the index over and over, and manually unpacking the pairs, and manually performing index checking unnecessarily.
Making it short-circuit is easy:
subdict = {}
for index, pair in enumerate(my_ordered_dict.items()):
if index >= 4:
break # Ends the loop without iterating all of my_ordered_dict
subdict[pair[0]] = pair[1]
and you can nested the unpacking to get nicer names:
subdict = {}
# Inner parentheses mandatory for nested unpacking
for index, (key, val) in enumerate(my_ordered_dict.items()):
if index >= 4:
break # Ends the loop
subdict[key] = value
but you can improve on that with itertools.islice to remove the manual index checking:
from itertools import islice # At top of file
subdict = {}
# islice lazily produces the first four pairs then stops for you
for key, val in islice(my_ordered_dict.items(), 4):
subdict[key] = value
at which point you can actually one-line the whole thing (because now you have an iterable of exactly the four pairs you want, and the dict constructor accepts an iterable of pairs):
subdict = dict(islice(my_ordered_dict.items(), 4))
You can use a map function, like this
item = dict(map(lambda x: (x, subdict[x]),[*subdict][:4]))
Here is one approach:
sub_dict = dict(pair for i, pair in zip(range(4), my_ordered_dict.items()))
The length of zip(a,b) is equal to the length of the shortest of a and b, so if my_ordered_dict.items() is longer than 4, zip(range(4), my_ordered_dict.items() just takes the first 4 items. These key-value pairs are passed to the dict builtin to make a new dict.

Can Python sorting with a key pass me the ordinal position of the item in the original list, instead of the item itself

I have a list of items I need to sort but the the value I want used in the sort for each item is not in the list itself. The sorting information is in another list, which positionally aligns with the first one.
Ie., l = list of items to sort, v = list of values to use for sorting. When sorting l[0], the value in v[0] should be used.
So during the sort, I need python to tell me the ordinal position of the item being sorted, instead of giving the item itself.
So effectively what I would do is this:
l = sort(key = lambda index_of_item: v[index_of_item])
By default I think that this would not work as python is invoking that lambda with an actual item from l, not the item's position. Is there a way to have python give me the position instead?
(If there were some identifier in each item being sorted I could use that myself inside the lambda to extrapolate the index_of_item, but sadly there isn't)
Convert your list of items to a list of tuples that includes the original index; this can be done using enumerate(). Then you can use that index to access the other list.
augumented_list = list(enumerate(l))
augmented_list.sort(key = lambda item: v[item[0]])
result = [x[1] for x in augmented_list]
Another option is to use zip() to combine the elements of both lists, then use the value from the other list as the sort key.
result = [x[0] for x in sorted(zip(l, v), key = lambda x: x[1])]

Parse a list, check if it has elements from another list and print out these elements

I have a list populated from entries of a log; for sake of simplicity, something like
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"......]
This list can have an undefined number of entry, which may or may not be in sequence, since I run multiple operations in async fashion.
Then I have another list, which I use as reference to get only the list of entries; which may be like
list_template = ["entry1", "entry2", "entry3"]
I am trying to use the second list, to get sequences of entries, so I can isolate the single sequence, taking only the first instance found of each entry.
Since I am not dealing with numbers, I can't use set, so I did try with a loop inside a loop, comparing values in each list
This does not work, because it is possible that another entry may happen before what I am looking for (say, I want entry1, entry2, entry3, and the loop find entry1, but then find entry3, and since I compare every element of each list, it will be happy to find an element)
for item in listlog:
entry, value = item.split(":")
for reference_entry in list_template:
if entry == reference_entry:
print item
break
I have to, in a nutshell, find a sequence as in the template list, while these items are not necessarily in order. I am trying to parse the list once, otherwise I could do a very expensive multi-pass for each element of the template list, until I find the first occurrence and bail out. I thought that doing the loop in the loop is more efficient, since my reference list is always smaller than the log list, which is usually few elements.
How would you approach this problem, in the most efficient and pythonic way? All that I can think of, is multiple passes on the log list
you can use dict:
>>> listlog
['entry1:abcde', 'entry2:abbds', 'entry1:eorieo', 'entry3:orieqor', 'entry2:iroewiow']
>>> list_template
['entry1', 'entry2', 'entry3']
>>> for x in listlog:
... key, value = x.split(":")
... if key not in my_dict and key in list_template:
... my_dict[key] = value
...
>>> my_dict
{'entry2': 'abbds', 'entry3': 'orieqor', 'entry1': 'abcde'}
Disclaimer : This answer could use someone's insight on performance. Sure, list/dict comprehensions and zip are pythonic but the following may very well be a poor use of those tools.
You could use zip :
>>> data = ["a:12", "b:32", "c:54"]
>>> ref = ['c', 'b']
>>> matches = zip(ref, [val for key,val in [item.split(':') for item in data] if key in ref])
>>> for k, v in matches:
>>> print("{}:{}".format(k, v))
c:32
b:54
Here's another (worse? I'm not sure, performance-wise) way to get around this :
>>> data = ["a:12", "b:32", "c:54"]
>>> data_dict = {x:y for x,y in [item.split(':') for item in data]}
>>> ["{}:{}".format(key, val) for key,val in md.items() if key in ref]
['b:32', 'c:54']
Explanation :
Convert your initial list into a dict using a dict
For each pair of (key, val) found in the dict, join both in a string if the key is found in the 'ref' list
You can use a list comprehension something like this:
import re
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"]
print([item for item in listlog if re.search('entry', item)])
# ['entry1:abcde', 'entry2:abbds', 'entry1:eorieo', 'entry3:orieqor', 'entry2:iroewiow']
Than u can split 'em as u wish and create a dictonary if u want:
import re
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"]
mylist = [item for item in listlog if re.search('entry', item)]
def create_dict(string, dict_splitter=':'):
_dict = {}
temp = string.split(dict_splitter)
key = temp[0]
value = temp[1]
_dict[key] = value
return _dict
mydictionary = {}
for x in mylist:
x = str(x)
mydictionary.update(create_dict(x))
for k, v in mydictionary.items():
print(k, v)
# entry1 eorieo
# entry2 iroewiow
# entry3 orieqor
As you see this method need an update, cause we have changing the dictionary value. That's bad. Most better to update value for the same key. But it's much easier as u can think

Python: checking if an item is in a dictionary. Key OR value

I'm trying to check if a dictionary has a certain value in its keys as well as its values with just one command instead of having to OR two searches. I.e.
'b' in d.keys()
'b' in d.keys() or 'b' in d.values()
Searching the internet with these terms has returned nothing but instructions on how to do search just keys or just values.
d = {'a':'b'}
d.items() ## this is the closest thing I could find to all the items
'b' in d.items() ## but this returns false
There's no single function to do this. You'll have to either do (without keys() is faster):
'b' in d or 'b' in d.values()
or some kind of loop over items:
for i in d.items():
if 'b' in i:
return True
return False
or:
any(('b' in i) for i in d.items())
PS. It also points at a bad design. Dictionaries are cool for key lookups, because they're fast at that. If you check both keys and values, you're just looking through all the stored items anyway. (and it shows you're not even sure which side you're looking at) I'd suggest checking if maybe some combination of sets and dicts is better suited for what you want to do.
The problem with d.items() is that it returns a list of key/value pairs that are represented as tuples so 'b' will not be in [('a', 'b')].
all_items = d.keys() + d.values()
'b' in all_items

How to print the longest dictionary value?

I have a dictionary whose values are all lists of strings and I want to print the longest value(in this case, the list with the most strings). I created this for loop:
count=0
for values in d.values():
if len(values)>count:
count=len(values)
print(values)
However, this prints all of the values ever associated with 'count'. I only want the largest one (which is the last line). This is an example of what the for loop prints:
['gul', 'lug']
['tawsed', 'wadset', 'wasted']
['lameness', 'maleness', 'maneless', 'nameless', 'salesmen']
['pores', 'poser', 'prose', 'repos', 'ropes', 'spore']
['arrest', 'rarest', 'raster', 'raters', 'starer', 'tarres', 'terras']
['carets', 'cartes', 'caster', 'caters', 'crates', 'reacts', 'recast', 'traces']
['estrin', 'inerts', 'insert', 'inters', 'niters', 'nitres', 'sinter', 'triens', 'trines']
['least', 'setal', 'slate', 'stale', 'steal', 'stela', 'taels', 'tales', 'teals', 'tesla']
['apers', 'apres', 'asper', 'pares', 'parse', 'pears', 'prase', 'presa', 'rapes', 'reaps', 'spare', 'spear']
How can I get it to print only that last(longest) line?
max(d.values(), key=len)
This prints out the longest list of words from your dict values. I used the max() function, which takes a key parameter that is assigned to the function len(in this case). Basically, the criteria for which value is the 'max' value is now decided by it's len.
Inspired by Dilbert there is a chance for simplification, no need to use lambda to define function for comparing values, we may take advantage of the fact, that len returns length of each item and this is perfect key for deciding who comes first and who is last:
print sorted(d.values(), key=len)[-1]
count = 0
for values in d.values():
if len(values) > count:
values_with_largest_count_first_hit = values
print(values_with_largest_count_first_hit)
print sorted(d.values(), cmp = lambda x, y: cmp(len(x), len(y)))[-1]

Categories