get the smallest items from a listOfLists group by keys - python

I've got a list like
listOfLists = [['key2', 1], ['key1', 2], ['key2', 2], ['key1', 1]]
The first item of an inner list is the key. The second item of an inner list is the value.
I want to get an output [['key1', 1], ['key2', 1]] which gives the list that its value is the smallest of the lists that has the same key and the output group by the key (my English is poor so just use the concept of Sql Syntax)
I've written some code like this:
listOfLists = [['key2', 1], ['key1', 2], ['key2', 2], ['key1', 1]]
listOfLists.sort() #this will sort by key, and then ascending by value
output = []
for index, l in enumerate(listOfLists):
if index == 0:
output.append(l)
if l[0] == listOfLists[index - 1][0]:
#has the same key, and the value is larger, discard
continue
else:
output.append(l)
this seems not smart enough
is there any simpler way to do this work?

How about using a dictionary (no need to sort the data)?
>>> listOfLists = [['key2', 1], ['key1', 2], ['key2', 2], ['key1', 1]]
>>> d = {}
>>> for k,v in listOfLists:
d.setdefault(k, []).append(v)
>>> d = {k:min(v) for k,v in d.items()}
>>> d
{'key2': 1, 'key1': 1}
You can convert to a list if you want

O(N log N) solution
You can just use the dict constructor for this. It is O(N log N) because of the sorting step
>>> dict(sorted(listOfLists, reverse=True))
{'key2': 1, 'key1': 1}
To see why this works, look at the result of sorted
>>> sorted(listOfLists, reverse=True)
[['key2', 2], ['key2', 1], ['key1', 2], ['key1', 1]]
The dict constructor will replace each key as it traverses the list and sorted has pushed the minimum for each key to the end of the sublist for that key
O(N) solution
>>> d = {}
>>> for k, v in listOfLists:
... d[k] = min(d.get(k, v), v)
...
>>> d
{'key2': 1, 'key1': 1}

The itertools module has a very useful groupby function that is probably exactly what you need:
from itertools import groupby
listOfLists.sort()
for key, subgroup in groupby(listOfLists, lambda item: item[0]):
print key, min(subgroup)

Related

Fastest way to match two list fields python

I have an issue with time in my latest python script.
In essence, i have two lists, e.g.
List1:
([a,1],[b,2])
List2:
([a,3],[b,4])
Now in the example above i have provided two entries in each list. However, in reality there is about 150,000.
In my current script I retrieve the first field from the first list [a] and loop through the entire List2 till there is a match. The two list entries are then appended.
The final result would be:
([a,1,3],[b,2,4])
However, given the size of the lists this is taking forever.
Is there a way i can use the field of list1 [a] and in constant time retrieve all entries in list2 that have [a]
I have seen some answers online suggesting sets, but i am unsure as to how to implement one and use it to solve the solution above.
Any help would be appreciated.
Further example:
l1=(['abc123','hi'], ['efg456','bye']) - l1 has around 2000 tuples
l2=(['abc123','letter'],['abc123','john'],['abc123','leaf']) - l2 has around 100,000+ tuples
Output:
l3=(['abc123','hi','letter'],['abc123','hi','john'],['abc123','hi','leaf'])
Not so hard, just use a dict for list1 and a for loop for list2.
dict1 = {key1: [value1] for key1, value1 in list1} # convert list1 to dict
# and the values should be converted to dict
for key2, value2 in list2:
try:
dict1[key2].append(value2)
except KeyError:
continue # I'm not sure what do you want to do if the keys in list2 didn't exist in list1, so just ignore them
list3 = tuple([key3, *value3] for key3, value3 in dict1.items())
print(list3)
If your a and b values are unique, you can convert the "lists" (what you have is actually a tuple of lists, not a list of lists) into dictionaries and then merge them. For example:
l1 = (['a', 1], ['b', 2], ['c', 5])
l2 = (['a', 3], ['b', 4])
d1 = { k : [v] for [k, v] in l1 }
d2 = { k : [v] for [k, v] in l2 }
for k in d1.keys():
d1[k] += d2.get(k, [])
print(d1)
Output:
{'a': [1, 3], 'b': [2, 4], 'c': [5]}
You can convert that dictionary back to a tuple of lists using a comprehension:
print(tuple([k, *v] for k, v in d1.items()))
Output:
(['a', 1, 3], ['b', 2, 4], ['c', 5])

Python: How to append list item to dictionary if item equals key?

I'm attempting to iterate through a list and if the list item equals a dictionary key, append the list item to the dictionary.
mylist = [1, 2, 3]
mydict = dict.fromkeys(mylist, [])
for item in mylist:
for key in mydict:
if key == item:
mydict[key].append(item)
print(mydict)
Output:
{1: [1, 2, 3], 2: [1, 2, 3], 3: [1, 2, 3]}
Required output:
{1: [1], 2: [2], 3: [3]}
Much thanks!
That's because here:
mydict = dict.fromkeys(mylist, [])
mydict's values will be the same object [], so when you append to mydict[something], you'll be appending to the same list, no matter what something is.
All values are the same object, you append three numbers to it => all values are shown as the same list.
To avoid this, assign new lists to each key:
mydict = {}
for item in mylist:
mydict.setdefault(item, []).append(item)
Or, you know:
mydict = {key: [key] for key in mylist}
by using:
mylist = [1, 2, 3]
mydict = dict.fromkeys(mylist, [])
you are creating a dict that has all the elements from mylist as keys and all the values from your dict are references to the same list, to fix you may use:
mydict = dict.fromkeys(mylist)
for item in mylist:
mydict[item] = [item]
print(mydict)
output:
{1: [1], 2: [2], 3: [3]}
same thing but in a more efficient and compact form by using a dictionary comprehension:
mydict = {item: [item] for item in mylist}
Is this what you wanted?
mylist = [1,2,3,3]
mydict = {}
for item in mylist:
if item not in mydict:
mydict[item] = [item]
else:
mydict[item].append(item)
print(mydict)
It will output: {1: [1], 2: [2], 3: [3, 3]}
mylist = [1, 2, 3]
mydict = dict.fromkeys(mylist, [])
for item in mylist:
for key in mydict:
if key == item:
mydict[key] = [item]
print(mydict)

How to update multiple dictionary values based on a condition?

I have a dictionary which looks like:
dict = {'A':[1,2], 'B':[0], 'c':[4]}
need it to look like:
dict = {'A':[1,2], 'B':[0,0], 'c':[4,0]}
What I am doing right now:
dict = {x: y+[0] for (x,y) in dict.items() if len(y) < 2}
which generates:
dict = {'B':[0,0], 'c':[4,0]}
any idea how I could avoid eliminating those who do not meet the condition?
You're almost there. Try:
my_dict = {x: y + [0] if len(y) < 2 else y
for (x,y) in dict.items()}
(as mentioned by jp_data_analysis, avoid naming variables after builtins like dict)
This is one way.
Note: do not name variables after classes, e.g. use d instead of dict.
d = {'A':[1,2], 'B':[0], 'c':[4]}
d = {k: v if len(v)==2 else v+[0] for k, v in d.items()}
# {'A': [1, 2], 'B': [0, 0], 'c': [4, 0]}
You can use dictionary comprehension:
d = {'A':[1,2], 'B':[0], 'c':[4]}
new_d = {a:b+[0] if len(b) == 1 else b for a, b in d.items()}
Also, it is best practice not to assign variables to names shadowing common builtins, such as dict, as you are then overriding the function in the current namespace.
Your code is almost correct. Your problem is that you're filtering out any lists bigger than 2. What you need to do instead is simply place them in the new dictionary unchanged. This can be done using the ternary operator. It has the form value1 if condition else value2.
Also, if you want a more general way to pad every list in your dictionary to
be of equal length, you can use map and max.
Here is your code with the above modifications:
>>> d = {'A':[1, 2], 'B': [0], 'c': [4]}
>>>
>>> max_len = max(map(len, d.values()))
>>> {k: v + [0] * (max_len - len(v)) if len(v) < max_len else v for k, v in d.items()}
{'A': [1, 2], 'B': [0, 0], 'c': [4, 0]}
>>>
A generalized way:
d = {'A':[1,2], 'B':[0], 'c':[4]}
m = max(len(v) for v in d.values())
for k, v in d.items():
if len(v) < m:
d[k].extend([0 for i in range(m-len(v))])
You were very close, just use update():
d = {'A':[1,2], 'B':[0], 'c':[4]}
d.update({x: y+[0] for (x,y) in d.items() if len(y) < 2})
d
# {'A': [1, 2], 'B': [0, 0], 'c': [4, 0]}
Like others have said, don't use reassign reserved names like dict, it's a one way street down to debugging hell.

Python filter defaultdict

I have a defaultdict of lists, but I want to basically do this:
myDefaultDict = filter(lambda k: len(k)>1, myDefaultDict)
Except it only seems to work with lists. What can I do?
Are you trying to get only values with len > 1?
Dictionary comprehensions are a good way to handle this:
reduced_d = {k: v for k, v in myDefaultDict.items() if len(v) > 1}
As martineau pointed out, this does not give you the same defaultdict functionality of the source myDefaultDict. You can use the dict comprehension on defaultdict instantiaion, as martineau shows to get the same defaultdict functionality.
from collections import defaultdict
myDefaultDict = defaultdict(list, {'ab': [1,2,3], 'c': [4], 'def': [5,6]}) # original
reduced_d = defaultdict(list, {k: v for k, v in myDefaultDict.items() if len(v) > 1})
I'm not sure whether you want to delete keys or values longer than 1.
Assuming it's the length of the key, here's how to do it with filter:
from collections import defaultdict
# create test data
my_defaultdict = defaultdict(list, {'ab': [1,2,3], 'c': [4], 'def': [5,6]})
my_defaultdict = defaultdict(my_defaultdict.default_factory,
filter(lambda i: len(i[0])>1, my_defaultdict.items()))
print(my_defaultdict)
Output:
defaultdict(<type 'list'>, {'ab': [1, 2, 3], 'def': [5, 6]})
If it's the length of the associated value, just change the len(i[0]) to len(i[1]).
You can use filter and then use map to remove each dict entry based on the filtered condition:
>>> example={'l1':[1], 'l2':[2,3], 'l3':[4,5,6], 'l4':[7]}
>>> filter(lambda k: len(example[k])<2, example)
['l4', 'l1']
>>> map(example.__delitem__, filter(lambda k: len(example[k])<2, example))
[None, None]
>>> example
{'l2': [2, 3], 'l3': [4, 5, 6]}

Python remove keys with the same value on a dictionary

I need to do a not "natural" operation on a dictionary so i wondering what is the best pythonic way to do this.
I need to simplify a dictionary by removing all the keys on it with the same value (keys are differents, values are the same)
For example:
Input:
dict = {key1 : [1,2,3], key2: [1,2,6], key3: [1,2,3]}
expected output:
{key1 : [1,2,3], key2:[1,2,6]}
I dont care about which key is delete (on the example: key1 or key3)
Exchange keys and values; duplicated key-value pairs will be removed as a side effect (because dictionary does not allow duplicated keys). Exchange keys and values again.
>>> d = {'key1': [1,2,3], 'key2': [1,2,6], 'key3': [1,2,3]}
>>> d2 = {tuple(v): k for k, v in d.items()} # exchange keys, values
>>> d = {v: list(k) for k, v in d2.items()} # exchange again
>>> d
{'key2': [1, 2, 6], 'key1': [1, 2, 3]}
NOTE: tuple(v) was used because list is not hashable; cannot be used as key directly.
BTW, don't use dict as a variable name. It will shadow builtin function/type dict.
This solution deletes the keys with same values without creating a new dictionary.
seen = set()
for key in mydict.keys():
value = tuple(mydict[key])
if value in seen:
del mydict[key]
else:
seen.add(value)
I think you can do it this way also. But I don't say as there seems to be more efficient ways. It is in-line.
for i in dictionary.keys():
if dictionary.values().count(dictionary[i]) > 1:
del dictionary[i]
You can iterate over your dict items and use a set to check what we have seen so far, deleting a key if we have already seen the value:
d = {"key1" : [1,2,3], "key2": [1,2,6], "key3": [1,2,3]}
seen = set()
for k, v in d.items(): # list(items) for python3
temp = tuple(v)
if temp in seen:
del d[k]
seen.add(temp)
print(d)
{'key1': [1, 2, 3], 'key2': [1, 2, 6]}
This will be more efficient that using creating a dict and reversing the values as you only have to cast to tuple once not from a tuple back to a list.
this worked for me:
seen = set()
for key in mydict.copy():
value = tuple(mydict[key])
if value in seen:
del mydict[key]
else:
seen.add(value)

Categories