Quickest way to dedupe list in dict [duplicate] - python

This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(31 answers)
Closed 7 years ago.
I have a dict containing lists and need a fast way to dedupe the lists.
I know how to dedupe a list in isolation using the set() function, but in this case I want a fast way of iterating through the dict, deduping each list on the way.
hello = {'test1':[2,3,4,2,2,5,6], 'test2':[5,5,8,4,3,3,8,9]}
I'd like it to appear like;
hello = {'test1':[2,3,4,5,6], 'test2':[5,8,4,3,9]}
Though I don't necessarily need to have the original order of the lists preserved.
I've tried using a set like this, but it's not quite correct (it's not iterating properly and I'm losing the first key)
for key, value in hello.items(): goodbye = {key: set(value)}
>>> goodbye
{'test2': set([8, 9, 3, 4, 5])}
EDIT: Following PM 2Ring's comment below, I'm now populating the dict differently to avoid duplicates in the first place. Previously I was using lists, but using sets prevents dupes to be appended by default;
>>> my_numbers = {}
>>> my_numbers['first'] = [1,2,2,2,6,5]
>>> from collections import defaultdict
>>> final_list = defaultdict(set)
>>> for n in my_numbers['first']: final_list['test_first'].add(n)
...
>>> final_list['test_first']
set([1, 2, 5, 6])
As you can see, the final output is a deduped set, as required.

It's not iterating wrong, you're just assigning goodbye as a new dict each time. You need to assign as an empty dict then assign the values to keys in each iteration.
goodbye = {}
for key, value in hello.items(): goodbye[key] = set(value)
>>> goodbye
{'test1': set([2, 3, 4, 5, 6]), 'test2': set([8, 9, 3, 4, 5])}
Also since sets don't preserve order, if you do want to preserve order it's best to make a simple iterating function that will return a new list that skips over already added values.
def uniqueList(li):
newList = []
for x in li:
if x not in newList:
newList.append(x)
return newList
goodbye = {}
for key, value in hello.items(): goodbye[key] = uniqueList(value)
>>> goodbye
{'test1': [2, 3, 4, 5, 6], 'test2': [5, 8, 4, 3, 9]}

You can use a list comprehension with a deduplicate function that preserves the order:
def deduplicate(seq):
seen = set()
seen_add = seen.add
return [ x for x in seq if not (x in seen or seen_add(x))]
{key: deduplicate(value) for key, value in hello.items()}

>>>hello = {'test1':[2,3,4,2,2,5,6], 'test2':[5,5,8,4,3,3,8,9]}
>>>for key,value in hello.iteritems():
hello[key] = list(set(value))
>>>hello
{'test1': [2, 3, 4, 5, 6], 'test2': [8, 9, 3, 4, 5]}

This is a more verbose way of doing it, which preserves order and works in all Python versions:
for key in hello:
s = set()
l = []
for subval in hello[key]:
if subval not in s:
l.append(subval)
s.add(subval)
hello[key] = l

my_list = [1,2,2,2,3,4,5,6,7,7,7,7,7,8,9,10]
seen = set()
print list(filter(lambda x:x not in seen and not seen.add(x),my_list))

Related

Adding Two Lists into One Dictionary (Python)

I have two lists, one containing a list of keys, and another containing the values to be assigned to each key, chronologically, by key.
For example;
key_list = ['cat', 'dog', 'salamander']
value_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
I'm looking to make a quick method that takes these two lists, and from it can spit out a dictionary that looks like this:
key_value_pairs = {
'cat': [1, 4, 7],
'dog': [2, 5, 8],
'salamander': [3, 6, 9]
}
Regardless of the length of the values, I'm looking for a way to just iterate through each value and amend them to a dictionary containing one entry for each item in the key_list. Any ideas?
key_value_pairs = {k: [v for v_i, v in enumerate(value_list) if v_i % len(key_list) == k_i] for k_i, k in enumerate(key_list)}
Edit: that's a fun one-liner, but it has worse time complexity than the following solution, which doesn't use any nested loops:
lists = [[] for _ in key_list]
for i, v in enumerate(value_list):
lists[i % len(key_list)].append(v)
key_value_pairs = dict(zip(keys, lists))

Can a loop reference a list without naming it, within a global frame?

I have been tasked to group a list by frequency. This is a very common question on SOF and so far the forum has been very educational. However, of all the examples given, only one follows these perimeters:
Sort the given iterable so that its elements end up in the decreasing frequency order.
If two elements have the same frequency, they should end up in the same order as the first appearance in the iterable.
Using these two lists:
[4, 6, 2, 2, 6, 4, 4, 4]
[17, 99, 42]
The following common codes given as solutions to this question have failed.
from collections import Counter
freq = Counter(items)
# Ex 1
# The items dont stay grouped in the final list :(
sorted(items, key = items.count, reverse=True)
sorted(items, key=lambda x: -freq[x])
[4, 4, 4, 4, 6, 2, 2, 6]
# Ex 2
# The order that the items appear in the list gets rearranged :(
sorted(sorted(items), key=freq.get, reverse=True)
[4, 4, 4, 4, 2, 2, 6, 6]
# Ex 3
# With a list of integers, after the quantity gets sorted,
# the int value gets sorted :(
sorted(items, key=lambda x: (freq[x], x), reverse=True)
[99, 42, 17]
I did find a solution that works great though:
s_list = sorted(freq, key=freq.get, reverse=True)
new_list = []
for num in s_list:
for rep in range(freq[num]):
new_list.append(num)
print(new_list)
I can't figure out how the second loop references the number of occurrences though.
I ran the process through pythontutor to visualize it and the code seems to simply know that there are four "4", two "6" and two "2" in the 'items' list. The only solution I can think of is that python can reference a list in a global frame without it being named. Or perhaps being able to utilize the value from the "freq" dictionary. Is this correct?
referenced thread:
Sort list by frequency in python
Yes, the values of freq are the ones making the second loop work.
freq is a Counter:
It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values.
In other words, freq is a dictionary which keys are the unique elements of items mapped to the amount of times they appeared in items.
And to illustrate your example:
>>> from collections import Counter
>>> items = [4, 6, 2, 2, 6, 4, 4, 4]
>>> freq = Counter(items)
>>> freq
Counter({4: 4, 6: 2, 2: 2})
So when range(freq[num]) is iterated over in your second loop, all it does is iterating over the amount of times num appeared in items.
Edit 2019-02-13: Additional information and example for Python Tutor
It looks like Python Tutor represents simple build-in types (integers, strings, ...) as-is, and not as "objects" in their own cell.
You can see the references clearly if you use new objects instead of integer. For instance, if you were to wrap the integer such as:
from collections import Counter
class MyIntWrapper:
def __init__(self, value):
self.value = value
items = [4, 6, 2, 2, 6, 4, 4, 4]
items_wrapped = [MyIntWrapper(item) for item in items]
freq = Counter(items_wrapped)
s_list = sorted(freq, key=freq.get, reverse=True)
new_list = []
for num in s_list:
for rep in range(freq[num]):
new_list.append(num)

Python list comprehension: adding unique elements into list?

I'm trying to write simpler code for adding unique elements into a python list. I have a dataset that contains a list of dictionaries, and I'm trying to iterate through a list inside the dictionary
Why doesn't this work? It's adding all the items, including the duplicates, instead of adding unique items.
unique_items = []
unique_items = [item for d in data for item in d['items'] if item not in unique_items]
vs. the longer form which works:
unique_items = []
for d in data:
for item in d['items']:
if (item not in unique_items):
unique_items.append(item)
Is there a way of making this work using list comprehension, or am I stuck with using double for loops? I want to keep the ordering for this.
Here's the list of dictionaries:
[{"items":["apple", "banana"]}, {"items":["banana", "strawberry"]}, {"items":["blueberry", "kiwi", "apple"]}]
output should be ["apple", "banana", "strawberry", "blueberry", "kiwi"]
I noticed someone asking a similar question on another post: Python list comprehension, with unique items, but I was wondering if there's another way to do it without OrderedDict or if that's the best way
all_items isn't continuously overwritten during the list comprehension, so you're constantly looking for things in an empty list.
I would do this instead:
data = [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 1, 2, 3, 4,]
items = []
_ = [items.append(d) for d in data if d not in items]
print(items)
and I get:
[1, 2, 3, 4, 5, 6]
But there are more efficient ways to do this anyway.
Why not just use set?
e.g. -
>>> data = {1: {'items': [1, 2, 3, 4, 5]}, 2: {'items': [1, 2, 3, 4, 5]}}
>>> {val for item in data for val in data[item]['items']}
>>> {1, 2, 3, 4, 5}
If you want a list:
>>> list(repeat above)
>>> [1, 2, 3, 4, 5]
Instead of the curly braces {} for the set you could also just use the set keyword, since the braces may be overly obscure for some.
Here's a link to the syntax
The easiest way is to use OrderedDict:
from collections import OrderedDict
from itertools import chain
l = [{"items":["apple", "banana"]}, {"items":["banana", "strawberry"]}, {"items":["blueberry", "kiwi", "apple"]}]
OrderedDict.fromkeys(chain.from_iterable(d['items'] for d in l)).keys() # ['apple', 'banana', 'strawberry', 'blueberry', 'kiwi']
If you want alternatives check OrderedSet recipe and package based on it.

python replace list values using a tuple

If I have a list:
my_list = [3,2,2,3,4,1,3,4]
and a tuple
my_tuple = (3,5)
What's the best way of replacing elements in my_list using the tuple:
result = [5,2,2,5,4,1,5,4]
e.g.
for item in my_list:
if(item == my_tuple[0]):
item = my_tuple[1]
More generally, I would have a list of lists, and a list of tuples, and I would like to apply each of the tuples to each of the lists within the list of lists.
The more natural data structure for my_tuple is a dictionary. Consider something like this and use the .get() method:
>>> my_lists = [[3,2,2,3,4,1,3,4], [1,2,3,4,5,6]]
>>> my_tuple_list = [(3,5), (6, 7)]
>>> my_dict = dict(my_tuple_list)
>>> my_dict
{3: 5, 6: 7}
>>> my_lists = [[my_dict.get(x,x) for x in somelist] for somelist in my_lists]
>>> my_lists
[[5, 2, 2, 5, 4, 1, 5, 4], [1, 2, 5, 4, 5, 7]]
Per #Wooble's comment, your code will work if you enumerate.
list_of_lists = [[3,2,2,3,4,1,3,4], [1,3,5,3,4,6,3]]
list_of_tuples = [(3,5), (1,9)]
def tup_replace(mylist, mytuple):
for i, item in enumerate(mylist):
if item == mytuple[0]:
mylist[i] = mytuple[1]
return mylist
then you can just nest that some more to work on a list of list and list of tuples.
for mylist in list_of_lists:
for mytuple in list_of_tuples:
mylist = tup_replace(mylist, mytuple)
print mylist
That said, the dictionary approach is probably better.
Using if item == my_tuple[0], ... in a general case sounds like you are making a switch statement that you want to apply to each item in your list. Use a dictionary instead if you can. (Why isn't there a switch or case statement in python?)
Convert your list of tuples to a lookup dictionary (python's switch statement):
replacements = dict(my_tuples) #thanks to #julio
Then for a single list, reproduce the list with a comprehension, but replace each value with the new value from replacements if it exists:
replaced_list = [replacements.get(original,original) for original in my_list]
I guess there is a more efficient way to do it, but that's for a single list with a list of tuples. You say you also need to do it for a list of lists? Just nest that?
Could you explain more about where you are getting this data and why you need to do it?
If you are trying to replace every 3 in your list with 5, this will do:
[x == my_tuple[0] and my_tuple[1] or x for x in my_list]
If you want to do this, with more than one "translational" tuple, then I really suggest to use a dictionary instead:
trans = {3: 5, 4: 6}
[trans.get(x,x) for x in my_list]
And in the more general case where you have more than one list:
ll = [[3, 2, 3, 4], [5, 4, 3, 4]]
trans = {3: 5, 4: 6}
for i in range(len(ll)):
ll[i] = [trans.get(x,x) for x in ll[i]]
Supposing that you want to replace every old list in ll with the new one.

List of unique items in a list of tuples

I have a list of tuples like this: mylist = [(1,2,3),(6,1,1),(7,8,1),(3,4,5)]. If I use the list comprehension slist = [item for sublist in mylist for item in sublist], I could get slist = [1,2,3,6,1,1,7,8,1,3,4,5].
How should I modify if I need only unique elements in slist like this [1,2,3,6,7,8,4,5]?
Use a set instead of a list.
set(slist)
If you really need it as a list then you can convert it back to a list:
slist = list(set(slist))
Note that this conversion won't preserve the original order of the elements. If you need the same order you can use this instead:
>>> result = []
>>> seen = set()
>>> for innerlist in mylist:
for item in innerlist:
if not item in seen:
seen.add(item)
result.append(item)
>>> result
[1, 2, 3, 6, 7, 8, 4, 5]
You can actually make your first part a bit easier by using itertools.chain.from_iterable and then passing the result to set, which will only retain the unique elements:
>>> mylist = [(1,2,3),(6,1,1),(7,8,1),(3,4,5)]
>>> import itertools
>>> set(itertools.chain.from_iterable(mylist))
set([1, 2, 3, 4, 5, 6, 7, 8])
import itertools
chain = itertools.chain(*mylist)
print(set(chain))
taken from Flattening a shallow list in Python and adapted for use in this question.
You should use python sets http://docs.python.org/library/sets.html
set(yourlist)
it will do the trick

Categories