I'm trying to write simpler code for adding unique elements into a python list. I have a dataset that contains a list of dictionaries, and I'm trying to iterate through a list inside the dictionary
Why doesn't this work? It's adding all the items, including the duplicates, instead of adding unique items.
unique_items = []
unique_items = [item for d in data for item in d['items'] if item not in unique_items]
vs. the longer form which works:
unique_items = []
for d in data:
for item in d['items']:
if (item not in unique_items):
unique_items.append(item)
Is there a way of making this work using list comprehension, or am I stuck with using double for loops? I want to keep the ordering for this.
Here's the list of dictionaries:
[{"items":["apple", "banana"]}, {"items":["banana", "strawberry"]}, {"items":["blueberry", "kiwi", "apple"]}]
output should be ["apple", "banana", "strawberry", "blueberry", "kiwi"]
I noticed someone asking a similar question on another post: Python list comprehension, with unique items, but I was wondering if there's another way to do it without OrderedDict or if that's the best way
all_items isn't continuously overwritten during the list comprehension, so you're constantly looking for things in an empty list.
I would do this instead:
data = [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 1, 2, 3, 4,]
items = []
_ = [items.append(d) for d in data if d not in items]
print(items)
and I get:
[1, 2, 3, 4, 5, 6]
But there are more efficient ways to do this anyway.
Why not just use set?
e.g. -
>>> data = {1: {'items': [1, 2, 3, 4, 5]}, 2: {'items': [1, 2, 3, 4, 5]}}
>>> {val for item in data for val in data[item]['items']}
>>> {1, 2, 3, 4, 5}
If you want a list:
>>> list(repeat above)
>>> [1, 2, 3, 4, 5]
Instead of the curly braces {} for the set you could also just use the set keyword, since the braces may be overly obscure for some.
Here's a link to the syntax
The easiest way is to use OrderedDict:
from collections import OrderedDict
from itertools import chain
l = [{"items":["apple", "banana"]}, {"items":["banana", "strawberry"]}, {"items":["blueberry", "kiwi", "apple"]}]
OrderedDict.fromkeys(chain.from_iterable(d['items'] for d in l)).keys() # ['apple', 'banana', 'strawberry', 'blueberry', 'kiwi']
If you want alternatives check OrderedSet recipe and package based on it.
Related
I have two lists, one containing a list of keys, and another containing the values to be assigned to each key, chronologically, by key.
For example;
key_list = ['cat', 'dog', 'salamander']
value_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
I'm looking to make a quick method that takes these two lists, and from it can spit out a dictionary that looks like this:
key_value_pairs = {
'cat': [1, 4, 7],
'dog': [2, 5, 8],
'salamander': [3, 6, 9]
}
Regardless of the length of the values, I'm looking for a way to just iterate through each value and amend them to a dictionary containing one entry for each item in the key_list. Any ideas?
key_value_pairs = {k: [v for v_i, v in enumerate(value_list) if v_i % len(key_list) == k_i] for k_i, k in enumerate(key_list)}
Edit: that's a fun one-liner, but it has worse time complexity than the following solution, which doesn't use any nested loops:
lists = [[] for _ in key_list]
for i, v in enumerate(value_list):
lists[i % len(key_list)].append(v)
key_value_pairs = dict(zip(keys, lists))
This question was previously asked here with an egregious typo: Counting "unique pairs" of numbers into a python dictionary?
This is an algorithmic problem, and I don't know of the most efficient solution. My idea would be to somehow cache values in a list and enumerate pairs...but that would be so slow. I'm guessing there's something useful from itertools.
Let's say I have a list of integers whereby are never repeats:
list1 = [2, 3]
In this case, there is a unique pair 2-3 and 3-2, so the dictionary should be:
{2:{3: 1}, 3:{2: 1}}
That is, there is 1 pair of 2-3 and 1 pair of 3-2.
For larger lists, the pairing is the same, e.g.
list2 = [2, 3, 4]
has the dicitonary
{2:{3:1, 4:1}, 3:{2:1, 4:1}, 4:{3:1, 2:1}}
(1) Once the size of the lists become far larger, how would one algorithmically find the "unique pairs" in this format using python data structures?
(2) I mentioned that the lists cannot have repeat integers, e.g.
[2, 2, 3]
is impossible, as there are two 2s.
However, one may have a list of lists:
list3 = [[2, 3], [2, 3, 4]]
whereby the dictionary must be
{2:{3:2, 4:1}, 3:{2:2, 4:1}, 4:{2:1, 3:1}}
as there are two pairs of 2-3 and 3-2. How would one "update" the dictionary given multiple lists within a list?
EDIT: My ultimate use case is, I want to iterate through hundreds of lists of integers, and create a single dictionary with the "counts" of pairs. Does this make sense? There might be another data structure which is more useful.
For the nested list example, you can do the following, making use of itertools.permutations and dict.setdefault:
from itertools import permutations
list3 = [[2, 3], [2, 3, 4]]
d = {}
for l in list3:
for a, b in permutations(l, 2):
d[a][b] = d.setdefault(a, {}).setdefault(b, 0) + 1
# {2: {3: 2, 4: 1}, 3: {2: 2, 4: 1}, 4: {2: 1, 3: 1}}
For flat lists l, use only the inner loop and omit the outer one
For this example I'll just use a list with straight numbers and no nested list:
values = [3, 2, 4]
result = dict.from_keys(values)
for key, value in result.items():
value = {}
for num in values:
if num != key:
value[num] = 1
This creates a dict with each number as a key. Now in each key, make the value a nested dict who's contents are num: 1 for each number in the original values list if it isn't the name of the key that we're in
use defaultdict with permutations
from collections import defaultdict
from itertools import permutations
d = defaultdict(dict)
for i in [x for x in permutations([4,2,3])]:
d[i[0]] = {k: 1 for k in i[1:]}
output is
In [22]: d
Out[22]: defaultdict(dict, {2: {3: 1, 4: 1}, 4: {2: 1, 3: 1}, 3: {2: 1, 4: 1}})
for inherit list of lists https://stackoverflow.com/a/52206554/8060120
This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(31 answers)
Closed 7 years ago.
I have a dict containing lists and need a fast way to dedupe the lists.
I know how to dedupe a list in isolation using the set() function, but in this case I want a fast way of iterating through the dict, deduping each list on the way.
hello = {'test1':[2,3,4,2,2,5,6], 'test2':[5,5,8,4,3,3,8,9]}
I'd like it to appear like;
hello = {'test1':[2,3,4,5,6], 'test2':[5,8,4,3,9]}
Though I don't necessarily need to have the original order of the lists preserved.
I've tried using a set like this, but it's not quite correct (it's not iterating properly and I'm losing the first key)
for key, value in hello.items(): goodbye = {key: set(value)}
>>> goodbye
{'test2': set([8, 9, 3, 4, 5])}
EDIT: Following PM 2Ring's comment below, I'm now populating the dict differently to avoid duplicates in the first place. Previously I was using lists, but using sets prevents dupes to be appended by default;
>>> my_numbers = {}
>>> my_numbers['first'] = [1,2,2,2,6,5]
>>> from collections import defaultdict
>>> final_list = defaultdict(set)
>>> for n in my_numbers['first']: final_list['test_first'].add(n)
...
>>> final_list['test_first']
set([1, 2, 5, 6])
As you can see, the final output is a deduped set, as required.
It's not iterating wrong, you're just assigning goodbye as a new dict each time. You need to assign as an empty dict then assign the values to keys in each iteration.
goodbye = {}
for key, value in hello.items(): goodbye[key] = set(value)
>>> goodbye
{'test1': set([2, 3, 4, 5, 6]), 'test2': set([8, 9, 3, 4, 5])}
Also since sets don't preserve order, if you do want to preserve order it's best to make a simple iterating function that will return a new list that skips over already added values.
def uniqueList(li):
newList = []
for x in li:
if x not in newList:
newList.append(x)
return newList
goodbye = {}
for key, value in hello.items(): goodbye[key] = uniqueList(value)
>>> goodbye
{'test1': [2, 3, 4, 5, 6], 'test2': [5, 8, 4, 3, 9]}
You can use a list comprehension with a deduplicate function that preserves the order:
def deduplicate(seq):
seen = set()
seen_add = seen.add
return [ x for x in seq if not (x in seen or seen_add(x))]
{key: deduplicate(value) for key, value in hello.items()}
>>>hello = {'test1':[2,3,4,2,2,5,6], 'test2':[5,5,8,4,3,3,8,9]}
>>>for key,value in hello.iteritems():
hello[key] = list(set(value))
>>>hello
{'test1': [2, 3, 4, 5, 6], 'test2': [8, 9, 3, 4, 5]}
This is a more verbose way of doing it, which preserves order and works in all Python versions:
for key in hello:
s = set()
l = []
for subval in hello[key]:
if subval not in s:
l.append(subval)
s.add(subval)
hello[key] = l
my_list = [1,2,2,2,3,4,5,6,7,7,7,7,7,8,9,10]
seen = set()
print list(filter(lambda x:x not in seen and not seen.add(x),my_list))
If I have a list:
my_list = [3,2,2,3,4,1,3,4]
and a tuple
my_tuple = (3,5)
What's the best way of replacing elements in my_list using the tuple:
result = [5,2,2,5,4,1,5,4]
e.g.
for item in my_list:
if(item == my_tuple[0]):
item = my_tuple[1]
More generally, I would have a list of lists, and a list of tuples, and I would like to apply each of the tuples to each of the lists within the list of lists.
The more natural data structure for my_tuple is a dictionary. Consider something like this and use the .get() method:
>>> my_lists = [[3,2,2,3,4,1,3,4], [1,2,3,4,5,6]]
>>> my_tuple_list = [(3,5), (6, 7)]
>>> my_dict = dict(my_tuple_list)
>>> my_dict
{3: 5, 6: 7}
>>> my_lists = [[my_dict.get(x,x) for x in somelist] for somelist in my_lists]
>>> my_lists
[[5, 2, 2, 5, 4, 1, 5, 4], [1, 2, 5, 4, 5, 7]]
Per #Wooble's comment, your code will work if you enumerate.
list_of_lists = [[3,2,2,3,4,1,3,4], [1,3,5,3,4,6,3]]
list_of_tuples = [(3,5), (1,9)]
def tup_replace(mylist, mytuple):
for i, item in enumerate(mylist):
if item == mytuple[0]:
mylist[i] = mytuple[1]
return mylist
then you can just nest that some more to work on a list of list and list of tuples.
for mylist in list_of_lists:
for mytuple in list_of_tuples:
mylist = tup_replace(mylist, mytuple)
print mylist
That said, the dictionary approach is probably better.
Using if item == my_tuple[0], ... in a general case sounds like you are making a switch statement that you want to apply to each item in your list. Use a dictionary instead if you can. (Why isn't there a switch or case statement in python?)
Convert your list of tuples to a lookup dictionary (python's switch statement):
replacements = dict(my_tuples) #thanks to #julio
Then for a single list, reproduce the list with a comprehension, but replace each value with the new value from replacements if it exists:
replaced_list = [replacements.get(original,original) for original in my_list]
I guess there is a more efficient way to do it, but that's for a single list with a list of tuples. You say you also need to do it for a list of lists? Just nest that?
Could you explain more about where you are getting this data and why you need to do it?
If you are trying to replace every 3 in your list with 5, this will do:
[x == my_tuple[0] and my_tuple[1] or x for x in my_list]
If you want to do this, with more than one "translational" tuple, then I really suggest to use a dictionary instead:
trans = {3: 5, 4: 6}
[trans.get(x,x) for x in my_list]
And in the more general case where you have more than one list:
ll = [[3, 2, 3, 4], [5, 4, 3, 4]]
trans = {3: 5, 4: 6}
for i in range(len(ll)):
ll[i] = [trans.get(x,x) for x in ll[i]]
Supposing that you want to replace every old list in ll with the new one.
I have a list of tuples like this: mylist = [(1,2,3),(6,1,1),(7,8,1),(3,4,5)]. If I use the list comprehension slist = [item for sublist in mylist for item in sublist], I could get slist = [1,2,3,6,1,1,7,8,1,3,4,5].
How should I modify if I need only unique elements in slist like this [1,2,3,6,7,8,4,5]?
Use a set instead of a list.
set(slist)
If you really need it as a list then you can convert it back to a list:
slist = list(set(slist))
Note that this conversion won't preserve the original order of the elements. If you need the same order you can use this instead:
>>> result = []
>>> seen = set()
>>> for innerlist in mylist:
for item in innerlist:
if not item in seen:
seen.add(item)
result.append(item)
>>> result
[1, 2, 3, 6, 7, 8, 4, 5]
You can actually make your first part a bit easier by using itertools.chain.from_iterable and then passing the result to set, which will only retain the unique elements:
>>> mylist = [(1,2,3),(6,1,1),(7,8,1),(3,4,5)]
>>> import itertools
>>> set(itertools.chain.from_iterable(mylist))
set([1, 2, 3, 4, 5, 6, 7, 8])
import itertools
chain = itertools.chain(*mylist)
print(set(chain))
taken from Flattening a shallow list in Python and adapted for use in this question.
You should use python sets http://docs.python.org/library/sets.html
set(yourlist)
it will do the trick