How to take duplicated values from the several lists? - python

I have several lists in python, and I would like to take only values which are in each list, is there any function to do it directly?
for example I have:
{'a','b','c','d','e'},{'a','g','c','d','h','e'}, {'i','b','m','d','e','a'}
and I want to make one list which contains
{'a','d','e'}
but i don't know how many lists I actually have, cause it's dependent on value 'i'.
thanks for any help!

if the elements are unique and hashable (and order doesn't matter in the result), you can use set intersection: e.g.:
common_elements = list(set(list1).intersection(list2).intersection(list3))
This is functionally equivalent to:
common_elements = list( set(list1) & set(list2) & set(list3) )
The & operator only works with sets whereas the the intersection method works with any iterable.
If you have a list of lists and you want the intersection of all of them, you can do this easily:
common_elements = list( set.intersection( * map(set, your_list_of_lists) ) )
special thanks to DSM for pointing this one out
Or you could just use a loop:
common_elements = set(your_list_of_lists[0])
for elem in your_list_of_lists[1:]:
common_elements = common_elements.intersection(elem) #or common_elements &= set(elem) ...
else:
common_elements = list(common_elements)
Note that if you really want to get the order that they were in the original list, you can do that using a simple sort:
common_elements.sort( key = lambda x, your_list_of_lists[0].index(x) )
By construction, there is no risk of a ValueError being raised here.

Just to put a one-liner on the table:
l=['a','b','c','d','e'],['a','g','c','d','h'], ['i','b','m','d','e']
reduce(lambda a, b: a & b, map(set, l))
or
from operator import and_
l=['a','b','c','d','e'],['a','g','c','d','h'], ['i','b','m','d','e']  
reduce(and_, map(set, l))  

You need make set from first list, then use set's .intersection() method.
a, b, c = ['a','b','c','d','e'], ['a','g','c','d','h'], ['i','b','m','d','e']
exists_in_all = set(a).intersection(b).intersection(c)
Updated.
Simplified according to mgilson's comment.

from operator import and_
import operator
a = [['a','b','c','d','e'],['a','g','c','d','h','e'], ['i','b','m','d','e','a']]
print list(reduce(operator.and_, map(set, a)))
it will give you the commeen element from the list
['a', 'e', 'd']

Related

Dropping values from a list of tuples

I have a list of tuples which I would like to only return the second column of data from and only unique values
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
Desired output:
['Andrew#gmail.com','Jim#gmail.com','Sarah#gmail.com']
My idea would be to iterate through the list and append the item from the second column into a new list then use the following code. Before I go down that path too far I know there is a better way to do this.
from collections import Counter
cnt = Counter(mytuple_new)
unique_mytuple_new = [k for k, v in cnt.iteritems() if v > 1]
You can use zip function :
>>> set(zip(*mytuple)[1])
set(['Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com'])
Or as a less performance way you can use map and operator.itemgetter and use set to get the unique tuple :
>>> from operator import itemgetter
>>> tuple(set(map(lambda x:itemgetter(1)(x),mytuple)))
('Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com')
a benchmarking on some answers :
my answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set(zip(*mytuple)[1])
"""
print timeit.timeit(stmt=s, number=100000)
0.0740020275116
icodez answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
seen = set()
[x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
"""
print timeit.timeit(stmt=s, number=100000)
0.0938332080841
Hasan's answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set([k[1] for k in mytuple])
"""
print timeit.timeit(stmt=s, number=100000)
0.0699651241302
Adem's answer :
s = """
from itertools import izip
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set(map(lambda x: x[1], mytuple))
"""
print timeit.timeit(stmt=s, number=100000)
0.237300872803 !!!
unique_emails = set(item[1] for item in mytuple)
The list comprehension will help you generate a list containing only the second column data, and converting that list to set() removes duplicated values.
try:
>>> unique_mytuple_new = set([k[1] for k in mytuple])
>>> unique_mytuple_new
set(['Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com'])
You can use a list comprehension and a set to keep track of seen values:
>>> mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
>>> seen = set()
>>> [x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
>>>
The most important part of this solution is that order is preserved like in your example. Doing just set(x[1] for x in mytuple) or something similar will get you the unique items, but their order will be lost.
Also, the if x[1] not in seen and not seen.add(x[1]) may seem a little strange, but it is actually a neat trick that allows you to add items to the set inside the list comprehension (otherwise, we would need to use a for-loop).
Because and performs short-circuit evaluation in Python, not seen.add(x[1]) will only be evaluated if x[1] not in seen returns True. So, the condition sees if x[1] is in the set and adds it if not.
The not operator is placed before seen.add(x[1]) so that the condition evaluates to True if x[1] needed to be added to the set (set.add returns None, which is treated as False. not False is True).
How about the obvious and simple loop? There is no need to create a list and then convert to set, just don't add dupliates.
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
result = []
for item in mytuple:
if item[1] not in result:
result.append(item[1])
print result
Output:
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
Is the order of the items important? A lot of the proposed answers use set to unique-ify the list. That's good, proper, and performant if the order is unimportant. If order does matter, you can used an OrderedDict to perform set-like unique-ification while preserving order.
# test data
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
from collections import OrderedDict
emails = list(OrderedDict((t[1], 1) for t in mytuple).keys())
print emails
Yielding:
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
Update
Based on iCodez's suggestion, restating answer to:
from collections import OrderedDict
emails = list(OrderedDict.fromkeys(t[1] for t in mytuple).keys())

How to use pop for multidimensional array in pythonic way

I want to remove the list items found in list B from the list A. This is the function I wrote:
def remove(A,B):
to_remove=[];
for i in range(len(A)):
for j in range(len(B)):
if (B[j]==A[i]):
to_remove.append(i);
for j in range(len(to_remove)):
A.pop(to_remove[j]);
Is this the normal way to do it ? Although, this works completely fine (if typos, I don't know), I think there might be more pythonic way to do it. Please suggest.
Convert B to a set first and then create a new array from A using a list comprehension:
s = set(B)
A = [item for item in A if item not in s]
Item lookup in a set is an O(1) operation.
If you don't want to change the id() of A, then:
A[:] = [item for item in A if item not in s]
First, note that your function doesn't work right. Try this:
A = [1, 2, 3]
B = [1, 2, 3]
remove(A, B)
You'll get an IndexError, because the correct indices to delete change each time you do a .pop().
You'll doubtless get answers recommending using sets, and that's indeed much better if the array elements are hashable and comparable, but in general you may need something like this:
def remove(A, B):
A[:] = [avalue for avalue in A if avalue not in B]
That works for any kinds of array elements (provided only they can be compared for equality), and preserves the original ordering. But it takes worst-case time proportional to len(A) * len(B).
List comprehenstion to the rescue:
[item for item in A if item not in B]
This however creates a new list. You can return the list from the function.
Or, if you are ok with loosing any duplicates in list A, or there are no duplicates, you can use set difference:
return list(set(A) - set(B))
One caveat is, this won't preserve the order of elements in A. So, if you want elements in order, this is not what you want. Use the 1st approach instead.
What about list comprehension?
def remove(removeList, fromList):
return [x for x in fromList if x not in removeList]
Also, to make life easier and removing faster you can make a set from list removeList, leaving only unique elements:
def remove(removeList, fromList):
removeSet = set(removeList)
return [x for x in fromList if x not in removeSet]
>>> print remove([1,2,3], [1,2,3,4,5,6,7])
[4, 5, 6, 7]
And, of course, you can use built-in filter function, though someone will say that it's non-pythonic, and you should use list generators instead. Either way, here is an example:
def remove(removeList, fromList):
removeSet = set(removeList)
return filter(lambda x : x not in removeSet, fromList)

What is the simplest and most efficient function to return a sublist based on an index list?

Say I have a list l:
['a','b','c','d','e']
and a list of indexes idx:
[1,3]
What is the simplest and most efficient function that will return:
['b','d']
Try using this:
[l[i] for i in idx]
You want operator.itemgetter.
In my first example, I'll show how you can use itemgetter to construct a callable which you can use on any indexable object:
from operator import itemgetter
items = itemgetter(1,3)
items(yourlist) #('b', 'd')
Now I'll show how you can use argument unpacking to store your indices as a list
from operator import itemgetter
a = ['a','b','c','d','e']
idx = [1,3]
items = itemgetter(*idx)
print items(a) #('b', 'd')
Of course, this gives you a tuple, not a list, but it's trivial to construct a list from a tuple if you really need to.
Here is an option using a list comprehension:
[v for i, v in enumerate(l) if i in idx]
This will be more efficient if you convert idx to a set first.
An alternative with operator.itemgetter:
import operator
operator.itemgetter(*idx)(l)
As noted in comments, [l[i] for i in idx] will probably be your best bet here, unless idx may contain indices greater than the length of l, or if idx is not ordered and you want to keep the same order as l.

How to find common elements in list of lists?

I'm trying to figure out how to compare an n number of lists to find the common elements.
For example:
p=[ [1,2,3],
[1,9,9],
..
..
[1,2,4]
>> print common(p)
>> [1]
Now if I know the number of elements I can do comparions like:
for a in b:
for c in d:
for x in y:
...
but that wont work if I don't know how many elements p has. I've looked at this solution that compares two lists
https://stackoverflow.com/a/1388864/1320800
but after spending 4 hrs trying to figure a way to make that recursive, a solution still eludes me so any help would be highly appreciated!
You are looking for the set intersection of all the sublists, and the data type you should use for set operations is a set:
result = set(p[0])
for s in p[1:]:
result.intersection_update(s)
print result
A simple solution (one-line) is:
set.intersection(*[set(list) for list in p])
The set.intersection() method supports intersecting multiple inputs at a time. Use argument unpacking to pull the sublists out of the outer list and pass them into set.intersection() as separate arguments:
>>> p=[ [1,2,3],
[1,9,9],
[1,2,4]]
>>> set(p[0]).intersection(*p)
set([1])
Why not just:
set.intersection(*map(set, p))
Result:
set([1])
Or like this:
ip = iter(p)
s = set(next(ip))
s.intersection(*ip)
Result:
set([1])
edit:
copied from console:
>>> p = [[1,2,3], [1,9,9], [1,2,4]]
>>> set.intersection(*map(set, p))
set([1])
>>> ip = iter(p)
>>> s = set(next(ip))
>>> s.intersection(*ip)
set([1])
p=[ [1,2,3],
[1,9,9],
[1,2,4]]
ans = [ele[0] for ele in zip(*p) if len(set(ele)) == 1]
Result:
>>> ans
[1]
reduce(lambda x, y: x & y, (set(i) for i in p))
You are looking for the set intersection of all the sublists, and the data type you should use for set operations is a set:
result = set(p[0])
for s in p[1:]:
result.intersection_update(s)
print result
However, there is a limitation of 10 lists in a list. Anything bigger causes 'result' list to be out of order. Assuming you've made 'result' into a list by list(result).
Make sure you result.sort() to ensure it's ordered if you depend on it to be that way.

python intersect of dict items

Suppose I have a dict like:
aDict[1] = '3,4,5,6,7,8'
aDict[5] = '5,6,7,8,9,10,11,12'
aDict[n] = '5,6,77,88'
The keys are arbitrary, and there could be any number of them. I want to consider every value in the dictionary.
I want to treat each string as comma-separated values, and find the intersection across the entire dictionary (the elements common to all dict values). So in this case the answer would be '5,6'. How can I do this?
from functools import reduce # if Python 3
reduce(lambda x, y: x.intersection(y), (set(x.split(',')) for x in aDict.values()))
First of all, you need to convert these to real lists.
l1 = '3,4,5,6,7,8'.split(',')
Then you can use sets to do the intersection.
result = set(l1) & set(l2) & set(l3)
Python Sets are ideal for that task. Consider the following (pseudo code):
intersections = None
for value in aDict.values():
temp = set([int(num) for num in value.split(",")])
if intersections is None:
intersections = temp
else:
intersections = intersections.intersection(temp)
print intersections
result = None
for csv_list in aDict.values():
aList = csv_list.split(',')
if result is None:
result = set(aList)
else:
result = result & set(aList)
print result
Since set.intersection() accepts any number of sets, you can make do without any use of reduce():
set.intersection(*(set(v.split(",")) for v in aDict.values()))
Note that this version won't work for an empty aDict.
If you are using Python 3, and your dictionary values are bytes objects rather than strings, just split at b"," instead of ",".

Categories