Python: intersection of 2 lists keeping duplicates from both lists

Python: intersection of 2 lists keeping duplicates from both lists - python

I want to efficiently find the intersection of two lists , keeping duplicates from both, e.g. A=[1,1,2,3], B=[1,1,2,4] should return [1,1,1,1,2]
I know a similar question was asked previously (Python intersection of two lists keeping duplicates)
however this does not help me because only the duplicates from one list are retained.
The following works
def intersect(A,B):
C=[]
for a in A:
for b in B:
if a==b:
C.append(a)
return C
however it isn't efficient enough for what I'm doing! To speed things up I tried sorting the lists
def intersect(A,B):
A.sort()
B.sort()
C=[]
i=0
j=0
while i<len(A) and j<len(B):
if A[i]<=B[j]:
if A[i]==B[j]:
C.append(A[i])
i+=1
else:
j=j+1
return C
however this only keeps the duplicates from list B. Any suggestions?

Here is the answer to your question as asked:
import collections
for A,B,expected_output in (
([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
([1,1,2,3], [1,2,4], [1,1,2])):
cntA = collections.Counter(A)
cntB = collections.Counter(B)
output = [
x for x in sorted(set(A) & set(B)) for i in range(cntA[x]*cntB[x])]
assert output == expected_output
Here is the answer to the question as originally interpreted by myself and two others:
import collections
A=[1,1,2,3]
B=[1,1,2,4]
expected_output = [1,1,1,1,2,2]
cntA = collections.Counter(A)
cntB = collections.Counter(B)
cnt_sum = collections.Counter(A) + collections.Counter(B)
output = [x for x in sorted(set(A) & set(B)) for i in range(cnt_sum[x])]
assert output == expected_output
You can find the collections.Counter() documentation here. collections is a great module and I highly recommend giving the documentation on the whole module a read.
I realized you don't actually need to find the intersection of the sets, because the "count of a missing element is zero" according to the documentation:
import collections
for A,B,expected_output in (
([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
([1,1,2,3], [1,2,4], [1,1,2])):
cntA = collections.Counter(A)
cntB = collections.Counter(B)
output = [
x for x in sorted(set(A)) for i in range(cntA[x]*cntB[x])]
assert output == expected_output

How about this:
a_set = set(A)
b_set = set(B)
intersect = [i for i in A if i in b_set] + [j for j in B if j in a_set]
Two list comprehensions concatenated. A bit of extra time and memory is used to create sets of A and B, but that will be more than offset by the efficiency of checking membership of items in a set vs list.
You could also spruce it up a bit:
set_intersect = set(A) & set(B)
list_intersect = [ele for ele in A+B if ele in set_intersect]
Coerce both lists to sets, take their intersection, then use a list comprehension to add all elements from both lists A and B if they appear in the set intersection.

I'm having a hard time speeding your code since I don't know what are you running it on. It makes a lot of difference whether you run it on small or large lists and how many distinct elements are there. Anyway, here are some suggestions:
1.
def intersect(a, b):
count_a = Counter(a)
count_b = Counter(b)
count_mul = []
for i in count_a:
count_mul.extend([i] * (count_a[i] * count_b[i]))
return count_mul
2.
This returns an iterator, you can use list(iterator) to turn it into a list
def intersect(a, b):
count_a = Counter(a)
count_b = Counter(b)
count_mul = Counter()
for i in count_a:
count_mul[i] += count_a[i] * count_b[i]
return count_mul.elements()
3.
Very similar to your way but without changing the size of the list which takes time.
def intersect(A, B):
return [a for a in A for b in B if a == b]
I'm not sure this gives any improvement to your original way, it really depends on the inputs, but your way is O(n*m) and mine is O(n+m).
You can use the module timeit to check how fast it runs on your input:
from timeit import timeit
timeit('test.intersect(A, B)', 'import test; A = [1,1,2,3]; B = [1,1,2,4]')

Related

Python Perform action on all list elements and include only some in output using List Comprehension

I have the following function that works fine:
output = []
for a, b in itertools.product(
list_a, list_b
):
x= perform_action(a,b)
if b.relevant:
output.append(x)
return output
How can I rewrite this using list comprehension - if possible ?
In short what I'm looking for is to perform_action for all items and include only the relevant ones in the output.
Reproducible example:
from itertools import product
a = [2,3,4]
b = ["a","b"]
def foo(p,r):
out = "{0}---{1}".format(p,r)
print(out)
return out
li = [foo(p,r) for p,r in product(a,b) if p>3]
print(li)

How do you feel about two layers of list comprehension?
return [x for b, x in [(b, perform_action(a,b)) for a,b in itertools.product(l_a, l_b)] if b.relevant]
I know you didn't come here for code readability advice, but, don't turn this into a list comprehension, it's far less readable as one. If you're thinking of memory efficiency, just do a generator.

Python, finding unique words in multiple lists

I have the following code:
a= ['hello','how','are','hello','you']
b= ['hello','how','you','today']
len_b=len(b)
for word in a:
count=0
while count < len_b:
if word == b[count]:
a.remove(word)
break
else:
count=count+1
print a
The goal is that it basically outputs (contents of list a)-(contents of list b)
so the wanted result in this case would be a = ['are','hello']
but when i run my code i get a= ['how','are','you']
can anybody either point out what is wrong with my implementation, or is there another better way to solve this?

You can use a set to get all non duplicate elements
So you could do set(a) - set(b) for the difference of sets

The reason for this is because you are mutating the list a while iterating over it.
If you want to solve it correctly, you can try the below method. It uses list comprehension and dictionary to keep track of the number of words in the resulting set:
>>> a = ['hello','how','are','hello','you']
>>> b = ['hello','how','you','today']
>>>
>>> cnt_a = {}
>>> for w in a:
... cnt_a[w] = cnt_a.get(w, 0) + 1
...
>>> for w in b:
... if w in cnt_a:
... cnt_a[w] -= 1
... if cnt_a[w] == 0:
... del cnt_a[w]
...
>>> [y for k, v in cnt_a.items() for y in [k] * v]
['hello', 'are']
It works well in case where there are duplicates, even in the resulting list. However it may not preserve the order, but it can be easily modify to do this if you want.

set(a+b) is alright, too. You can use sets to get unique elements.

Element-wise addition on tuple or list python

I was wondering if anyone can teach me how to do element wise addition on a tuple or list without using zip, numpy arrays, or any of those modules?
For example if I have:
a = (1,0,0,1)
b = (2,1,0,1)
how can i get: (3,1,0,2) instead of (1,0,0,1,2,1,0,1) ?

You can do this using operator.add
from operator import add
>>>map(add, a, b)
[3, 1, 0, 2]
In python3
>>>list(map(add, a, b))

List comprehensions are really useful:
[a[i] + b[i] for i in range(len(a))]

You can use the map function, see here:
https://docs.python.org/2/tutorial/datastructures.html#functional-programming-tools
map(func, seq)
For example:
a,b=(1,0,0,1),(2,1,0,1)
c = map(lambda x,y: x+y,a,b)
print c

This will save you if the length of both lists are not the same:
result = [a[i] + b[i] for i in range(min(len(a), len(b))]

This can be done by simply iterating over the length of list(assuming both the lists have equal length) and adding up the values at that indices in both the lists.
a = (1,0,0,1)
b = (2,1,0,1)
c = (1,3,5,7)
#You can add more lists as well
n = len(a)
#if length of lists is not equal then we can use:
n = min(len(a), len(b), len(c))
#As this would not lead to IndexError
sums = []
for i in xrange(n):
sums.append(a[i] + b[i] + c[i])
print sums

Here is a solution that works well for deep as well as shallow nested lists or tuples
import operator
def list_recur(l1, l2, op = operator.add):
if not l1:
return type(l1)([])
elif isinstance(l1[0], type(l1)):
return type(l1)([list_recur(l1[0], l2[0], op)]) + \
list_recur(l1[1:],l2[1:], op)
else:
return type(l1)([op(l1[0], l2[0])]) + \
list_recur(l1[1:], l2[1:], op)
It (by default) performs element wise addition, but you can specify more complex functions and/or lambdas (provided they are binary)

Comparing two lists and only printing the differences? (XORing two lists)

I'm trying to create a function that takes in 2 lists and returns the list that only has the differences of the two lists.
Example:
a = [1,2,5,7,9]
b = [1,2,4,8,9]
The result should print [4,5,7,8]
The function so far:
def xor(list1, list2):
list3=list1+list2
for i in range(0, len(list3)):
x=list3[i]
y=i
while y>0 and x<list3[y-1]:
list3[y]=list3[y-1]
y=y-1
list3[y]=x
last=list3[-1]
for i in range(len(list3) -2, -1, -1):
if last==list3[i]:
del list3[i]
else:
last=list3[i]
return list3
print xor([1,2,5,7,8],[1,2,4,8,9])
The first for loop sorts it, second one removes the duplicates. Problem is the result is
[1,2,4,5,7,8,9] not [4,5,7,8], so it doesn't completely remove the duplicates? What can I add to do this.
I can't use any special modules, .sort, set or anything, just loops basically.

You basically want to add an element to your new list if it is present in one and not present in another. Here is a compact loop which can do it. For each element in the two lists (concatenate them with list1+list2), we add element if it is not present in one of them:
[a for a in list1+list2 if (a not in list1) or (a not in list2)]
You can easily transform it into a more unPythonic code with explicit looping through elements as you have now, but honestly I don't see a point (not that it matters):
def xor(list1, list2):
outputlist = []
list3 = list1 + list2
for i in range(0, len(list3)):
if ((list3[i] not in list1) or (list3[i] not in list2)) and (list3[i] not in outputlist):
outputlist[len(outputlist):] = [list3[i]]
return outputlist

Use set is better
>>> a = [1,2,5,7,9]
>>> b = [1,2,4,8,9]
>>> set(a).symmetric_difference(b)
{4, 5, 7, 8}
Thanks to #DSM, a better sentence is:
>>> set(a)^set(b)
These two statements are the same. But the latter is clearer.
Update: sorry, I did not see the last requirement: cannot use set. As far as I see, the solution provided by #sashkello is the best.

Note: This is really unpythonic and should only be used as a homework answer :)
After you have sorted both lists, you can find duplicates by doing the following:
1) Place iterators at the start of A and B
2) If Aitr is greater than Bitr, advance Bitr after placing Bitr's value in the return list
3) Else if Bitr is greater than Aitr, advance Aiter after placing Aitr's value in the return list
4) Else you have found a duplicate, advance Aitr and Bitr

This code works assuming you've got sorted lists. It works in linear time, rather than quadratic like many of the other solutions given.
def diff(sl0, sl1):
i0, i1 = 0, 0
while i0 < len(sl0) and i1 < len(sl1):
if sl0[i0] == sl1[i1]:
i0 += 1
i1 += 1
elif sl0[i0] < sl1[i1]:
yield sl0[i0]
i0 += 1
else:
yield sl1[i1]
i1 += 1
for i in xrange(i0, len(sl0)):
yield sl0[i]
for i in xrange(i1, len(sl1)):
yield sl1[i]
print list(diff([1,2,5,7,9], [1,2,4,8,9]))

Try this,
a = [1,2,5,7,9]
b = [1,2,4,8,9]
print set(a).symmetric_difference(set(b))

Simple, but not particularly efficient :)
>>> a = [1,2,5,7,9]
>>> b = [1,2,4,8,9]
>>> [i for i in a+b if (a+b).count(i)==1]
[5, 7, 4, 8]
Or with "just loops"
>>> res = []
>>> for i in a+b:
... c = 0
... for j in a+b:
... if i==j:
... c += 1
... if c == 1:
... res.append(i)
...
>>> res
[5, 7, 4, 8]

How to take duplicated values from the several lists?

I have several lists in python, and I would like to take only values which are in each list, is there any function to do it directly?
for example I have:
{'a','b','c','d','e'},{'a','g','c','d','h','e'}, {'i','b','m','d','e','a'}
and I want to make one list which contains
{'a','d','e'}
but i don't know how many lists I actually have, cause it's dependent on value 'i'.
thanks for any help!

if the elements are unique and hashable (and order doesn't matter in the result), you can use set intersection: e.g.:
common_elements = list(set(list1).intersection(list2).intersection(list3))
This is functionally equivalent to:
common_elements = list( set(list1) & set(list2) & set(list3) )
The & operator only works with sets whereas the the intersection method works with any iterable.
If you have a list of lists and you want the intersection of all of them, you can do this easily:
common_elements = list( set.intersection( * map(set, your_list_of_lists) ) )
special thanks to DSM for pointing this one out
Or you could just use a loop:
common_elements = set(your_list_of_lists[0])
for elem in your_list_of_lists[1:]:
common_elements = common_elements.intersection(elem) #or common_elements &= set(elem) ...
else:
common_elements = list(common_elements)
Note that if you really want to get the order that they were in the original list, you can do that using a simple sort:
common_elements.sort( key = lambda x, your_list_of_lists[0].index(x) )
By construction, there is no risk of a ValueError being raised here.

Just to put a one-liner on the table:
l=['a','b','c','d','e'],['a','g','c','d','h'], ['i','b','m','d','e']
reduce(lambda a, b: a & b, map(set, l))
or
from operator import and_
l=['a','b','c','d','e'],['a','g','c','d','h'], ['i','b','m','d','e']  
reduce(and_, map(set, l))

You need make set from first list, then use set's .intersection() method.
a, b, c = ['a','b','c','d','e'], ['a','g','c','d','h'], ['i','b','m','d','e']
exists_in_all = set(a).intersection(b).intersection(c)
Updated.
Simplified according to mgilson's comment.

from operator import and_
import operator
a = [['a','b','c','d','e'],['a','g','c','d','h','e'], ['i','b','m','d','e','a']]
print list(reduce(operator.and_, map(set, a)))
it will give you the commeen element from the list
['a', 'e', 'd']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: intersection of 2 lists keeping duplicates from both lists - python

Related

Python Perform action on all list elements and include only some in output using List Comprehension

Python, finding unique words in multiple lists

Element-wise addition on tuple or list python

Comparing two lists and only printing the differences? (XORing two lists)

How to take duplicated values from the several lists?

Categories

Resources