Accessing Panda dataframe columns [duplicate]

Accessing Panda dataframe columns [duplicate] - python

a = [1,2,3,4,5]
b = [1,3,5,6]
c = a and b
print c
actual output: [1,3,5,6]
expected output: [1,3,5]
How can we achieve a boolean AND operation (list intersection) on two lists?

If order is not important and you don't need to worry about duplicates then you can use set intersection:
>>> a = [1,2,3,4,5]
>>> b = [1,3,5,6]
>>> list(set(a) & set(b))
[1, 3, 5]

Using list comprehensions is a pretty obvious one for me. Not sure about performance, but at least things stay lists.
[x for x in a if x in b]
Or "all the x values that are in A, if the X value is in B".

If you convert the larger of the two lists into a set, you can get the intersection of that set with any iterable using intersection():
a = [1,2,3,4,5]
b = [1,3,5,6]
set(a).intersection(b)

Make a set out of the larger one:
_auxset = set(a)
Then,
c = [x for x in b if x in _auxset]
will do what you want (preserving b's ordering, not a's -- can't necessarily preserve both) and do it fast. (Using if x in a as the condition in the list comprehension would also work, and avoid the need to build _auxset, but unfortunately for lists of substantial length it would be a lot slower).
If you want the result to be sorted, rather than preserve either list's ordering, an even neater way might be:
c = sorted(set(a).intersection(b))

Here's some Python 2 / Python 3 code that generates timing information for both list-based and set-based methods of finding the intersection of two lists.
The pure list comprehension algorithms are O(n^2), since in on a list is a linear search. The set-based algorithms are O(n), since set search is O(1), and set creation is O(n) (and converting a set to a list is also O(n)). So for sufficiently large n the set-based algorithms are faster, but for small n the overheads of creating the set(s) make them slower than the pure list comp algorithms.
#!/usr/bin/env python
''' Time list- vs set-based list intersection
See http://stackoverflow.com/q/3697432/4014959
Written by PM 2Ring 2015.10.16
'''
from __future__ import print_function, division
from timeit import Timer
setup = 'from __main__ import a, b'
cmd_lista = '[u for u in a if u in b]'
cmd_listb = '[u for u in b if u in a]'
cmd_lcsa = 'sa=set(a);[u for u in b if u in sa]'
cmd_seta = 'list(set(a).intersection(b))'
cmd_setb = 'list(set(b).intersection(a))'
reps = 3
loops = 50000
def do_timing(heading, cmd, setup):
t = Timer(cmd, setup)
r = t.repeat(reps, loops)
r.sort()
print(heading, r)
return r[0]
m = 10
nums = list(range(6 * m))
for n in range(1, m + 1):
a = nums[:6*n:2]
b = nums[:6*n:3]
print('\nn =', n, len(a), len(b))
#print('\nn = %d\n%s %d\n%s %d' % (n, a, len(a), b, len(b)))
la = do_timing('lista', cmd_lista, setup)
lb = do_timing('listb', cmd_listb, setup)
lc = do_timing('lcsa ', cmd_lcsa, setup)
sa = do_timing('seta ', cmd_seta, setup)
sb = do_timing('setb ', cmd_setb, setup)
print(la/sa, lb/sa, lc/sa, la/sb, lb/sb, lc/sb)
output
n = 1 3 2
lista [0.082171916961669922, 0.082588911056518555, 0.0898590087890625]
listb [0.069530963897705078, 0.070394992828369141, 0.075379848480224609]
lcsa [0.11858987808227539, 0.1188349723815918, 0.12825107574462891]
seta [0.26900982856750488, 0.26902294158935547, 0.27298116683959961]
setb [0.27218389511108398, 0.27459001541137695, 0.34307217597961426]
0.305460649521 0.258469975867 0.440838458259 0.301898526833 0.255455833892 0.435697630214
n = 2 6 4
lista [0.15915989875793457, 0.16000485420227051, 0.16551494598388672]
listb [0.13000702857971191, 0.13060092926025391, 0.13543915748596191]
lcsa [0.18650484085083008, 0.18742108345031738, 0.19513416290283203]
seta [0.33592700958251953, 0.34001994132995605, 0.34146714210510254]
setb [0.29436492919921875, 0.2953648567199707, 0.30039691925048828]
0.473793098554 0.387009751735 0.555194537893 0.540689066428 0.441652573672 0.633583767462
n = 3 9 6
lista [0.27657914161682129, 0.28098297119140625, 0.28311991691589355]
listb [0.21585917472839355, 0.21679902076721191, 0.22272896766662598]
lcsa [0.22559309005737305, 0.2271728515625, 0.2323150634765625]
seta [0.36382699012756348, 0.36453008651733398, 0.36750602722167969]
setb [0.34979605674743652, 0.35533690452575684, 0.36164689064025879]
0.760194128313 0.59330170819 0.62005595016 0.790686848184 0.61710008036 0.644927481902
n = 4 12 8
lista [0.39616990089416504, 0.39746403694152832, 0.41129183769226074]
listb [0.33485794067382812, 0.33914685249328613, 0.37850618362426758]
lcsa [0.27405810356140137, 0.2745978832244873, 0.28249192237854004]
seta [0.39211201667785645, 0.39234519004821777, 0.39317893981933594]
setb [0.36988520622253418, 0.37011313438415527, 0.37571001052856445]
1.01034878821 0.85398540833 0.698928091731 1.07106176249 0.905302334456 0.740927452493
n = 5 15 10
lista [0.56792402267456055, 0.57422614097595215, 0.57740211486816406]
listb [0.47309303283691406, 0.47619009017944336, 0.47628307342529297]
lcsa [0.32805585861206055, 0.32813096046447754, 0.3349759578704834]
seta [0.40036201477050781, 0.40322518348693848, 0.40548801422119141]
setb [0.39103078842163086, 0.39722800254821777, 0.43811702728271484]
1.41852623806 1.18166313332 0.819398061028 1.45237674242 1.20986133789 0.838951479847
n = 6 18 12
lista [0.77897095680236816, 0.78187918663024902, 0.78467702865600586]
listb [0.629547119140625, 0.63210701942443848, 0.63321495056152344]
lcsa [0.36563992500305176, 0.36638498306274414, 0.38175487518310547]
seta [0.46695613861083984, 0.46992206573486328, 0.47583580017089844]
setb [0.47616910934448242, 0.47661614418029785, 0.4850609302520752]
1.66818870637 1.34819326075 0.783028414812 1.63591241329 1.32210827369 0.767878297495
n = 7 21 14
lista [0.9703209400177002, 0.9734041690826416, 1.0182771682739258]
listb [0.82394003868103027, 0.82625699043273926, 0.82796716690063477]
lcsa [0.40975093841552734, 0.41210508346557617, 0.42286920547485352]
seta [0.5086359977722168, 0.50968098640441895, 0.51014018058776855]
setb [0.48688101768493652, 0.4879908561706543, 0.49204087257385254]
1.90769222837 1.61990115188 0.805587768483 1.99293236904 1.69228211566 0.841583309951
n = 8 24 16
lista [1.204819917678833, 1.2206029891967773, 1.258256196975708]
listb [1.014998197555542, 1.0206191539764404, 1.0343101024627686]
lcsa [0.50966787338256836, 0.51018595695495605, 0.51319599151611328]
seta [0.50310111045837402, 0.50556015968322754, 0.51335406303405762]
setb [0.51472997665405273, 0.51948785781860352, 0.52113485336303711]
2.39478683834 2.01748351664 1.01305257092 2.34068341135 1.97190418975 0.990165516871
n = 9 27 18
lista [1.511646032333374, 1.5133969783782959, 1.5639569759368896]
listb [1.2461750507354736, 1.254518985748291, 1.2613379955291748]
lcsa [0.5565330982208252, 0.56119203567504883, 0.56451296806335449]
seta [0.5966339111328125, 0.60275578498840332, 0.64791703224182129]
setb [0.54694414138793945, 0.5508568286895752, 0.55375313758850098]
2.53362406013 2.08867620074 0.932788243907 2.76380331728 2.27843203069 1.01753187594
n = 10 30 20
lista [1.7777848243713379, 2.1453688144683838, 2.4085969924926758]
listb [1.5070111751556396, 1.5202279090881348, 1.5779800415039062]
lcsa [0.5954139232635498, 0.59703707695007324, 0.60746097564697266]
seta [0.61563014984130859, 0.62125110626220703, 0.62354087829589844]
setb [0.56723213195800781, 0.57257509231567383, 0.57460403442382812]
2.88774814689 2.44791645689 0.967161734066 3.13413984189 2.6567803378 1.04968299523
Generated using a 2GHz single core machine with 2GB of RAM running Python 2.6.6 on a Debian flavour of Linux (with Firefox running in the background).
These figures are only a rough guide, since the actual speeds of the various algorithms are affected differently by the proportion of elements that are in both source lists.

A functional way can be achieved using filter and lambda operator.
list1 = [1,2,3,4,5,6]
list2 = [2,4,6,9,10]
>>> list(filter(lambda x:x in list1, list2))
[2, 4, 6]
Edit: It filters out x that exists in both list1 and list, set difference can also be achieved using:
>>> list(filter(lambda x:x not in list1, list2))
[9,10]
Edit2: python3 filter returns a filter object, encapsulating it with list returns the output list.

a = [1,2,3,4,5]
b = [1,3,5,6]
c = list(set(a).intersection(set(b)))
Should work like a dream. And, if you can, use sets instead of lists to avoid all this type changing!

You can also use numpy.intersect1d(ar1, ar2).
It return the unique and sorted values that are in both of two arrays.

This way you get the intersection of two lists and also get the common duplicates.
>>> from collections import Counter
>>> a = Counter([1,2,3,4,5])
>>> b = Counter([1,3,5,6])
>>> a &= b
>>> list(a.elements())
[1, 3, 5]

This is an example when you need Each element in the result should appear as many times as it shows in both arrays.
def intersection(nums1, nums2):
#example:
#nums1 = [1,2,2,1]
#nums2 = [2,2]
#output = [2,2]
#find first 2 and remove from target, continue iterating
target, iterate = [nums1, nums2] if len(nums2) >= len(nums1) else [nums2, nums1] #iterate will look into target
if len(target) == 0:
return []
i = 0
store = []
while i < len(iterate):
element = iterate[i]
if element in target:
store.append(element)
target.remove(element)
i += 1
return store

In the case you have a list of lists map comes handy:
>>> lists = [[1, 2, 3], [2, 3, 4], [2, 3, 5]]
>>> set(lists.pop()).intersection(*map(set, lists))
{2, 3}
would work for similar iterables:
>>> lists = ['ash', 'nazg']
>>> set(lists.pop()).intersection(*map(set, lists))
{'a'}
pop will blow if the list is empty so you may want to wrap in a function:
def intersect_lists(lists):
try:
return set(lists.pop()).intersection(*map(set, lists))
except IndexError: # pop from empty list
return set()

It might be late but I just thought I should share for the case where you are required to do it manually (show working - haha) OR when you need all elements to appear as many times as possible or when you also need it to be unique.
Kindly note that tests have also been written for it.
from nose.tools import assert_equal
'''
Given two lists, print out the list of overlapping elements
'''
def overlap(l_a, l_b):
'''
compare the two lists l_a and l_b and return the overlapping
elements (intersecting) between the two
'''
#edge case is when they are the same lists
if l_a == l_b:
return [] #no overlapping elements
output = []
if len(l_a) == len(l_b):
for i in range(l_a): #same length so either one applies
if l_a[i] in l_b:
output.append(l_a[i])
#found all by now
#return output #if repetition does not matter
return list(set(output))
else:
#find the smallest and largest lists and go with that
sm = l_a if len(l_a) len(l_b) else l_b
for i in range(len(sm)):
if sm[i] in lg:
output.append(sm[i])
#return output #if repetition does not matter
return list(set(output))
## Test the Above Implementation
a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
exp = [1, 2, 3, 5, 8, 13]
c = [4, 4, 5, 6]
d = [5, 7, 4, 8 ,6 ] #assuming it is not ordered
exp2 = [4, 5, 6]
class TestOverlap(object):
def test(self, sol):
t = sol(a, b)
assert_equal(t, exp)
print('Comparing the two lists produces')
print(t)
t = sol(c, d)
assert_equal(t, exp2)
print('Comparing the two lists produces')
print(t)
print('All Tests Passed!!')
t = TestOverlap()
t.test(overlap)

Most of the solutions here don't take the order of elements in the list into account and treat lists like sets. If on the other hand you want to find one of the longest subsequences contained in both lists, you can try the following code.
def intersect(a, b):
if a == [] or b == []:
return []
inter_1 = intersect(a[1:], b)
if a[0] in b:
idx = b.index(a[0])
inter_2 = [a[0]] + intersect(a[1:], b[idx+1:])
if len(inter_1) <= len(inter_2):
return inter_2
return inter_1
For a=[1,2,3] and b=[3,1,4,2] this returns [1,2] rather than [1,2,3]. Note that such a subsequence is not unique as [1], [2], [3] are all solutions for a=[1,2,3] and b=[3,2,1].

If, by Boolean AND, you mean items that appear in both lists, e.g. intersection, then you should look at Python's set and frozenset types.

You can also use a counter! It doesn't preserve the order, but it'll consider the duplicates:
>>> from collections import Counter
>>> a = [1,2,3,4,5]
>>> b = [1,3,5,6]
>>> d1, d2 = Counter(a), Counter(b)
>>> c = [n for n in d1.keys() & d2.keys() for _ in range(min(d1[n], d2[n]))]
>>> print(c)
[1,3,5]

when we used tuple and we want to intersection
a=([1,2,3,4,5,20], [8,3,9,5,1,4,20])
for i in range(len(a)):
b=set(a[i-1]).intersection(a[i])
print(b)
{1, 3, 4, 5, 20}

Related

Remove multiple objects in array with for loop in Python [duplicate]

a = [1,2,3,4,5]
b = [1,3,5,6]
c = a and b
print c
actual output: [1,3,5,6]
expected output: [1,3,5]
How can we achieve a boolean AND operation (list intersection) on two lists?

If order is not important and you don't need to worry about duplicates then you can use set intersection:
>>> a = [1,2,3,4,5]
>>> b = [1,3,5,6]
>>> list(set(a) & set(b))
[1, 3, 5]

Using list comprehensions is a pretty obvious one for me. Not sure about performance, but at least things stay lists.
[x for x in a if x in b]
Or "all the x values that are in A, if the X value is in B".

If you convert the larger of the two lists into a set, you can get the intersection of that set with any iterable using intersection():
a = [1,2,3,4,5]
b = [1,3,5,6]
set(a).intersection(b)

Make a set out of the larger one:
_auxset = set(a)
Then,
c = [x for x in b if x in _auxset]
will do what you want (preserving b's ordering, not a's -- can't necessarily preserve both) and do it fast. (Using if x in a as the condition in the list comprehension would also work, and avoid the need to build _auxset, but unfortunately for lists of substantial length it would be a lot slower).
If you want the result to be sorted, rather than preserve either list's ordering, an even neater way might be:
c = sorted(set(a).intersection(b))

Here's some Python 2 / Python 3 code that generates timing information for both list-based and set-based methods of finding the intersection of two lists.
The pure list comprehension algorithms are O(n^2), since in on a list is a linear search. The set-based algorithms are O(n), since set search is O(1), and set creation is O(n) (and converting a set to a list is also O(n)). So for sufficiently large n the set-based algorithms are faster, but for small n the overheads of creating the set(s) make them slower than the pure list comp algorithms.
#!/usr/bin/env python
''' Time list- vs set-based list intersection
See http://stackoverflow.com/q/3697432/4014959
Written by PM 2Ring 2015.10.16
'''
from __future__ import print_function, division
from timeit import Timer
setup = 'from __main__ import a, b'
cmd_lista = '[u for u in a if u in b]'
cmd_listb = '[u for u in b if u in a]'
cmd_lcsa = 'sa=set(a);[u for u in b if u in sa]'
cmd_seta = 'list(set(a).intersection(b))'
cmd_setb = 'list(set(b).intersection(a))'
reps = 3
loops = 50000
def do_timing(heading, cmd, setup):
t = Timer(cmd, setup)
r = t.repeat(reps, loops)
r.sort()
print(heading, r)
return r[0]
m = 10
nums = list(range(6 * m))
for n in range(1, m + 1):
a = nums[:6*n:2]
b = nums[:6*n:3]
print('\nn =', n, len(a), len(b))
#print('\nn = %d\n%s %d\n%s %d' % (n, a, len(a), b, len(b)))
la = do_timing('lista', cmd_lista, setup)
lb = do_timing('listb', cmd_listb, setup)
lc = do_timing('lcsa ', cmd_lcsa, setup)
sa = do_timing('seta ', cmd_seta, setup)
sb = do_timing('setb ', cmd_setb, setup)
print(la/sa, lb/sa, lc/sa, la/sb, lb/sb, lc/sb)
output
n = 1 3 2
lista [0.082171916961669922, 0.082588911056518555, 0.0898590087890625]
listb [0.069530963897705078, 0.070394992828369141, 0.075379848480224609]
lcsa [0.11858987808227539, 0.1188349723815918, 0.12825107574462891]
seta [0.26900982856750488, 0.26902294158935547, 0.27298116683959961]
setb [0.27218389511108398, 0.27459001541137695, 0.34307217597961426]
0.305460649521 0.258469975867 0.440838458259 0.301898526833 0.255455833892 0.435697630214
n = 2 6 4
lista [0.15915989875793457, 0.16000485420227051, 0.16551494598388672]
listb [0.13000702857971191, 0.13060092926025391, 0.13543915748596191]
lcsa [0.18650484085083008, 0.18742108345031738, 0.19513416290283203]
seta [0.33592700958251953, 0.34001994132995605, 0.34146714210510254]
setb [0.29436492919921875, 0.2953648567199707, 0.30039691925048828]
0.473793098554 0.387009751735 0.555194537893 0.540689066428 0.441652573672 0.633583767462
n = 3 9 6
lista [0.27657914161682129, 0.28098297119140625, 0.28311991691589355]
listb [0.21585917472839355, 0.21679902076721191, 0.22272896766662598]
lcsa [0.22559309005737305, 0.2271728515625, 0.2323150634765625]
seta [0.36382699012756348, 0.36453008651733398, 0.36750602722167969]
setb [0.34979605674743652, 0.35533690452575684, 0.36164689064025879]
0.760194128313 0.59330170819 0.62005595016 0.790686848184 0.61710008036 0.644927481902
n = 4 12 8
lista [0.39616990089416504, 0.39746403694152832, 0.41129183769226074]
listb [0.33485794067382812, 0.33914685249328613, 0.37850618362426758]
lcsa [0.27405810356140137, 0.2745978832244873, 0.28249192237854004]
seta [0.39211201667785645, 0.39234519004821777, 0.39317893981933594]
setb [0.36988520622253418, 0.37011313438415527, 0.37571001052856445]
1.01034878821 0.85398540833 0.698928091731 1.07106176249 0.905302334456 0.740927452493
n = 5 15 10
lista [0.56792402267456055, 0.57422614097595215, 0.57740211486816406]
listb [0.47309303283691406, 0.47619009017944336, 0.47628307342529297]
lcsa [0.32805585861206055, 0.32813096046447754, 0.3349759578704834]
seta [0.40036201477050781, 0.40322518348693848, 0.40548801422119141]
setb [0.39103078842163086, 0.39722800254821777, 0.43811702728271484]
1.41852623806 1.18166313332 0.819398061028 1.45237674242 1.20986133789 0.838951479847
n = 6 18 12
lista [0.77897095680236816, 0.78187918663024902, 0.78467702865600586]
listb [0.629547119140625, 0.63210701942443848, 0.63321495056152344]
lcsa [0.36563992500305176, 0.36638498306274414, 0.38175487518310547]
seta [0.46695613861083984, 0.46992206573486328, 0.47583580017089844]
setb [0.47616910934448242, 0.47661614418029785, 0.4850609302520752]
1.66818870637 1.34819326075 0.783028414812 1.63591241329 1.32210827369 0.767878297495
n = 7 21 14
lista [0.9703209400177002, 0.9734041690826416, 1.0182771682739258]
listb [0.82394003868103027, 0.82625699043273926, 0.82796716690063477]
lcsa [0.40975093841552734, 0.41210508346557617, 0.42286920547485352]
seta [0.5086359977722168, 0.50968098640441895, 0.51014018058776855]
setb [0.48688101768493652, 0.4879908561706543, 0.49204087257385254]
1.90769222837 1.61990115188 0.805587768483 1.99293236904 1.69228211566 0.841583309951
n = 8 24 16
lista [1.204819917678833, 1.2206029891967773, 1.258256196975708]
listb [1.014998197555542, 1.0206191539764404, 1.0343101024627686]
lcsa [0.50966787338256836, 0.51018595695495605, 0.51319599151611328]
seta [0.50310111045837402, 0.50556015968322754, 0.51335406303405762]
setb [0.51472997665405273, 0.51948785781860352, 0.52113485336303711]
2.39478683834 2.01748351664 1.01305257092 2.34068341135 1.97190418975 0.990165516871
n = 9 27 18
lista [1.511646032333374, 1.5133969783782959, 1.5639569759368896]
listb [1.2461750507354736, 1.254518985748291, 1.2613379955291748]
lcsa [0.5565330982208252, 0.56119203567504883, 0.56451296806335449]
seta [0.5966339111328125, 0.60275578498840332, 0.64791703224182129]
setb [0.54694414138793945, 0.5508568286895752, 0.55375313758850098]
2.53362406013 2.08867620074 0.932788243907 2.76380331728 2.27843203069 1.01753187594
n = 10 30 20
lista [1.7777848243713379, 2.1453688144683838, 2.4085969924926758]
listb [1.5070111751556396, 1.5202279090881348, 1.5779800415039062]
lcsa [0.5954139232635498, 0.59703707695007324, 0.60746097564697266]
seta [0.61563014984130859, 0.62125110626220703, 0.62354087829589844]
setb [0.56723213195800781, 0.57257509231567383, 0.57460403442382812]
2.88774814689 2.44791645689 0.967161734066 3.13413984189 2.6567803378 1.04968299523
Generated using a 2GHz single core machine with 2GB of RAM running Python 2.6.6 on a Debian flavour of Linux (with Firefox running in the background).
These figures are only a rough guide, since the actual speeds of the various algorithms are affected differently by the proportion of elements that are in both source lists.

A functional way can be achieved using filter and lambda operator.
list1 = [1,2,3,4,5,6]
list2 = [2,4,6,9,10]
>>> list(filter(lambda x:x in list1, list2))
[2, 4, 6]
Edit: It filters out x that exists in both list1 and list, set difference can also be achieved using:
>>> list(filter(lambda x:x not in list1, list2))
[9,10]
Edit2: python3 filter returns a filter object, encapsulating it with list returns the output list.

a = [1,2,3,4,5]
b = [1,3,5,6]
c = list(set(a).intersection(set(b)))
Should work like a dream. And, if you can, use sets instead of lists to avoid all this type changing!

You can also use numpy.intersect1d(ar1, ar2).
It return the unique and sorted values that are in both of two arrays.

This way you get the intersection of two lists and also get the common duplicates.
>>> from collections import Counter
>>> a = Counter([1,2,3,4,5])
>>> b = Counter([1,3,5,6])
>>> a &= b
>>> list(a.elements())
[1, 3, 5]

This is an example when you need Each element in the result should appear as many times as it shows in both arrays.
def intersection(nums1, nums2):
#example:
#nums1 = [1,2,2,1]
#nums2 = [2,2]
#output = [2,2]
#find first 2 and remove from target, continue iterating
target, iterate = [nums1, nums2] if len(nums2) >= len(nums1) else [nums2, nums1] #iterate will look into target
if len(target) == 0:
return []
i = 0
store = []
while i < len(iterate):
element = iterate[i]
if element in target:
store.append(element)
target.remove(element)
i += 1
return store

In the case you have a list of lists map comes handy:
>>> lists = [[1, 2, 3], [2, 3, 4], [2, 3, 5]]
>>> set(lists.pop()).intersection(*map(set, lists))
{2, 3}
would work for similar iterables:
>>> lists = ['ash', 'nazg']
>>> set(lists.pop()).intersection(*map(set, lists))
{'a'}
pop will blow if the list is empty so you may want to wrap in a function:
def intersect_lists(lists):
try:
return set(lists.pop()).intersection(*map(set, lists))
except IndexError: # pop from empty list
return set()

It might be late but I just thought I should share for the case where you are required to do it manually (show working - haha) OR when you need all elements to appear as many times as possible or when you also need it to be unique.
Kindly note that tests have also been written for it.
from nose.tools import assert_equal
'''
Given two lists, print out the list of overlapping elements
'''
def overlap(l_a, l_b):
'''
compare the two lists l_a and l_b and return the overlapping
elements (intersecting) between the two
'''
#edge case is when they are the same lists
if l_a == l_b:
return [] #no overlapping elements
output = []
if len(l_a) == len(l_b):
for i in range(l_a): #same length so either one applies
if l_a[i] in l_b:
output.append(l_a[i])
#found all by now
#return output #if repetition does not matter
return list(set(output))
else:
#find the smallest and largest lists and go with that
sm = l_a if len(l_a) len(l_b) else l_b
for i in range(len(sm)):
if sm[i] in lg:
output.append(sm[i])
#return output #if repetition does not matter
return list(set(output))
## Test the Above Implementation
a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
exp = [1, 2, 3, 5, 8, 13]
c = [4, 4, 5, 6]
d = [5, 7, 4, 8 ,6 ] #assuming it is not ordered
exp2 = [4, 5, 6]
class TestOverlap(object):
def test(self, sol):
t = sol(a, b)
assert_equal(t, exp)
print('Comparing the two lists produces')
print(t)
t = sol(c, d)
assert_equal(t, exp2)
print('Comparing the two lists produces')
print(t)
print('All Tests Passed!!')
t = TestOverlap()
t.test(overlap)

Most of the solutions here don't take the order of elements in the list into account and treat lists like sets. If on the other hand you want to find one of the longest subsequences contained in both lists, you can try the following code.
def intersect(a, b):
if a == [] or b == []:
return []
inter_1 = intersect(a[1:], b)
if a[0] in b:
idx = b.index(a[0])
inter_2 = [a[0]] + intersect(a[1:], b[idx+1:])
if len(inter_1) <= len(inter_2):
return inter_2
return inter_1
For a=[1,2,3] and b=[3,1,4,2] this returns [1,2] rather than [1,2,3]. Note that such a subsequence is not unique as [1], [2], [3] are all solutions for a=[1,2,3] and b=[3,2,1].

If, by Boolean AND, you mean items that appear in both lists, e.g. intersection, then you should look at Python's set and frozenset types.

You can also use a counter! It doesn't preserve the order, but it'll consider the duplicates:
>>> from collections import Counter
>>> a = [1,2,3,4,5]
>>> b = [1,3,5,6]
>>> d1, d2 = Counter(a), Counter(b)
>>> c = [n for n in d1.keys() & d2.keys() for _ in range(min(d1[n], d2[n]))]
>>> print(c)
[1,3,5]

when we used tuple and we want to intersection
a=([1,2,3,4,5,20], [8,3,9,5,1,4,20])
for i in range(len(a)):
b=set(a[i-1]).intersection(a[i])
print(b)
{1, 3, 4, 5, 20}

remove pairs of equal elements from two lists in sympy

I am quite new to Sympy (running with Python 3).
Given two lists of integers, in general not of equal lengths and whose elements might repeat, I need to eliminate all pairs of equal elements between the two lists.
For example, given
List_a=[1, 2, 3, 4, 4], List_b=[0, 2, 2, 4]
I shall eliminate the pairs (2,2) and (4,4) and output
List_ar=[1,3,4], List_br=[0,2]
If the two lists are equal I shall get two empty lists as output.
I have been trying to compare the lists element by element ( "for" and "while" loops) and when found equal to delete the pair from both lists. Upon that to repeat the procedure on the reduced lists till no deletion is made.
But lacking of a "goto" control I do not know how to handle the variable length lists.
Thanks for helping.

I you convert them to multiset you can compare the counts of common keys:
>>> from sympy.utilities.iterables import multiset
>>> ma = multiset(lista)
>>> mb = multiset(listb)
>>> for k in set(ma) & set(mb):
... n = min(ma[k],mb[k])
... ma[k] -= n
... mb[k] -= n
...
>>> lista = [i for k in ma for i in [k]*ma[k]]
>>> listb = [i for k in mb for i in [k]*mb[k]]
You could also treat this like a merge sort but the above is direct.

A general Python solution:
def g(a, b):
def ensure2(xs):
ys = [list(x) for x in xs]
if ys == []:
return [[], []]
else:
return ys
n = min(len(a), len(b))
c, d = ensure2(
zip(*
filter(lambda x: x[0] != x[1],
zip(a, b))))
return c + a[n:], d + b[n:]
a = [1,2,3,4,4]
b = [0,2,2,4]
c = [0,2,2,4]
d = []
# two different lists
ar, br = g(a, b)
print(ar) # [1, 3, 4]
print(br) # [0, 2]
# two identical lists
br, cr = g(b, c)
print(br) # []
print(cr) # []
# one empty list
ar, dr = g(a, d)
print(ar) # [1, 2, 3, 4, 4]
print(dr) # []
Use zip() to create the pairs
Use filter() to remove pairs with equal elements
Use zip(*...) to split the remaining pairs back to two lists
Use ensure2() to guard against empty lists
Use + list[n:] to append excessive elements back to the originally longer list

Cumulative sum in python giving different results when summing with a for loop

Ok let's try this again. I have 1 set of data. I want to make 2 copies, and then sort the copies in descending order based on different columns. Then I want to get the cumulative sum of the respective columns. When I run the following code I get different results for the two instances I call on print (setA[x][2]).
set = [[2,2,0],[1,3,0],[3,1,0]]
def getkey_setA (item):
return item[0]
setA = sorted(set, key=getkey_setA, reverse=True)
def getkey_setB (item):
return item[1]
setB = sorted(set, key=getkey_setB, reverse=True)
setA[0][2] = setA[0][0]
setB[0][2] = setB[0][1]
for x in range(1, 3):
setA[x][2] = setA[x-1][2] + setA[x][0]
print(setA[x][2])
for x in range(1, 3):
setB[x][2] = setB[x-1][2] + setB[x][1]
for x in range(1, 3):
print (setA[x][2])
This produces:
5
6
8
6
but I expected it to produce
5
6
5
6
instead.

sorted() creates a shallow copy of the sequence being sorted. This means that your nested lists are not copied, they are merely referenced:
>>> set = [[2,2,0],[1,3,0],[3,1,0]]
>>> setA = sorted(set, key=getkey_setA, reverse=True)
>>> setB = sorted(set, key=getkey_setB, reverse=True)
>>> setA[0] is set[2]
True
>>> setB[2] is set[2]
True
>>> setA[0] is setB[2]
True
So the last element in set is exactly the same object as setA[0] and setB[2]. Making changes to any one of those references is reflected in the others:
>>> setA[0][2]
0
>>> setA[0][2] = 42
>>> setB[2]
[3, 1, 42]
>>> set[2]
[3, 1, 42]
This is why the set object (from which you produced your sorted setA and setB lists) is also changed after running your code:
>>> set
[[2, 2, 8], [1, 3, 6], [3, 1, 9]]
You need to create a proper copy of the nested lists; you could use the copy.deepcopy() function to create a recursive copy of the list objects, or you could use a generator expression when sorting:
setA = sorted((subl[:] for subl in set), key=getkey_setA, reverse=True)
setB = sorted((subl[:] for subl in set), key=getkey_setB, reverse=True)
This shallowly copies the nested lists; this is fine because those nested lists only contain immutable objects themselves.

What is the fastest way to merge two lists in python?

Given,
list_1 = [1,2,3,4]
list_2 = [5,6,7,8]
What is the fastest way to achieve the following in python?
list = [1,2,3,4,5,6,7,8]
Please note that there can be many ways to merge two lists in python.
I am looking for the most time-efficient way.
I tried the following and here is my understanding.
CODE
import time
c = list(range(1,10000000))
c_n = list(range(10000000, 20000000))
start = time.time()
c = c+c_n
print len(c)
print time.time() - start
c = list(range(1,10000000))
start = time.time()
for i in c_n:
c.append(i)
print len(c)
print time.time() - start
c = list(range(1,10000000))
start = time.time()
c.extend(c_n)
print len(c)
print time.time() - start
OUTPUT
19999999
0.125061035156
19999999
1.02858018875
19999999
0.03928399086
So, if someone does not bother reusing list_1/list_2 in the question then extend is the way to go. On the other hand, "+" is the fastest way.
I am not sure about other options though.

You can just use concatenation:
list = list_1 + list_2
If you don't need to keep list_1 around, you can just modify it:
list_1.extend(list_2)

I tested out several ways to merge two lists (see below) and came up with the following order after running each several times to normalize the cache changes (which make about a 15% difference).
import time
c = list(range(1,10000000))
c_n = list(range(10000000, 20000000))
start = time.time()
*insert method here*
print (time.time()-start)
Method 1: c.extend(c_n)
Representative result: 0.11861872673034668
Method 2: c += c_n
Representative result: 0.10558319091796875
Method 3: c = c + c_n
Representative result: 0.25804924964904785
Method 4: c = [*c, *c_n]
Representative result: 0.22019600868225098
Conclusion
Use += or .extend() if you want to merge in place. They are significantly faster.

If you are using python 3, there is one more way to do this and a little bit faster (tested only on python 3.7)
[*list1, *list2]
Benchmark
from timeit import timeit
x = list(range(10000))
y = list(x)
def one():
x + y
def two():
[*x, *y]
print(timeit(one, number=1000, globals={'x':x, 'y': y}))
print(timeit(two, number=1000, globals={'x':x, 'y': y}))
0.10456193100253586
0.09631731400440913

list_1 + list_2 does it. Example -
>>> list_1 = [1,2,3,4]
>>> list_2 = [5,6,7,8]
>>> list_1 + list_2
[1, 2, 3, 4, 5, 6, 7, 8]

a=[1,2,3]
b=[4,5,6]
c=a+b
print(c)
OUTPUT:
>>> [1, 2, 3, 4, 5, 6]
In the above code "+" operator is used to concatenate the 2 lists into a single list.
ANOTHER SOLUTION:
a=[1,2,3]
b=[4,5,6]
c=[] #Empty list in which we are going to append the values of list (a) and (b)
for i in a:
c.append(i)
for j in b:
c.append(j)
print(c)
OUTPUT:
>>> [1, 2, 3, 4, 5, 6]

How to find elements existing in two lists but with different indexes

I have two lists of the same length which contains a variety of different elements. I'm trying to compare them to find the number of elements which exist in both lists, but have different indexes.
Here are some example inputs/outputs to demonstrate what I mean:
>>> compare([1, 2, 3, 4], [4, 3, 2, 1])
4
>>> compare([1, 2, 3], [1, 2, 3])
0
# Each item in the first list has the same index in the other
>>> compare([1, 2, 4, 4], [1, 4, 4, 2])
2
# The 3rd '4' in both lists don't count, since they have the same indexes
>>> compare([1, 2, 3, 3], [5, 3, 5, 5])
1
# Duplicates don't count
The lists are always the same size.
This is the algorithm I have so far:
def compare(list1, list2):
# Eliminate any direct matches
list1 = [a for (a, b) in zip(list1, list2) if a != b]
list2 = [b for (a, b) in zip(list1, list2) if a != b]
out = 0
for possible in list1:
if possible in list2:
index = list2.index(possible)
del list2[index]
out += 1
return out
Is there a more concise and eloquent way to do the same thing?

This python function does hold for the examples you provided:
def compare(list1, list2):
D = {e:i for i, e in enumerate(list1)}
return len(set(e for i, e in enumerate(list2) if D.get(e) not in (None, i)))

since duplicates don't count, you can use sets to find only the elements in each list. A set only holds unique elements. Then select only the elements shared between both using list.index
def compare(l1, l2):
s1, s2 = set(l1), set(l2)
shared = s1 & s2 # intersection, only the elements in both
return len([e for e in shared if l1.index(e) != l2.index(e)])
You can actually bring this down to a one-liner if you want
def compare(l1, l2):
return len([e for e in set(l1) & set(l2) if l1.index(e) != l2.index(e)])
Alternative:
Functionally you can use the reduce builtin (in python3, you have to do from functools import reduce first). This avoids construction of the list which saves excess memory usage. It uses a lambda function to do the work.
def compare(l1, l2):
return reduce(lambda acc, e: acc + int(l1.index(e) != l2.index(e)),
set(l1) & set(l2), 0)
A brief explanation:
reduce is a functional programming contruct that reduces an iterable to a single item traditionally. Here we use reduce to reduce the set intersection to a single value.
lambda functions are anonymous functions. Saying lambda x, y: x + 1 is like saying def func(x, y): return x + y except that the function has no name. reduce takes a function as its first argument. The first argument a the lambda receives when used with reduce is the result of the previous function, the accumulator.
set(l1) & set(l2) is a set consisting of unique elements that are in both l1 and l2. It is iterated over, and each element is taken out one at a time and used as the second argument to the lambda function.
0 is the initial value for the accumulator. We use this since we assume there are 0 shared elements with different indices to start.

I dont claim it is the simplest answer, but it is a one-liner.
import numpy as np
import itertools
l1 = [1, 2, 3, 4]
l2 = [1, 3, 2, 4]
print len(np.unique(list(itertools.chain.from_iterable([[a,b] for a,b in zip(l1,l2) if a!= b]))))
I explain:
[[a,b] for a,b in zip(l1,l2) if a!= b]
is the list of couples from zip(l1,l2) with different items. Number of elements in this list is number of positions where items at same position differ between the two lists.
Then, list(itertools.chain.from_iterable() is for merging component lists of a list. For instance :
>>> list(itertools.chain.from_iterable([[3,2,5],[5,6],[7,5,3,1]]))
[3, 2, 5, 5, 6, 7, 5, 3, 1]
Then, discard duplicates with np.unique(), and take len().

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Accessing Panda dataframe columns [duplicate] - python

a = [1,2,3,4,5] b = [1,3,5,6] c = a and b print c actual output: [1,3,5,6] expected output: [1,3,5] How can we achieve a boolean AND operation (list intersection) on two lists?

If order is not important and you don't need to worry about duplicates then you can use set intersection: >>> a = [1,2,3,4,5] >>> b = [1,3,5,6] >>> list(set(a) & set(b)) [1, 3, 5]

Using list comprehensions is a pretty obvious one for me. Not sure about performance, but at least things stay lists. [x for x in a if x in b] Or "all the x values that are in A, if the X value is in B".

If you convert the larger of the two lists into a set, you can get the intersection of that set with any iterable using intersection(): a = [1,2,3,4,5] b = [1,3,5,6] set(a).intersection(b)

a = [1,2,3,4,5] b = [1,3,5,6] c = list(set(a).intersection(set(b))) Should work like a dream. And, if you can, use sets instead of lists to avoid all this type changing!

You can also use numpy.intersect1d(ar1, ar2). It return the unique and sorted values that are in both of two arrays.

This way you get the intersection of two lists and also get the common duplicates. >>> from collections import Counter >>> a = Counter([1,2,3,4,5]) >>> b = Counter([1,3,5,6]) >>> a &= b >>> list(a.elements()) [1, 3, 5]

If, by Boolean AND, you mean items that appear in both lists, e.g. intersection, then you should look at Python's set and frozenset types.

when we used tuple and we want to intersection a=([1,2,3,4,5,20], [8,3,9,5,1,4,20]) for i in range(len(a)): b=set(a[i-1]).intersection(a[i]) print(b) {1, 3, 4, 5, 20}

Related

Remove multiple objects in array with for loop in Python [duplicate]

remove pairs of equal elements from two lists in sympy

Cumulative sum in python giving different results when summing with a for loop

What is the fastest way to merge two lists in python?

How to find elements existing in two lists but with different indexes

Categories

Resources