Suppose for example you have the list
a = [['hand', 'head'], ['phone', 'wallet'], ['lost', 'stock']]
and another list
b = ['phone', 'lost']
And you want to find a list c, that contains the indices of the rows in a (thinking of a as a 2D matrix) whose first column is a value in b. So in this case
c = [1, 2]
I tried to use the following list comprehensions
c = [i if a[i][0] in b for i in range(0, 1)]
c = [i if a[i][0] in b]
But both of these were invalid syntax.
Use enumerate():
c = [i for i, v in enumerate(a) if v[0] in b]
enumerate() gives you both the index and the value of the iterable you pass in. Note that the if test goes at the end; list comprehensions should be written in the same order that you would use when nesting loops:
c = []
for i, v in enumerate(a):
if v[0] in b:
c.append(i)
You really want to make b a set:
b = set(b)
to make membership testing a O(1) constant time operation as opposed to a O(n) linear time test against a list.
Demo:
>>> a = [['hand', 'head'], ['phone', 'wallet'], ['lost', 'stock']]
>>> b = {'phone', 'lost'} # set literal
>>> [i for i, v in enumerate(a) if v[0] in b]
[1, 2]
First the array the start from 0.
So c must be:
c=[1,2]
if you need to do it with a list compression the solution can be:
c=[pos for pos, val_a in enumerate(a) for val_b_to_check in val_a if val_b_to_check in b ]
You can use Numpy to do this as well:
>>> import numpy as np
>>> a = np.array([['hand', 'head'], ['phone', 'wallet'], ['lost', 'stock']])
>>> b = np.array(['phone', 'lost'])
>>> np.in1d(a[:,0],b)
array([False, True, True], dtype=bool)
Or, if you want the indices:
>>> np.where(np.in1d(a[:,0],b))[0].tolist()
[1, 2]
Related
Edit: Removed undefined variable.
So my code is basically, trying to compare if a value of one list is present in another. If so append the value to 3rd list. If the value is not present, then append to 4th list. What is the most efficient and readable way to do this task. Example of my code:
a = [1,2,3]
b = [2,3,4,5,6,7]
c = []
d = []
for ele in a:
if ele in b:
c.append(ele )
else:
d.append(ele)
a=[2,3,4,5]
b=[3,5,7,9]
c = [value for value in a if value in b]
d = [value for value in a if value not in b]
print(f'Present in B: {c}')
print(f"Not present in B: {d}")
c = [i for i in a if i in b]
d = [i for i in a if i not in b]
The best way to solve this is by using sets.
import random
a = [random.randint(1, 15) for _ in range(5)]
b = [random.randint(1, 15) for _ in range(7)]
print(a)
print(b)
set_a = set(a)
set_b = set(b)
set_intersection = set_a.intersection(set_b)
set_diff = set_a.difference(set_b)
print(list(set_intersection))
print(list(set_diff))
I have N lists, and would like to know which elements are present in strictly X of those lists. I understand that if I have two lists, it's rather straightforward:
lst_a = [1,2,3]
lst_b = [1,2,5]
overlap = list(set(a) & set(b))
What if I have, say, 5 lists, and want to know which elements are in strictly 4 of those?
Merge using counters:
from collections import Counter
lst_a = [1,2,3]
lst_b = [1,2,5]
lsts = [lst_a, lst_b]
counter = Counter()
for lst in lsts:
unique = set(lst)
counter += Counter(unique)
n = 2
print(f"elements in exactly {n} lsts:")
for k, v in counter.items():
if v == n:
print(k)
Similar to #wim's code, but in a more concise manner:
[i for i, c in sum(map(Counter, lsts), Counter()).items() if c == 2]
If the items in the input lists are not unnecessarily unique you can map the lists to sets first:
[i for i, c in sum(map(Counter, map(set, lsts)), Counter()).items() if c == 2]
This returns:
[1, 2]
Say I have a list of lists like so:
list = [[1,2,3,4],[4,5,3,2],[7,8,9,2],[5,6,8,9]]
I want to get the indices of the inner lists that contain unique elements. For the example above, the lists at index 2 is the only one that contains 7 and the list at index 3 is the only one that contains 6.
How would one go about implementing this in python?
Here's a solution using Counter. Each inner list is checked for a value that only has a single count, and then the corresponding index is printed (a la enumerate).
from collections import Counter
from itertools import chain
c = Counter(chain.from_iterable(l))
idx = [i for i, x in enumerate(l) if any(c[y] == 1 for y in x)]
print(idx)
[0, 2, 3]
A possible optimisation might include precomputing unique elements in a set to replace the any call with a set.intersection.
c = Counter(chain.from_iterable(l))
u = {k for k in c if c[k] == 1}
idx = [i for i, x in enumerate(l) if u.intersection(x)]
A naive solution:
>>> from collections import Counter
>>> from itertools import chain
>>> my_list = [[1,2,3,4],[4,5,3,2],[7,8,9,2],[5,6,8,9]]
# find out the counts.
>>> counter = Counter(chain(*my_list))
# find the unique numbers
>>> uniques = [element for element,count in counter.items() if count==1]
# find the index of those unique numbers
>>> result = [indx for indx,elements in enumerate(my_list) for e in uniques if e in elements]
>>> result
[0, 2, 3]
using itertools.chain with set.difference(set)
from itertools import chain
l = [[1,2,3,4],[4,5,3,2],[7,8,9,2],[5,6,8,9]]
[i for i in range(len(l)) if set(l[i]).difference(set(chain(*[j for j in l if j!=l[i]])))]
#[0, 2, 3]
How can I use boolean inddex arrays to filter a list without using numpy?
For example:
>>> l = ['a','b','c']
>>> b = [True,False,False]
>>> l[b]
The result should be:
['a']
I know numpy support it but want to know how to solve in Python.
>>> import numpy as np
>>> l = np.array(['a','b','c'])
>>> b = np.array([True,False,False])
>>> l[b]
array(['a'],
dtype='|S1')
Python does not support boolean indexing but the itertools.compress function does exactly what you want. It return an iterator with means you need to use the list constructor to return a list.
>>> from itertools import compress
>>> l = ['a', 'b', 'c']
>>> b = [True, False, False]
>>> list(compress(l, b))
['a']
[a for a, t in zip(l, b) if t]
# => ["a"]
A bit more efficient, use iterator version:
from itertools import izip
[a for a, t in izip(l, b) if t]
# => ["a"]
EDIT: user3100115's version is nicer.
Using enumerate
l = ['a','b','c']
b = [True,False,False]
res = [item for i, item in enumerate(l) if b[i]]
print(res)
gives
['a']
Let's say I have two lists of strings:
a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
where #### represents 4-digit random number. And
b = ['boo', 'aaa', 'bii']
I need to know which string entry in list a contains any given entry in b. I was able to accomplish this by couple of nested loops and then using the in operator for checking the string contains the current entry in b. But, being relatively new to py, I'm almost positive this was not the most pythonic or elegant way to write it. So, is there such idiom to reduce my solution?
The following code gives you an array with the indexes of a where the part after the slash is an element from b.
a_sep = [x.split('/')[1] for x in a]
idxs = [i for i, x in enumerate(a_sep) if x in b]
To improve performance, make b a set instead of a list.
Demo:
>>> a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
>>> b = ['boo', 'aaa', 'bii']
>>> a_sep = [x.split('/')[1] for x in a]
>>> idxs = [i for i, x in enumerate(a_sep) if x in b]
>>> idxs
[0, 3]
>>> [a[i] for i in idxs]
['####/boo', '####/bii']
If you prefer to get the elements directly instead of the indexes:
>>> a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
>>> b = ['boo', 'aaa', 'bii']
>>> [x for x in a if x.split('/')[1] in b]
['####/boo', '####/bii']
ThiefMaster's answer is good, and mine will be quite similar, but if you don't need to know the indexes, you can take a shortcut:
>>> a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
>>> b = ['boo', 'aaa', 'bii']
>>> [x for x in a if x.split('/')[1] in b]
['####/boo', '####/bii']
Again, if b is a set, that will improve performance for large numbers of elements.
import random
a=[str(random.randint(1000,9999))+'/'+e for e in ['boo','baa','bee','bii','buu']]
b = ['boo', 'aaa', 'bii']
c=[x.split('/')[-1] for x in a if x.split('/')[-1] in b]
print c
prints:
['boo', 'bii']
Or, if you want the entire entry:
print [x for x in a if x.split('/')[-1] in b]
prints:
['3768/boo', '9110/bii']
>>> [i for i in a for j in b if j in i]
['####/boo', '####/bii']
This should do what you want, elegant and pythonic.
As other answers have indicated, you can use set operations to make this faster. Here's a way to do this:
>>> a_dict = dict((item.split('/')[1], item) for item in a)
>>> common = set(a_dict) & set(b)
>>> [a_dict[i] for i in common]
['####/boo', '####/bii']