Let's say I have two lists of strings:
a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
where #### represents 4-digit random number. And
b = ['boo', 'aaa', 'bii']
I need to know which string entry in list a contains any given entry in b. I was able to accomplish this by couple of nested loops and then using the in operator for checking the string contains the current entry in b. But, being relatively new to py, I'm almost positive this was not the most pythonic or elegant way to write it. So, is there such idiom to reduce my solution?
The following code gives you an array with the indexes of a where the part after the slash is an element from b.
a_sep = [x.split('/')[1] for x in a]
idxs = [i for i, x in enumerate(a_sep) if x in b]
To improve performance, make b a set instead of a list.
Demo:
>>> a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
>>> b = ['boo', 'aaa', 'bii']
>>> a_sep = [x.split('/')[1] for x in a]
>>> idxs = [i for i, x in enumerate(a_sep) if x in b]
>>> idxs
[0, 3]
>>> [a[i] for i in idxs]
['####/boo', '####/bii']
If you prefer to get the elements directly instead of the indexes:
>>> a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
>>> b = ['boo', 'aaa', 'bii']
>>> [x for x in a if x.split('/')[1] in b]
['####/boo', '####/bii']
ThiefMaster's answer is good, and mine will be quite similar, but if you don't need to know the indexes, you can take a shortcut:
>>> a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
>>> b = ['boo', 'aaa', 'bii']
>>> [x for x in a if x.split('/')[1] in b]
['####/boo', '####/bii']
Again, if b is a set, that will improve performance for large numbers of elements.
import random
a=[str(random.randint(1000,9999))+'/'+e for e in ['boo','baa','bee','bii','buu']]
b = ['boo', 'aaa', 'bii']
c=[x.split('/')[-1] for x in a if x.split('/')[-1] in b]
print c
prints:
['boo', 'bii']
Or, if you want the entire entry:
print [x for x in a if x.split('/')[-1] in b]
prints:
['3768/boo', '9110/bii']
>>> [i for i in a for j in b if j in i]
['####/boo', '####/bii']
This should do what you want, elegant and pythonic.
As other answers have indicated, you can use set operations to make this faster. Here's a way to do this:
>>> a_dict = dict((item.split('/')[1], item) for item in a)
>>> common = set(a_dict) & set(b)
>>> [a_dict[i] for i in common]
['####/boo', '####/bii']
Related
As best I can describe it, I have two lists of strings and I want to return all results from list A that contain any of the strings in list B. Here are details:
A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
B = ['2000', '2001']
How do I return
C = ['dataFile2000', 'dataFile2001']?
I've been looking into list comprehensions, doing something like below
C=[x for x in A if B in A]
but I can't seem to make it work. Am I on the right track?
You were close, use any:
C=[x for x in A if any(b in x for b in B)]
More detailed:
A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
B = ['2000', '2001']
C = [x for x in A if any(b in x for b in B)]
print(C)
Output
['dataFile2000', 'dataFile2001']
You can use any() to check if any element of your list B is in x:
A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
B = ['2000', '2001']
c = [x for x in A if any(k in x for k in B)]
print(c)
Output:
['dataFile2000', 'dataFile2001']
First, I would construct a set of the years for the O(1) lookup time.1
>>> A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
>>> B = ['2000', '2001']
>>>
>>> years = set(B)
Now, keep only the elements of A that end with an element of years.
>>> [file for file in A if file[-4:] in years]
>>> ['dataFile2000', 'dataFile2001']
1 If you have very small lists (two elements certainly qualify) keep the lists. Sets have O(1) lookup but the hashing still introduces overhead.
How can I use boolean inddex arrays to filter a list without using numpy?
For example:
>>> l = ['a','b','c']
>>> b = [True,False,False]
>>> l[b]
The result should be:
['a']
I know numpy support it but want to know how to solve in Python.
>>> import numpy as np
>>> l = np.array(['a','b','c'])
>>> b = np.array([True,False,False])
>>> l[b]
array(['a'],
dtype='|S1')
Python does not support boolean indexing but the itertools.compress function does exactly what you want. It return an iterator with means you need to use the list constructor to return a list.
>>> from itertools import compress
>>> l = ['a', 'b', 'c']
>>> b = [True, False, False]
>>> list(compress(l, b))
['a']
[a for a, t in zip(l, b) if t]
# => ["a"]
A bit more efficient, use iterator version:
from itertools import izip
[a for a, t in izip(l, b) if t]
# => ["a"]
EDIT: user3100115's version is nicer.
Using enumerate
l = ['a','b','c']
b = [True,False,False]
res = [item for i, item in enumerate(l) if b[i]]
print(res)
gives
['a']
Example:
a = ['abc123','abc','543234','blah','tete','head','loo2']
So I want to filter out from the above array of strings the following array b = ['ab','2']
I want to remove strings containing 'ab' from that list along with other strings in the array so that I get the following:
a = ['blah', 'tete', 'head']
You can use a list comprehension:
[i for i in a if not any(x in i for x in b)]
This returns:
['blah', 'tete', 'head']
>>> a = ['abc123','abc','543234','blah','tete','head','loo2']
>>> b = ['ab','2']
>>> [e for e in a if not [s for s in b if s in e]]
['blah', 'tete', 'head']
newA = []
for c in a:
for d in b:
if d not in c:
newA.append(c)
break
a = newA
Suppose for example you have the list
a = [['hand', 'head'], ['phone', 'wallet'], ['lost', 'stock']]
and another list
b = ['phone', 'lost']
And you want to find a list c, that contains the indices of the rows in a (thinking of a as a 2D matrix) whose first column is a value in b. So in this case
c = [1, 2]
I tried to use the following list comprehensions
c = [i if a[i][0] in b for i in range(0, 1)]
c = [i if a[i][0] in b]
But both of these were invalid syntax.
Use enumerate():
c = [i for i, v in enumerate(a) if v[0] in b]
enumerate() gives you both the index and the value of the iterable you pass in. Note that the if test goes at the end; list comprehensions should be written in the same order that you would use when nesting loops:
c = []
for i, v in enumerate(a):
if v[0] in b:
c.append(i)
You really want to make b a set:
b = set(b)
to make membership testing a O(1) constant time operation as opposed to a O(n) linear time test against a list.
Demo:
>>> a = [['hand', 'head'], ['phone', 'wallet'], ['lost', 'stock']]
>>> b = {'phone', 'lost'} # set literal
>>> [i for i, v in enumerate(a) if v[0] in b]
[1, 2]
First the array the start from 0.
So c must be:
c=[1,2]
if you need to do it with a list compression the solution can be:
c=[pos for pos, val_a in enumerate(a) for val_b_to_check in val_a if val_b_to_check in b ]
You can use Numpy to do this as well:
>>> import numpy as np
>>> a = np.array([['hand', 'head'], ['phone', 'wallet'], ['lost', 'stock']])
>>> b = np.array(['phone', 'lost'])
>>> np.in1d(a[:,0],b)
array([False, True, True], dtype=bool)
Or, if you want the indices:
>>> np.where(np.in1d(a[:,0],b))[0].tolist()
[1, 2]
This question already has answers here:
Joining pairs of elements of a list [duplicate]
(7 answers)
Closed 2 years ago.
I have a list like
myl = ['A','B','C','D','E','F'] #length always even
Now my desired output is 'AB','CD','EF'
I tried
>>> myl = ['A','B','C','D','E','F']
>>> even_pos = myl[::2]
>>> odd_pos = myl[::-2]
>>> odd_pos.reverse()
>>> newlist = zip(even_pos,odd_pos)
>>> for x in newlist:
... print "".join(list(x))
...
...
AB
CD
EF
>>>
I don't like this way because I think this is too much.
So, is there any better way to achieve my output.
You can do this concisely using a list comprehension or generator expression:
>>> myl = ['A','B','C','D','E','F']
>>> [''.join(myl[i:i+2]) for i in range(0, len(myl), 2)]
['AB', 'CD', 'EF']
>>> print '\n'.join(''.join(myl[i:i+2]) for i in range(0, len(myl), 2))
AB
CD
EF
You could replace ''.join(myl[i:i+2]) with myl[i] + myl[i+1] for this particular case, but using the ''.join() method is easier for when you want to do groups of three or more.
Or an alternative that comes from the documentation for zip():
>>> map(''.join, zip(*[iter(myl)]*2))
['AB', 'CD', 'EF']
Why is your method so complicated? You could do basically what you did, but in one line, like so:
[ "".join(t) for t in zip(myl[::2], myl[1::2]) ]
F.J's answer is more efficient though.
How about this?
>>> ["%s%s" % (myl[c], myl[c+1]) for c in range(0, 6, 2)]
['AB', 'CD', 'EF']
I'd probably write:
[myl[i] + myl[i + 1] for i in xrange(len(myl), step=2)]
You could do:
myl = ['A','B','C','D','E','F']
[''.join(myl[i:i+2]) for i in range(0, len(myl), 2)]
print '\n'.join(''.join(myl[i:i+2]) for i in range(0, len(myl), 2))