How to compare two lists to keep matching substrings?

How to compare two lists to keep matching substrings? - python

As best I can describe it, I have two lists of strings and I want to return all results from list A that contain any of the strings in list B. Here are details:
A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
B = ['2000', '2001']
How do I return
C = ['dataFile2000', 'dataFile2001']?
I've been looking into list comprehensions, doing something like below
C=[x for x in A if B in A]
but I can't seem to make it work. Am I on the right track?

You were close, use any:
C=[x for x in A if any(b in x for b in B)]
More detailed:
A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
B = ['2000', '2001']
C = [x for x in A if any(b in x for b in B)]
print(C)
Output
['dataFile2000', 'dataFile2001']

You can use any() to check if any element of your list B is in x:
A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
B = ['2000', '2001']
c = [x for x in A if any(k in x for k in B)]
print(c)
Output:
['dataFile2000', 'dataFile2001']

First, I would construct a set of the years for the O(1) lookup time.1
>>> A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
>>> B = ['2000', '2001']
>>>
>>> years = set(B)
Now, keep only the elements of A that end with an element of years.
>>> [file for file in A if file[-4:] in years]
>>> ['dataFile2000', 'dataFile2001']
1 If you have very small lists (two elements certainly qualify) keep the lists. Sets have O(1) lookup but the hashing still introduces overhead.

Related

How can it taken the specific args in lists of list?

for example we have;
L = [["Ak","154"],["Bm","200"],["Ck","250"], ["Ad","500"],["Ac","600"]]
I want to choose first element starting with 'A' I want to find their values which are in second element; see this output should like
["154","500","600"] or like [["154"],["500"],["600"]]

Filter and map with a list comprehension:
[b for a, b in L if a[0] == "A"]
Or, if you need to search for prefixes of more than one character:
[b for a, b in L if a.startswith("A")]

Another Solution Using map() and filter() functions
L = [["Ak","154"],["Bm","200"],["Ck","250"], ["Ad","500"],["Ac","600"]]
k = list(map(lambda y:y[1], list(filter(lambda x: x[0][0] == 'A' , L))))
Output:
['154', '500', '600']

Identifying common element in list in python

I have two lists. I need to compare elements whether any element in the list is matching or not.
Input:
a = ['1001,1715']
b = ['1009,1715']
Output : 1715
Please suggest how to do it?
I tried doing:
set(''.join(a))
and
set(''.join(b))
but it gave me {'5', '0', '7', ',', '1'}. how can I convert ['1001,1715'] to [1001,1715] ?

a = ['1001,1715']
b = ['1009,1715']
def fun(a):
return a[0].split(",")
def intersect(a, b):
return list(set(a) & set(b))
print(intersect(fun(a),fun(b)))

There are 2 parts to your problem.
Convert strings to sets of integers
Since your string is the only element of a list, you can use list indexing and str.split with map:
a_set = set(map(int, a[0].split(',')))
b_set = set(map(int, b[0].split(',')))
Calculate the intersection of 2 sets
res = a_set & b_set
# alternatively, a_set.intersection(b_set)
print(res)
{1715}

You can use set intersection:
set(a[0].split(',')).intersection(set(b[0].split(',')))
Which returns:
{'1715'}
Converting from '1001,1715' to ['1001', '1715'] can simply be done with .split(',')

A more general solution if you have lists with more elements (e.g a = ['1001,1715','1009,2000'] )
a = [x for xs in a for x in xs.split(',')]
b = [x for xs in b for x in xs.split(',')]
common = set(a).intersection(set(b))
Example:
a = ['1001,1715','1009,2000']
b = ['1009,1715']
Output:
{'1009', '1715'}

Assign values to an array using two values

I am trying to generate an array that is the sum of two previous arrays. e.g
c = [A + B for A in a and B in b]
Here, get the error message
NameError: name 'B' is not defined
where
len(a) = len(b) = len(c)
Please can you let me know what I am doing wrong. Thanks.

The boolean and operator does not wire iterables together, it evaluates the truthiness (or falsiness) of its two operands.
What you're looking for is zip:
c = [A + B for A, B in zip(a, b)]
Items from the two iterables are successively assigned to A to B until one of the two is exhausted. B is now defined!

It should be
c = [A + B for A in a for B in b]
for instead of and. You might want to consider using numpy, where you can add 2 matrices directly, and more efficient.

'for' does not work the way you want it to work.
You could use zip().
A = [1,2,3]
B = [4,5,6]
c = [ a + b for a,b in zip(A,B)]
zip iterates through A & B and produces tuples.
To see what this looks like try:
[ x for x in zip(A,B)]

Python find duplicates array operations

How can I form an array (c) composed of elements of b which are not in a?
a=[1,2,"ID123","ID126","ID124","ID125"]
b=[1,"ID123","ID124","ID125","343434","fffgfgf"]
c= []
Can this be done without using a list comprehension?

If the lists are long, you want to make a set of a first:
a_set = set(a)
c = [x for x in b if x not in a_set]
If the order of the elements don't matter, then just use sets:
c = list(set(b) - set(a))
Python lists don't offer a direct - operator, as Ruby arrays do.

Using list comprehension is most straight forward:
[i for i in b if i not in a]
c
['343434', 'fffgfgf']
However, if you really did not want to use list comprehension you could use a generator expression:
c = (i for i in b if i not in a)
This will also not generate the result list all at once in memory (in case that would be a concern).

The following will do it:
c = [v for v in b if v not in a]
If a is long, it might improve performance to turn it into a set:
a_set = set(a)
c = [v for v in b if v not in a_set]

Cross-list comprehension in Python

Let's say I have two lists of strings:
a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
where #### represents 4-digit random number. And
b = ['boo', 'aaa', 'bii']
I need to know which string entry in list a contains any given entry in b. I was able to accomplish this by couple of nested loops and then using the in operator for checking the string contains the current entry in b. But, being relatively new to py, I'm almost positive this was not the most pythonic or elegant way to write it. So, is there such idiom to reduce my solution?

The following code gives you an array with the indexes of a where the part after the slash is an element from b.
a_sep = [x.split('/')[1] for x in a]
idxs = [i for i, x in enumerate(a_sep) if x in b]
To improve performance, make b a set instead of a list.
Demo:
>>> a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
>>> b = ['boo', 'aaa', 'bii']
>>> a_sep = [x.split('/')[1] for x in a]
>>> idxs = [i for i, x in enumerate(a_sep) if x in b]
>>> idxs
[0, 3]
>>> [a[i] for i in idxs]
['####/boo', '####/bii']
If you prefer to get the elements directly instead of the indexes:
>>> a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
>>> b = ['boo', 'aaa', 'bii']
>>> [x for x in a if x.split('/')[1] in b]
['####/boo', '####/bii']

ThiefMaster's answer is good, and mine will be quite similar, but if you don't need to know the indexes, you can take a shortcut:
>>> a = ['####/boo', '####/baa', '####/bee', '####/bii', '####/buu']
>>> b = ['boo', 'aaa', 'bii']
>>> [x for x in a if x.split('/')[1] in b]
['####/boo', '####/bii']
Again, if b is a set, that will improve performance for large numbers of elements.

import random
a=[str(random.randint(1000,9999))+'/'+e for e in ['boo','baa','bee','bii','buu']]
b = ['boo', 'aaa', 'bii']
c=[x.split('/')[-1] for x in a if x.split('/')[-1] in b]
print c
prints:
['boo', 'bii']
Or, if you want the entire entry:
print [x for x in a if x.split('/')[-1] in b]
prints:
['3768/boo', '9110/bii']

>>> [i for i in a for j in b if j in i]
['####/boo', '####/bii']
This should do what you want, elegant and pythonic.

As other answers have indicated, you can use set operations to make this faster. Here's a way to do this:
>>> a_dict = dict((item.split('/')[1], item) for item in a)
>>> common = set(a_dict) & set(b)
>>> [a_dict[i] for i in common]
['####/boo', '####/bii']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to compare two lists to keep matching substrings? - python

You were close, use any: C=[x for x in A if any(b in x for b in B)] More detailed: A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002'] B = ['2000', '2001'] C = [x for x in A if any(b in x for b in B)] print(C) Output ['dataFile2000', 'dataFile2001']

You can use any() to check if any element of your list B is in x: A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002'] B = ['2000', '2001'] c = [x for x in A if any(k in x for k in B)] print(c) Output: ['dataFile2000', 'dataFile2001']

Related

How can it taken the specific args in lists of list?

Identifying common element in list in python

Assign values to an array using two values

Python find duplicates array operations

Cross-list comprehension in Python

Categories

Resources