list comprenhension using two list - python

How can I create a list C based on two list A and B using list comprehension, where C list contains an item from A just when items in B is TRUE. My for loop implementation is here:
A = ["ID","population","median_age"]
B = [False False True]
C = []
for x in range(len(A)):
if B[x] == True:
C.append(A[x])

You could do something like this:
C = [a for a, b in zip(A, B) if b]
Doing something like for i in range(len(lst)) is rarely idiomatic in Python, as you'd usually prefer to do for i, value in enumerate(lst). But in this case, using zip seems both safer and more idiomatic as it manages cases where A and B are of different lengths.

C = [ A[x] for x in range(len(A)) if B[x] ]

Related

How to compare two lists to keep matching substrings?

As best I can describe it, I have two lists of strings and I want to return all results from list A that contain any of the strings in list B. Here are details:
A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
B = ['2000', '2001']
How do I return
C = ['dataFile2000', 'dataFile2001']?
I've been looking into list comprehensions, doing something like below
C=[x for x in A if B in A]
but I can't seem to make it work. Am I on the right track?
You were close, use any:
C=[x for x in A if any(b in x for b in B)]
More detailed:
A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
B = ['2000', '2001']
C = [x for x in A if any(b in x for b in B)]
print(C)
Output
['dataFile2000', 'dataFile2001']
You can use any() to check if any element of your list B is in x:
A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
B = ['2000', '2001']
c = [x for x in A if any(k in x for k in B)]
print(c)
Output:
['dataFile2000', 'dataFile2001']
First, I would construct a set of the years for the O(1) lookup time.1
>>> A = ['dataFile1999', 'dataFile2000', 'dataFile2001', 'dataFile2002']
>>> B = ['2000', '2001']
>>>
>>> years = set(B)
Now, keep only the elements of A that end with an element of years.
>>> [file for file in A if file[-4:] in years]
>>> ['dataFile2000', 'dataFile2001']
1 If you have very small lists (two elements certainly qualify) keep the lists. Sets have O(1) lookup but the hashing still introduces overhead.

How would I change this one line for loop to normal for loop?

This is a general question that I was not to able to understand.
If I have this:
somelist = [[a for a, b in zip(X, y) if b == c] for c in np.unique(y)]
How can I write this as normal multiline for loop? I never seem to get it right.
EDIT: So far I've tried this:
somelist = []
for c in np.unique(y):
for x, t in zip(X, y):
if t == c:
separated.append(x)
But I wasn't sure if this was right because I wasn't getting an expected result in some other part of my code.
Let me know if this works:
evaluate the outer list comprehension first for the outer loop. then evaluate the inner list comprehension.
somelist=[]
for c in np.unique(y):
ans=[]
for a,b in zip(X,y):
if b==c:
ans.append(a)
somelist.append(ans)
To flat a nested comprehension out, follow these steps:
First create an empty container: somelist = []
If the comprehension has an if clause, put it right after the for
Then, flat the nested comprehensions out, starting with the innermost
The inner comprehension is:
row = []
for a, b in zip(X, y):
if b == c:
row.append(a)
Then, somelist is nothing more than [row for c in np.unique(y)], where row depends on several factors.
This one is equivalent to:
somelist = []
for c in np.unique(y):
somelist.append(row)
So the complete version is:
somelist = []
for c in np.unique(y):
row = []
for a, b in zip(X, y):
if b == c:
row.append(a)
c.append(row)
This how it looks like using "normal" for-loop (a.ka. without using list comprehension):
somelist = []
for c in np.unique(y)
l = []
for a, b in zip(X, y):
if b == c:
l.append(a)
somelist.append(l)
Your were very close. The problem with your approach is that you forgot an important point: The result of the list comprehension will be a list of lists. Thus, the values computed in the inner loop, need to be held in a temporary list that will be append to the "main" list somelist to create a list of lists:
somelist = []
for c in np.unique(y):
# create a temporary list that will holds the values computed in the
# inner loop.
sublist = []
for x, t in zip(X, y):
if t == c:
sublist.append(x)
# after the list has been computed, add the temporary list to the main
# list `somelist`. That way, a list of lists is created.
somelist.append(sublist)
The general rule of thumb when converting a list comprehension to a vanilla for loop is that for each level of nesting, you'll need another nested for loop and another temporary list to hold the values computed in the nested loop.
As a caveat, once you start getting past 2-3 leves of nesting in your comprehension, you should seriously consider coveting it to a normal for loop. Whatever efficacy you're gaining, it offset my the unreliability of the nested list comprehension. Remember, "97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%".
After offering the obvious caveat that, for performance and Pythonic reasons, you should not expand your list comprehension into a multi-line loop, you would write it from the outside in:
somelist = []
for c in np.unique(y):
inner_list = []
for a, b in zip(X, y):
if b == c:
inner_list.append(a)
somelist.append(inner_list)
And now you see the beauty of list comprehensions.
somelist = []
for c in np.unique(y):
somelist.append([a for a, b in zip(X, y) if b == c])

Fixing a 4 nested for loops in Python

So i'm trying to implement the agglomerative clustering algorithm and to check the distances between each cluster i use this:
a, b = None, None
c = max
for i in range(len(map)-1):
for n in range(len(map[i])):
for j in range(i+1, len(map)):
for m in range(len(map[j])):
//dist is distance func.
d = dist(map[i][n], map[j][m])
if c > d:
a, b, c = i, j, d
print(a, ' ', b)
return a, b
map looks like this: { 0: [[1,2,3], [2,2,2]], 1: [[3,3,3]], 2: [[4,4,4], [5,5,5]] }
What I expect from this is for each row item to compare with every row/col of every other row. So something like this:
comparisons:
[1,2,3] and [3,3,3], [1,2,3] and [4,4,4], [1,2,3] and [5,5,5], [2,2,2] and [3,3,3] and so on
When I run this it only works 1 time and fails any subsequent try after at line 6 with KeyError.
I suspect that the problem is either here or in merging clusters.
If map is a dict of values, you have a general problem with your indexing:
for m in range(len(map[j])):
You use range() to create numerical indices. However, what you need j to be in this example is a valid key of the dictionary map.
EDIT:
That is - of course - assuming that you did not use 0-based incremented integers as the key of map, in which cause you might as well have gone with a list. In general you seem to be relying on the ordering provided in a list or OrderedDict (or dict in Python3.6+ as an implementation detail). See for j in range(i+1, len(map)): as a good example. Therefore I would advise using a list.
EDIT 2: Alternatively, create a list of the map.keys() and use it to index the map:
a, b = None, None
c = max
keys = list(map.keys())
for i in range(len(map)-1):
for n in range(len(map[keys[i]])):
for j in range(i+1, len(map)):
for m in range(len(map[keys[j]])):
#dist is distance func.
d = dist(map[keys[i]][n], map[keys[j]][m])
if c > d:
a, b, c = i, j, d
print(a, ' ', b)
return a, b
Before accessing to map[j] check is it valid or not like:
if j in map.keys():
#whatever
or put it in try/except:
try:
#...
except KeyError:
#....
Edit:
its better to use for loop like this:
for i in map.keys():
#.....

Assign values to an array using two values

I am trying to generate an array that is the sum of two previous arrays. e.g
c = [A + B for A in a and B in b]
Here, get the error message
NameError: name 'B' is not defined
where
len(a) = len(b) = len(c)
Please can you let me know what I am doing wrong. Thanks.
The boolean and operator does not wire iterables together, it evaluates the truthiness (or falsiness) of its two operands.
What you're looking for is zip:
c = [A + B for A, B in zip(a, b)]
Items from the two iterables are successively assigned to A to B until one of the two is exhausted. B is now defined!
It should be
c = [A + B for A in a for B in b]
for instead of and. You might want to consider using numpy, where you can add 2 matrices directly, and more efficient.
'for' does not work the way you want it to work.
You could use zip().
A = [1,2,3]
B = [4,5,6]
c = [ a + b for a,b in zip(A,B)]
zip iterates through A & B and produces tuples.
To see what this looks like try:
[ x for x in zip(A,B)]

Python find duplicates array operations

How can I form an array (c) composed of elements of b which are not in a?
a=[1,2,"ID123","ID126","ID124","ID125"]
b=[1,"ID123","ID124","ID125","343434","fffgfgf"]
c= []
Can this be done without using a list comprehension?
If the lists are long, you want to make a set of a first:
a_set = set(a)
c = [x for x in b if x not in a_set]
If the order of the elements don't matter, then just use sets:
c = list(set(b) - set(a))
Python lists don't offer a direct - operator, as Ruby arrays do.
Using list comprehension is most straight forward:
[i for i in b if i not in a]
c
['343434', 'fffgfgf']
However, if you really did not want to use list comprehension you could use a generator expression:
c = (i for i in b if i not in a)
This will also not generate the result list all at once in memory (in case that would be a concern).
The following will do it:
c = [v for v in b if v not in a]
If a is long, it might improve performance to turn it into a set:
a_set = set(a)
c = [v for v in b if v not in a_set]

Categories