So i'm trying to implement the agglomerative clustering algorithm and to check the distances between each cluster i use this:
a, b = None, None
c = max
for i in range(len(map)-1):
for n in range(len(map[i])):
for j in range(i+1, len(map)):
for m in range(len(map[j])):
//dist is distance func.
d = dist(map[i][n], map[j][m])
if c > d:
a, b, c = i, j, d
print(a, ' ', b)
return a, b
map looks like this: { 0: [[1,2,3], [2,2,2]], 1: [[3,3,3]], 2: [[4,4,4], [5,5,5]] }
What I expect from this is for each row item to compare with every row/col of every other row. So something like this:
comparisons:
[1,2,3] and [3,3,3], [1,2,3] and [4,4,4], [1,2,3] and [5,5,5], [2,2,2] and [3,3,3] and so on
When I run this it only works 1 time and fails any subsequent try after at line 6 with KeyError.
I suspect that the problem is either here or in merging clusters.
If map is a dict of values, you have a general problem with your indexing:
for m in range(len(map[j])):
You use range() to create numerical indices. However, what you need j to be in this example is a valid key of the dictionary map.
EDIT:
That is - of course - assuming that you did not use 0-based incremented integers as the key of map, in which cause you might as well have gone with a list. In general you seem to be relying on the ordering provided in a list or OrderedDict (or dict in Python3.6+ as an implementation detail). See for j in range(i+1, len(map)): as a good example. Therefore I would advise using a list.
EDIT 2: Alternatively, create a list of the map.keys() and use it to index the map:
a, b = None, None
c = max
keys = list(map.keys())
for i in range(len(map)-1):
for n in range(len(map[keys[i]])):
for j in range(i+1, len(map)):
for m in range(len(map[keys[j]])):
#dist is distance func.
d = dist(map[keys[i]][n], map[keys[j]][m])
if c > d:
a, b, c = i, j, d
print(a, ' ', b)
return a, b
Before accessing to map[j] check is it valid or not like:
if j in map.keys():
#whatever
or put it in try/except:
try:
#...
except KeyError:
#....
Edit:
its better to use for loop like this:
for i in map.keys():
#.....
Related
If you look at https://en.wikipedia.org/wiki/Clique_problem, you'll notice there is a distinction between cliques and maximal cliques. A maximal clique is contained in no other clique but itself. So I want those clique, but networkx seems to only provide:
networkx.algorithms.clique.enumerate_all_cliques(G)
So I tried a simple for loop filtering mechanism (see below).
def filter_cliques(self, cliques):
# TODO: why do we need this? Post in forum...
res = []
for C in cliques:
C = set(C)
for D in res:
if C.issuperset(D) and len(C) != len(D):
res.remove(D)
res.append(C)
break
elif D.issuperset(C):
break
else:
res.append(C)
res1 = []
for C in res:
for D in res1:
if C.issuperset(D) and len(C) != len(D):
res1.remove(D)
res1.append(C)
elif D.issuperset(C):
break
else:
res1.append(C)
return res1
I want to filter out all the proper subcliques. But as you can see it sucks because I had to filter it twice. It's not very elegant. So, the problem is, given a list of lists of objects (integers, strings), which were the node labels in the graph; enumerate_all_cliques(G) returns exactly this list of lists of labels. Now, given this list of lists, filter out all proper subcliques. So for instance:
[[a, b, c], [a, b], [b, c, d]] => [[a, b, c], [b, c, d]]
What's the quickest pythonic way of doing that?
There's a function for that: networkx.algorithms.clique.find_cliques, and yes, it does return only maximal cliques, despite the absence of "maximal" from the name. It should run a lot faster than any filtering approach.
If you find the name confusing (I do), you can rename it:
from networkx.algorithms.clique import find_cliques as maximal_cliques
I have 3 very large lists of strings, for visualization purposes consider:
A = ['one','four', 'nine']
B = ['three','four','six','five']
C = ['four','five','one','eleven']
How can I calculate the difference between this lists in order to get only the elements that are not repeating in the other lists. For example:
A = ['nine']
B = ['three','six']
C = ['eleven']
Method 1
You can arbitrarily add more lists just by changing the first line, e.g. my_lists = (A, B, C, D, E).
my_lists = (A, B, C)
my_sets = {n: set(my_list) for n, my_list in enumerate(my_lists)}
my_unique_lists = tuple(
list(my_sets[n].difference(*(my_sets[i] for i in range(len(my_sets)) if i != n)))
for n in range(len(my_sets)))
>>> my_unique_lists
(['nine'], ['six', 'three'], ['eleven'])
my_sets uses a dictionary comprehension to create sets for each of the lists. The key to the dictionary is the lists order ranking in my_lists.
Each set is then differenced with all other sets in the dictionary (barring itself) and then converted back to a list.
The ordering of my_unique_lists corresponds to the ordering in my_lists.
Method 2
You can use Counter to get all unique items (i.e. those that only appear in just one list and not the others), and then use a list comprehension to iterate through each list and select those that are unique.
from collections import Counter
c = Counter([item for my_list in my_lists for item in set(my_list)])
unique_items = tuple(item for item, count in c.items() if count == 1)
>>> tuple([item for item in my_list if item in unique_items] for my_list in my_lists)
(['nine'], ['three', 'six'], ['eleven'])
With sets:
convert all lists to sets
take the differences
convert back to lists
A, B, C = map(set, (A, B, C))
a = A - B - C
b = B - A - C
c = C - A - B
A, B, C = map(list, (a, b, c))
The (possible) problem with this is that the final lists are no longer ordered, e.g.
>>> A
['nine']
>>> B
['six', 'three']
>>> C
['eleven']
This could be fixed by sorting by the original indicies, but then the time complexity will dramatically increase so the benefit of using sets is almost entirely lost.
With list-comps (for-loops):
convert lists to sets
use list-comps to filter out elements from the original lists that are not in the other sets
sA, sB, sC = map(set, (A, B, C))
A = [e for e in A if e not in sB and e not in sC]
B = [e for e in B if e not in sA and e not in sC]
C = [e for e in C if e not in sA and e not in sB]
which then produces a result that maintains the original order of the lists:
>>> A
['nine']
>>> B
['three', 'six']
>>> C
['eleven']
Summary:
In conclusion, if you don't care about the order of the result, convert the lists to sets and then take their differences (and not bother converting back to lists). However, if you do care about order, then still convert the lists to sets (hash tables) as then the lookup will still be faster when filtering them (best case O(1) vs O(n) for lists).
You can iteratively go thru all lists elements adding current element to set if its not there, and if its there remove it from list. This way you will use additional up to O(n) space complexity, and O(n) time complexity but elements will remain in order.
You can also use a function define purposely to check the difference between three list. Here's an example of such a function:
def three_list_difference(l1, l2, l3):
lst = []
for i in l1:
if not(i in l2 or i in l3):
lst.append(i)
return lst
The function three_list_difference takes three list and checks if an element in the first list l1 is also in either l2 or l3. The deference can be determined by simple calling the function in the right configuration:
three_list_difference(A, B, C)
three_list_difference(B, A, C)
three_list_difference(C, B, A)
with outputs:
['nine']
['three', 'six']
['eleven']
Using a function is advantageous because the code is reusable.
I am trying to generate an array that is the sum of two previous arrays. e.g
c = [A + B for A in a and B in b]
Here, get the error message
NameError: name 'B' is not defined
where
len(a) = len(b) = len(c)
Please can you let me know what I am doing wrong. Thanks.
The boolean and operator does not wire iterables together, it evaluates the truthiness (or falsiness) of its two operands.
What you're looking for is zip:
c = [A + B for A, B in zip(a, b)]
Items from the two iterables are successively assigned to A to B until one of the two is exhausted. B is now defined!
It should be
c = [A + B for A in a for B in b]
for instead of and. You might want to consider using numpy, where you can add 2 matrices directly, and more efficient.
'for' does not work the way you want it to work.
You could use zip().
A = [1,2,3]
B = [4,5,6]
c = [ a + b for a,b in zip(A,B)]
zip iterates through A & B and produces tuples.
To see what this looks like try:
[ x for x in zip(A,B)]
I need to store in a list the indexes of those values in 3 lists which exceed a given maximum limit. This is what I got:
# Data lists.
a = [3,4,5,12,6,8,78,5,6]
b = [6,4,1,2,8,784,43,6,2]
c = [8,4,32,6,1,7,2,9,23]
# Maximum limit.
max_limit = 20.
# Store indexes in list.
indexes = []
for i, a_elem in enumerate(a):
if a_elem > max_limit or b[i] > max_limit or c[i] > max_limit:
indexes.append(i)
This works but I find it quite ugly. How can I make it more elegant/pythonic?
You could replace your for loop with:
indexes = []
for i, triplet in enumerate(zip(a, b, c)):
if any(e > max_limit for e in triplet):
indexes.append(i)
... which you could then reduce to a list comprehension:
indexes = [i for i, t in enumerate(zip(a, b, c)) if any(e > max_limit for e in t)]
... although that seems a little unwieldy to me - this is really about personal taste, but I prefer to keep listcomps simple; the three-line for loop is clearer in my opinion.
As pointed out by user2357112, you can reduce the apparent complexity of the list comprehension with max():
indexes = [i for i, t in enumerate(zip(a, b, c)) if max(t) > max_limit]
... although this won't short-circuit in the same way that the any() version (and your own code) does, so will probably be slightly slower.
You could try
if max(a_elem, b[i], c[i]) > max_limit:
indexes.append(i)
The logic here is finding out if any one of these three values needs to be greater than max_limit. If the greatest element of these three is greater than max_limit, your condition is satisfied.
I like the exceeders = line best myself
import collections
# Data lists.
a = [3,4,5,12,6,8,78,5,6]
b = [6,4,1,2,8,784,43,6,2]
c = [8,4,32,6,1,7,2,9,23]
Triad = collections.namedtuple('Triad', 'a b c')
triads = [Triad(*args) for args in zip(a, b, c)]
triads = [t for t in zip(a, b, c)] # if you don't need namedtuple
# Maximum limit.
max_limit = 20.
# Store indexes in list.
indexes = [for i, t in enumerate(triads) if max(t) > max_limit]
print indexes
# store the bad triads themselves in a list for
# greater pythonic
exceeders = [t for t in triads if max(t) > max_limit]
print exceeder
As I commented above, using parallel arrays to represent data that are related makes simple code much less simple than it need be.
added in response to comment
Perhaps I gave you too many alternatives, so I shall give only one way instead. One feature that all of the answers have in common is that they fuse the separate "data lists" into rows using zip:
triads = [t for t in zip(a, b, c)]
exceeders = [t for t in triads if max(t) > max_limit]
That's it: two lines. The important point is that storing the index of anything in a list is a C-style way of doing things and you asked for a Pythonic way. Keeping a list of indices means that anytime you want to do something with the data at that index, you have to do an indirection. After those two lines execute, exceeders has the value:
[(5, 1, 32), (8, 784, 7), (78, 43, 2), (6, 2, 23)]
Where each member of the list has the "column" of your three data rows that was found to exceed your limit.
Now you might say "but I really wanted the indices instead". If that is so, there is another part of your problem which you didn't show us which also relies on list indexing. If so, you are still doing things in a C/C++/Java way and "Pythonic" will remain evasive.
>>> maximums = map(max, zip(a, b, c))
>>> [i for i, num in enumerate(maximums) if num > max_limit]
[2, 5, 6, 8]
Old answer
Previously, I posted the mess below. The list comp above is much more manageable.
>>> next(zip(*filter(lambda i: i[1] > max_limit, enumerate(map(max, zip(a, b, c))))))
(2, 5, 6, 8)
m = lambda l: [i for i, e in enumerate(l) if e>max_limit]
indexes = sorted(set(m(a) + m(b) + m(c)))
How can I form an array (c) composed of elements of b which are not in a?
a=[1,2,"ID123","ID126","ID124","ID125"]
b=[1,"ID123","ID124","ID125","343434","fffgfgf"]
c= []
Can this be done without using a list comprehension?
If the lists are long, you want to make a set of a first:
a_set = set(a)
c = [x for x in b if x not in a_set]
If the order of the elements don't matter, then just use sets:
c = list(set(b) - set(a))
Python lists don't offer a direct - operator, as Ruby arrays do.
Using list comprehension is most straight forward:
[i for i in b if i not in a]
c
['343434', 'fffgfgf']
However, if you really did not want to use list comprehension you could use a generator expression:
c = (i for i in b if i not in a)
This will also not generate the result list all at once in memory (in case that would be a concern).
The following will do it:
c = [v for v in b if v not in a]
If a is long, it might improve performance to turn it into a set:
a_set = set(a)
c = [v for v in b if v not in a_set]