How to get the variable names of pairwise combinations? - python

I have multiple pairwise comparisons to do in order to find common items between the lists. My code below works. However, I would like to keep track of the names of the lists. which are geno1, geno2 and geno3. I'm not sure how to get the combinations refering to the variable names instead of the arrays. Although there are related questions on Stack overflow such as Getting method parameter names, I'm hoping to get a quite easy solution.Thank you in advance for your time.
import itertools #to get all pairwise combinations
geno1 = [1,2,3]
geno2 = [2,5]
geno3 = [1,2,4,5]
genotypes = [geno1,geno2,geno3]
combinations = list(itertools.combinations(genotypes,2))
for pair in combinations:
commonItem = [x for x in pair[0] if x in pair[1]]
print(f'{len(commonItem)} common items between {pair}')#Here instead of pair I want to know which pair of genotypes such as geno1 geno2, or geno1 geno3.
print(commonItem)
print()

Create a dictionary, where the keys are the name of the list and values are the lists that you originally have. You could do something using locals() if you didn't want to write out the name of the lists as strings, but it's pretty hacky and I wouldn't recommend it:
import itertools
geno1 = [1,2,3]
geno2 = [2,5]
geno3 = [1,2,4,5]
genotypes = {"geno1": geno1, "geno2": geno2, "geno3": geno3}
combinations = list(itertools.combinations(genotypes.items(),2))
for (fst_name, fst_lst), (snd_name, snd_lst) in combinations:
commonItem = [x for x in fst_lst if x in snd_lst]
print(f'{len(commonItem)} common items between {fst_name} and {snd_name}')
print(commonItem)
print()
Output:
1 common items between geno1 and geno2
[2]
2 common items between geno1 and geno3
[1, 2]
2 common items between geno2 and geno3
[2, 5]

You're probably best off putting it all into a dictionary. Then you can get the combinations of the names first, and look at the actual lists only inside the loop:
import itertools
genotypes = {
'geno1': [1,2,3],
'geno2': [2,5],
'geno3': [1,2,4,5],
}
combinations = list(itertools.combinations(genotypes,2))
for left_name, right_name in combinations:
left_geno = genotypes[left_name]
right_geno = genotypes[right_name]
commonItem = [x for x in left_geno if x in right_geno]
print(f'{len(commonItem)} common items between {left_name} and {right_name}: {commonItem}\n')

Where you want to treat the names as data, you should be storing them as data. A dict will let you hold names and values together:
import itertools
genotypes = {
'geno1': [1,2,3],
'geno2': [2,5],
'geno3': [1,2,4,5],
}
combinations = itertools.combinations(genotypes.items(), 2)
for (k1, v1), (k2, v2) in combinations:
commonItem = [x for x in v1 if x in v2]
print(f'{len(commonItem)} common items between {k1} and {k2}')
print(commonItem)
Output:
1 common items between geno1 and geno2
[2]
2 common items between geno1 and geno3
[1, 2]
2 common items between geno2 and geno3
[2, 5]
For more context, see:
How do I create variable variables?
Getting the name of a variable as a string

Related

How to create a dictionary using two lists (keys have repeated data) in python?

I am facing issue. I have two lists where one list contains keys say ['a','a','a','b','b','b','b'] but the list was in ['a','b','a','b','a','b'] this form converted into above list by using sorted() and another list contains values say ['1','2','3','4','5','6']. I want to make dictionary using the two lists, How can I do that? I have tried Zip method though it is returing only the last values
My code:
input:lst = ['a','b','a','b','a','b']
lst2 = sorted(lst)
score =[1,2,3,4,5,6]
dictionary = dict(zip(lst2, score))
Output:
{'a':3,'b':6}
What I want is
input:
lst = ['a','b','a','b','a','b']
lst2 = sorted(lst)
score =[1,2,3,4,5,6]
excepted output :
{'a':1,'a':2,'a':3,'b':4,'b':5,'b':6}
In Python Dictionary, your keys must be unique - therefore can't have duplicates as others have already mentioned.
I believe the Collections module in Python can help you group the data.
lst = ['a','b','a','b','a','b']
lst.sort() # in-place sort
score = [1,2,3,4,5,6]
from collections import defaultdict
output_list = defaultdict(list)
for A, B in zip(lst, score):
output_list[A].append(B)
output_list yields:
defaultdict(list, {'a': [1, 2, 3], 'b': [4, 5, 6]})

How can I check if a list of nodes have already been included in a list within a list of lists?

I have the following list: a = [[1,2,3],[4,5,6],[7,8,9]] which contains 3 lists, each being a list of nodes of a graph.
I am also given a tuple of nodes z = ([1,2], [4,9]). Now, I will like to check if either of the lists in z has been included in a list in a. For example, [1,2] is in [1,2,3], in a, but [4,9] is not in [4,5,6], although there is an overlapping node.
Remark: To clarify, I am also checking for sub-list of a list, or whether every item in a list is in another list. For example, I consider [1,3] to be "in" [1,2,3].
How can I do this? I tried implementing something similar found at Python 3 How to check if a value is already in a list in a list, but I have reached a mental deadlock..
Some insight on this issue will be great!
You can use any and all:
a = [[1,2,3],[4,5,6],[7,8,9]]
z = ([1,2], [4,9])
results = [i for i in z if any(all(c in b for c in i) for b in a)]
Output:
[[1, 2]]
You can use sets to compare if the nodes appear in a, <= operator for sets is equivalent to issubset().
itertools module provides some useful functions, itertools.product() is equivalent to nested for loops.
E.g.:
In []:
import itertools as it
[m for m, n in it.product(z, a) if set(m) <= set(n)]
Out[]:
[[1, 2]]
a = [[1,2,3],[4,5,6],[7,8,9]]
z = ([1,2], [4,9])
for z_ in z:
for a_ in a:
if set(z_).issubset(a_):
print(z_)
itertools.product is your friend (no installation builtin python module):
from itertools import product
print([i for i in z if any(tuple(i) in list(product(l,[len(i)])) for l in a)])
Output:
[[1, 2]]
Since you're only looking to test the sub-lists as if they were subsets, you can convert the sub-lists to sets and then use set.issubset() for the test:
s = map(set, a)
print([l for l in z for i in s if set(l).issubset(i)])
This outputs:
[[1, 2]]

Finding indices of items from a list in another list even if they repeat

This answer works very well for finding indices of items from a list in another list, but the problem with it is, it only gives them once. However, I would like my list of indices to have the same length as the searched for list.
Here is an example:
thelist = ['A','B','C','D','E'] # the list whose indices I want
Mylist = ['B','C','B','E'] # my list of values that I am searching in the other list
ilist = [i for i, x in enumerate(thelist) if any(thing in x for thing in Mylist)]
With this solution, ilist = [1,2,4] but what I want is ilist = [1,2,1,4] so that len(ilist) = len(Mylist). It leaves out the index that has already been found, but if my items repeat in the list, it will not give me the duplicates.
thelist = ['A','B','C','D','E']
Mylist = ['B','C','B','E']
ilist = [thelist.index(x) for x in Mylist]
print(ilist) # [1, 2, 1, 4]
Basically, "for each element of Mylist, get its position in thelist."
This assumes that every element in Mylist exists in thelist. If the element occurs in thelist more than once, it takes the first location.
UPDATE
For substrings:
thelist = ['A','boB','C','D','E']
Mylist = ['B','C','B','E']
ilist = [next(i for i, y in enumerate(thelist) if x in y) for x in Mylist]
print(ilist) # [1, 2, 1, 4]
UPDATE 2
Here's a version that does substrings in the other direction using the example in the comments below:
thelist = ['A','B','C','D','E']
Mylist = ['Boo','Cup','Bee','Eerr','Cool','Aah']
ilist = [next(i for i, y in enumerate(thelist) if y in x) for x in Mylist]
print(ilist) # [1, 2, 1, 4, 2, 0]
Below code would work
ilist = [ theList.index(i) for i in MyList ]
Make a reverse lookup from strings to indices:
string_indices = {c: i for i, c in enumerate(thelist)}
ilist = [string_indices[c] for c in Mylist]
This avoids the quadratic behaviour of repeated .index() lookups.
If you data can be implicitly converted to ndarray, as your example implies, you could use numpy_indexed (disclaimer: I am its author), to perform this kind of operation in an efficient (fully vectorized and NlogN) manner.
import numpy_indexed as npi
ilist = npi.indices(thelist, Mylist)
npi.indices is essentially the array-generalization of list.index. Also, it has a kwarg to give you control over how to deal with missing values and such.

How to Create Combination of Element in Different Set?

Let say that I have n lists and they are not disjoint. I want to make every combination of n elements which I get one from every lists I have but in that combination there are different elements and there are no double combination. So, [1,1,2] isn't allowed and [1,2,3] is same as [2,1,3].
For example, I have A=[1,2,3], B=[2,4,1], and C=[1,5,3]. So, the output that I want is [[1,2,5],[1,2,3],[1,4,5],[1,4,3],[2,4,1],[2,4,5],[2,4,3],[3,2,5],[3,4,5],[3,1,5]].
I have search google and I think function product in module itertools can do it. But, I have no idea how to make no same elements in every combinations and no double combinations.
Maybe something like:
from itertools import product
A=[1,2,3]
B=[2,4,1]
C=[1,5,3]
L = list(set([ tuple(sorted(l)) for l in product(A,B,C) if len(set(l))==3 ]))
Of course you would have to change 3 ot the relevant value if you work with more than 3 lists.
how about this? create a dicitonary with the sorted permutations as key. accept values only if all the three integers are different:
from itertools import product
A=[1,2,3]
B=[2,4,1]
C=[1,5,3]
LEN = 3
dct = {tuple(sorted(item)): item for item in product(A,B,C)
if len(set(item)) == LEN}
print(dct)
vals = list(dct.values())
print(vals)

Get list based on occurrences in unknown number of sublists

I'm looking for a way to make a list containing list (a below) into a single list (b below) with 2 conditions:
The order of the new list (b) is based on the number of times the value has occurred in some of the lists in a.
A value can only appear once
Basically turn a into b:
a = [[1,2,3,4], [2,3,4], [4,5,6]]
# value 4 occurs 3 times in list a and gets first position
# value 2 occurs 2 times in list a and get second position and so on...
b = [4,2,3,1,5,6]
I figure one could do this with set and some list magic. But can't get my head around it when a can contain any number of list. The a list is created based on user input (I guess that it can contain between 1 - 20 list with up 200-300 items in each list).
My trying something along the line with [set(l) for l in a] but don't know how to perform set(l) & set(l).... to get all matched items.
Is possible without have a for loop iterating sublist count * items in sublist times?
I think this is probably the closest you're going to get:
from collections import defaultdict
d = defaultdict(int)
for sub in outer:
for val in sub:
d[val] += 1
print sorted(d.keys(), key=lambda k: d[k], reverse = True)
# Output: [4, 2, 3, 1, 5, 6]
There is an off chance that the order of elements that appear an identical number of times may be indeterminate - the output of d.keys() is not ordered.
import itertools
all_items = set(itertools.chain(*a))
b = sorted(all_items, key = lambda y: -sum(x.count(y) for x in a))
Try this -
a = [[1,2,3,4], [2,3,4], [4,5,6]]
s = set()
for l in a:
s.update(l)
print s
#set([1, 2, 3, 4, 5, 6])
b = list(s)
This will add each list to the set, which will give you a unique set of all elements in all the lists. If that is what you are after.
Edit. To preserve the order of elements in the original list, you can't use sets.
a = [[1,2,3,4], [2,3,4], [4,5,6]]
b = []
for l in a:
for i in l:
if not i in b:
b.append(i)
print b
#[1,2,3,4,5,6] - The same order as the set in this case, since thats the order they appear in the list
import itertools
from collections import defaultdict
def list_by_count(lists):
data_stream = itertools.chain.from_iterable(lists)
counts = defaultdict(int)
for item in data_stream:
counts[item] += 1
return [item for (item, count) in
sorted(counts.items(), key=lambda x: (-x[1], x[0]))]
Having the x[0] in the sort key ensures that items with the same count are in some kind of sequence as well.

Categories