Related
I have multiple pairwise comparisons to do in order to find common items between the lists. My code below works. However, I would like to keep track of the names of the lists. which are geno1, geno2 and geno3. I'm not sure how to get the combinations refering to the variable names instead of the arrays. Although there are related questions on Stack overflow such as Getting method parameter names, I'm hoping to get a quite easy solution.Thank you in advance for your time.
import itertools #to get all pairwise combinations
geno1 = [1,2,3]
geno2 = [2,5]
geno3 = [1,2,4,5]
genotypes = [geno1,geno2,geno3]
combinations = list(itertools.combinations(genotypes,2))
for pair in combinations:
commonItem = [x for x in pair[0] if x in pair[1]]
print(f'{len(commonItem)} common items between {pair}')#Here instead of pair I want to know which pair of genotypes such as geno1 geno2, or geno1 geno3.
print(commonItem)
print()
Create a dictionary, where the keys are the name of the list and values are the lists that you originally have. You could do something using locals() if you didn't want to write out the name of the lists as strings, but it's pretty hacky and I wouldn't recommend it:
import itertools
geno1 = [1,2,3]
geno2 = [2,5]
geno3 = [1,2,4,5]
genotypes = {"geno1": geno1, "geno2": geno2, "geno3": geno3}
combinations = list(itertools.combinations(genotypes.items(),2))
for (fst_name, fst_lst), (snd_name, snd_lst) in combinations:
commonItem = [x for x in fst_lst if x in snd_lst]
print(f'{len(commonItem)} common items between {fst_name} and {snd_name}')
print(commonItem)
print()
Output:
1 common items between geno1 and geno2
[2]
2 common items between geno1 and geno3
[1, 2]
2 common items between geno2 and geno3
[2, 5]
You're probably best off putting it all into a dictionary. Then you can get the combinations of the names first, and look at the actual lists only inside the loop:
import itertools
genotypes = {
'geno1': [1,2,3],
'geno2': [2,5],
'geno3': [1,2,4,5],
}
combinations = list(itertools.combinations(genotypes,2))
for left_name, right_name in combinations:
left_geno = genotypes[left_name]
right_geno = genotypes[right_name]
commonItem = [x for x in left_geno if x in right_geno]
print(f'{len(commonItem)} common items between {left_name} and {right_name}: {commonItem}\n')
Where you want to treat the names as data, you should be storing them as data. A dict will let you hold names and values together:
import itertools
genotypes = {
'geno1': [1,2,3],
'geno2': [2,5],
'geno3': [1,2,4,5],
}
combinations = itertools.combinations(genotypes.items(), 2)
for (k1, v1), (k2, v2) in combinations:
commonItem = [x for x in v1 if x in v2]
print(f'{len(commonItem)} common items between {k1} and {k2}')
print(commonItem)
Output:
1 common items between geno1 and geno2
[2]
2 common items between geno1 and geno3
[1, 2]
2 common items between geno2 and geno3
[2, 5]
For more context, see:
How do I create variable variables?
Getting the name of a variable as a string
I have a grouped list of strings that sort of looks like this, the lists inside of these groups will always contain 5 elements:
text_list = [['aaa','bbb','ccc','ddd','eee'],
['fff','ggg','hhh','iii','jjj'],
['xxx','mmm','ccc','bbb','aaa'],
['fff','xxx','aaa','bbb','ddd'],
['aaa','bbb','ccc','ddd','eee'],
['fff','xxx','aaa','ddd','eee'],
['iii','xxx','ggg','jjj','aaa']]
The objective is simple, group all of the list that is similar by the first 3 elements that is then compared against all of the elements inside of the other groups.
So from the above example the output might look like this (output is the index of the list):
[[0,2,4],[3,5]]
Notice how if there is another list that contains the same elements but in a different order is removed.
I've written the following code to extract the groups but they would return duplicates and I am unsure how to proceed. I also think this might not be the most efficient way to do the extraction as the real list can contain upwards to millions of groups:
grouped_list = []
for i in range(0,len(text_list)):
int_temp = []
for m in range(0,len(text_list)):
if i == m:
continue
bool_check = all( x in text_list[m] for x in text_list[i][0:3])
if bool_check:
if len(int_temp) == 0:
int_temp.append(i)
int_temp.append(m)
continue
int_temp.append(m)
grouped_list.append(int_temp)
## remove index with no groups
grouped_list = [x for x in grouped_list if x != []]
Is there a better way to go about this? How do I remove the duplicate group afterwards? Thank you.
Edit:
To be clearer, I would like to retrieve the lists that is similar to each other but only using the first 3 elements of the other lists. For example, using the first 3 elements from list A, check if list B,C,D... contains all 3 of the elements from list A. Repeat for the entire list then remove any list that contains duplicate elements.
You can build a set of frozensets to keep track of indices of groups with the first 3 items being a subset of the rest of the members:
groups = set()
sets = list(map(set, text_list))
for i, lst in enumerate(text_list):
groups.add(frozenset((i, *(j for j, s in enumerate(sets) if set(lst[:3]) <= s))))
print([sorted(group) for group in groups if len(group) > 1])
If the input list is long, it would be faster to create a set of frozensets of the first 3 items of all sub-lists and use the set to filter all combinations of 3 items from each sub-list, so that the time complexity is essentially linear to the input list rather than quadratic despite the overhead in generating combinations:
from itertools import combinations
sets = {frozenset(lst[:3]) for lst in text_list}
groups = {}
for i, lst in enumerate(text_list):
for c in map(frozenset, combinations(lst, 3)):
if c in sets:
groups.setdefault(c, []).append(i)
print([sorted(group) for group in groups.values() if len(group) > 1])
I have the the following pairs stored in the following list
sample = [[CGCG,ATAT],[CGCG,CATC],[ATAT,TATA]]
Each pairwise comparison can have only two unique combinations of characters, if not then those pairwise comparisons are eliminated. eg,
In sample[1]
C C
G A
C T
G C
Look a the corresponding elements in both sub-lists, CC, GA, CT, GC.
Here, there are more than two types of pairs (CC), (GA), (CT) and (GC). So this pairwise comparison cannot occur.
Every comparison can have only 2 combinations out of (AA, GG,CC,TT, AT,TA,AC,CA,AG,GA,GC,CG,GT,TG,CT,TC) ... basically all possible combinations of ACGT where order matters.
In the above example, more than 2 such combinations are found.
However,
In sample[0]
C A
G T
C A
G T
There are only 2 unique combinations: CA and GT
Thus, the only pairs, that remain are:
output = [[CGCG,ATAT],[ATAT,TATA]]
I would prefer if the code was in traditional for-loop format and not comprehensions
This is a small part of the question listed here. This portion of the question is re-asked, as the answer provided earlier provided incorrect output.
def filter_sample(sample):
filtered_sample = []
for s1, s2 in sample:
pairs = {pair for pair in zip(s1, s2)}
if len(pairs) <= 2:
filtered_sample.append([s1, s2])
return filtered_sample
Running this
sample = [["CGCG","ATAT"],["CGCG","CATC"],["ATAT","TATA"]]
filter_sample(sample)
Returns this
[['CGCG', 'ATAT'], ['ATAT', 'TATA']]
sample = [[CGCG,ATAT],[CGCG,CATC],[ATAT,CATC]]
result = []
for s in sample:
first = s[0]
second = s[1]
combinations = []
for i in range(0,len(first)):
comb = [first[i],second[i]]
if comb not in combinations:
combinations.append(comb)
if len(combinations) == 2:
result.append(s)
print result
The core of this task is extracting the pairs from your sublists and counting the number of unique pairs. Assuming your samples actually contain strings, you can use zip(*sub_list) to get the pairs. Then you can use set() to remove duplicate entries.
sample = [['CGCG','ATAT'],['CGCG','CATC'],['ATAT','CATC']]
def filter(sub_list, n_pairs):
pairs = zip(*sub_list)
return len(set(pairs)) == n_pairs
Then you can use a for loop or a list comprehension to apply this function to your main list.
new_sample = [sub_list for sub_list in sample if filter(sub_list, 2)]
...or as a for loop...
new_sample = []
for sub_list in sample:
if filter(sub_list, 2):
new_sample.append(sub_list)
I need to split a list like:
m=[[1,2,3,4,5,6,7,8,9,0],[11,12,13,14,15,16,17,18,19,20],[21,22,23,24,25,26,27,28,29,30],[31,32,33,34,35,36,37,38,39,40],[41,42,43,44,45,46,47,48,49,50],[51,52,53,54,55,56,57,58,59,60],[61,62,63,64,65,66,67,68,69,70],[71,72,73,74,75,76,77,78,79,80],[81,82,83,84,85,86,87,88,89,90],[91,92,93,94,95,96,97,98,99,100],
into smaller 5x5 lists like:
m1=[[1,2,3,4,5],[11,12,13,14,15],[21,22,23,24,25],[31,32,33,34,35],[41,42,43,44,45]]
m2=[[6,7,8,9,0],[16,17,18,19,20],[26,27,28,29,30],[36,37,38,39,40],[46,47,48,49,50]]
and have a new list that contains these smaller lists:
new_list=[m1,m2,m3,m4]
thanks
First, how can you split a list of 10 elements into two lists of 5 elements?
def split_list(m):
return m[:len(m)//2], m[len(m)//2:]
Now, we want to map that over each list in m:
mm = [split_list(sublist) for sublist in m]
But now we have a list of pairs of lists, not a pair of lists of lists. How do you fix that? zip is the answer: it turns an X of Y of foo into a Y of X of foo:
new_list = list(zip(*mm))
If you don't like the fact that this gives you a list of tuples of lists instead of a list of lists of lists, just use a list comprehension with the list function:
new_list = [list(m) for m in zip(*mm)]
If you want to change it to split any list into N/5 groups of 5, instead of 2 groups of N/2, that's just a matter of changing the first function. A general-purpose grouper function will do that, like the one from the itertools recipes (or see this question for other options, if you prefer):
def grouper(iterable, n=5):
args = [iter(iterable)] * n
return itertools.izip_longest(*args)
So:
new_list = [list(mm) for mm in zip(*grouper(m, 5))]
I have N lists of 3 elements. I want to find all combinations between them that don't use the same index twice. Each combination must always have 3 items.
Example:
list1 = [l11, l12, l13]
list2 = [l21, l22, l23]
list3 = [l31, l32, l33]
All combinations possible:
combinaison1 = l11, l22, l33
combinaison2 = l11, l23, l32
combinaison3 = l12, l21,l33
combinaison4= l12, l23, l31
combinaison5=l13, l21, l32
combinaison6= l13, l22, l31
BUT I don't want:
BADcombinaison = l11,l21,l32
How can I do that in python?
Since you want only up to 3 items from 3 or more lists, the first step is to find k-permutations of the list of lists with k-3. I.e. permutations(lists, 3). From there you don't actually have to permute the indexes too, because you want unique indexes. (Note: this allows variable number of lists and also a variable length of the lists, but the lengths of all input and output lists are equal).
Essentially instead of trying to permute indexes, the indexes are just (0, 1, 2) since you specify no repetition of indexes, and the lists are permuted.
from itertools import permutations
# number of lists may vary (>= length of lists)
list1 = ["l11", "l12", "l13"]
list2 = ["l21", "l22", "l23"]
list3 = ["l31", "l32", "l33"]
list4 = ["l41", "l42", "l43"]
lists = [list1, list2, list3, list4]
# lenths of lists must be the same and will be the size of outputs
size = len(lists[0])
for subset in permutations(lists, size):
print([sublist[item_i] for item_i, sublist in enumerate(subset)])