Populate list with tuples - python

I'm just fiddling with a simulation of (Mendel's First Law of Inheritance).
Before i can let the critters mate and analyze the outcome, the population has to be generated, i.e., a list has to be filled with varying numbers of three different types of tuples without unpacking them.
While trying to get familiar with itertools (I'll need combinations later in the mating part), I came up with the following solution:
import itertools
k = 2
m = 3
n = 4
hd = ('A', 'A') # homozygous dominant
het = ('A', 'a') # heterozygous
hr = ('a', 'a') # homozygous recessive
fhd = itertools.repeat(hd, k)
fhet = itertools.repeat(het, m)
fhr = itertools.repeat(hr, n)
population = [x for x in fhd] + [x for x in fhet] + [x for x in fhr]
which would result in:
[('A', 'A'), ('A', 'A'), ('A', 'a'), ('A', 'a'), ('A', 'a'), ('A', 'a'), ('A', 'a'), ('A', 'a'), ('A', 'a')]
Is there a more reasonable, pythonic or memory saving way to build the final list, e.g. without generating the lists of for the three types of individuals first?

You could use itertools.chain to combine the iterators:
population = list(itertools.chain(fhd, fhet, fhr))
Though I would say there's no need to use itertools.repeat when you could simply do [hd] * k. Indeed, I would approach this simulation as follows:
pops = (20, 30, 44)
alleles = (('A', 'A'), ('A', 'a'), ('a', 'a'))
population = [a for n, a in zip(pops, alleles) for _ in range(n)]
or perhaps
allele_freqs = ((20, ('A', 'A')),
(30, ('A', 'a')),
(44, ('a', 'a')))
population = [a for n, a in allele_freqs for _ in range(n)]

This should work I suppose.
pops = [2,3,4]
alleles = [('A','A'), ('A', 'a'), ('a','a')]
out = [pop*[allele] for pop, allele in zip(pops,alleles)]
print [item for sublist in out for item in sublist]
I have put the code on CodeBunk so you could run it too.

population = 2*[('A', 'A')] + 3*[('A', 'a')] + 4*[('a', 'a')]
or
hd = ('A', 'A') # homozygous dominant
het = ('A', 'a') # heterozygous
hr = ('a', 'a') # homozygous recessive
population = 2*[hd] + 3*[het] + 4*[hr]

Related

Markov Chain from String

I am currently sitting on a problem considering Markov chains were an input is given in the form of a list of strings. This input has to be transformed into a Markov chain. I have been sitting on this problem already a couple of hours.
My idea: As you can see below I have tried to use the counter from collections to count all transitions, which has worked. Now I am trying to count all the tuples where A and B are the first elements. This gives me all possible transitions for A.
Then I'll count the transitions like (A, B).
Then I want to use these to create a matrix with all probabilities.
def markov(seq):
states = Counter(seq).keys()
liste = []
print(states)
a = zip(seq[:-1], seq[1:])
print(list(a))
print(markov(["A","A","B","B","A","B","A","A","A"]))
So far I can't get the counting of the tuples to work.
Any help or new ideas on how to solve this is appreciated
To count the tuple, you can create another counter.
b = Counter()
for word_pair in a:
b[word_pair] += 1
b will keep the count of the pair.
To create the matrix, you can use numpy.
c = np.array([[b[(i,j)] for j in states] for i in states], dtype = float)
I will leave the task of normalizing each row sum to 1 as an exercise.
I didn't get exactly what you wanted but here is what I think it is:
from collections import Counter
def count_occurence(seq):
counted_states = []
transition_dict = {}
for tup in seq:
if tup not in counted_states:
transition_dict[tup] = seq.count(tup)
counted_states.append(tup)
print(transition_dict)
#{('A', 'A'): 3, ('A', 'B'): 2, ('B', 'B'): 1, ('B', 'A'): 2}
def markov(seq):
states = Counter(seq).keys()
print(states)
#dict_keys(['A', 'B'])
a = list(zip(seq[:-1], seq[1:]))
print(a)
#[('A', 'A'), ('A', 'B'), ('B', 'B'), ('B', 'A'), ('A', 'B'), ('B',
#'A'), ('A', 'A'), ('A', 'A')]
return a
seq = markov(["A","A","B","B","A","B","A","A","A"])
count_occurence(seq)

How to efficiently group pairs based on shared item?

I have a list of pairs (tuples), for simplification something like this:
L = [("A","B"), ("B","C"), ("C","D"), ("E","F"), ("G","H"), ("H","I"), ("G","I"), ("G","J")]
Using python I want efficiently split this list to:
L1 = [("A","B"), ("B","C"), ("C","D")]
L2 = [("E","F")]
L3 = [("G","H"), ("G","I"), ("G","J"), ("H","I")]
How to efficiently split list into groups of pairs, where for pairs in the group there must be always at least one pair which shares one item with others? As stated in one of the answers this is actually network problem. The goal is to efficiently split network into disconnected (isolated) network parts.
Type lists, tuples (sets) may be changed for achieving higher efficiency.
This is more like a network problem, so we can use networkx:
import networkx as nx
G=nx.from_edgelist(L)
l=list(nx.connected_components(G))
# after that we create the map dict , for get the unique id for each nodes
mapdict={z:x for x, y in enumerate(l) for z in y }
# then append the id back to original data for groupby
newlist=[ x+(mapdict[x[0]],)for x in L]
import itertools
#using groupby make the same id into one sublist
newlist=sorted(newlist,key=lambda x : x[2])
yourlist=[list(y) for x , y in itertools.groupby(newlist,key=lambda x : x[2])]
yourlist
[[('A', 'B', 0), ('B', 'C', 0), ('C', 'D', 0)], [('E', 'F', 1)], [('G', 'H', 2), ('H', 'I', 2), ('G', 'I', 2), ('G', 'J', 2)]]
Then to match your output format:
L1,L2,L3=[[y[:2]for y in x] for x in yourlist]
L1
[('A', 'B'), ('B', 'C'), ('C', 'D')]
L2
[('E', 'F')]
L3
[('G', 'H'), ('H', 'I'), ('G', 'I'), ('G', 'J')]
Initialise a list of groups as empty
Let (a, b) be the next pair
Collect all groups that contain any elements with a or b
Remove them all, join them, add (a, b), and insert as a new group
Repeat till done
That'd be something like this:
import itertools, functools
def partition(pred, iterable):
t1, t2 = itertools.tee(iterable)
return itertools.filterfalse(pred, t1), filter(pred, t2)
groups = []
for a, b in L:
unrelated, related = partition(lambda group: any(aa == a or bb == b or aa == b or bb == a for aa, bb in group), groups)
groups = [*unrelated, sum(related, [(a, b)])]
An efficient and Pythonic approach is to convert the list of tuples to a set of frozensets as a pool of candidates, and in a while loop, create a set as group and use a nested while loop to keep expanding the group by adding the first candidate set and then performing set union with other candidate sets that intersects with the group until there is no more intersecting candidate, at which point go back to the outer loop to form a new group:
pool = set(map(frozenset, L))
groups = []
while pool:
group = set()
groups.append([])
while True:
for candidate in pool:
if not group or group & candidate:
group |= candidate
groups[-1].append(tuple(candidate))
pool.remove(candidate)
break
else:
break
Given your sample input, groups will become:
[[('A', 'B'), ('C', 'B'), ('C', 'D')],
[('G', 'H'), ('H', 'I'), ('G', 'J'), ('G', 'I')],
[('E', 'F')]]
Keep in mind that sets are unordered in Python, which is why the order of the above output doesn't match your expected output, but for your purpose the order should not matter.
You can use the following code:
l = [("A","B"), ("B","C"), ("C","D"), ("E","F"), ("G","H"), ("H","I"), ("G","I"), ("G","J")]
result = []
if len(l) > 1:
tmp = [l[0]]
for i in range(1,len(l)):
if l[i][0] == l[i-1][1] or l[i][1] == l[i-1][0] or l[i][1] == l[i-1][1] or l[i][0] == l[i-1][0]:
tmp.append(l[i])
else:
result.append(tmp)
tmp = [l[i]]
result.append(tmp)
else:
result = l
for elem in result:
print(elem)
output:
[('A', 'B'), ('B', 'C'), ('C', 'D')]
[('E', 'F')]
[('G', 'H'), ('H', 'I'), ('G', 'I'), ('G', 'J')]
Note: this code is based on the hypothesis that your initial array is sorted. If this is not the case it will not work as it does only one pass on the whole list to create the groups (complexity O(n)).
Explanations:
result will store your groups
if len(l) > 1: if you have only one element in your list or an empty list no need to do any processing you have the answer
You will to a one pass on each element of the list and compare the 4 possible equality between the tuple at position i and the one at position i-1.
tmp is used to construct your groups, as long as the condition is met you add tuples to tmp
when the condition is not respected you add tmp (the current group that has been created to the result, reinitiate tmp with the current tuple) and you continue.
You can use a while loop and start iteration from first member of L(using a for loop inside). Check for the whole list if any member(either of the two) is shared or not. Then append it to a list L1 and pop that member from original list L. Then while loop would run again (till list L is nonempty). And for loop inside would run for each element in list to append to a new list L2. You can try this. (I will provide code it doesn't help)

python: dedup and count a given list

I am using the following code to dedup and count a given list:
def my_dedup_count(l):
l.append(None)
new_l = []
current_x = l[0]
current_count = 1
for x in l[1:]:
if x == current_x:
current_count += 1
else:
new_l.append((current_x, current_count))
current_x = x
current_count = 1
return new_l
With my testing code:
my_test_list = ['a','a','b','b','b','c','c','d']
my_dedup_count(my_test_list)
result is:
[('a', 2), ('b', 3), ('c', 2), ('d', 1)]
The code is doing fine and the output is correct. However, I feel my code is quite lengthy and am wondering would anyone suggest a more elegant way to improve the above code? Thanks!
Yes, don't re-invent the wheel. Use the standard library instead; you want to use the collections.Counter() class here:
from collections import Counter
def my_dedup_count(l):
return Counter(l).items()
You may want to just return the counter itself and use all functionality it provides (such as giving you a key-count list sorted by counts).
If you expected only consecutive runs to be counted (so ['a', 'b', 'a'] results in [('a', 1), ('b', 1), ('a', 1)], then use itertools.groupby():
from itertools import groupby
def my_dedup_count(l):
return [(k, sum(1 for _ in g)) for k, g in groupby(l)]
I wrote two versions of some shorter ways to write what you accomplished.
This first option ignores ordering, and all like values in the list will be deduplicated.
from collections import defaultdict
def my_dedup_count(test_list):
foo = defaultdict(int)
for el in test_list:
foo[el] += 1
return foo.items()
my_test_list = ['a','a','b','b','b','c','c','d', 'a', 'a', 'd']
>>> [('a', 4), ('c', 2), ('b', 3), ('d', 2)]
This second option respects order and only deduplicates consecutive duplicate values.
def my_dedup_count(my_test_list):
output = []
succession = 1
for idx, el in enumerate(my_test_list):
if idx+1 < len(my_test_list) and el == my_test_list[idx+1]:
succession += 1
else:
output.append((el, succession))
succession = 1
return output
my_test_list = ['a','a','b','b','b','c','c','d', 'a', 'a', 'd']
>>> [('a', 2), ('b', 3), ('c', 2), ('d', 1), ('a', 2), ('d', 1)]

Optimal strategy for choosing pairs from a list of combinations

Questions: Can someone help me to figure out how to calculate cycles that have the maximum amount of pairs (three per cycle - see last example)?
This is what I want to do:
-> pair two users every cycle such that
- each user is only paired once with an other user in a given cycle
- each user is only paired once with every other user in all cycles
Real world:
You meet one new person from a list every week (week = cycle).
You never meet the same person again.
Every user is matched to someone else per week
This is my problem:
I'm able to create combinations of users and select pairs of users that never have met. However, sometimes I'm able to only match two pairs in a cycle instead of three. Therefore,
I'm searching for a way to create the optimal selections from a list of combinations.
1) I start with 6 users:
users = ["A","B","C","D","E","F"]
2) From this list, I create possible combinations:
x = itertools.combinations(users,2)
for i in x:
candidates.append(i)
This gives me:
. A,B A,C A,D A,E A,F
. . B,C B,D B,E B,F
. . . C,D C,E C,F
. . . . D,E D,F
. . . . . E,F
or
candidates = [('A', 'B'), ('A', 'C'), ('A', 'D'), ('A', 'E'), ('A', 'F'), ('B', 'C'),
('B', 'D'), ('B', 'E'), ('B', 'F'), ('C', 'D'), ('C', 'E'), ('C', 'F'),
('D', 'E'), ('D', 'F'), ('E', 'F')]
3) Now, I would like to select pairs from this list, such that a user (A to F) is only present once & all users are paired with someone in this cycle
Example:
cycle1 = ('A','B'),('C','D') ('E','F')
Next cycle, I want to find an other set of three pairs.
I calculated that with 6 users there should be 5 cycles with 3 pairs each:
Example:
cycle 1: AF BC DE
cycle 2: AB CD EF
cycle 3: AC BE DF
cycle 4: AE BD CF
cycle 5: AD BF CE
Can someone help me to figure out how to calculate cycles that have the maximum amount of pairs (three per cycle - see last example)?
Like Whatang mentioned in the comments your problem is in fact equivalent to that of creating a round-robin style tournament. This is a Python version of the algorithm mentioned on the Wikipedia page, see also this and this answer.
def schedule(users):
# first make copy or convert to list with length `n`
users = list(users)
n = len(users)
# add dummy for uneven number of participants
if n % 2:
users.append('_')
n += 1
cycles = []
for _ in range(n-1):
# "folding", `//` for integer division
c = zip(users[:n//2], reversed(users[n//2:]))
cycles.append(list(c))
# rotate, fixing user 0 in place
users.insert(1, users.pop())
return cycles
schedule(['A', 'B', 'C', 'D', 'E', 'F'])
For your example it produces the following:
[[('A', 'F'), ('B', 'E'), ('C', 'D')],
[('A', 'E'), ('F', 'D'), ('B', 'C')],
[('A', 'D'), ('E', 'C'), ('F', 'B')],
[('A', 'C'), ('D', 'B'), ('E', 'F')],
[('A', 'B'), ('C', 'F'), ('D', 'E')]]
Here's an itertools-based solution:
import itertools
def hasNoRepeats(matching):
flattenedList = list(itertools.chain.from_iterable(matching))
flattenedSet = set(flattenedList)
return len(flattenedSet) == len(flattenedList)
def getMatchings(users, groupSize=2):
# Get all possible pairings of users
pairings = list(itertools.combinations(users, groupSize))
# Get all possible groups of pairings of the correct size, then filter to eliminate groups of pairings where a user appears more than once
possibleMatchings = filter(hasNoRepeats, itertools.combinations(pairings, len(users)/groupSize))
# Select a series of the possible matchings, making sure no users are paired twice, to create a series of matching cycles.
cycles = [possibleMatchings.pop(0)]
for matching in possibleMatchings:
# pairingsToDate represents a flattened list of all pairs made in cycles so far
pairingsToDate = list(itertools.chain.from_iterable(cycles))
# The following checks to make sure there are no pairs in matching (the group of pairs being considered for this cycle) that have occurred in previous cycles (pairingsToDate)
if not any([pair in pairingsToDate for pair in matching]):
# Ok, 'matching' contains only pairs that have never occurred so far, so we'll add 'matching' as the next cycle
cycles.append(matching)
return cycles
# Demo:
users = ["A","B","C","D","E","F"]
matchings = getMatchings(users, groupSize=2)
for matching in matchings:
print matching
output:
(('A', 'B'), ('C', 'D'), ('E', 'F'))
(('A', 'C'), ('B', 'E'), ('D', 'F'))
(('A', 'D'), ('B', 'F'), ('C', 'E'))
(('A', 'E'), ('B', 'D'), ('C', 'F'))
(('A', 'F'), ('B', 'C'), ('D', 'E'))
Python 2.7. It's a little brute-forcey, but it gets the job done.
Ok this is pseudo code, but it should do the trick
while length(candidates) > length(users)/2 do
{
(pairs, candidates) = selectPairs(candidates, candidates)
if(length(pairs) == length(users)/2)
cycles.append(pairs)
}
selectPairs(ccand, cand)
{
if notEmpty(ccand) then
cpair = cand[0]
ncand = remove(cpair, cand)
nccand = removeOccurences(cpair, ncand)
(pairs, tmp) = selectPairs(nccand, ncand)
return (pairs.append(cpair), tmp)
else
return ([],cand)
}
where:
remove(x, xs) remove x from xs
removeOccurences(x, xs) remove every pair of xs containing at least one element of the pair `x
EDIT: the condition to stop the algorithm may need further thought ...

List multiplication [duplicate]

This question already has answers here:
Operation on every pair of element in a list
(5 answers)
Closed 8 months ago.
I have a list L = [a, b, c] and I want to generate a list of tuples :
[(a,a), (a,b), (a,c), (b,a), (b,b), (b,c)...]
I tried doing L * L but it didn't work. Can someone tell me how to get this in python.
You can do it with a list comprehension:
[ (x,y) for x in L for y in L]
edit
You can also use itertools.product as others have suggested, but only if you are using 2.6 onwards. The list comprehension will work will all versions of Python from 2.0. If you do use itertools.product bear in mind that it returns a generator instead of a list, so you may need to convert it (depending on what you want to do with it).
The itertools module contains a number of helpful functions for this sort of thing. It looks like you may be looking for product:
>>> import itertools
>>> L = [1,2,3]
>>> itertools.product(L,L)
<itertools.product object at 0x83788>
>>> list(_)
[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]
Take a look at the itertools module, which provides a product member.
L =[1,2,3]
import itertools
res = list(itertools.product(L,L))
print(res)
Gives:
[(1,1),(1,2),(1,3),(2,1), .... and so on]
Two main alternatives:
>>> L = ['a', 'b', 'c']
>>> import itertools
>>> list(itertools.product(L, L))
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'b'), ('b', 'c'), ('c', 'a'), ('c', 'b'), ('c', 'c')]
>>> [(one, two) for one in L for two in L]
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'b'), ('b', 'c'), ('c', 'a'), ('c', 'b'), ('c', 'c')]
>>>
the former one needs Python 2.6 or better -- the latter works in just about any Python version you might be tied to.
x = [a,b,c]
y = []
for item in x:
for item2 in x:
y.append((item, item2))
Maybe not the Pythonic way but working
Ok I tried :
L2 = [(x,y) for x in L for x in L] and this got L square.
Is this the best pythonic way to do this? I would expect L * L to work in python.
The most old fashioned way to do it would be:
def perm(L):
result = []
for i in L:
for j in L:
result.append((i,j))
return result
This has a runtime of O(n^2) and is therefore quite slow, but you could consider it to be "vintage" style code.

Categories