Python - multiple combinations maths question - python

I'm trying to make a program that lists all the 64 codons/triplet base sequences of DNA...
In more mathematical terms, there are 4 letters: A, T, G and C.
I want to list all possible outcomes where there are three letters of each and a letter can be used multiple times but I have no idea how!
I know there are 64 possibilities and I wrote them all down on paper but I want to write a program that generates all of them for me instead of me typing up all 64!
Currently, I am at this point but I have most surely overcomplicated it and I am stuck:
list = ['A','T','G','C']
list2 = []
y = 0
x = 1
z = 2
skip = False
back = False
for i in range(4):
print(list[y],list[y],list[y])
if i == 0:
skip = True
else:
y=y+1
for i in range(16):
print(list[y],list[y],list[x])
print(list[y],list[x], list[x])
print(list[y],list[x], list[y])
print(list[y],list[x], list[z])
if i == 0:
skip = True
elif z == 3:
back = True
x = x+1
elif back == True:
z = z-1
x = x-1
else:
x = x+1
z = z+1
Any help would be much appreciated!!!!

You should really be using itertools.product for this.
from itertools import product
l = ['A','T','G','C']
combos = list(product(l,repeat=3 ))
# all 64 combinations
Since this produces an iterator, you don't need to wrap it in list() if you're just going to loop over it. (Also, don't name your list list — it clobbers the build-in).
If you want a list of strings you can join() them as John Coleman shows in a comment under your question.
list_of_strings = ["".join(c) for c in product(l,repeat=3) ]

Look for for pemuations with repetitions there tons of code available for Python .
I would just use library , if you want to see how they implemented it look inside the library . These guys usually do it very efficiency
import itertools
x = [1, 2, 3, 4, 5, 6]
[p for p in itertools.product(x, repeat=2)]

Related

For all sets in a list, extract the first number only

I have a list that looks like this:
b = [{'dg_12.942_ch_293','dg_22.38_ca_627'},
{'dg_12.651_cd_286','dg_14.293_ce_334'},
{'dg_17.42_cr_432','dg_18.064_cm_461','dg_18.85_cn_474','dg_20.975_cf_489'}]
I want to keep only the first number for each item in each set:
b = [{'12','22'},
{'12','14'},
{'17','18','18','20'}]
I then want to find the difference between the smallest and the largest number of each set and put it in a list, so in this case I would have:
b = [3,2,3]
Ugly and without any sanity check, but do the work.
import re
SEARCH_NUMBER_REGEX = re.compile("(\d+)")
def foo(dataset):
out = []
for entries in dataset:
numbers = []
for entry in entries:
# Search for the first number in the str
n = SEARCH_NUMBER_REGEX.search(entry).group(1)
n = int(n)
numbers.append(n)
# Sort the numbers and sustract the last one (largest)
# by the first one (smallest)
numbers.sort()
out.append(numbers[-1] - numbers[0])
return out
b = [
{'dg_12.942_ch_293', 'dg_22.38_ca_627'},
{'dg_12.651_cd_286', 'dg_14.293_ce_334'},
{'dg_17.42_cr_432', 'dg_18.064_cm_461', 'dg_18.85_cn_474', 'dg_20.975_cf_489'}
]
print(b)
# > [10, 2, 3]
This is giving o/p as [10,2,3]
(The difference b/w 22 and 12 is 10)
b = [{'12','22'},
{'12','14'},
{'17','18','18','20'}]
l = []
for i in b:
large ,small = -99, 99
for j in i:
j = int(j)
if large < j:
large = j
if small >j:
small = j
l.append(large - small)
print(l)
Here's yet another way to do it:
import re
ba = [{'dg_12.942_ch_293', 'dg_22.38_ca_627'},
{'dg_12.651_cd_286', 'dg_14.293_ce_334'},
{'dg_17.42_cr_432', 'dg_18.064_cm_461', 'dg_18.85_cn_474', 'dg_20.975_cf_489'}]
bb = []
for s in ba:
ns = sorted([int(re.search(r'(\d+)', ss)[0]) for ss in s])
bb.append(ns[-1]-ns[0])
print(bb)
Output:
[10, 2, 3]
Or, if you want to be ridiculous:
ba = [{'dg_12.942_ch_293', 'dg_22.38_ca_627'},
{'dg_12.651_cd_286', 'dg_14.293_ce_334'},
{'dg_17.42_cr_432', 'dg_18.064_cm_461', 'dg_18.85_cn_474', 'dg_20.975_cf_489'}]
bb = [(n := sorted([int(re.search(r'(\d+)', ss)[0]) for ss in s]))[-1]-n[0] for s in ba]
print(bb)
In your final product I see it was "[3,2,3]" but if I am understanding your question correct, it would be [10,2,3]. Either way the code I have below will atleast point you in the right direction (hopefully).
This code will iterate through each tuple in the list and split the str (since that is all we want to compare) and add them into lists. These numbers are then evaluated and subtracts the smallest number from the biggest number, and places it in a separate array. This "separate array" is the final one as shown in your question.
Goodluck - hopefully this helps!
import re
b = [('dg_12.942_ch_293','dg_22.38_ca_627'), ('dg_12.651_cd_286','dg_14.293_ce_334'), ('dg_17.42_cr_432','dg_18.064_cm_461','dg_18.85_cn_474','dg_20.975_cf_489')]
final_array = []
for tup in b:
x = tup
temp_array = []
for num in x:
split_number = re.search(r'\d+', num).group()
temp_array.append(split_number)
difference = int(max(temp_array)) - int(min(temp_array))
final_array.append(difference)
print(final_array)

Python create two lists with one list comprehension

I am currently using nested for loops to add data to two lists at once. See below code.
headers, in the code below is a beautiful soup object.
openData = []
count = 0
openC = 7
closeData = []
closeC = 10
for j in headers:
for z in j:
for data in z:
count += 1
if count == 1:
openData.append(data)
elif count == openC:
openData.append(data)
openC += 6
if count == 4:
closeData.append(data)
elif closeC == count:
closeData.append(data)
closeC += 6
The two lists here are openData and closeData.
As a rough example, I know I could do something like
openData = [data for j in headers for z in j for data in z]
closeData = [data for j in headers for z in j for data in z]
I am worried that this would take roughly twice as long since the looping operations are being carried out twice. but is there a way to combine both statements like
openData, closeData = [list comprehension]
I am also confused about how to incorporate the if, elif logic into the list comprehension. Finally, is this something I should be doing? or would doing so be an abuse of list comprehensions. The above code I wrote works, but it looks ugly. My goal is better code than what I have.
My attempt (with some basic initial data):
#some initial data
data = list(range(20))
openData, closeData = data[1::6], data[4::6]
print(openData, closeData)
Prints:
[1, 7, 13, 19] [4, 10, 16]

Funny behaviour of my recursive function

t = 8
string = "1 2 3 4 3 3 2 1"
string.replace(" ","")
string2 = [x for x in string]
print string2
for n in range(t-1):
string2.remove(' ')
print string2
def remover(ca):
newca = []
print len(ca)
if len(ca) == 1:
return ca
else:
for i in ca:
newca.append(int(i) - int(min(ca)))
for x in newca:
if x == 0:
newca.remove(0)
print newca
return remover(newca)
print (remover(string2))
It's supposed to be a program that takes in a list of numbers, and for every number in the list it subtracts from it, the min(list). It works fine for the first few iterations but not towards the end. I've added print statements here and there to help out.
EDIT:
t = 8
string = "1 2 3 4 3 3 2 1"
string = string.replace(" ","")
string2 = [x for x in string]
print len(string2)
def remover(ca):
newca = []
if len(ca) == 1: return()
else:
for i in ca:
newca.append(int(i) - int(min(ca)))
while 0 in newca:
newca.remove(0)
print len(newca)
return remover(newca)
print (remover(string2))
for x in newca:
if x == 0:
newca.remove(0)
Iterating over a list and removing things from it at the same time can lead to strange and unexpected behvaior. Try using a while loop instead.
while 0 in newca:
newca.remove(0)
Or a list comprehension:
newca = [item for item in newca if item != 0]
Or create yet another temporary list:
newnewca = []
for x in newca:
if x != 0:
newnewca.append(x)
print newnewca
return remover(newnewca)
(Not a real answer, JFYI:)
Your program can be waaay shorter if you decompose it into proper parts.
def aboveMin(items):
min_value = min(items) # only calculate it once
return differenceWith(min_value, items)
def differenceWith(min_value, items):
result = []
for value in items:
result.append(value - min_value)
return result
The above pattern can, as usual, be replaced with a comprehension:
def differenceWith(min_value, items):
return [value - min_value for value in items]
Try it:
>>> print aboveMin([1, 2, 3, 4, 5])
[0, 1, 2, 3, 4]
Note how no item is ever removed, and that data are generally not mutated at all. This approach helps reason about programs a lot; try it.
So IF I've understood the description of what you expect,
I believe the script below would result in something closer to your goal.
Logic:
split will return an array composed of each "number" provided to raw_input, while even if you used the output of replace, you'd end up with a very long number (you took out the spaces that separated each number from one another), and your actual split of string splits it in single digits number, which does not match your described intent
you should test that each input provided is an integer
as you already do a print in your function, no need for it to return anything
avoid adding zeros to your new array, just test first
string = raw_input()
array = string.split()
intarray = []
for x in array:
try:
intarray.append(int(x))
except:
pass
def remover(arrayofint):
newarray = []
minimum = min(arrayofint)
for i in array:
if i > minimum:
newarray.append(i - minimum)
if len(newarray) > 0:
print newarray
remover(newarray)
remover(intarray)

Another IndexError with python

This is supposed to become a random name generator in the end, all the random part is working. Only problem is that it is REALLY random, getting weird stuff like aaaaaaaa etc.
So I'm trying to add a rule to not allow 2 vowels after each other (same goes with consonants).
So yeah, guys please help me out here. I've been looking throu' this code for 2 hours now and I cant find the problem.
Just pasting my entire code here.
import random
import string
import numpy as np
from sys import argv
import csv
# abcdefghijklmnopqrstuvwxyz
# Example output: floke fl0ke flok3 fl0k3
#
class facts:
kons = list('bcdfghjklmnpqrstvwxz') #20
voks = list('aeiouy') #6
abc = list('abcdefghijklmnopqrstuvwxyz')
def r_trfa(): #True Or False (1/0)
x = random.randrange(0, 2)
return x;
def r_kons(): #Konsonant
y = random.randrange(0, 20)
x = facts.kons[y]
return x;
def r_vok(): #Vokal
y = random.randrange(0, 6)
x = facts.voks[y]
return x;
def r_len(): #Langd
x = random.randrange(4, 8)
return x;
def r_type():
x = random.randrange(1, 4)
return x;
def r_structure(length): #Skapar strukturen
y = r_type()
if y == 0:
no1 = 1
else:
no1 = 2
i = 0
x = [no1]
y = r_type()
if not no1 == y:
x.append(y)
while i < length:
y = r_type()
if not x[i] == y:
x.append(y)
i = i + 1
x2 = list(x)
return x2;
def name(): #Final product
struct = r_structure(r_len())
name = struct
You've got several bugs. For example, you're checking the value y against 0 even though it is always in the range 1-4, probably unintended behavior. Furthermore, you never actually call a function that gets you a character, and you never create a string. Thus it's not clear what you're trying to do.
Here's how I'd rewrite things based on my guess of what you want to do.
import random, itertools
voks = frozenset('aeiouy')
abc = 'abcdefghijklmnopqrstuvwxyz'
def r_gen():
last=None #both classes ok
while 1:
new = random.choice(abc)
if (new in voks) != last:
yield new
last = (new in voks)
def name(): #Final product
length = random.randrange(4, 8)
return ''.join(itertools.islice(r_gen(), length))
The problem you're having is that your loop increments i always, but only adds an additional value to your x list if the random value doesn't match x[i]. This means that if you get several matches in a row, i may become larger than the largest index into x and so you'll get an IndexError exception.
I'm not entirely sure I understand what you're trying to do, but I think this will do something similar to your current r_structure function:
def r_structure(length):
"""Returns a list of random "types", avoiding any immediate repeats"""
x = [r_type()]
while len(x) < length:
y = r_type()
if y != x[-1]: # check against the last item in the list
x.append(y)
return x
If your goal is simply to randomly generate a sequence of alternating vowels and consonants, there's an easier way than what you seem to be doing. First off, you can use random.choice to pick your characters. Further, rather than picking many letters and rejecting ones that are of the wrong type, you can simply pick from one string, then pick from the other, for as long as you need:
import random
def alternating_characters(length):
characters = ["aeiouy", "bcdfghjklmnpqrstvwxz"]
char_type = random.randrange(2) # pick a random letter type to start with
results = []
while len(char_list) < length:
results.append(random.choice(characters[char_type])) # pick random char
char_type = 1-char_type # pick from the other list next time
return "".join(char_list)
Well it's unclear what you want to do.. As the conditions on vowels and consonants is the same, so why do you need to differentiate between them?
So all you need to do is take a random letter and check that it doesn't match with the last letter.
Here's some code:
import random
abc = 'abcdefghijklmnopqrstuvwxyz'
def gen_word(length):
last = ''
while length > 0:
l = random.choice(abc)
if l != last:
length -= 1
yield l
if __name__ == '__main__':
word = ''.join(gen_word(10))
print word

Python 3.0+ Calculating Mode

I have written a program to calculate the most often occurring number. This works great unless you have 2 most occurring numbers in a list such as 7,7,7,9,9,9. For that I wrote in:
if len(modeList) > 1 and modeList[0] != modeList[1]:
break
but then I encounter other problems like a set of number with 7,9,9,9,9. What do I do. Below is my code that will calculate one Mode.
list1 = [7,7,7,9,9,9,9]
numList=[]
modeList=[]
finalList =[]
for i in range(len(list1)):
for k in range(len(list1)):
if list1[i] == list1[k]:
numList.append(list1[i])
numList.append("EOF")
w = 0
for w in range(len(numList)):
if numList[w] == numList[w + 1]:
modeList.append(numList[w])
if numList[w + 1] == "EOF":
break
w = 0
lenMode = len(modeList)
print(lenMode)
while lenMode > 1:
for w in range(lenMode):
print(w)
if w != lenMode - 1:
if modeList[w] == modeList[w + 1]:
finalList.append(modeList[w])
print(w)
lenFinal = len(finalList)
modeList = []
for i in range(lenFinal):
modeList.append(finalList[i])
finalList = []
lenMode = len(modeList)
and then
print(modeList)
We have not learned counters but I would be open to it if someone could explain!
I would just use collections.Counter for this:
>>> from collections import Counter
>>> c = Counter([7,9,9,9,9])
>>> max(c.items(), key=lambda x:x[1])[0]
9
This is really rather simple. All it does is count how many times each value appears in the list, and then selects the element with the highest count.
I would use statistics.mode() for this. If there is more than one mode, it will raise an exception. If you need to handle multiple modes (it's not clear to me whether that's the case), you probably want to use a collections.Counter object as suggested by NPE.

Categories