Comparing strings to a dictionary in groups of multiples of 3 - python

I am writing a program which reads in a number of DNA characters (which is always divisible by 3) and checks if they correspond to the same amino acid. For example AAT and AAC both correspond to N so my program should print "It's the same". It does this fine but i just don't know how to compare 6/9/12/any multiple of 3 and see if the definitions are the same. For example:
AAAAACAAG
AAAAACAAA
Should return me It's the same as they are both KNK.
This is my code:
sequence = {}
d = 0
for line in open('codon_amino.txt'):
pattern, character = line.split()
sequence[pattern] = character
a = input('Enter original DNA: ')
b = input('Enter patient DNA: ')
for i in range(len(a)):
if sequence[a] == sequence[b]:
d = d + 0
else:
d = d + 1
if d == 0:
print('It\'s the same')
else:
print('Mutated!')
And the structure of my codon_amino.txt is structured like:
AAA K
AAC N
AAG K
AAT N
ACA T
ACC T
ACG T
ACT T
How do i compare the DNA structures in patters of 3? I have it working for strings which are 3 letters long but it returns an error for anything more.
EDIT:
If i knew how to split a and b into a list which was in intervals of three that might help so like:
a2 = a.split(SPLITINTOINTERVALSOFTHREE)
then i could easily use a for loop to iterate through them, but how do i split them in the first place?
EDIT: THE SOLUTION:
sequence = {}
d = 0
for line in open('codon_amino.txt'):
pattern, character = line.split()
sequence[pattern] = character
a = input('Enter original DNA: ')
b = input('Enter patient DNA: ')
for i in range(len(a)):
if all(sequence[a[i:i+3]] == sequence[b[i:i+3]] for i in range(0, len(a), 3)):
d = d + 1
else:
d = d + 0
if d == 0:
print('The patient\'s amino acid sequence is mutated.')
else:
print('The patient\'s amino acid sequence is not mutated.')

I think you can replace your second loop and comparisons with:
if all(sequence[a[i:i+3]] == sequence[b[i:i+3]] for i in range(0, len(a), 3)):
print('It\'s the same')
else:
print('Mutated!')
The all function iterates over the generator expression, and will be False if any of the values is False. The generator expression compares length-three slices of the strings.

I think what you should do is :
write a function to split a string into chunks a 3 characters.
(Some hints here)
write a function to convert a string into it's corresponding amino acid sequence (using previous function)
compare the sequences.

If this is what you mean:
def DNA(string):
return [string[i:i+3] for i in xrange(0,len(string),3)]
amino_1 = DNA("AAAAACAAG")
amino_2 = DNA("AAAAACAAA")
print amino_1, amino_2
print amino_1 == amino_2
Output: ['AAA', 'AAC', 'AAG'] ['AAA', 'AAC', 'AAA']
False

Related

How to compare two strings by character and print matching positions in python

I want to compare two strings by character and then print how many times they have the same character in the same position. If I were to input 'soon' and 'moon' as the two strings, it would print that they match in 3 positions.
I've run into another problem where if the 2nd string is shorter, it gives me an error "string index out of range".
I tried
a = input('Enter string')
b = input('Enter string')
i=0
count = 0
while i<len(a):
if b[i] == a[i]:
match = match + 1
i = i + 1
print(match, 'positions.')
You have some extraneous code in the form of the 2nd if statement and you don't have the match incrementor in the first if statement. You also don't need the found variable. This code should solve the problem
# get input from the user
A = input('Enter string')
B = input('Enter string')
# set the incrementors to 0
i=0
match = 0
# loop through every character in the string.
# Note, you may want to check to ensure that A and B are the same lengths.
while i<len(A):
# if the current characters of A and B are the same advance the match incrementor
if B[i] == A[I]:
# This is the important change. In your code this line
# is outside the if statement, so it advances for every
# character that is checked not every character that passes.
match = match + 1
# Move to the next character
i = i + 1
# Display the output to the user.
print(match, 'positions.')
num_matching = 0
a = "Hello"
b = "Yellow"
shortest_string = a if len(a) <= len(b) else b
longest_string = b if shortest_string == a else a
for idx, val in enumerate(shortest_string):
if val == b[idx]:
num_matching += 1
print(f"In {a} & {b} there are {num_matching} characters in the same position!")
Simplifying my answer in light of #gog's insight regarding zip versus zip_longest:
string1 = "soon"
string2 = "moon"
matching_positions = 0
for c1, c2 in zip(string1, string2):
if c1 == c2:
matching_positions += 1
print(matching_positions)
Output:
3

How to join list elements into strings?

I have a list here where I only need to input 10 letters or strings. I am having a problem separating the list.
print ("Please input letter: \n")
num_string = []
num = 10
for i in range (0,num):
element = str(input(str(i + 1) + ". "))
num_string.append(element)
string = ' '.join([str(item) for item in num_string])
print (string)
In my code, for example, I inputted a b c d e f g h i j since it is only 10 inputs. Instead of having an output like a b c d e f g h i j because I used the join method, I want to have a NewLine for every list. So I want it to be like
a
b
c
d
e
f
g
h
i
j
You are almost there just instead of joining a whitespace, join a newline, also you don't need to convert each element to string because each element is a string already because input always returns a string (in Python3) (so this is redundant: str(input()), it is the exact same as: input()):
string = '\n'.join(num_string)
Complete example (removed the redundant str):
print("Please input letter: \n")
num_string = []
num = 10
# the 0 isn't necessary either but
# I guess for clarification it can stay
for i in range(0, num):
element = input(str(i + 1) + ". ")
num_string.append(element)
string = '\n'.join(num_string)
print(string)
Alternatively you can use this (instead of the last two lines in the above code example):
print(*num_string, sep='\n')
And if you really want to shorten the code (it can be as short as 3 lines):
print("Please input letter: \n")
num = 10
print('\n'.join(input(f'{i + 1}. ') for i in range(num)))
print ("Please input letter: \n")
num_string = []
num = 10
for i in range (0,num):
element = str(input(str(i + 1) + ". "))
num_string.append(element)
string = '\n' .join([str(item) for item in num_string])
print (string)
Use '\n' , it is like a breakline in your output

How to display A B C D E if a input 5 numbers 1 2 3 4 5

import random
ltr =" ABCDEFGHIJKLMNOPQRSTUVWXYZ"
print(ltr.strip())
a = input('')
b = input('')
c = input('')
d = input('')
e = input('')
print(a,b,c,d,e)
var1 = random.randrange(1,26)
var2 = random.randrange(1,26)
var3 = random.randrange(1,26)
var4 = random.randrange(1,26)
var5 = random.randrange(1,26)
print(var1,var2,var3,var4,var5)
What I want to do is when I input numbers from 1 to 26 it should display a corresponding result. As you can see, the user must input 5 numbers. For example, if we input 1 2 3 4 5 the result must be A B C D E. Also, we have random numbers. For example, if our random numbers are 4 5 3 1 2 the result must be D E C A B.
I don't know what to do to display the result.
random.randrange(1,26)
This is wrong. Random's second parameter is non-inclusive, meaning you'll get numbers from 1 to 25.
You should use:
random.randrange(1, len(ltr))
Then your result letter is just accessing the correct index in your ltr string - simply doing ltr(var1)
As for user input, you need to invert it to integer values like this:
a = int(input(''))
strip() your ltr and find element at (entered_position-1) , repeat for five times ,and join the list separated by a space
ltr = " ABCDEFGHIJKLMNOPQRSTUVWXYZ"
print(' '.join([ltr.strip()[int(input('number 1-26: '))-1] for _ in range(5)]))
print(' '.join([ltr.strip()[random.randrange(1,27)-1] for _ in range(5)]))
What you want to do is to print the letter in the alphabet which is at the position the user entered. In other words, the user enters the index of that letter inside the alphabet. Since you can access a character inside a string using its index in python using letter = string[index] you can do this:
ltr =" ABCDEFGHIJKLMNOPQRSTUVWXYZ"
a = input('')
b = input('')
c = input('')
d = input('')
e = input('')
print(a,b,c,d,e)
print(ltr[a], ltr[b], ltr[c], ltr[d], ltr[e])
Note that due to the space at the start of ltr, A will be output when entering 1.
Edit: Update according to question in comments.
You can sort the inputs if you put them in a list and sort() them. Then you can get the characters at the positions from the input list:
ltr =" ABCDEFGHIJKLMNOPQRSTUVWXYZ"
inputs = []
for _ in range(5):
inputs.append(int(input("Enter a number:")))
inputs.sort()
print(', '.join(map(str, inputs)))
print(', '.join(ltr[i] for i in inputs))

How to count instances of consecutive letters in a string in Python 3?

I want to count the number of instances where consecutive letters are identical in a given string.
For example, my string input is:
EOOOEOEE
I would only like to find the number of occasions where there is more than one consecutive 'O'.
The Output should be:
1
Since, there is only one set of O's that come consecutively.
This is possible with itertools.groupby:
from itertools import groupby
x = 'EOOOEOEE'
res = sum(len(list(j)) > 1 for i, j in groupby(x) if i == 'O') # 1
You can use a regex:
>>> import re
>>> s = 'EOOOEOEEOO'
>>> sum(1 for x in re.finditer(r'O{2,}', s))
2
Just count with a for-loop:
n = 0
g = 0
s = 'EOOOEOEE'
for c in s:
if c == 'O':
g += 1
elif g:
if g > 1:
n += 1
g = 0
if g:
n += 1
which gives n as 1.
I assume you want to know the number of times that all letters are consecutive in a string and not just for the letter 'O'.
Make a character dictionary that will hold this count as values to keys as the characters in a string. char_dict = {}
The idea is to have two conditions to match
(1) Is the current character same as the previous character
(2) If the first condition is true then is the current pair of consecutive characters part of a larger substring that has the same consecutive characters.
Simply put, take for example ABBBCBB. When we encounter the third B we want to check whether it is part of sub-string that is consecutive and already accounted for. i.e. BBB should give consecutive count to be 1 and not 2. To implement this we use a flag variable that checks this conditions.
If we use only the (1)st condition BBB will count as BB and BB and not as a single BBB.
Rest of the code is pretty straight forward.
char_dict = {}
string = "EOOOEOEEFFFOFEOOO"
prev_char = None
flag=0
for char in list(string):
if char not in char_dict:
#initialize the count by zero
char_dict[char] = 0
if char == prev_char and flag !=0:
char_dict[char] += 1
flag = 0
else:
flag = 1
prev_char = char
print(char_dict)

Compare two strings, including duplicate letters?

I'm trying to write a function that takes two user inputs: a word and a maximum length. The function reads from a text file (opened earlier in the program), looks at all the words that fit within the maximum length given, and returns a list of words from the file that contain all of the letters in the word that the user gave. Here's my code so far:
def comparison():
otherWord = input("Enter word: ")
otherWord = list(otherWord)
maxLength = input("What is the maximum length of the words you want: ")
listOfWords = []
for line in file:
line = line.rstrip()
letterCount = 0
if len(line) <= int(maxLength):
for letter in otherWord:
if letter in line:
letterCount += 1
if letterCount == len(otherLine):
listOfWords.append(line)
return listOfWords
This code works, but my problem is that it does not account for duplicate letters in the words read from the file. For example, if I enter "GREEN" as otherWord, then the function returns a list of words containing the letters G, R, E, and N. I would like it to return a list containing words that have 2 E's. I imagine I'll also have to do some tweaking with the letterCount part, as the duplicates would affect that, but I'm more concerned with recognizing duplicates for now. Any help would be much appreciated.
You could use a Counter for the otherWord, like this:
>>> from collections import Counter
>>> otherWord = 'GREEN'
>>> otherWord = Counter(otherWord)
>>> otherWord
Counter({'E': 2, 'R': 1, 'N': 1, 'G': 1})
And then your check could look like this:
if len(line) <= int(maxLength):
match = True
for l, c in counter.items():
if line.count(l) < c:
match = False
break
if match:
listOfWords.append(line)
You can also write this without a match variable using Python’s for..else construct:
if len(line) <= int(maxLength):
for l, c in counter.items():
if line.count(l) < c:
break
else:
listOfWords.append(line)
Edit: If you want to have an exact match on character count, check for equality instead, and further check if there are any extra characters (which is the case if the line length is different).
You can use collections.Counter that also lets you perform (multi)set operations:
In [1]: from collections import Counter
In [2]: c = Counter('GREEN')
In [3]: l = Counter('GGGRREEEENN')
In [4]: c & l # find intersection
Out[4]: Counter({'E': 2, 'R': 1, 'G': 1, 'N': 1})
In [5]: c & l == c # are all letters in "GREEN" present "GGGRREEEENN"?
Out[5]: True
In [6]: c == l # Or if you want, test for equality
Out[6]: False
So your function could become something like:
def word_compare(inputword, wordlist, maxlenght):
c = Counter(inputword)
return [word for word in wordlist if maxlenght <= len(word)
and c & Counter(word) == c]

Categories