I'm writing a program in which I can Reverse the sequence and Replace all As with Ts, all Cs with Gs, all Gs with Cs, and all Ts with As. the program is to read a sequence of bases and output the reverse complement sequence. I am having trouble to do it so can anyone please help me with this by having a look on my code:
word = raw_input("Enter sequence: ")
a = word.replace('A', 'T')
b = word.replace('C', 'G')
c = word.replace('G', 'C')
d = word.replace('T', 'A')
if a == word and b == word and c == word and d == word:
print "Reverse complement sequence: ", word
And I want this sort of output:
Enter sequence: CGGTGATGCAAGG
Reverse complement sequence: CCTTGCATCACCG
Regards
I would probably do something like:
word = raw_input("Enter sequence:")
# build a dictionary to know what letter to switch to
swap_dict = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
# find out what each letter in the reversed word maps to and then join them
newword = ''.join(swap_dict[letter] for letter in reversed(word))
print "Reverse complement sequence:", newword
I don't quite understand your if statement, but the above code avoids needing one by looping over each letter, deciding what it should become, and then combining the results. That way each letter only gets converted once.
Edit: oops, I didn't notice that you wanted to reverse the string too. Fixed.
Your code as written is problematic, because steps 1 and 4 are the opposite of each other. Thus they can't be done in completely separate steps: you convert all As to Ts, then convert those (plus the original Ts) to As in step 4.
For something simple, built-in, and- hopefully- efficient, I'd consider using translation tables from the string module:
import string
sequence = "ATGCAATCG"
trans_table = string.maketrans( "ATGC" , "TACG")
new_seq = string.translate( sequence.upper() , trans_table )
print new_seq
This gives the output desired:
'TACGTTAGC'
Although I doubt that your users will ever forget to capitalize all letters, it's good practice to ensure that the input is in the form expected; hence the use of sequence.upper(). Any letters/bases with conversions not included in the translation table will be unaffected:
>>> string.translate( "AEIOUTGC" , trans_table )
'TEIOUACG'
As for the reverse complement sequence? You can do that concisely using slice notation on the output string, with a step of -1:
>>> new_seq[::-1]
'CGATTGCAT'
So if I understand what you want to do, you want to swap all Ts and As as well as swap all Gs and Cs and you want to reverse the string.
OK, well first, let's work on reversing the string, something you don't have implemented. Unfortunately, there's no obvious way to do it but this SO question about how to reverse strings in python should give you some ideas. The best solution seems to be
reversedWord = word[::-1]
Next, you need to swap the letters. You can't call replace("T", "A") and replace("A","T") on the same string because that will make both you As and Ts all be set to T. You seem to have recognized this but you use separate strings for each swap and don't ever combine them. Instead you need to go through the string, one letter at a time and check. Something like this:
swappedWord = "" #start swapped word empty
for letter in word: #for every letter in word
if letter == "A": #if the letter is "A"
swappedWord += "T" #add a "T
elif letter == "T": #if it's "T"
swappedWord += "A" #add an "A"
elif letter == "C": #if it's "C"
... #you get the idea
else: #if it isn't one of the above letters
swappedWord += letter #add the letter unchanged
(EDIT - DSM's dictionary based solution is better than my solution. Our solutions are very similar though in that we both look at each character and decide what the swapped character should be but DSM's is much more compact. However, I still feel my solution is useful for helping you understand the general idea of what DSM's solution is doing. Instead of my big if statement, DSM uses a dictionary to quickly and simply return the proper letter. DSM also collapsed it into a single line.)
The reason why your if statement isn't working is that you're basically saying "if a, b, c, d, and word are all exactly the same" since == means "are equal" and if a is equal to word and b is equal to word then a must be equal to b. This can only be true if the string has no As, Ts, Cs, or Gs (i.e. word is unchanged by the swaps), so you never print out the output.
Related
What I'm trying to figure out is how to go back a position in a string.
Say I have a word and I'm checking every letter, but once I get to a "Y"
I need to check if the character before was a vowel or not. (I'm a beginner in this language so I'm trying to practice some stuff I did in C which is the language I'm studying at college).
I'm using a For loop to check the letters in the word but I don't know if there's any way to go back in the index, I know in C for example strings are treated like arrays, so I would have a For loop and once I get to a "Y", that would be my word[i] (i being the index of the position I'm currently at) so what I would normally do is check if word[i-1] in "AEIOUaeiou" (i-1 being the position before the one I'm currently at). Now I don't know how that can be done in python and it would be awesome if someone could give me a hand :(
One option is to iterate through by index, as you'd do in C:
word = "today"
for i in range(1, len(word)):
if word[i].lower() == 'y' and word[i-1].lower() in 'aeiou':
print(word[i-1:i+1])
Another is to zip the string with itself shifted by one character:
for x, y in zip(word, word[1:]):
if y.lower() == 'y' and x.lower() in 'aeiou':
print(x+y)
There's a good answer here already but I wanted to point out a more "C-like" way to iterate strings (or anything else).
Some people may considered it un-Pythonic but in my opinion it's often a good approach when writing certain algorithms:
word = "today"
len_word = len(word)
vowels = "aeiou"
i = 0
while i < len_word:
if word[i] == "y":
if word[i-1].lower() in vowels:
print(word[i-1])
i += 1
This approach gives you more flexibility, for example, you can do more complex things like "jumping" back and forth with the index, however, you also need to be more careful not to set the index to something that is out of range of the iterable.
You could use a regular expression here, e.g. to flag words which don't have a vowel before Y you could use:
inp = "blahYes"
if re.search(r'[^\WAEIOUaeiou_]Y', inp):
print("INVALID")
else:
print("VALID")
You can easily do this in the C style:
vowels = ['a', 'e', 'i', 'o', 'u']
for i in range (0, len(your_string):
if your_string[i].lower() == 'y':
# do your calculation here
if your_string[i-1].lower() in vowels:
print (f"String has vowel '{your_string[i-1]' at index {i-1} and has 'y' at i)
You could use your_string[i].lower() == 'y' so it will match both y and Y .
Or your can also use enumerate function.
for index, value in enumerate(your_string):
if val.lower() == 'y' :
# check if index-1 was a vowel
in Python, strings are iterable, so you can get the [i-1] element of a string
I'm working on a hangman game in Python. My "answer" list contains all the letters of the word in order, the "work" list starts off with dashes for each letter, which are then populated with correct letters.
When using index(), it only returns the lowest position in the list that the value appears. However, I need a way to make all instances of the value be returned (otherwise repeating letters aren't getting filled in).
I'm new to Python, so I'm not sure if some kind of loop is best, or if there is a different function to get the result I'm looking for. I've looked at enumerate() but I'm not sure how this would work in this instance.
if guess in word:
print("Correct!")
for i in range(count):
work[answer.index(guess)] = [guess]
print(work)
As you mentioned the problem is that index returns only the first occurrence of the character in the string. In order to solve your problem you need to iterate over answer and get the indices of the characters equal to guess, something like this (using enumerate):
guess = 'l'
word = "hello"
work = [""] * len(word)
answer = list(word)
if guess in word:
print("Correct!")
for i, c in enumerate(answer):
if c == guess:
work[i] = guess
print(work)
Output
Correct!
['', '', 'l', 'l', '']
Note that work is slightly different from what you put on the comments.
Further:
How to find all occurrences of an element in a list?
I have to enter a string, remove all spaces and print the string without vowels. I also have to print a string of all the removed vowels.
I have gotten very close to this goal, but for some reason when I try to remove all the vowels it will not remove two vowels in a row. Why is this? Please give answers for this specific block of code, as solutions have helped me solve the challenge but not my specific problem
# first define our function
def disemvowel(words):
# separate the sentence into separate letters in a list
no_v = list(words.lower().replace(" ", ""))
print no_v
# create an empty list for all vowels
v = []
# assign the number 0 to a
a = 0
for l in no_v:
# if a letter in the list is a vowel:
if l == "a" or l == "e" or l == "i" or l == "o" or l == "u":
# add it to the vowel list
v.append(l)
#print v
# delete it from the original list with a
del no_v[a]
print no_v
# increment a by 1, in order to keep a's position in the list moving
else:
a += 1
# print both lists with all spaces removed, joined together
print "".join(no_v)
print "".join(v)
disemvowel(raw_input(""))
Mistakes
So there are a lot of other, and perhaps better approaches to solve this problem. But as you mentioned I just discuss your failures or what you can do better.
1. Make a list of input word
There are a lot of thins you could do better
no_v = list(words.lower().replace(" ", ""))
You don't replaces all spaces cause of " " -> " " so just use this instead
no_v = list(words.lower().translate( None, string.whitespace))
2. Replace for loop with while loop
Because if you delete an element of the list the for l in no_v: will go to the next position. But because of the deletion you need the same position, to remove all the vowels in no_v and put them in v.
while a < len(no_v):
l = no_v[a]
3. Return the values
Cause it's a function don't print the values just return them. In this case replace the print no_v print v and just return and print them.
return (no_v,v) # returning both lists as tuple
4. Not a mistake but be prepared for python 3.x
Just try to use always print("Have a nice day") instead of print "Have a nice day"
Your Algorithm without the mistakes
Your algorithm now looks like this
import string
def disemvowel(words):
no_v = list(words.lower().translate( None, string.whitespace))
v = []
a = 0
while a < len(no_v):
l = no_v[a]
if l == "a" or l == "e" or l == "i" or l == "o" or l == "u":
v.append(l)
del no_v[a]
else:
a += 1
return ("".join(no_v),"".join(v))
print(disemvowel("Stackoverflow is cool !"))
Output
For the sentence Stackoverflow is cool !\n it outputs
('stckvrflwscl!', 'aoeoioo')
How I would do this in python
Not asked but I give you a solution I would probably use. Cause it has something to do with string replacement, or matching I would just use regex.
def myDisemvowel(words):
words = words.lower().translate( None, string.whitespace)
nv = re.sub("[aeiou]*","", words)
v = re.sub("[^a^e^i^o^u]*","", words)
return (nv, v)
print(myDisemvowel("Stackoverflow is cool !\n"))
I use just a regular expression and for the nv string I just replace all voewls with and empty string. For the vowel string I just replace the group of all non vowels with an empty string. If you write this compact, you could solve this with 2 lines of code (Just returning the replacement)
Output
For the sentence Stackoverflow is cool !\n it outputs
('stckvrflwscl!', 'aoeoioo')
You are modifying no_v while iterating through it. It'd be a lot simpler just to make two new lists, one with vowels and one without.
Another option is to convert it to a while loop:
while a < len(no_v):
l = no_v[a]
This way you have just a single variable tracking your place in no_v instead of the two you currently have.
For educational purposes, this all can be made significantly less cumbersome.
def devowel(input_str, vowels="aeiou"):
filtered_chars = [char for char in input_str
if char.lower() not in vowels and not char.isspace()]
return ''.join(filtered_chars)
assert devowel('big BOOM') == 'bgBM'
To help you learn, do the following:
Define a function that returns True if a particular character has to be removed.
Using that function, loop through the characters of the input string and only leave eligible characters.
In the above, avoid using indexes and len(), instead iterate over characters, as in for char in input_str:.
Learn about list comprehensions.
(Bonus points:) Read about the filter function.
I am wondering how to count specific letters in a string. The first thing that popped into my head was the function len. Out of curiosity, is there a way to write this code without using built in functions and using len?
There is a question asked similar to this here and I am having trouble understanding it.
def count_letters(word, char):
count = 0
for c in word:
if char == c:
count = count + 1
return count
What exactly is going on in if char == c: and count += 1? I understand why the person started with a for loop but I don't understand why place an if after?
The if is needed because you only want to count instances of a specific character, char. Without it, you would wind up doing count = count + 1 for every character in the string, so you'd get the full string length, not the amount of specific character you're looking for.
With comments in code:
for c in word: # go through each character in code
if char == c: # if the character is the one we're counting
count = count + 1 # add one to the current count of characters
Strings have a built in count() method:
>>> s = 'aaabbbccc'
>>> s.count('a')
3
>>> s.count('aa')
1
Have you tried the Python Wiki?
It states that ANY object with an iterating function can be cycled through, which answers your second question.
Since you don't want to use the len function, you can use the for loop like in the answer you linked to cycle through the object (the String word) looking for the character char, recording each time the char is found with count.
The parameter char needs to be found in word parameter, and so c is just a variable to set to each letter (or character, not all parts of the String may be an alphabetical letter) as it cycles through word.
The reason for the if statement is so when the cycling variable c equals char, the block can be executed. In this particular block, the count is being iterated up (count = count + 1), and once the function is done with iterating through the for loop, it will return count (how many times char was found, effectively counting specific letters in the String as you asked for).
Long-winded but in short, yes, that function you posted will give you a count of how many times the letter is in the word.
You want to count only for specific char. Meaning that if you have the word "hello" and the letter "l", you want to return 2 because "l" appears 2 times in "hello".
The if simply checks if the char is the char you want, if that's the case, you increment the counter by one - for c in word: iterates on the chars of word.
Python's syntax is very readable and easy to understand, try to speak the code and you'll understand what it does.
Another way to do that is:
print len([c for c in word if c == char])
Example:
[c for c in 'hello world' if c == 'l']
Will return:
['l', 'l', 'l']
Then len will return 3.
Using collections.Counter
>>> word = "Hallelujahhhhh"
>>> from collections import Counter
>>> Counter(word)
Counter({'h': 5, 'l': 3, 'a': 2, 'e': 1, 'H': 1, 'j': 1, 'u': 1})
for c in word:
In this statement word will be taken as a list of a string, c is each character from that list. That means loop will repeat for each character of the word.
So in this statement if char == c: for every character of word will be compare with char.
If the statement is true then count will increase by 1.
So, As per your question
if char == c: compare char with each character of word
and
count+=1 will increase the value of count by 1
I am trying to count the number of times 'e' appears in a word.
def has_no_e(word): #counts 'e's in a word
letters = len(word)
count = 0
while letters >= 0:
if word[letters-1] == 'e':
count = count + 1
letters = letters - 1
print count
It seems to work fine except when the word ends with an 'e'. It will count that 'e' twice. I have no idea why. Any help?
I know my code may be sloppy, I'm a beginner! I'm just trying to figure out the logic behind what's happening.
>>> word = 'eeeooooohoooooeee'
>>> word.count('e')
6
Why not this?
As others mention, you can implement the test with a simple word.count('e'). Unless you're doing this as a simple exercise, this is far better than trying to reinvent the wheel.
The problem with your code is that it counts the last character twice because you are testing index -1 at the end, which in Python returns the last character in the string. Fix it by changing while letters >= 0 to while letters > 0.
There are other ways you can tidy up your code (assuming this is an exercise in learning):
Python provides a nice way of iterating over a string using a for loop. This is far more concise and easier to read than using a while loop and maintaining your own counter variable. As you've already seen here, adding complexity results in bugs. Keep it simple.
Most languages provide a += operator, which for integers adds the amount to a variable. It's more concise than count = count + 1.
Use a parameter to define which character you're counting to make it more flexible. Define a default argument for using char='e' in the parameter list when you have an obvious default.
Choose a more appropriate name for the function. The name has_no_e() makes the reader think the code checks to see if the code has no e, but what it actually does is counts the occurrences of e.
Putting this all together we get:
def count_letter(word, char='e'):
count = 0
for c in word:
if c == char:
count += 1
return count
Some tests:
>>> count_letter('tee')
2
>>> count_letter('tee', 't')
1
>>> count_letter('tee', 'f')
0
>>> count_letter('wh' + 'e'*100)
100
Why not simply
def has_no_e(word):
return sum(1 for letter in word if letter=="e")
The problem is that the last value of 'letters' in your iteration is '0', and when this happens you look at:
word[letters-1]
meaning, you look at word[-1], which in python means "last letter of the word".
so you're actually counting correctly, and adding a "bonus" one if the last letter is 'e'.
It will count it twice when ending with an e because you decrement letters one time too many (because you loop while letters >= 0 and you should be looping while letters > 0). When letters reaches zero you check word[letters-1] == word[-1] which corresponds to the last character in the word.
Many of these suggested solutions will work fine.
Know that, in Python, list[-1] will return the last element of the list.
So, in your original code, when you were referencing word[letters-1] in a while loop constrained by letters >= 0, you would count the 'e' on the end of the word twice (once when letters was the length-1 and a second time when letters was 0).
For example, if my word was "Pete" your code trace would look like this (if you printed out word[letter] each loop.
e (for word[3])
t (for word[2])
e (for word[1])
P (for word[0])
e (for word[-1])
Hope this helps to clear things up and to reveal an interesting little quirk about Python.
#marcog makes some excellent points;
in the meantime, you can do simple debugging by inserting print statements -
def has_no_e(word):
letters = len(word)
count = 0
while letters >= 0:
ch = word[letters-1] # what is it looking at?
if ch == 'e':
count = count + 1
print('{0} <-'.format(ch))
else:
print('{0}'.format(ch))
letters = letters - 1
print count
then
has_no_e('tease')
returns
e <-
s
a
e <-
t
e <-
3
from which you can see that
you are going through the string in reverse order
it is correctly recognizing e's
you are 'wrapping around' to the end of the string - hence the extra e if your string ends in one
If what you really want is 'has_no_e' then the following may be more appropriate than counting 'e's and then later checking for zero,
def has_no_e(word):
return 'e' not in word
>>> has_no_e('Adrian')
True
>>> has_no_e('test')
False
>>> has_no_e('NYSE')
True
If you want to check there are no 'E's either,
def has_no_e(word):
return 'e' not in word.lower()
>>> has_no_e('NYSE')
False
You don't have to use a while-loop. Strings can be used for-loops in Python.
def has_no_e(word):
count = 0
for letter in word:
if letter == "e":
count += 1
print count
or something simpler:
def has_no_e(word):
return sum(1 for letter in word if letter=="e")