This is what I have so far, it is not replacing the C with a G for some reason. Why might this be?
DNASeq=raw_input("Enter the DNA sequence here: ")
DNASeq=DNASeq.upper().replace(" ","")
reverse=DNASeq[::-1]
print reverse.replace('A','U').replace('T','A').replace('C','G').replace('G','C')
The problem is that you replace C with G then G with C. One simple way to prevent you going from one to the other would be to replace C with g, so it wouldn't then go back to C, then uppercase the result:
gattaca="GATTACA"
rev = gattaca[::-1]
print rev.replace('A','u').replace('T','a').replace('C','g').replace('G','c').upper()
This correctly outputs UGUAAUC instead of UCUAAUC as your example did.
UPDATE
The more Pythonic way, though, and avoiding the case-based hack, and being more efficient as the string doesn't need to be scanned five times, and being more obvious as to the purpose, would be:
from string import maketrans
transtab = maketrans("ATCG", "UAGC")
print rev.translate(transtab)
As pointed out already, the problem is that your are replacing ALL Cs with Gs first. I wanted to throw in this approach because I think it would be the most efficient:
>>> complement = {'A':'U', 'G':'C', 'C':'G','T':'A'}
>>> seq = "GATTACA"
>>> "".join(complement[c] for c in seq)[::-1]
'UGUAAUC'
>>>
You have done almost correct code.
but here one thing you need to understand is First you replace C with G and again you replace G with C.
.replace('C','G').replace('G','C')
just remove .replace('G','C') from your code and everything works fine.
here is the correct code:
DNASeq=raw_input("Enter the DNA sequence here: ")
DNASeq=DNASeq.upper().replace(" ","")
reverse=DNASeq[::-1]
print reverse.replace('A','U').replace('T','A').replace('C','G').replace('G','C')
Related
I'm working on a exercism.io exercise in Python where one of the tests requires that I convert an SGF value with escape characters into one without. I don't know why they leave newline characters intact, however.
input_val = "\\]b\nc\nd\t\te \n\\]"
output_val = "]b\nc\nd e \n]"
I tried some codecs and ats functions to no avail. Any suggestions? Thanks in advance.
The purpose of your exercise is unclear, but the solution is trivial:
input_val.replace("\\", "").replace("\t", " ")
You can use this code:
def no_escapes(text): # get text argument
# get a list of strings split with \ and join them together without it
text = text.split('\\')
text = [t.split('\t') for t in text]
text = [i for t in text for i in t]
return ''.join(text)
It will first turn "\\]b\nc\nd\t\te \n\\]" into ["]b\nc\nd\te \n"]. It'll then turn it into [["]b\nc\nd", "e \n"]]. Next, it'll flatten it out into ["]b\nc\nd", "e \n"] and it'll join them together without anything between the strings, so you'll end up with "]b\nc\nd e \n]"
Example:
>>> print(no_escapes('\\yeet\nlol\\'))
yeet
lol
And if you want it raw:
>>> string = no_escapes('\\yeet\nlol\\')
>>> print(f'{string!r}')
yeet\nlol
After looking at SGF text value rules here which says, 'all whitespaces except line breaks become spaces,' I came up with this solution. It oddly doesn't say '\\' characters should be erased, though. Not sure if there's a cleaner way to do this?
s = '\\]b\nc\nd\t\te \n\\]'
r = re.sub(r'[^\S\n]', ' ', s).replace(r'\\', '')
print(r)
# ']b\nc\nd e \n]'
I'm very new to Python and am trying to understand how to manipulate strings.
What I want to do is change a string by removing the spaces and alternating the case from upper to lower, IE "This is harder than I thought it would be" to "ThIsIsHaRdErThAnItHoUgHtItWoUlDbE"
I've cobbled together a code to remove the spaces (heavily borrowed from here):
string1 = input("Ask user for something.")
nospace = ""
for a in string1:
if a == " ":
pass
else:
nospace=nospace+a
... but just can't get my head around the caps/lower case part. There are several similar issues on this site and I've tried amending a few of them, with no joy. I realise I need to define a range and iterate through it, but that's where I draw a blank.
for c in nospace[::]:
d = ""
c = nospace[:1].lower()
d = d + c
c = nospace[:1].upper
print d
All I am getting is a column of V's. I'm obviously getting this very wrong. Please can someone advise where? Thanks in advance.
Here is a cutesie way to do this:
>>> s = "This is harder than I thought it would be"
>>> from itertools import cycle
>>> funcs = cycle([str.upper, str.lower])
>>> ''.join(next(funcs)(c) for c in s if c != ' ')
'ThIsIsHaRdErThAnItHoUgHtItWoUlDbE'
>>>
Or, as suggested by Moses in the comments, you can use str.isspace, which will take care of not just a single space ' '
>>> ''.join(next(funcs)(c) for c in s if not c.isspace())
'ThIsIsHaRdErThAnItHoUgHtItWoUlDbE'
This approach only does a single pass on the string. Although, a two-pass method is likely performant enough.
Now, if you were starting with a nospace string, the best way is to convert to some mutable type (e.g. a list) and use slice-assignment notation. It's a little bit inefficient because it builds intermediate data structures, but slicing is fast in Python, so it may be quite performant. You have to ''.join at the end, to bring it back to a string:
>>> nospace
'ThisisharderthanIthoughtitwouldbe'
>>> nospace = list(nospace)
>>> nospace[0::2] = map(str.upper, nospace[0::2])
>>> nospace[1::2] = map(str.lower, nospace[1::2])
>>> ''.join(nospace)
'ThIsIsHaRdErThAnItHoUgHtItWoUlDbE'
>>>
You're trying to do everything at once. Don't. Break your program into steps.
Read the string.
Remove the spaces from the string (as #A.Sherif just demonstrated here)
Go over the string character by character. If the character is in an odd position, convert it to uppercase. Otherwise, convert to lowercase.
So your 2nd loop is where you're breaking it, because the original list isn't being shortened, the c=nospace[:1] grabs the first character of the string and that's the only character that's ever printed. So a solution would be as follows.
string1 = str(input("Ask user for something."))
nospace = ''.join(string1.split(' '))
for i in range(0, len(nospace)):
if i % 2 == 0:
print(nospace[i].upper(), end="")
else:
print(nospace[i].lower(), end="")
Could also replace the if/else statement with a ternary opperator.
for i in range(0, len(nospace)):
print(nospace[i].upper() if (i % 2 == 0) else nospace[i].lower(), end='')
Final way using enumerate as commented about
for i, c in enumerate(nospace):
print(c.upper() if (i % 2 == 0) else c.lower(), end='')
I'm trying to filter all non-alphanumeric characters to the end of the strings. I am having a hard time with the regex since I don't know where the special characters we be. Here are a couple of simple examples.
hello*there*this*is*a*str*ing*with*asterisks
and&this&is&a&str&ing&&with&ersands&in&i&t
one%mo%refor%good%mea%sure%I%think%you%get%it
How would I go about sliding all the special characters to the end of the string?
Here is what I tried, but I didn't get anything.
re.compile(r'(.+?)(\**)')
r.sub(r'\1\2', string)
Edit:
Expected output for the first string would be:
hellotherethisisastringwithasterisks********
There's no need for regex here. Just use str.isalpha and build up two lists, then join them:
strings = ['hello*there*this*is*a*str*ing*with*asterisks',
'and&this&is&a&str&ing&&with&ersands&in&i&t',
'one%mo%refor%good%mea%sure%I%think%you%get%it']
for s in strings:
a = []
b = []
for c in s:
if c.isalpha():
a.append(c)
else:
b.append(c)
print(''.join(a+b))
Result:
hellotherethisisastringwithasterisks********
andthisisastringwithampersandsinit&&&&&&&&&&&
onemoreforgoodmeasureIthinkyougetit%%%%%%%%%%
Alternative print() call for Python 3.5 and higher:
print(*a, *b, sep='')
Here is my proposed solution for this with regex:
import re
def move_nonalpha(string,char):
pattern = "\\"+char
char_list = re.findall(pattern,string)
if len(char_list)>0:
items = re.split(pattern,string)
if len(items)>0:
return ''.join(items)+''.join(char_list)
Usage:
string = "hello*there*this*is*a*str*ing*with*asterisks"
print (move_nonalpha(string,"*"))
Gives me output:
hellotherethisisastringwithasterisks********
I tried with your other input patterns as well and it's working. Hope it'll help.
I have a string with a lot of recurrencies of a single pattern like
a = 'eresQQQutnohnQQQjkhjhnmQQQlkj'
and I have another string like
b = 'rerTTTytu'
I want to substitute the entire second string having as a reference the 'QQQ' and the 'TTT', and I want to find in this case 3 different results:
'ererTTTytuohnQQQjkhjhnmQQQlkj'
'eresQQQutnrerTTTytujhnmQQQlkj'
'eresQQQutnohnQQQjkhjrerTTTytu'
I've tried using re.sub
re.sub('\w{3}QQQ\w{3}' ,b,a)
but I obtain only the first one, and I don't know how to get the other two solutions.
Edit: As you requested, the two characters surrounding 'QQQ' will be replaced as well now.
I don't know if this is the most elegant or simplest solution for the problem, but it works:
import re
# Find all occurences of ??QQQ?? in a - where ? is any character
matches = [x.start() for x in re.finditer('\S{2}QQQ\S{2}', a)]
# Replace each ??QQQ?? with b
results = [a[:idx] + re.sub('\S{2}QQQ\S{2}', b, a[idx:], 1) for idx in matches]
print(results)
Output
['errerTTTytunohnQQQjkhjhnmQQQlkj',
'eresQQQutnorerTTTytuhjhnmQQQlkj',
'eresQQQutnohnQQQjkhjhrerTTTytuj']
Since you didn't specify the output format, I just put it in a list.
I have a bunch of mathematical expressions stored as strings. Here's a short one:
stringy = "((2+2)-(3+5)-6)"
I want to break this string up into a list that contains ONLY the information in each "sub-parenthetical phrase" (I'm sure there's a better way to phrase that.) So my yield would be:
['2+2','3+5']
I have a couple of ideas about how to do this, but I keep running into a "okay, now what" issue.
For example:
for x in stringy:
substring = stringy[stringy.find('('+1 : stringy.find(')')+1]
stringlist.append(substring)
Works just peachy to return 2+2, but that's about as far as it goes, and I am completely blanking on how to move through the remainder...
One way using regex:
import re
stringy = "((2+2)-(3+5)-6)"
for exp in re.findall("\(([\s\d+*/-]+)\)", stringy):
print exp
Output
2+2
3+5
You could use regular expressions like the following:
import re
x = "((2+2)-(3+5)-6)"
re.findall(r"(?<=\()[0-9+/*-]+(?=\))", x)
Result:
['2+2', '3+5']