I'm very new to Python and am trying to understand how to manipulate strings.
What I want to do is change a string by removing the spaces and alternating the case from upper to lower, IE "This is harder than I thought it would be" to "ThIsIsHaRdErThAnItHoUgHtItWoUlDbE"
I've cobbled together a code to remove the spaces (heavily borrowed from here):
string1 = input("Ask user for something.")
nospace = ""
for a in string1:
if a == " ":
pass
else:
nospace=nospace+a
... but just can't get my head around the caps/lower case part. There are several similar issues on this site and I've tried amending a few of them, with no joy. I realise I need to define a range and iterate through it, but that's where I draw a blank.
for c in nospace[::]:
d = ""
c = nospace[:1].lower()
d = d + c
c = nospace[:1].upper
print d
All I am getting is a column of V's. I'm obviously getting this very wrong. Please can someone advise where? Thanks in advance.
Here is a cutesie way to do this:
>>> s = "This is harder than I thought it would be"
>>> from itertools import cycle
>>> funcs = cycle([str.upper, str.lower])
>>> ''.join(next(funcs)(c) for c in s if c != ' ')
'ThIsIsHaRdErThAnItHoUgHtItWoUlDbE'
>>>
Or, as suggested by Moses in the comments, you can use str.isspace, which will take care of not just a single space ' '
>>> ''.join(next(funcs)(c) for c in s if not c.isspace())
'ThIsIsHaRdErThAnItHoUgHtItWoUlDbE'
This approach only does a single pass on the string. Although, a two-pass method is likely performant enough.
Now, if you were starting with a nospace string, the best way is to convert to some mutable type (e.g. a list) and use slice-assignment notation. It's a little bit inefficient because it builds intermediate data structures, but slicing is fast in Python, so it may be quite performant. You have to ''.join at the end, to bring it back to a string:
>>> nospace
'ThisisharderthanIthoughtitwouldbe'
>>> nospace = list(nospace)
>>> nospace[0::2] = map(str.upper, nospace[0::2])
>>> nospace[1::2] = map(str.lower, nospace[1::2])
>>> ''.join(nospace)
'ThIsIsHaRdErThAnItHoUgHtItWoUlDbE'
>>>
You're trying to do everything at once. Don't. Break your program into steps.
Read the string.
Remove the spaces from the string (as #A.Sherif just demonstrated here)
Go over the string character by character. If the character is in an odd position, convert it to uppercase. Otherwise, convert to lowercase.
So your 2nd loop is where you're breaking it, because the original list isn't being shortened, the c=nospace[:1] grabs the first character of the string and that's the only character that's ever printed. So a solution would be as follows.
string1 = str(input("Ask user for something."))
nospace = ''.join(string1.split(' '))
for i in range(0, len(nospace)):
if i % 2 == 0:
print(nospace[i].upper(), end="")
else:
print(nospace[i].lower(), end="")
Could also replace the if/else statement with a ternary opperator.
for i in range(0, len(nospace)):
print(nospace[i].upper() if (i % 2 == 0) else nospace[i].lower(), end='')
Final way using enumerate as commented about
for i, c in enumerate(nospace):
print(c.upper() if (i % 2 == 0) else c.lower(), end='')
Related
I am trying to pull a substring out of a function result, but I'm having trouble figuring out the best way to strip the necessary string out using Python.
Output Example:
[<THIS STRING-STRING-STRING THAT THESE THOSE>]
In this example, I would like to grab "STRING-STRING-STRING" and throw away all the rest of the output. In this example, "[<THIS " &" THAT THESE THOSE>]" are static.
Many many ways to solve this. Here are two examples:
First one is a simple replacement of your unwanted characters.
targetstring = '[<THIS STRING-STRING-STRING THAT THESE THOSE>]'
#ALTERNATIVE 1
newstring = targetstring.replace(r" THAT THESE THOSE>]", '').replace(r"[<THIS ", '')
print(newstring)
and this drops everything except your target pattern:
#ALTERNATIVE 2
match = "STRING-STRING-STRING"
start = targetstring.find(match)
stop = len(match)
targetstring[start:start+stop]
These can be shortened but thought it might be useful for OP to have them written out.
I found this extremely useful, might be of help to you as well: https://www.computerhope.com/issues/ch001721.htm
If by '"[<THIS " &" THAT THESE THOSE>]" are static' you mean that they are always the exact same string, then:
s = "[<THIS STRING-STRING-STRING THAT THESE THOSE>]"
before = len("[<THIS ")
after = len(" THAT THESE THOSE>]")
s[before:-after]
# 'STRING-STRING-STRING'
Like so (as long as the postition of the characters in the string doesn't change):
myString = "[<THIS STRING-STRING-STRING THAT THESE THOSE>]"
myString = myString[7:27]
Another alternative method;
import re
my_str = "[<THIS STRING-STRING-STRING THAT THESE THOSE>]"
string_pos = [(s.start(), s.end()) for s in list(re.finditer('STRING-STRING-STRING', my_str))]
start, end = string_pos[0]
print(my_str[start: end + 1])
STRING-STRING-STRING
If the STRING-STRING-STRING occurs multiple times in the string, start and end indexes of the each occurrences will be given as tuples in string_pos.
I'm working on a exercism.io exercise in Python where one of the tests requires that I convert an SGF value with escape characters into one without. I don't know why they leave newline characters intact, however.
input_val = "\\]b\nc\nd\t\te \n\\]"
output_val = "]b\nc\nd e \n]"
I tried some codecs and ats functions to no avail. Any suggestions? Thanks in advance.
The purpose of your exercise is unclear, but the solution is trivial:
input_val.replace("\\", "").replace("\t", " ")
You can use this code:
def no_escapes(text): # get text argument
# get a list of strings split with \ and join them together without it
text = text.split('\\')
text = [t.split('\t') for t in text]
text = [i for t in text for i in t]
return ''.join(text)
It will first turn "\\]b\nc\nd\t\te \n\\]" into ["]b\nc\nd\te \n"]. It'll then turn it into [["]b\nc\nd", "e \n"]]. Next, it'll flatten it out into ["]b\nc\nd", "e \n"] and it'll join them together without anything between the strings, so you'll end up with "]b\nc\nd e \n]"
Example:
>>> print(no_escapes('\\yeet\nlol\\'))
yeet
lol
And if you want it raw:
>>> string = no_escapes('\\yeet\nlol\\')
>>> print(f'{string!r}')
yeet\nlol
After looking at SGF text value rules here which says, 'all whitespaces except line breaks become spaces,' I came up with this solution. It oddly doesn't say '\\' characters should be erased, though. Not sure if there's a cleaner way to do this?
s = '\\]b\nc\nd\t\te \n\\]'
r = re.sub(r'[^\S\n]', ' ', s).replace(r'\\', '')
print(r)
# ']b\nc\nd e \n]'
I'm trying to filter all non-alphanumeric characters to the end of the strings. I am having a hard time with the regex since I don't know where the special characters we be. Here are a couple of simple examples.
hello*there*this*is*a*str*ing*with*asterisks
and&this&is&a&str&ing&&with&ersands&in&i&t
one%mo%refor%good%mea%sure%I%think%you%get%it
How would I go about sliding all the special characters to the end of the string?
Here is what I tried, but I didn't get anything.
re.compile(r'(.+?)(\**)')
r.sub(r'\1\2', string)
Edit:
Expected output for the first string would be:
hellotherethisisastringwithasterisks********
There's no need for regex here. Just use str.isalpha and build up two lists, then join them:
strings = ['hello*there*this*is*a*str*ing*with*asterisks',
'and&this&is&a&str&ing&&with&ersands&in&i&t',
'one%mo%refor%good%mea%sure%I%think%you%get%it']
for s in strings:
a = []
b = []
for c in s:
if c.isalpha():
a.append(c)
else:
b.append(c)
print(''.join(a+b))
Result:
hellotherethisisastringwithasterisks********
andthisisastringwithampersandsinit&&&&&&&&&&&
onemoreforgoodmeasureIthinkyougetit%%%%%%%%%%
Alternative print() call for Python 3.5 and higher:
print(*a, *b, sep='')
Here is my proposed solution for this with regex:
import re
def move_nonalpha(string,char):
pattern = "\\"+char
char_list = re.findall(pattern,string)
if len(char_list)>0:
items = re.split(pattern,string)
if len(items)>0:
return ''.join(items)+''.join(char_list)
Usage:
string = "hello*there*this*is*a*str*ing*with*asterisks"
print (move_nonalpha(string,"*"))
Gives me output:
hellotherethisisastringwithasterisks********
I tried with your other input patterns as well and it's working. Hope it'll help.
This is what I have so far, it is not replacing the C with a G for some reason. Why might this be?
DNASeq=raw_input("Enter the DNA sequence here: ")
DNASeq=DNASeq.upper().replace(" ","")
reverse=DNASeq[::-1]
print reverse.replace('A','U').replace('T','A').replace('C','G').replace('G','C')
The problem is that you replace C with G then G with C. One simple way to prevent you going from one to the other would be to replace C with g, so it wouldn't then go back to C, then uppercase the result:
gattaca="GATTACA"
rev = gattaca[::-1]
print rev.replace('A','u').replace('T','a').replace('C','g').replace('G','c').upper()
This correctly outputs UGUAAUC instead of UCUAAUC as your example did.
UPDATE
The more Pythonic way, though, and avoiding the case-based hack, and being more efficient as the string doesn't need to be scanned five times, and being more obvious as to the purpose, would be:
from string import maketrans
transtab = maketrans("ATCG", "UAGC")
print rev.translate(transtab)
As pointed out already, the problem is that your are replacing ALL Cs with Gs first. I wanted to throw in this approach because I think it would be the most efficient:
>>> complement = {'A':'U', 'G':'C', 'C':'G','T':'A'}
>>> seq = "GATTACA"
>>> "".join(complement[c] for c in seq)[::-1]
'UGUAAUC'
>>>
You have done almost correct code.
but here one thing you need to understand is First you replace C with G and again you replace G with C.
.replace('C','G').replace('G','C')
just remove .replace('G','C') from your code and everything works fine.
here is the correct code:
DNASeq=raw_input("Enter the DNA sequence here: ")
DNASeq=DNASeq.upper().replace(" ","")
reverse=DNASeq[::-1]
print reverse.replace('A','U').replace('T','A').replace('C','G').replace('G','C')
What is the easiest way to "interpret" formatting control characters in a string, to show the results as if they were printed. For simplicity, I will assume there are no newlines in the string.
So for example,
>>> sys.stdout.write('foo\br')
shows for, therefore
interpret('foo\br') should be 'for'
>>>sys.sdtout.write('foo\rbar')
shows bar, therefore
interpret('foo\rbar') should be 'bar'
I can write a regular expression substitution here, but, in the case of '\b' replacement, it would have to be applied recursively until there are no more occurrences. It would be quite complex if done without recursion.
Is there an easier way?
If efficiency doesn't matter, a simple stack would work fine:
string = "foo\rbar\rbash\rboo\b\bba\br"
res = []
for char in string:
if char == "\r":
res.clear()
elif char == "\b":
if res: del res[-1]
else:
res.append(char)
"".join(res)
#>>> 'bbr'
Otherwise, I think this is about as fast as you can hope for in complex cases:
string = "foo\rbar\rbash\rboo\b\bba\br"
try:
string = string[string.rindex("\r")+1:]
except ValueError:
pass
split_iter = iter(string.split("\b"))
res = list(next(split_iter, ''))
for part in split_iter:
if res: del res[-1]
res.extend(part)
"".join(res)
#>>> 'bbr'
Note that I haven't timed this.
Python's does not have any built-in or standard library module for doing this.
However if you only care for simple control characters like \r, \b and \n you can write a simple function to handle this:
def interpret(text):
lines = []
current_line = []
for char in text:
if char == '\n':
lines.append(''.join(current_line))
current_line = []
elif char == '\r':
current_line.clear()
# del current_line[:] # in old python versions
elif char == '\b':
del current_line[-1:]
else:
current_line.append(char)
if current_line:
lines.append(current_line)
return '\n'.join(lines)
You can extend the function handling any control character you want. For example you might want to ignore some control characters that don't get actually displayed in a terminal (e.g. the bell \a)
UPDATE: after 30 minutes of asking for clarifications and an example string, we find the question is actually quite different: "How to repeatedly apply formatting control characters (backspace) to a Python string?"
In that case yes you apparently need to apply the regex/fn repeatedly until you stop getting matches.
SOLUTION:
import re
def repeated_re_sub(pattern, sub, s, flags=re.U):
"""Match-and-replace repeatedly until we run out of matches..."""
patc = re.compile(pattern, flags)
sold = ''
while sold != s:
sold = s
print "patc=>%s< sold=>%s< s=>%s<" % (patc,sold,s)
s = patc.sub(sub, sold)
#print help(patc.sub)
return s
print repeated_re_sub('[^\b]\b', '', 'abc\b\x08de\b\bfg')
#print repeated_re_sub('.\b', '', 'abcd\b\x08e\b\bfg')
[multiple previous answers, asking for clarifications and pointing out that both re.sub(...) or string.replace(...) could be used to solve the problem, non-recursively.]