I have strings like so: hey what is up!, "what did you say?", "he said 'well'", etc. and a regex expression like so: [!%&'\(\)$#\"\/\\*+,-.:;<=>?#\[\]^_´{|}~]´. These are my delimiters and into the strings shown a space shall be inserted like so: "hey what is up !", "what did you say ?", "he said ' well '". So if one of the delimiters is in front of another character sequence, add a space, and if its is after, add space as well.
How can I achieve this? I do not want to split by these delimiters.
Here's my solution but I would be curious how to solve it with regex.
space = set("[!%&'()$#\"/\*+,-.:;<=>?#[]^_´`{|}~]")
for sent in self.sentences:
sent = list(sent)
for i, char in enumerate(sent):
# Make sure to respect length of string when indexing
if i != 0:
# insert space in front if char is punctuation
if sent[i] in space and sent[i - 1] != " ":
sent.insert(i, " ")
if i != len(sent)-1:
# insert space after if char is punctuation
if sent[i] in space and sent[i + 1] != " ":
sent.insert(i + 1, " ")
You could expand your pattern to catch optional spaces and then replace by capture group plus spaces before and after (loop only for demo, not neccessary):
import re
strings = ["hey what is up!", "what did you say?", "he said 'well'"]
pattern = r'(\s?[!%&\'\(\)$#\"\/\\*+,-.:;<=>?#\[\]^_´{|}~]\s?)'
for string in strings:
print(re.sub(pattern, r' \1 ', string))
This will give this output:
hey what is up !
what did you say ?
he said ' well '
Without the aid of the re module you could simply do this:
punctuation = "!%&'()$#\"/\\*+,-.:;<=>?#[]^_´{|}~"
mystring = "Well hello! How are you?"
mylist = list(mystring)
i = 0
for c in mystring:
if c in punctuation:
mylist.insert(i, ' ')
i += 2
else:
i += 1
print(''.join(mylist))
You can make a loop that goes through your strings and when it finds a ponctuation character use the slice function to cut your string in half and concatenate with a space in between.
For example:
for i in yourString:
if yourString[i] == '!':
newString = yourString.slice(0, i) + " " + yourString.slice(i + 1)
It only checks for "!" but you could replace it with a dictionnary of ponctuation characters
Related
Hello is use some method like .isupper() in a loop, or string[i+1] to find my lower char but i don't know how to do that
input in function -> "ThisIsMyChar"
expected -> "This is my char"
I´ve done it with regex, could be done with less code but my intention is readable
import re
def split_by_upper(input_string):
pattern = r'[A-Z][a-z]*'
matches = re.findall(pattern, input_string)
if (matches):
output = matches[0]
for word in matches[1:]:
output += ' ' + word[0].lower() + word[1:]
return output
else:
return input_string
print(split_by_upper("ThisIsMyChar"))
>> split_by_upper() -> "This is my char"
You could use re.findall and str.lower:
>>> import re
>>> s = 'ThisIsMyChar'
>>> ' '.join(w.lower() if i >= 1 else w for i, w in enumerate(re.findall('.[^A-Z]*', s)))
'This is my char'
You should first try by yourself. If you didn't get it done, you can do something like this:
# to parse input string
def parse(str):
result= "" + str[0];
for i in range(1, len(str)):
ch = str[i]
if ch.isupper():
result += " ";
result += ch.lower();
return result;
# input string
str = "ThisIsMyChar";
print(parse(str))
First you need to run a for loop and check for Uppercase words then when you find it just add a space at the starting, lower the word and increment it to your new string. Simple, more code is explained in comments in the code itself.
def AddSpaceInTitleCaseString(string):
NewStr = ""
# Check for Uppercase string in the input string char-by-char.
for i in string:
# If it found one, add it to the NewStr variable with a space and lowering it's case.
if i.isupper(): NewStr += f" {i.lower()}"
# Else just add it as usual.
else: NewStr += i
# Before returning the NewStr, remove all the leading and trailing spaces from it.
# And as shown in your question I'm assuming that you want the first letter or your new sentence,
# to be in uppercase so just use 'capitalize' function for it.
return NewStr.strip().capitalize()
# Test.
MyStr = AddSpaceInTitleCaseString("ThisIsMyChar")
print(MyStr)
# Output: "This is my char"
Hope it helped :)
Here is a concise regex solution:
import re
capital_letter_pattern = re.compile(r'(?!^)[A-Z]')
def add_spaces(string):
return capital_letter_pattern.sub(lambda match: ' ' + match[0].lower(), string)
if __name__ == '__main__':
print(add_spaces('ThisIsMyChar'))
The pattern searches for capital letters ([A-Z]), and the (?!^) is negative lookahead that excludes the first character of the input ((?!foo) means "don't match foo, ^ is "start of line", so (?!^) is "don't match start of line").
The .sub(...) method of a pattern is usually used like pattern.sub('new text', 'my input string that I want changed'). You can also use a function in place of 'new text', in which case the function is called with the match object as an argument, and the value returned by the function is used as the replacement string.
The expression capital_letter_pattern.sub(lambda match: ' ' + match[0].lower(), string) replaces all matches (all capital letters except at the start of the line) using a lambda function to add a space before and make the letter lowercase. match[0] means "the entirety of the matched text", which in this case is the captial letter.
You can split it via Regex using r"(?<!^)(?=[A-Z])" pattern:
import re
txt = 'ThisIsMyChar'
c = re.compile(r"(?<!^)(?=[A-Z])")
first, *rest = map(str.lower, c.split(txt))
print(f'{first.title()} {" ".join(rest)}')
Pattern explanation:
(?<!^) checks to see if it is not at the beginning.
(?=[A-Z]) checks to see there a capital letter after it.
note These are non-capturing groups.
This question already has answers here:
Split a string at uppercase letters
(22 answers)
Closed 2 years ago.
I am trying to make a script that will accept a string as input in which all of the words are run together, but the first character of each word is uppercase. It should convert the string to a string in which the words are separated by spaces and only the first word starts with an uppercase letter.
For Example (The Input):
"StopWhateverYouAreDoingInterestingIDontCare"
The expected output:
"Stop whatever you are doing interesting I dont care"
Here is the one I wrote so far:
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
def organize_string():
start_sentence = string_input[0]
index_of_i = string_input.index("I")
for i in string_input[1:]:
if i == "I" and string_input[index_of_i + 1].isupper():
start_sentence += ' ' + i
elif i.isupper():
start_sentence += ' ' + i.lower()
else:
start_sentence += i
return start_sentence
While this takes care of some parts, I am struggling with differentiating if the letter "I" is single or a whole word. Here is my output:
"Stop whatever you are doing interesting i dont care"
Single "I" needs to be uppercased, while the "I" in the word "Interesting" should be lowercased "interesting".
I will really appreciate all the help!
A regular expression will do in this example.
import re
s = "StopWhateverYouAreDoingInterestingIDontCare"
t = re.sub(r'(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z])', ' ', s)
Explained:
(?<=[a-z])(?=[A-Z]) - a lookbehind for a lowercase letter followed by a lookahead uppercase letter
| - (signifies or)
(?<=[A-Z])(?=[A-Z]) - a lookbehind for a uppercase letter followed by a lookahead uppercase letter
This regex substitutes a space when there is a lowercase letter followed by an uppercase letter, OR, when there is an uppercase letter followed by an uppercase letter.
UPDATE: This doesn't correctly lowercase the words (with the exception of I and the first_word)
UPDATE2: The fix to this is:
import re
s = "StopWhateverYouAreDoingInterestingIDontCare"
first_word, *rest = re.split(r'(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z])', s)
rest = [word.lower() if word != 'I' else word for word in rest]
print(first_word, ' '.join(rest))
Prints:
Stop whatever you are doing interesting I dont care
Update 3: I looked at why your code failed to correctly form the sentence (which I should have done in the first place instead of posting my own solution :-)).
Here is the corrected code with some remarks about the changes.
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
def organize_string():
start_sentence = string_input[0]
#index_of_i = string_input.index("I")
for i, char in enumerate(string_input[1:], start=1):
if char == "I" and string_input[i + 1].isupper():
start_sentence += ' ' + char
elif char.isupper():
start_sentence += ' ' + char.lower()
else:
start_sentence += char
return start_sentence
print(organize_string())
!. I commented out the line index_of_i = string_input.index("I") as it doesn't do what you need (it finds the index of the first capital I and not an I that should stand alone (it finds the index of the I in Interesting instead of the IDont further in the string_input string). It is not a correct statement.
for i, char in enumerate(string_input[1:], 1) enumerate states the index of the letters in the string starting at 1 (since string_input[1:] starts at index 1 so they are in sync). i is the index of a letter in string_input.
I changed the i's to char to make it clearer that char is the character. Other than these changes, the code stands as you wrote it.
Now the program gives the correct output.
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
counter = 1
def organize_string():
global counter
start_sentence = string_input[0]
for i in string_input[1:]:
if i == "I" and string_input[counter+1].isupper():
start_sentence += ' ' + i
elif i.isupper():
start_sentence += ' ' + i.lower()
else:
start_sentence += i
counter += 1
print(start_sentence)
organize_string()
I made some changes to your program. I used a counter to check the index position. I get your expected output:
Stop whatever you are doing interesting I dont care
s = 'StopWhateverYouAreDoingInterestingIDontCare'
ss = ' '
res = ''.join(ss + x if x.isupper() else x for x in s).strip(ss).split(ss)
sr = ''
for w in res:
sr = sr + w.lower() + ' '
print(sr[0].upper() + sr[1:])
output
Stop whatever you are doing interesting i dont care
I hope this will work fine :-
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
def organize_string():
i=0
while i<len(string_input):
if string_input[i]==string_input[i].upper() and i==0 :
print(' ',end='')
print(string_input[i].upper(),end='')
elif string_input[i]==string_input[i].upper() and string_input[i+1]==string_input[i+1].upper():
print(' ',end='')
print(string_input[i].upper(),end='')
elif string_input[i]==string_input[i].upper() and i!=0:
print(' ',end='')
print(string_input[i].lower(),end='')
if string_input[i]!=string_input[i].upper():
print(string_input[i],end='')
i=i+1
organize_string()
Here is one solution utilising the re package to split the string based on the upper case characters. [Docs]
import re
text = "StopWhateverYouAreDoingInterestingIDontCare"
# Split text by upper character
text_splitted = re.split('([A-Z])', text)
print(text_splitted)
As we see in the output below the separator (The upper case character) and the text before and after is kept. This means that the upper case character is always followed by the rest of the word. The empty first string originates from the first upper case character, which is the first separator.
# Output of print
[
'',
'S', 'top',
'W', 'hatever',
'Y', 'ou',
'A', 're',
'D', 'oing',
'I', 'nteresting',
'I', '',
'D', 'ont',
'C', 'are'
]
As we have seen the first character is always followed by the rest of the word. By combining the two we have the splitted words. This also allows us to easily handle your special case with the I
# Remove first character because it is always empty if first char is always upper
text_splitted = text_splitted[1:]
result = []
for i in range(0, len(text_splitted), 2):
word = text_splitted[i]+text_splitted[i+1]
if (i > 0) and (word != 'I') :
word = word.lower()
result.append(word)
result = ' '.join(result)
split the sentence into individual words. If you find the word "I" in this list, leave it alone. Leave the first word alone. All of the other words, you cast to lower case.
You have to use some string manipulation like this:
output=string_input[0]
for l in string_input[1:]:
if l.islower():
new_s+=l
else:
new_s+=' '+l.lower()
print(output)
So I have, p.e, this string: ' I love python ' and I want to convert all the spaces to '_'. My problem is that I also need to delete the outside spaces so I dont finish with the result: '_I_love_python__' and more like this 'I_love_python'
I searched and found out that I can develop it with a single line of code mystring.strip().replace(" ", "_") which is unfortunaly is sintax that I cant apply in my essay.
So what I landed with was this:
frase= str(input('Introduza: '))
aux=''
for car in frase:
if car==' ':
car='_'
aux+=car
else:
aux+=car
print(aux)
My problem now is on deleting those outside spaces. What I thought about was runing another for i in in the start and another on the final of the string and to stop until they found a non space caracter. But unfortunaly I havent been able to do that...
Apreciate all the help you can suply!
I came up with following solution:
You iterate over the string, but instead of replacing the space with underscore as soon as it appears, you store the amount of spaces encountered. Then, once a non-space-character is reached, you add the amount of spaces found to the string. So if the string ends with lots of spaces, it will never reach a non-space-character and therefore never add the underscores.
For cutting off the spaces at the beginning, I just added a condition to add the underscores being: "Have I encountered a non-space-character before?"
Here is the code:
text = " I love python e "
out = ""
string_started = False
underscores_to_add = 0
for c in text:
if c == " ":
underscores_to_add += 1
else:
if string_started:
out += "_" * underscores_to_add
underscores_to_add = 0
string_started = True
out += c
print(out) # prints "I_love___python____e"
You can use the following trick to remove leading and trailing spaces in your string:
s = ' I love python '
ind1 = min(len(s) if c == ' ' else n for n, c in enumerate(s))
ind2 = max(0 if c == ' ' else n for n, c in enumerate(s))
s = ''.join('_' if c == ' ' else c for c in s[ind1:ind2 + 1])
print('*', s, '*', sep='')
Output:
*I_love_python*
If you are not allowed to use strip() method
def find(text):
for i, s in enumerate(text):
if s != " ":
break
return i
text = " I love python e "
text[find(text):len(text)-find(text[::-1])].replace(" ","_")
texts = [" I love python e ","I love python e"," I love python e","I love python e ", "I love python e"]
for text in texts:
print (text[find(text):len(text)-find(text[::-1])].replace(" ","_"))
output:
I_love___python____e
I_love___python____e
I_love___python____e
I_love___python____e
I_love___python____e
Given a string find will find the first non space character in the string
Use find to find the first nonspace character and the last nonspace character
Get the substring using above found indices
Replace all spaces with _ in the above substring
I stuck on the following lab exercise:
The first piece we need is a routine that, given a word, will in
someway jumble up the order of all but the first and the last
characters. Rather than just randomly moving the characters around we
will reverse the order of the letters. The following code achieves
this:
def jumble(x):
return x[len(x)::-1]
my_string="Alistair"
print(" Reverse ",jumble(my_string))
Copy the above code to a file and run it. Currently it reverses the
order of all the characters in "my_string". Modify the code so that
the first and last letters of the word are NOT reversed. That is,
instead of producing "riatsilA" it produces "Aiatsilr".
This is my code for the above part:
def jumble(x):
temp0=x[0]
temp_last=x[-1]
x_new=temp0 + x[-2:0:-1] + temp_last
return x_new
my_string="Alistair"
print(" Reverse ",jumble(my_string))
The above routine does not account for leading or trailing white
space, punctuation or other characters that might legitimately be part
of the character string, but that should not be jumbled up. For
example if the string were " Alistair, " the result should be
" riatsilA, ". Modify your routine so that only the FIRST contiguous
series of alphabetical characters (minus the first and last
characters) are reversed. Ensure that the final returned string
includes all other leading and trailing characters.
I am not sure how to do this, as white space and punctuations can happen anywhere, I am thinking about have two lists, one for empty space and punctuations while another one for "contigous series of alphabetical characters", using append method to append elements to each list, but not sure how to preserve index.Can someone help me out? Thanks in advance.
#!/usr/bin/env python
#-*-coding:utf-8-*-
import re
def reverse(s):
p = re.compile(r'\w+')
for m in p.finditer(s):
word = m.group()
if word:
p = s.partition(word)
l = list(p)
index = p.index(word)
l[index] = l[index][::-1]
l2 = list(l[index])
l2[0],l2[-1]=l2[-1],l2[0]
l[index]=''.join(l2)
s = ''.join(l)
return s
s="Alistair Chris,"
print reverse(s)
fixed it.
The following code snipplet might help you for the last task.
There's no special handling for reverting of subsets, if a special char is found somewhere else than at start or end of the string.
# Special chars which should be ignored for reverting
SPECIALCHARS = [' ', '.', ',']
def reverse( string_ ):
# Find occurence of 'special' chars. Stack position and char into a list.
specchar = [( i, ltr ) for i, ltr in enumerate( string_ ) if ltr in SPECIALCHARS]
if specchar:
# Remove all the special characters
newstring = ''.join( c for c in string_ if c not in SPECIALCHARS )
# Reverse the legal characters
newstring = newstring[::-1]
offset = 0
# Re-insert the removed special chars
for pos, char in specchar:
if pos + 1 + offset >= len( newstring ):
# Append at the end
newstring += char
else:
# Insert
newstring = newstring[:pos + offset] + char + newstring[pos + offset:]
offset += 1
return newstring
else: # No special char at all, so just revert
return string_[::-1]
print " '%s' =?= ' riatsilA, '" % ( reverse( " Alistair, " ) )
will lead to the following output: ' riatsilA, ' =?= ' riatsilA, '
Just reverting with ignoring the first and last char is a one liner.
Using this one to invert the 'middle': t[1:-1][::-1]
Then add the first and last char to it.
>>> t = "Alistair"
>>> t[0]+t[1:-1][::-1]+t[-1]
'Aiatsilr'
Edit
Ok, now I think I understood what you want :-)
newstr = ""
fullstr = ""
for char in mystr:
if char in SPECIALCHARS:
if len( newstr ): # Is already something to invert there?
#Ignore 1st and last char and revert the rest
newstr = newstr[0] + newstr[1:-1][::-1] + newstr[-1]
fullstr += newstr + char
else: # Special char found but nothing to revert so far
fullstr += char
newstr = ""
else: # No special char so just append
newstr += char
print "'%s'" % fullstr
I have couple of strings (each string is a set of words) which has special characters in them. I know using strip() function, we can remove all occurrences of only one specific character from any string. Now, I would like to remove set of special characters (include !##%&*()[]{}/?<> ) etc.
What is the best way you can get these unwanted characters removed from the strings.
in-str = "#John, It's a fantastic #week-end%, How about () you"
out-str = "John, It's a fantastic week-end, How about you"
import string
s = "#John, It's a fantastic #week-end%, How about () you"
for c in "!##%&*()[]{}/?<>":
s = string.replace(s, c, "")
print s
prints "John, It's a fantastic week-end, How about you"
The strip function removes only leading and trailing characters.
For your purpose I would use python set to store your characters, iterate over your input string and create new string from characters not present in the set. According to other stackoverflow article this should be efficient. At the end, just remove double spaces by clever " ".join(output_string.split()) construction.
char_set = set("!##%&*()[]{}/?<>")
input_string = "#John, It's a fantastic #week-end%, How about () you"
output_string = ""
for i in range(0, len(input_string)):
if not input_string[i] in char_set:
output_string += input_string[i]
output_string = " ".join(output_string.split())
print output_string
Try out this:
import re
foo = 'a..!b...c???d;;'
chars = [',', '!', '.', ';', '?']
print re.sub('[%s]' % ''.join(chars), '', foo)
I presume that this is what you wanted.
try
s = "#John, It's a fantastic #week-end%, How about () you"
chars = "!##%&*()[]{}/?<>"
s_no_chars = "".join([k for k in s if k not in chars])
s_no_chars_spaces = " ".join([ d for d in "".join([k for k in s if k not in chars]).split(" ") if d])