I am working on a data science project and I have an issue. I have an array full of string like the following string and I want to add a space between the words and between the special characters
sentence[i] = 'This is a⓵⓶⓷string'
and I expect something like that:
sentence[i] = 'This is a ⓵ ⓶ ⓷ string'
My last try:
l=[]
for i in lines:
for j in i:
if j.isalpha() == False:
l.append(i.split())
else:
l.append(i)
print(l)
for i in l:
s = ' '.join(i)
You could simply scan the complete line and selectively add space for each character that is neither alphabet nor a space.
s = 'This is a⓵⓶⓷string'
t = ''
for x in s :
if not str.isalpha(x) and x != ' ' :
if t[-1] != ' ':
t+= ' '
t += x
t += ' '
else: t += x
this works for example you have given.
Use regular expressions to accomplish that task, specifically re.sub together with a backreference in order to surround the matched characters with spaces.
Related
I have strings like so: hey what is up!, "what did you say?", "he said 'well'", etc. and a regex expression like so: [!%&'\(\)$#\"\/\\*+,-.:;<=>?#\[\]^_´{|}~]´. These are my delimiters and into the strings shown a space shall be inserted like so: "hey what is up !", "what did you say ?", "he said ' well '". So if one of the delimiters is in front of another character sequence, add a space, and if its is after, add space as well.
How can I achieve this? I do not want to split by these delimiters.
Here's my solution but I would be curious how to solve it with regex.
space = set("[!%&'()$#\"/\*+,-.:;<=>?#[]^_´`{|}~]")
for sent in self.sentences:
sent = list(sent)
for i, char in enumerate(sent):
# Make sure to respect length of string when indexing
if i != 0:
# insert space in front if char is punctuation
if sent[i] in space and sent[i - 1] != " ":
sent.insert(i, " ")
if i != len(sent)-1:
# insert space after if char is punctuation
if sent[i] in space and sent[i + 1] != " ":
sent.insert(i + 1, " ")
You could expand your pattern to catch optional spaces and then replace by capture group plus spaces before and after (loop only for demo, not neccessary):
import re
strings = ["hey what is up!", "what did you say?", "he said 'well'"]
pattern = r'(\s?[!%&\'\(\)$#\"\/\\*+,-.:;<=>?#\[\]^_´{|}~]\s?)'
for string in strings:
print(re.sub(pattern, r' \1 ', string))
This will give this output:
hey what is up !
what did you say ?
he said ' well '
Without the aid of the re module you could simply do this:
punctuation = "!%&'()$#\"/\\*+,-.:;<=>?#[]^_´{|}~"
mystring = "Well hello! How are you?"
mylist = list(mystring)
i = 0
for c in mystring:
if c in punctuation:
mylist.insert(i, ' ')
i += 2
else:
i += 1
print(''.join(mylist))
You can make a loop that goes through your strings and when it finds a ponctuation character use the slice function to cut your string in half and concatenate with a space in between.
For example:
for i in yourString:
if yourString[i] == '!':
newString = yourString.slice(0, i) + " " + yourString.slice(i + 1)
It only checks for "!" but you could replace it with a dictionnary of ponctuation characters
This question already has answers here:
Split a string at uppercase letters
(22 answers)
Closed 2 years ago.
I am trying to make a script that will accept a string as input in which all of the words are run together, but the first character of each word is uppercase. It should convert the string to a string in which the words are separated by spaces and only the first word starts with an uppercase letter.
For Example (The Input):
"StopWhateverYouAreDoingInterestingIDontCare"
The expected output:
"Stop whatever you are doing interesting I dont care"
Here is the one I wrote so far:
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
def organize_string():
start_sentence = string_input[0]
index_of_i = string_input.index("I")
for i in string_input[1:]:
if i == "I" and string_input[index_of_i + 1].isupper():
start_sentence += ' ' + i
elif i.isupper():
start_sentence += ' ' + i.lower()
else:
start_sentence += i
return start_sentence
While this takes care of some parts, I am struggling with differentiating if the letter "I" is single or a whole word. Here is my output:
"Stop whatever you are doing interesting i dont care"
Single "I" needs to be uppercased, while the "I" in the word "Interesting" should be lowercased "interesting".
I will really appreciate all the help!
A regular expression will do in this example.
import re
s = "StopWhateverYouAreDoingInterestingIDontCare"
t = re.sub(r'(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z])', ' ', s)
Explained:
(?<=[a-z])(?=[A-Z]) - a lookbehind for a lowercase letter followed by a lookahead uppercase letter
| - (signifies or)
(?<=[A-Z])(?=[A-Z]) - a lookbehind for a uppercase letter followed by a lookahead uppercase letter
This regex substitutes a space when there is a lowercase letter followed by an uppercase letter, OR, when there is an uppercase letter followed by an uppercase letter.
UPDATE: This doesn't correctly lowercase the words (with the exception of I and the first_word)
UPDATE2: The fix to this is:
import re
s = "StopWhateverYouAreDoingInterestingIDontCare"
first_word, *rest = re.split(r'(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z])', s)
rest = [word.lower() if word != 'I' else word for word in rest]
print(first_word, ' '.join(rest))
Prints:
Stop whatever you are doing interesting I dont care
Update 3: I looked at why your code failed to correctly form the sentence (which I should have done in the first place instead of posting my own solution :-)).
Here is the corrected code with some remarks about the changes.
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
def organize_string():
start_sentence = string_input[0]
#index_of_i = string_input.index("I")
for i, char in enumerate(string_input[1:], start=1):
if char == "I" and string_input[i + 1].isupper():
start_sentence += ' ' + char
elif char.isupper():
start_sentence += ' ' + char.lower()
else:
start_sentence += char
return start_sentence
print(organize_string())
!. I commented out the line index_of_i = string_input.index("I") as it doesn't do what you need (it finds the index of the first capital I and not an I that should stand alone (it finds the index of the I in Interesting instead of the IDont further in the string_input string). It is not a correct statement.
for i, char in enumerate(string_input[1:], 1) enumerate states the index of the letters in the string starting at 1 (since string_input[1:] starts at index 1 so they are in sync). i is the index of a letter in string_input.
I changed the i's to char to make it clearer that char is the character. Other than these changes, the code stands as you wrote it.
Now the program gives the correct output.
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
counter = 1
def organize_string():
global counter
start_sentence = string_input[0]
for i in string_input[1:]:
if i == "I" and string_input[counter+1].isupper():
start_sentence += ' ' + i
elif i.isupper():
start_sentence += ' ' + i.lower()
else:
start_sentence += i
counter += 1
print(start_sentence)
organize_string()
I made some changes to your program. I used a counter to check the index position. I get your expected output:
Stop whatever you are doing interesting I dont care
s = 'StopWhateverYouAreDoingInterestingIDontCare'
ss = ' '
res = ''.join(ss + x if x.isupper() else x for x in s).strip(ss).split(ss)
sr = ''
for w in res:
sr = sr + w.lower() + ' '
print(sr[0].upper() + sr[1:])
output
Stop whatever you are doing interesting i dont care
I hope this will work fine :-
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
def organize_string():
i=0
while i<len(string_input):
if string_input[i]==string_input[i].upper() and i==0 :
print(' ',end='')
print(string_input[i].upper(),end='')
elif string_input[i]==string_input[i].upper() and string_input[i+1]==string_input[i+1].upper():
print(' ',end='')
print(string_input[i].upper(),end='')
elif string_input[i]==string_input[i].upper() and i!=0:
print(' ',end='')
print(string_input[i].lower(),end='')
if string_input[i]!=string_input[i].upper():
print(string_input[i],end='')
i=i+1
organize_string()
Here is one solution utilising the re package to split the string based on the upper case characters. [Docs]
import re
text = "StopWhateverYouAreDoingInterestingIDontCare"
# Split text by upper character
text_splitted = re.split('([A-Z])', text)
print(text_splitted)
As we see in the output below the separator (The upper case character) and the text before and after is kept. This means that the upper case character is always followed by the rest of the word. The empty first string originates from the first upper case character, which is the first separator.
# Output of print
[
'',
'S', 'top',
'W', 'hatever',
'Y', 'ou',
'A', 're',
'D', 'oing',
'I', 'nteresting',
'I', '',
'D', 'ont',
'C', 'are'
]
As we have seen the first character is always followed by the rest of the word. By combining the two we have the splitted words. This also allows us to easily handle your special case with the I
# Remove first character because it is always empty if first char is always upper
text_splitted = text_splitted[1:]
result = []
for i in range(0, len(text_splitted), 2):
word = text_splitted[i]+text_splitted[i+1]
if (i > 0) and (word != 'I') :
word = word.lower()
result.append(word)
result = ' '.join(result)
split the sentence into individual words. If you find the word "I" in this list, leave it alone. Leave the first word alone. All of the other words, you cast to lower case.
You have to use some string manipulation like this:
output=string_input[0]
for l in string_input[1:]:
if l.islower():
new_s+=l
else:
new_s+=' '+l.lower()
print(output)
For this problem, I am given strings ThatAreLikeThis where there are no spaces between words and the 1st letter of each word is capitalized. My task is to lowercase each capital letter and add spaces between words. The following is my code. What I'm doing there is using a while loop nested inside a for-loop. I've turned the string into a list and check if the capital letter is the 1st letter or not. If so, all I do is make the letter lowercase and if it isn't the first letter, I do the same thing but insert a space before it.
def amendTheSentence(s):
s_list = list(s)
for i in range(len(s_list)):
while(s_list[i].isupper()):
if (i == 0):
s_list[i].lower()
else:
s_list.insert(i-1, " ")
s_list[i].lower()
return ''.join(s_list)
However, for the test case, this is the behavior:
Input: s: "CodesignalIsAwesome"
Output: undefined
Expected Output: "codesignal is awesome"
Console Output: Empty
You can use re.sub for this:
re.sub(r'(?<!\b)([A-Z])', ' \\1', s)
Code:
import re
def amendTheSentence(s):
return re.sub(r'(?<!\b)([A-Z])', ' \\1', s).lower()
On run:
>>> amendTheSentence('GoForPhone')
go for phone
Try this:
def amendTheSentence(s):
start = 0
string = ""
for i in range(1, len(s)):
if s[i].isupper():
string += (s[start:i] + " ")
start = i
string += s[start:]
return string.lower()
print(amendTheSentence("CodesignalIsAwesome"))
print(amendTheSentence("ThatAreLikeThis"))
Output:
codesignal is awesome
that are like this
def amendTheSentence(s):
new_sentence=''
for char in s:
if char.isupper():
new_sentence=new_sentence + ' ' + char.lower()
else:
new_sentence=new_sentence + char
return new_sentence
new_sentence=amendTheSentence("CodesignalIsAwesome")
print (new_sentence)
result is codesignal is awesome
this is the string for example:
'I have an apple. I want to eat it. But it is so sore.'
and I want to convert it to this one:
'I have an apple want to eat it is is so sore'
Here is a way to do it without regexes, using del as you have mentioned:
def remove_after_sym(s, sym):
# Find first word
first = s.strip().split(' ')[0]
# Split the string using the symbol
l = []
s = s.strip().split(sym)
# Split words by space in each sentence
for a in s:
x = a.strip().split(' ')
del x[0]
l.append(x)
# Join words in each sentence
for i in range(len(l)):
l[i] = ' '.join(l[i])
# Combine sentences
final = first + ' ' + ' '.join(l)
final = final.strip() + '.'
return final
Here, sym is a str (a single character).
Also I have used the word 'sentence' very liberally as in your example, sym is a dot. But here sentence really means parts of the string broken by the symbol you want.
Here is what it outputs.
In [1]: remove_after_sym(string, '.')
Out[1]: 'I have an apple want to eat it it is so sore.'
lword = (bword.lower())
word = str(lword)
spaces = []
for spaces in word:
if spaces == ' ':
spaces.append (' ')
else:
spaces.append('_')
print (spaces)
How would this look if it was in Python 2.7?
You made a mistake in your code:
lword = (bword.lower())
word = str(lword)
spaces = []
for character in word: # character is the temporally variable inside the for-loop
if character == ' '
spaces.append(' ')
else:
spaces.append('_')
print (spaces)
This code will give the result what you're expecting.