I want to eliminate all the whitespace from a string, on both ends, and in between words.
I have this Python code:
def my_handle(self):
sentence = ' hello apple '
sentence.strip()
But that only eliminates the whitespace on both sides of the string. How do I remove all whitespace?
If you want to remove leading and ending spaces, use str.strip():
>>> " hello apple ".strip()
'hello apple'
If you want to remove all space characters, use str.replace() (NB this only removes the “normal” ASCII space character ' ' U+0020 but not any other whitespace):
>>> " hello apple ".replace(" ", "")
'helloapple'
If you want to remove duplicated spaces, use str.split() followed by str.join():
>>> " ".join(" hello apple ".split())
'hello apple'
To remove only spaces use str.replace:
sentence = sentence.replace(' ', '')
To remove all whitespace characters (space, tab, newline, and so on) you can use split then join:
sentence = ''.join(sentence.split())
or a regular expression:
import re
pattern = re.compile(r'\s+')
sentence = re.sub(pattern, '', sentence)
If you want to only remove whitespace from the beginning and end you can use strip:
sentence = sentence.strip()
You can also use lstrip to remove whitespace only from the beginning of the string, and rstrip to remove whitespace from the end of the string.
An alternative is to use regular expressions and match these strange white-space characters too. Here are some examples:
Remove ALL spaces in a string, even between words:
import re
sentence = re.sub(r"\s+", "", sentence, flags=re.UNICODE)
Remove spaces in the BEGINNING of a string:
import re
sentence = re.sub(r"^\s+", "", sentence, flags=re.UNICODE)
Remove spaces in the END of a string:
import re
sentence = re.sub(r"\s+$", "", sentence, flags=re.UNICODE)
Remove spaces both in the BEGINNING and in the END of a string:
import re
sentence = re.sub("^\s+|\s+$", "", sentence, flags=re.UNICODE)
Remove ONLY DUPLICATE spaces:
import re
sentence = " ".join(re.split("\s+", sentence, flags=re.UNICODE))
(All examples work in both Python 2 and Python 3)
"Whitespace" includes space, tabs, and CRLF. So an elegant and one-liner string function we can use is str.translate:
Python 3
' hello apple '.translate(str.maketrans('', '', ' \n\t\r'))
OR if you want to be thorough:
import string
' hello apple'.translate(str.maketrans('', '', string.whitespace))
Python 2
' hello apple'.translate(None, ' \n\t\r')
OR if you want to be thorough:
import string
' hello apple'.translate(None, string.whitespace)
For removing whitespace from beginning and end, use strip.
>> " foo bar ".strip()
"foo bar"
' hello \n\tapple'.translate({ord(c):None for c in ' \n\t\r'})
MaK already pointed out the "translate" method above. And this variation works with Python 3 (see this Q&A).
In addition, strip has some variations:
Remove spaces in the BEGINNING and END of a string:
sentence= sentence.strip()
Remove spaces in the BEGINNING of a string:
sentence = sentence.lstrip()
Remove spaces in the END of a string:
sentence= sentence.rstrip()
All three string functions strip lstrip, and rstrip can take parameters of the string to strip, with the default being all white space. This can be helpful when you are working with something particular, for example, you could remove only spaces but not newlines:
" 1. Step 1\n".strip(" ")
Or you could remove extra commas when reading in a string list:
"1,2,3,".strip(",")
Be careful:
strip does a rstrip and lstrip (removes leading and trailing spaces, tabs, returns and form feeds, but it does not remove them in the middle of the string).
If you only replace spaces and tabs you can end up with hidden CRLFs that appear to match what you are looking for, but are not the same.
eliminate all the whitespace from a string, on both ends, and in between words.
>>> import re
>>> re.sub("\s+", # one or more repetition of whitespace
'', # replace with empty string (->remove)
''' hello
... apple
... ''')
'helloapple'
https://en.wikipedia.org/wiki/Whitespace_character
Python docs:
https://docs.python.org/library/stdtypes.html#textseq
https://docs.python.org/library/stdtypes.html#str.replace
https://docs.python.org/library/string.html#string.replace
https://docs.python.org/library/re.html#re.sub
https://docs.python.org/library/re.html#regular-expression-syntax
I use split() to ignore all whitespaces and use join() to concatenate
strings.
sentence = ''.join(' hello apple '.split())
print(sentence) #=> 'helloapple'
I prefer this approach because it is only a expression (not a statement).
It is easy to use and it can use without binding to a variable.
print(''.join(' hello apple '.split())) # no need to binding to a variable
import re
sentence = ' hello apple'
re.sub(' ','',sentence) #helloworld (remove all spaces)
re.sub(' ',' ',sentence) #hello world (remove double spaces)
In the following script we import the regular expression module which we use to substitute one space or more with a single space. This ensures that the inner extra spaces are removed. Then we use strip() function to remove leading and trailing spaces.
# Import regular expression module
import re
# Initialize string
a = " foo bar "
# First replace any number of spaces with a single space
a = re.sub(' +', ' ', a)
# Then strip any leading and trailing spaces.
a = a.strip()
# Show results
print(a)
I found that this works the best for me:
test_string = ' test a s test '
string_list = [s.strip() for s in str(test_string).split()]
final_string = ' '.join(string_array)
# final_string: 'test a s test'
It removes any whitespaces, tabs, etc.
try this.. instead of using re i think using split with strip is much better
def my_handle(self):
sentence = ' hello apple '
' '.join(x.strip() for x in sentence.split())
#hello apple
''.join(x.strip() for x in sentence.split())
#helloapple
I would like to know how to remove specific words from a string in python without deleting them from other words they composed.
For example if I want to remove 'is' from the following sentence:
s = 'isabelle is in Paris'
The .replace() function delete 'is' in 'isabelle' and in 'Paris':
s = 'isabelle is in Paris'
s.replace('is', '')
It gives me abelle in Par But I want isabelle in Paris. Is there a way to delete only 'is'?
I tried: s.replace(' is ', '') with a space each side of 'is' but in this case 'is' is not removed in the string s = 'Isabelle is, as you know, in Paris'
Thank you
Use a regular expression instead of replacing an ordinary string. You can then use \b in the regexp to match word boundaries.
import re
s = re.sub(r'\bis\b', '', s)
String two-part fixed and separated by a space
import re
text="jahir islam"
print(re.sub(r' ',text))
Input: jahir islam
Output: islam jahir
You don't need to do that. Just split the string on the space and reverse it and join it back;
input_str = "jahir islam"
output_str = " ".join(input_str.split(" ")[::-1])
Using regex you can do this way if you know that you have only 2 words you need to swap
import re
text = 'jahir islam'
print re.sub(r'(.*) (.*)', r'\2 \1', text)
explanation:
re.sub(r'(.*) (.*)', r'\2 \1', text)
it groups the words and '\1' and '\2' are the representation of the groups you made, now you can place them any where as you want
for further query you can post a comment
This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 6 years ago.
I've been looking for a way to isolate special characters in a regex expression, but I only seem to find the exact opposite of what I'm looking for. So basically I want to is something along the lines of this:
import re
str = "I only want characters from the pattern below to appear in a list ()[]' including quotations"
pattern = """(){}[]"'-"""
result = re.findall(pattern, str)
What I expect from this is:
print(result)
#["(", ")", "[", "]", "'"]
Edit: thank you to whomever answered then deleted their comment with this regex that solved my problem:
pattern = r"""[(){}\[\]"'\-]"""
Why would you need regex for this when it can be done without regex?
>>> str = "I only want characters from the pattern below to appear in a list ()[]' including quotations"
>>> pattern = """(){}[]"'-"""
>>> [x for x in str if x in pattern]
['(', ')', '[', ']', "'"]
If it's for learning purposes (regex isn't really the best way here), then you can use:
import re
text = "I only want characters from the pattern below to appear in a list ()[]' including quotations"
output = re.findall('[' + re.escape("""(){}[]"'-""") + ']', text)
# ['(', ')', '[', ']', "'"]
Surrounding the characters in [ and ] makes it a regex character class and re.escape will escape any characters that have special regex meaning to avoid breaking the regex string (eg: ] terminating the characters early or - in a certain place causing it to act like a character range).
Several of the characters in your set have special meaning in regular expressions; to match them literally, you need to backslash-escape them.
pattern = r"""\(\)\{\}\[]"'-"""
Alternatively, you could use a character class:
pattern = """[]-[(){}"']"""
Notice also the use of a "raw string" r'...' to avoid having Python interpret the backslashes.
I am having trouble splitting continuous strings into more reasonable parts:
E.g. 'MarieMüller' should become 'Marie Müller'
So far I've used this, which works if no special characters occur:
' '.join([a for a in re.split(ur'([A-Z][a-z]+)', ''.join(entity)) if a])
This outputs for e.g. 'TinaTurner' -> 'Tina Turner', but doesn't work
for 'MarieMüller', which outputs: 'MarieMüller' -> 'Marie M \utf8 ller'
Now I came accros using regex \p{L}:
' '.join([a for a in re.split(ur'([\p{Lu}][\p{Ll}]+)', ''.join(entity)) if a])
But this produces weird things like:
'JenniferLawrence' -> 'Jennifer L awrence'
Could anyone give me a hand?
If you work with Unicode and need to use Unicode categories, you should consider using PyPi regex module. There, you have support for all the Unicode categories:
>>> import regex
>>> p = regex.compile(ur'(?<=\p{Ll})(?=\p{Lu})')
>>> test_str = u"Tina Turner\nMarieM\u00FCller\nJacek\u0104cki"
>>> result = p.sub(u" ", test_str)
>>> result
u'Tina Turner\nMarie M\xfcller\nJacek \u0104cki'
^ ^ ^
Here, the (?<=\p{Ll})(?=\p{Lu}) regex finds all locations between the lower- (\p{Ll}) and uppercase (\p{Lu}) letters, and then the regex.sub inserts a space there. Note that regex module automatically compiles the regex with regex.UNICODE flag if the pattern is a Unicode string (u-prefixed).
It won't work for extended character
You can use re.sub() for this. It will be much simpler
(?=(?!^)[A-Z])
For handling spaces
print re.sub(r'(?<=[^\s])(?=(?!^)[A-Z])', ' ', ' Tina Turner'.strip())
For handling cases of consecutive capital letters
print re.sub(r'(?<=[a-z])(?=[A-Z])', ' ', ' TinaTXYurner'.strip())
Ideone Demo
Regex Breakdown
(?= #Lookahead to find all the position of capital letters
(?!^) #Ignore the first capital letter for substitution
[A-Z]
)
Using a function constructed of Python's string operations instead of regular expressions, this should work:
def split_combined_words(combined):
separated = [combined[1]]
for letter in combined[1:]:
print letter
if (letter.islower() or (letter.isupper() and separated[-1].isupper())):
separated.append(letter)
else:
separated.extend((" ", letter))
return "".join(separated)