I want to eliminate all the whitespace from a string, on both ends, and in between words.
I have this Python code:
def my_handle(self):
sentence = ' hello apple '
sentence.strip()
But that only eliminates the whitespace on both sides of the string. How do I remove all whitespace?
If you want to remove leading and ending spaces, use str.strip():
>>> " hello apple ".strip()
'hello apple'
If you want to remove all space characters, use str.replace() (NB this only removes the “normal” ASCII space character ' ' U+0020 but not any other whitespace):
>>> " hello apple ".replace(" ", "")
'helloapple'
If you want to remove duplicated spaces, use str.split() followed by str.join():
>>> " ".join(" hello apple ".split())
'hello apple'
To remove only spaces use str.replace:
sentence = sentence.replace(' ', '')
To remove all whitespace characters (space, tab, newline, and so on) you can use split then join:
sentence = ''.join(sentence.split())
or a regular expression:
import re
pattern = re.compile(r'\s+')
sentence = re.sub(pattern, '', sentence)
If you want to only remove whitespace from the beginning and end you can use strip:
sentence = sentence.strip()
You can also use lstrip to remove whitespace only from the beginning of the string, and rstrip to remove whitespace from the end of the string.
An alternative is to use regular expressions and match these strange white-space characters too. Here are some examples:
Remove ALL spaces in a string, even between words:
import re
sentence = re.sub(r"\s+", "", sentence, flags=re.UNICODE)
Remove spaces in the BEGINNING of a string:
import re
sentence = re.sub(r"^\s+", "", sentence, flags=re.UNICODE)
Remove spaces in the END of a string:
import re
sentence = re.sub(r"\s+$", "", sentence, flags=re.UNICODE)
Remove spaces both in the BEGINNING and in the END of a string:
import re
sentence = re.sub("^\s+|\s+$", "", sentence, flags=re.UNICODE)
Remove ONLY DUPLICATE spaces:
import re
sentence = " ".join(re.split("\s+", sentence, flags=re.UNICODE))
(All examples work in both Python 2 and Python 3)
"Whitespace" includes space, tabs, and CRLF. So an elegant and one-liner string function we can use is str.translate:
Python 3
' hello apple '.translate(str.maketrans('', '', ' \n\t\r'))
OR if you want to be thorough:
import string
' hello apple'.translate(str.maketrans('', '', string.whitespace))
Python 2
' hello apple'.translate(None, ' \n\t\r')
OR if you want to be thorough:
import string
' hello apple'.translate(None, string.whitespace)
For removing whitespace from beginning and end, use strip.
>> " foo bar ".strip()
"foo bar"
' hello \n\tapple'.translate({ord(c):None for c in ' \n\t\r'})
MaK already pointed out the "translate" method above. And this variation works with Python 3 (see this Q&A).
In addition, strip has some variations:
Remove spaces in the BEGINNING and END of a string:
sentence= sentence.strip()
Remove spaces in the BEGINNING of a string:
sentence = sentence.lstrip()
Remove spaces in the END of a string:
sentence= sentence.rstrip()
All three string functions strip lstrip, and rstrip can take parameters of the string to strip, with the default being all white space. This can be helpful when you are working with something particular, for example, you could remove only spaces but not newlines:
" 1. Step 1\n".strip(" ")
Or you could remove extra commas when reading in a string list:
"1,2,3,".strip(",")
Be careful:
strip does a rstrip and lstrip (removes leading and trailing spaces, tabs, returns and form feeds, but it does not remove them in the middle of the string).
If you only replace spaces and tabs you can end up with hidden CRLFs that appear to match what you are looking for, but are not the same.
eliminate all the whitespace from a string, on both ends, and in between words.
>>> import re
>>> re.sub("\s+", # one or more repetition of whitespace
'', # replace with empty string (->remove)
''' hello
... apple
... ''')
'helloapple'
https://en.wikipedia.org/wiki/Whitespace_character
Python docs:
https://docs.python.org/library/stdtypes.html#textseq
https://docs.python.org/library/stdtypes.html#str.replace
https://docs.python.org/library/string.html#string.replace
https://docs.python.org/library/re.html#re.sub
https://docs.python.org/library/re.html#regular-expression-syntax
I use split() to ignore all whitespaces and use join() to concatenate
strings.
sentence = ''.join(' hello apple '.split())
print(sentence) #=> 'helloapple'
I prefer this approach because it is only a expression (not a statement).
It is easy to use and it can use without binding to a variable.
print(''.join(' hello apple '.split())) # no need to binding to a variable
import re
sentence = ' hello apple'
re.sub(' ','',sentence) #helloworld (remove all spaces)
re.sub(' ',' ',sentence) #hello world (remove double spaces)
In the following script we import the regular expression module which we use to substitute one space or more with a single space. This ensures that the inner extra spaces are removed. Then we use strip() function to remove leading and trailing spaces.
# Import regular expression module
import re
# Initialize string
a = " foo bar "
# First replace any number of spaces with a single space
a = re.sub(' +', ' ', a)
# Then strip any leading and trailing spaces.
a = a.strip()
# Show results
print(a)
I found that this works the best for me:
test_string = ' test a s test '
string_list = [s.strip() for s in str(test_string).split()]
final_string = ' '.join(string_array)
# final_string: 'test a s test'
It removes any whitespaces, tabs, etc.
try this.. instead of using re i think using split with strip is much better
def my_handle(self):
sentence = ' hello apple '
' '.join(x.strip() for x in sentence.split())
#hello apple
''.join(x.strip() for x in sentence.split())
#helloapple
I need to be able space separate a string unless the space is contained within escapable quotes. In other words spam spam spam "and \"eggs" should return spam, spam, spam and and "eggs. I intend to do this using the re.split method in python where you identify the characters to split on using regex.
I found this which finds everything between escapable quotes:
((?<![\\])['"])((?:.(?!(?<![\\])\1))*.?)\1
from: https://www.metaltoad.com/blog/regex-quoted-string-escapable-quotes
and this which splits by character unless between quotes:
\s(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)
from: https://stackabuse.com/regex-splitting-by-character-unless-in-quotes/. This finds all spaces with an even number of doubles quotes between the space and the end of the line.
I'm struggling join those two solution together.
For ref reference I found this I found this super-useful regex cheat sheet: https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285
I also found https://regex101.com/ extremely useful: allows you to test regex
Finally managed it:
\s(?=(?:(?:\\\"|[^\"])*(?<!\\)\"(?:\\\"|[^\"])*(?<!\\)\")*(?:\\\"|[^\"])*$)
This combines to two solutions in the question to find spaces with even numbers of unescaped double quotes to the right hand side. Explanation:
\s # space
(?= # followed by (not included in match though)
(?: # match pattern (but don't capture)
(?:
\\\" # match escaped double quotes
| # OR
[^\"] # any character that is not double quotes
)* # 0 or more times
(?<!\\)\" # followed by unescaped quotes
(?:\\\"|[^\"])* # as above match escaped double quotes OR any character that is not double quotes
(?<!\\)\" # as above - followed by unescaped quotes
# the above pairs of unescaped quotes
)* # repeated 0 or more times (acting on pairs of quotes given an even number of quotes returned)
(?:\\\"|[^\"])* # as above
$ # end of the line
)
So the final python is:
import re
test_str = r'spam spam spam "and \"eggs"'
regex = r'\s(?=(?:(?:\\\"|[^\"])*(?<!\\)\"(?:\\\"|[^\"])*(?<!\\)\")*(?:\\\"|[^\"])*$)'
test_list = re.split(regex, test_str)
print(test_list)
>>> ['spam', 'spam', 'spam', '"and \\"eggs"']
The only down side to this method is that it leave leading trailing quotes, however I can easily identify and remove these with the following python:
# remove leading and trailing unescaped quotes
test_list = list(map(lambda x: re.sub(r'(?<!\\)"', '', x), test_list))
# remove escape characters - they are no longer required
test_list = list(map(lambda x: x.replace(r'\"', '"'), test_list))
print(test_list)
>>> ['spam', 'spam', 'spam', 'and "eggs']
I am trying to write a regex that grabs blocks of whitespace from either side of a string. I can get the beginning, but I can't seem to grab the end block.
s = ' This is a string with whitespace on either side '
strip_regex = re.compile(r'(\s+)(.*)(something to grab end block)')
mo = strip_regex.findall(s)
What I get as an output is this:
[(' ', 'This is a string with whitespace on either side ')]
I have played around with that to do at the end, and the best I can get is one whitespace but I can never just grab the string until the end of 'side'. I don't want to use the characters in side because I want the regex to work with any string surrounded by whitespace. I am pretty sure that it's because I am using the (.*) which is just grabbing everything after the first whitespace block. But can't figure out how to make it stop before the end whitespace block.
Thanks for any help :)
If what you want to do is strip whitespace, you could use strip() instead.
See: https://www.journaldev.com/23625/python-trim-string-rstrip-lstrip-strip
As for your regex, if you want both the start and end whitespace, I suggest matching the whole line, with the middle part not greedy like so:
s = ' This is a string with whitespace on either side '
strip_regex = re.compile(r'^(\s+)(.*?)(\s+)$')
mo = strip_regex.findall(s)
Result:
[(' ', 'This is a string with whitespace on either side', ' ')]
More about greedy: How can I write a regex which matches non greedy?
I'm trying to add \n after the quotation mark (") and space.
The closest that I could find is re.sub however it remove certain characters.
line = 'Type: "SecurityIncident" RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"'
q = re.sub(r'[\d\w]" ', '\n', line)
print(q)
Output:
Type: "SecurityInciden\nRowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F2\n
Looking for a solution without any character being remove.
Your attempted regex [\d\w]" is almost fine but has some little short comings. You don't need to write \d with \w in a character set as that is redundant as \w already contains \d within it. Since \w alone is enough to represent an alphabet or digit or underscore, hence no need to enclose it in character set [] hence you can just write \w and your updated regex becomes \w".
But now if you match this regex and substitute it with \n it will match a literal alphabet t then " and a space and it will be replaced by \n which is why you are getting this output,
SecurityInciden\nRowID
You need to capture the matched string in group1 and while substituting, you need to use it while substituting so that doesn't get replaced hence you should use \1\n as replacement instead of just \n
Try this updated regex,
(\w" )
And replace it by \1\n
Demo1
If you notice, there is an extra space at the end of line in the first line and if you don't want that space there, you can take that space out of those capturing parenthesis and use this regex,
(\w")
^ space here
Demo2
Here is a sample python code,
import re
line = 'Type: "SecurityIncident" RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"'
q = re.sub(r'(\w") ', r'\1\n', line)
print(q)
Output,
Type: "SecurityIncident"
RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"
Try this:
import re
line = 'Type: "SecurityIncident" RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"'
pattern = re.compile('(\w+): (".+?"\s?)', re.IGNORECASE)
q = re.sub(pattern, r'\g<1>: \g<2>\n', line)
print(repr(q))
It should give you following resutls:
Type: "SecurityIncident" \nRowID:
"FB013B06-B04C-4FEB-A5A5-3B858F910F29"\n
In your regex you are removing the t from incident because you are matching it and not using it in the replacement.
Another option to get your result might be to split on a double quote followed by a whitespace when preceded with a word character using a positive lookbehind.
Then join the result back together using a newline.
(?<=\w)"
Regex demo | Python demo
For example:
import re
line = 'Type: "SecurityIncident" RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"'
print("\n".join(re.split(r'(?<=\w)" ', line)))
Result
Type: "SecurityIncident
RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"
I want to remove \n from a string if it is in a string.
I have tried:
slashn = str(chr(92))+"n"
if slashn in newString:
newerString = newString.replace(slashn,'')
print(newerString)
else:
print(newString)
Assume that newString is a word that has \n at the end of it. E.g. text\n.
I have also tried the same code except slash equals to "\\"+"n".
Use str.replace() but with raw string literals:
newString = r"new\nline"
newerString = newString.replace(r"\n", "")
If you put a r right before the quotes enclosing a string literal, it becomes a raw string literal that does not treat any backslash characters as special escape sequences.
Example to clarify raw string literals (output is behind the #> comments):
# Normal string literal: single backslash escapes the 'n' and makes it a new-line character.
print("new\nline")
#> new
#> line
# Normal string literal: first backslash escapes the second backslash and makes it a
# literal backslash. The 'n' won't be escaped and stays a literal 'n'.
print("new\\nline")
#> new\nline
# Raw string literal: All characters are taken literally, the backslash does not have any
# special meaning and therefore does not escape anything.
print(r"new\nline")
#> new\nline
# Raw string literal: All characters are taken literally, no backslash has any
# special meaning and therefore they do not escape anything.
print(r"new\\nline")
#> new\\nline
You can use strip() of a string. Or strip('\n'). strip is a builtin function of a string.
Example:
>>>
>>>
>>> """vivek
...
... """
'vivek\n\n'
>>>
>>> """vivek
...
... """.strip()
'vivek'
>>>
>>> """vivek
...
... \n"""
'vivek\n\n\n'
>>>
>>>
>>> """vivek
...
... \n""".strip()
'vivek'
>>>
Look for the help command for a string builtin function strip like this:
>>>
>>> help(''.strip)
Help on built-in function strip:
strip(...)
S.strip([chars]) -> string or unicode
Return a copy of the string S with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
>>>
Use
string_here.rstrip('\n')
To remove the newline.
Try with strip()
your_string.strip("\n") # removes \n before and after the string
If you want to remove the newline from the ends of a string, I'd use .strip(). If no arguments are given then it will remove whitespace characters, this includes newlines (\n).
Using .strip():
if newString[-1:-2:-1] == '\n': #Test if last two characters are "\n"
newerString = newString.strip()
print(newerString)
else:
print(newString)
Another .strip() example (Using Python 2.7.9)
Also, the newline character can simply be represented as "\n".
Text="test.\nNext line."
print(Text)
Output:::: test.\nNextline"
This is because the element is stored in double inverted commas.In such cases next line will behave as text enclose in string.