Python print list [duplicate] - python

How can I remove the last character of a string if it is a newline?
"abc\n" --> "abc"

Try the method rstrip() (see doc Python 2 and Python 3)
>>> 'test string\n'.rstrip()
'test string'
Python's rstrip() method strips all kinds of trailing whitespace by default, not just one newline as Perl does with chomp.
>>> 'test string \n \r\n\n\r \n\n'.rstrip()
'test string'
To strip only newlines:
>>> 'test string \n \r\n\n\r \n\n'.rstrip('\n')
'test string \n \r\n\n\r '
In addition to rstrip(), there are also the methods strip() and lstrip(). Here is an example with the three of them:
>>> s = " \n\r\n \n abc def \n\r\n \n "
>>> s.strip()
'abc def'
>>> s.lstrip()
'abc def \n\r\n \n '
>>> s.rstrip()
' \n\r\n \n abc def'

And I would say the "pythonic" way to get lines without trailing newline characters is splitlines().
>>> text = "line 1\nline 2\r\nline 3\nline 4"
>>> text.splitlines()
['line 1', 'line 2', 'line 3', 'line 4']

The canonical way to strip end-of-line (EOL) characters is to use the string rstrip() method removing any trailing \r or \n. Here are examples for Mac, Windows, and Unix EOL characters.
>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'
>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'
>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'
Using '\r\n' as the parameter to rstrip means that it will strip out any trailing combination of '\r' or '\n'. That's why it works in all three cases above.
This nuance matters in rare cases. For example, I once had to process a text file which contained an HL7 message. The HL7 standard requires a trailing '\r' as its EOL character. The Windows machine on which I was using this message had appended its own '\r\n' EOL character. Therefore, the end of each line looked like '\r\r\n'. Using rstrip('\r\n') would have taken off the entire '\r\r\n' which is not what I wanted. In that case, I simply sliced off the last two characters instead.
Note that unlike Perl's chomp function, this will strip all specified characters at the end of the string, not just one:
>>> "Hello\n\n\n".rstrip("\n")
"Hello"

Note that rstrip doesn't act exactly like Perl's chomp() because it doesn't modify the string. That is, in Perl:
$x="a\n";
chomp $x
results in $x being "a".
but in Python:
x="a\n"
x.rstrip()
will mean that the value of x is still "a\n". Even x=x.rstrip() doesn't always give the same result, as it strips all whitespace from the end of the string, not just one newline at most.

I might use something like this:
import os
s = s.rstrip(os.linesep)
I think the problem with rstrip("\n") is that you'll probably want to make sure the line separator is portable. (some antiquated systems are rumored to use "\r\n"). The other gotcha is that rstrip will strip out repeated whitespace. Hopefully os.linesep will contain the right characters. the above works for me.

You may use line = line.rstrip('\n'). This will strip all newlines from the end of the string, not just one.

s = s.rstrip()
will remove all newlines at the end of the string s. The assignment is needed because rstrip returns a new string instead of modifying the original string.

"line 1\nline 2\r\n...".replace('\n', '').replace('\r', '')
>>> 'line 1line 2...'
or you could always get geekier with regexps

This would replicate exactly perl's chomp (minus behavior on arrays) for "\n" line terminator:
def chomp(x):
if x.endswith("\r\n"): return x[:-2]
if x.endswith("\n") or x.endswith("\r"): return x[:-1]
return x
(Note: it does not modify string 'in place'; it does not strip extra trailing whitespace; takes \r\n in account)

you can use strip:
line = line.strip()
demo:
>>> "\n\n hello world \n\n".strip()
'hello world'

rstrip doesn't do the same thing as chomp, on so many levels. Read http://perldoc.perl.org/functions/chomp.html and see that chomp is very complex indeed.
However, my main point is that chomp removes at most 1 line ending, whereas rstrip will remove as many as it can.
Here you can see rstrip removing all the newlines:
>>> 'foo\n\n'.rstrip(os.linesep)
'foo'
A much closer approximation of typical Perl chomp usage can be accomplished with re.sub, like this:
>>> re.sub(os.linesep + r'\Z','','foo\n\n')
'foo\n'

Careful with "foo".rstrip(os.linesep): That will only chomp the newline characters for the platform where your Python is being executed. Imagine you're chimping the lines of a Windows file under Linux, for instance:
$ python
Python 2.7.1 (r271:86832, Mar 18 2011, 09:09:48)
[GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.platform
'linux2'
>>> "foo\r\n".rstrip(os.linesep)
'foo\r'
>>>
Use "foo".rstrip("\r\n") instead, as Mike says above.

An example in Python's documentation simply uses line.strip().
Perl's chomp function removes one linebreak sequence from the end of a string only if it's actually there.
Here is how I plan to do that in Python, if process is conceptually the function that I need in order to do something useful to each line from this file:
import os
sep_pos = -len(os.linesep)
with open("file.txt") as f:
for line in f:
if line[sep_pos:] == os.linesep:
line = line[:sep_pos]
process(line)

I don't program in Python, but I came across an FAQ at python.org advocating S.rstrip("\r\n") for python 2.2 or later.

import re
r_unwanted = re.compile("[\n\t\r]")
r_unwanted.sub("", your_text)

If your question is to clean up all the line breaks in a multiple line str object (oldstr), you can split it into a list according to the delimiter '\n' and then join this list into a new str(newstr).
newstr = "".join(oldstr.split('\n'))

I find it convenient to have be able to get the chomped lines via in iterator, parallel to the way you can get the un-chomped lines from a file object. You can do so with the following code:
def chomped_lines(it):
return map(operator.methodcaller('rstrip', '\r\n'), it)
Sample usage:
with open("file.txt") as infile:
for line in chomped_lines(infile):
process(line)

I'm bubbling up my regular expression based answer from one I posted earlier in the comments of another answer. I think using re is a clearer more explicit solution to this problem than str.rstrip.
>>> import re
If you want to remove one or more trailing newline chars:
>>> re.sub(r'[\n\r]+$', '', '\nx\r\n')
'\nx'
If you want to remove newline chars everywhere (not just trailing):
>>> re.sub(r'[\n\r]+', '', '\nx\r\n')
'x'
If you want to remove only 1-2 trailing newline chars (i.e., \r, \n, \r\n, \n\r, \r\r, \n\n)
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r\n')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n')
'\nx'
I have a feeling what most people really want here, is to remove just one occurrence of a trailing newline character, either \r\n or \n and nothing more.
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n\n', count=1)
'\nx\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n\r\n', count=1)
'\nx\r\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n', count=1)
'\nx'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n', count=1)
'\nx'
(The ?: is to create a non-capturing group.)
(By the way this is not what '...'.rstrip('\n', '').rstrip('\r', '') does which may not be clear to others stumbling upon this thread. str.rstrip strips as many of the trailing characters as possible, so a string like foo\n\n\n would result in a false positive of foo whereas you may have wanted to preserve the other newlines after stripping a single trailing one.)

workaround solution for special case:
if the newline character is the last character (as is the case with most file inputs), then for any element in the collection you can index as follows:
foobar= foobar[:-1]
to slice out your newline character.

It looks like there is not a perfect analog for perl's chomp. In particular, rstrip cannot handle multi-character newline delimiters like \r\n. However, splitlines does as pointed out here.
Following my answer on a different question, you can combine join and splitlines to remove/replace all newlines from a string s:
''.join(s.splitlines())
The following removes exactly one trailing newline (as chomp would, I believe). Passing True as the keepends argument to splitlines retain the delimiters. Then, splitlines is called again to remove the delimiters on just the last "line":
def chomp(s):
if len(s):
lines = s.splitlines(True)
last = lines.pop()
return ''.join(lines + last.splitlines())
else:
return ''

s = '''Hello World \t\n\r\tHi There'''
# import the module string
import string
# use the method translate to convert
s.translate({ord(c): None for c in string.whitespace}
>>'HelloWorldHiThere'
With regex
s = ''' Hello World
\t\n\r\tHi '''
print(re.sub(r"\s+", "", s), sep='') # \s matches all white spaces
>HelloWorldHi
Replace \n,\t,\r
s.replace('\n', '').replace('\t','').replace('\r','')
>' Hello World Hi '
With regex
s = '''Hello World \t\n\r\tHi There'''
regex = re.compile(r'[\n\r\t]')
regex.sub("", s)
>'Hello World Hi There'
with Join
s = '''Hello World \t\n\r\tHi There'''
' '.join(s.split())
>'Hello World Hi There'

>>> ' spacious '.rstrip()
' spacious'
>>> "AABAA".rstrip("A")
'AAB'
>>> "ABBA".rstrip("AB") # both AB and BA are stripped
''
>>> "ABCABBA".rstrip("AB")
'ABC'

Just use :
line = line.rstrip("\n")
or
line = line.strip("\n")
You don't need any of this complicated stuff

There are three types of line endings that we normally encounter: \n, \r and \r\n. A rather simple regular expression in re.sub, namely r"\r?\n?$", is able to catch them all.
(And we gotta catch 'em all, am I right?)
import re
re.sub(r"\r?\n?$", "", the_text, 1)
With the last argument, we limit the number of occurences replaced to one, mimicking chomp to some extent. Example:
import re
text_1 = "hellothere\n\n\n"
text_2 = "hellothere\n\n\r"
text_3 = "hellothere\n\n\r\n"
a = re.sub(r"\r?\n?$", "", text_1, 1)
b = re.sub(r"\r?\n?$", "", text_2, 1)
c = re.sub(r"\r?\n?$", "", text_3, 1)
... where a == b == c is True.

If you are concerned about speed (say you have a looong list of strings) and you know the nature of the newline char, string slicing is actually faster than rstrip. A little test to illustrate this:
import time
loops = 50000000
def method1(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string[:-1]
t1 = time.time()
print('Method 1: ' + str(t1 - t0))
def method2(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string.rstrip()
t1 = time.time()
print('Method 2: ' + str(t1 - t0))
method1()
method2()
Output:
Method 1: 3.92700004578
Method 2: 6.73000001907

This will work both for windows and linux (bit expensive with re sub if you are looking for only re solution)
import re
if re.search("(\\r|)\\n$", line):
line = re.sub("(\\r|)\\n$", "", line)

A catch all:
line = line.rstrip('\r|\n')

Related

Avoid newline in paths (Python) [duplicate]

How can I remove the last character of a string if it is a newline?
"abc\n" --> "abc"
Try the method rstrip() (see doc Python 2 and Python 3)
>>> 'test string\n'.rstrip()
'test string'
Python's rstrip() method strips all kinds of trailing whitespace by default, not just one newline as Perl does with chomp.
>>> 'test string \n \r\n\n\r \n\n'.rstrip()
'test string'
To strip only newlines:
>>> 'test string \n \r\n\n\r \n\n'.rstrip('\n')
'test string \n \r\n\n\r '
In addition to rstrip(), there are also the methods strip() and lstrip(). Here is an example with the three of them:
>>> s = " \n\r\n \n abc def \n\r\n \n "
>>> s.strip()
'abc def'
>>> s.lstrip()
'abc def \n\r\n \n '
>>> s.rstrip()
' \n\r\n \n abc def'
And I would say the "pythonic" way to get lines without trailing newline characters is splitlines().
>>> text = "line 1\nline 2\r\nline 3\nline 4"
>>> text.splitlines()
['line 1', 'line 2', 'line 3', 'line 4']
The canonical way to strip end-of-line (EOL) characters is to use the string rstrip() method removing any trailing \r or \n. Here are examples for Mac, Windows, and Unix EOL characters.
>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'
>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'
>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'
Using '\r\n' as the parameter to rstrip means that it will strip out any trailing combination of '\r' or '\n'. That's why it works in all three cases above.
This nuance matters in rare cases. For example, I once had to process a text file which contained an HL7 message. The HL7 standard requires a trailing '\r' as its EOL character. The Windows machine on which I was using this message had appended its own '\r\n' EOL character. Therefore, the end of each line looked like '\r\r\n'. Using rstrip('\r\n') would have taken off the entire '\r\r\n' which is not what I wanted. In that case, I simply sliced off the last two characters instead.
Note that unlike Perl's chomp function, this will strip all specified characters at the end of the string, not just one:
>>> "Hello\n\n\n".rstrip("\n")
"Hello"
Note that rstrip doesn't act exactly like Perl's chomp() because it doesn't modify the string. That is, in Perl:
$x="a\n";
chomp $x
results in $x being "a".
but in Python:
x="a\n"
x.rstrip()
will mean that the value of x is still "a\n". Even x=x.rstrip() doesn't always give the same result, as it strips all whitespace from the end of the string, not just one newline at most.
I might use something like this:
import os
s = s.rstrip(os.linesep)
I think the problem with rstrip("\n") is that you'll probably want to make sure the line separator is portable. (some antiquated systems are rumored to use "\r\n"). The other gotcha is that rstrip will strip out repeated whitespace. Hopefully os.linesep will contain the right characters. the above works for me.
You may use line = line.rstrip('\n'). This will strip all newlines from the end of the string, not just one.
s = s.rstrip()
will remove all newlines at the end of the string s. The assignment is needed because rstrip returns a new string instead of modifying the original string.
"line 1\nline 2\r\n...".replace('\n', '').replace('\r', '')
>>> 'line 1line 2...'
or you could always get geekier with regexps
This would replicate exactly perl's chomp (minus behavior on arrays) for "\n" line terminator:
def chomp(x):
if x.endswith("\r\n"): return x[:-2]
if x.endswith("\n") or x.endswith("\r"): return x[:-1]
return x
(Note: it does not modify string 'in place'; it does not strip extra trailing whitespace; takes \r\n in account)
you can use strip:
line = line.strip()
demo:
>>> "\n\n hello world \n\n".strip()
'hello world'
rstrip doesn't do the same thing as chomp, on so many levels. Read http://perldoc.perl.org/functions/chomp.html and see that chomp is very complex indeed.
However, my main point is that chomp removes at most 1 line ending, whereas rstrip will remove as many as it can.
Here you can see rstrip removing all the newlines:
>>> 'foo\n\n'.rstrip(os.linesep)
'foo'
A much closer approximation of typical Perl chomp usage can be accomplished with re.sub, like this:
>>> re.sub(os.linesep + r'\Z','','foo\n\n')
'foo\n'
Careful with "foo".rstrip(os.linesep): That will only chomp the newline characters for the platform where your Python is being executed. Imagine you're chimping the lines of a Windows file under Linux, for instance:
$ python
Python 2.7.1 (r271:86832, Mar 18 2011, 09:09:48)
[GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.platform
'linux2'
>>> "foo\r\n".rstrip(os.linesep)
'foo\r'
>>>
Use "foo".rstrip("\r\n") instead, as Mike says above.
An example in Python's documentation simply uses line.strip().
Perl's chomp function removes one linebreak sequence from the end of a string only if it's actually there.
Here is how I plan to do that in Python, if process is conceptually the function that I need in order to do something useful to each line from this file:
import os
sep_pos = -len(os.linesep)
with open("file.txt") as f:
for line in f:
if line[sep_pos:] == os.linesep:
line = line[:sep_pos]
process(line)
I don't program in Python, but I came across an FAQ at python.org advocating S.rstrip("\r\n") for python 2.2 or later.
import re
r_unwanted = re.compile("[\n\t\r]")
r_unwanted.sub("", your_text)
If your question is to clean up all the line breaks in a multiple line str object (oldstr), you can split it into a list according to the delimiter '\n' and then join this list into a new str(newstr).
newstr = "".join(oldstr.split('\n'))
I find it convenient to have be able to get the chomped lines via in iterator, parallel to the way you can get the un-chomped lines from a file object. You can do so with the following code:
def chomped_lines(it):
return map(operator.methodcaller('rstrip', '\r\n'), it)
Sample usage:
with open("file.txt") as infile:
for line in chomped_lines(infile):
process(line)
I'm bubbling up my regular expression based answer from one I posted earlier in the comments of another answer. I think using re is a clearer more explicit solution to this problem than str.rstrip.
>>> import re
If you want to remove one or more trailing newline chars:
>>> re.sub(r'[\n\r]+$', '', '\nx\r\n')
'\nx'
If you want to remove newline chars everywhere (not just trailing):
>>> re.sub(r'[\n\r]+', '', '\nx\r\n')
'x'
If you want to remove only 1-2 trailing newline chars (i.e., \r, \n, \r\n, \n\r, \r\r, \n\n)
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r\n')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n')
'\nx'
I have a feeling what most people really want here, is to remove just one occurrence of a trailing newline character, either \r\n or \n and nothing more.
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n\n', count=1)
'\nx\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n\r\n', count=1)
'\nx\r\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n', count=1)
'\nx'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n', count=1)
'\nx'
(The ?: is to create a non-capturing group.)
(By the way this is not what '...'.rstrip('\n', '').rstrip('\r', '') does which may not be clear to others stumbling upon this thread. str.rstrip strips as many of the trailing characters as possible, so a string like foo\n\n\n would result in a false positive of foo whereas you may have wanted to preserve the other newlines after stripping a single trailing one.)
workaround solution for special case:
if the newline character is the last character (as is the case with most file inputs), then for any element in the collection you can index as follows:
foobar= foobar[:-1]
to slice out your newline character.
It looks like there is not a perfect analog for perl's chomp. In particular, rstrip cannot handle multi-character newline delimiters like \r\n. However, splitlines does as pointed out here.
Following my answer on a different question, you can combine join and splitlines to remove/replace all newlines from a string s:
''.join(s.splitlines())
The following removes exactly one trailing newline (as chomp would, I believe). Passing True as the keepends argument to splitlines retain the delimiters. Then, splitlines is called again to remove the delimiters on just the last "line":
def chomp(s):
if len(s):
lines = s.splitlines(True)
last = lines.pop()
return ''.join(lines + last.splitlines())
else:
return ''
s = '''Hello World \t\n\r\tHi There'''
# import the module string
import string
# use the method translate to convert
s.translate({ord(c): None for c in string.whitespace}
>>'HelloWorldHiThere'
With regex
s = ''' Hello World
\t\n\r\tHi '''
print(re.sub(r"\s+", "", s), sep='') # \s matches all white spaces
>HelloWorldHi
Replace \n,\t,\r
s.replace('\n', '').replace('\t','').replace('\r','')
>' Hello World Hi '
With regex
s = '''Hello World \t\n\r\tHi There'''
regex = re.compile(r'[\n\r\t]')
regex.sub("", s)
>'Hello World Hi There'
with Join
s = '''Hello World \t\n\r\tHi There'''
' '.join(s.split())
>'Hello World Hi There'
>>> ' spacious '.rstrip()
' spacious'
>>> "AABAA".rstrip("A")
'AAB'
>>> "ABBA".rstrip("AB") # both AB and BA are stripped
''
>>> "ABCABBA".rstrip("AB")
'ABC'
Just use :
line = line.rstrip("\n")
or
line = line.strip("\n")
You don't need any of this complicated stuff
There are three types of line endings that we normally encounter: \n, \r and \r\n. A rather simple regular expression in re.sub, namely r"\r?\n?$", is able to catch them all.
(And we gotta catch 'em all, am I right?)
import re
re.sub(r"\r?\n?$", "", the_text, 1)
With the last argument, we limit the number of occurences replaced to one, mimicking chomp to some extent. Example:
import re
text_1 = "hellothere\n\n\n"
text_2 = "hellothere\n\n\r"
text_3 = "hellothere\n\n\r\n"
a = re.sub(r"\r?\n?$", "", text_1, 1)
b = re.sub(r"\r?\n?$", "", text_2, 1)
c = re.sub(r"\r?\n?$", "", text_3, 1)
... where a == b == c is True.
If you are concerned about speed (say you have a looong list of strings) and you know the nature of the newline char, string slicing is actually faster than rstrip. A little test to illustrate this:
import time
loops = 50000000
def method1(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string[:-1]
t1 = time.time()
print('Method 1: ' + str(t1 - t0))
def method2(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string.rstrip()
t1 = time.time()
print('Method 2: ' + str(t1 - t0))
method1()
method2()
Output:
Method 1: 3.92700004578
Method 2: 6.73000001907
This will work both for windows and linux (bit expensive with re sub if you are looking for only re solution)
import re
if re.search("(\\r|)\\n$", line):
line = re.sub("(\\r|)\\n$", "", line)
A catch all:
line = line.rstrip('\r|\n')

Remove all line breaks from a long string of text

Basically, I'm asking the user to input a string of text into the console, but the string is very long and includes many line breaks. How would I take the user's string and delete all line breaks to make it a single line of text. My method for acquiring the string is very simple.
string = raw_input("Please enter string: ")
Is there a different way I should be grabbing the string from the user? I'm running Python 2.7.4 on a Mac.
P.S. Clearly I'm a noob, so even if a solution isn't the most efficient, the one that uses the most simple syntax would be appreciated.
How do you enter line breaks with raw_input? But, once you have a string with some characters in it you want to get rid of, just replace them.
>>> mystr = raw_input('please enter string: ')
please enter string: hello world, how do i enter line breaks?
>>> # pressing enter didn't work...
...
>>> mystr
'hello world, how do i enter line breaks?'
>>> mystr.replace(' ', '')
'helloworld,howdoienterlinebreaks?'
>>>
In the example above, I replaced all spaces. The string '\n' represents newlines. And \r represents carriage returns (if you're on windows, you might be getting these and a second replace will handle them for you!).
basically:
# you probably want to use a space ' ' to replace `\n`
mystring = mystring.replace('\n', ' ').replace('\r', '')
Note also, that it is a bad idea to call your variable string, as this shadows the module string. Another name I'd avoid but would love to use sometimes: file. For the same reason.
You can try using string replace:
string = string.replace('\r', '').replace('\n', '')
You can split the string with no separator arg, which will treat consecutive whitespace as a single separator (including newlines and tabs). Then join using a space:
In : " ".join("\n\nsome text \r\n with multiple whitespace".split())
Out: 'some text with multiple whitespace'
https://docs.python.org/2/library/stdtypes.html#str.split
The canonic answer, in Python, would be :
s = ''.join(s.splitlines())
It splits the string into lines (letting Python doing it according to its own best practices). Then you merge it. Two possibilities here:
replace the newline by a whitespace (' '.join())
or without a whitespace (''.join())
updated based on Xbello comment:
string = my_string.rstrip('\r\n')
read more here
Another option is regex:
>>> import re
>>> re.sub("\n|\r", "", "Foo\n\rbar\n\rbaz\n\r")
'Foobarbaz'
If anybody decides to use replace, you should try r'\n' instead '\n'
mystring = mystring.replace(r'\n', ' ').replace(r'\r', '')
A method taking into consideration
additional white characters at the beginning/end of string
additional white characters at the beginning/end of every line
various end-line characters
it takes such a multi-line string which may be messy e.g.
test_str = '\nhej ho \n aaa\r\n a\n '
and produces nice one-line string
>>> ' '.join([line.strip() for line in test_str.strip().splitlines()])
'hej ho aaa a'
UPDATE:
To fix multiple new-line character producing redundant spaces:
' '.join([line.strip() for line in test_str.strip().splitlines() if line.strip()])
This works for the following too
test_str = '\nhej ho \n aaa\r\n\n\n\n\n a\n '
Regular expressions is the fastest way to do this
s='''some kind of
string with a bunch\r of
extra spaces in it'''
re.sub(r'\s(?=\s)','',re.sub(r'\s',' ',s))
result:
'some kind of string with a bunch of extra spaces in it'
The problem with rstrip() is that it does not work in all cases (as I myself have seen few). Instead you can use
text = text.replace("\n"," ")
This will remove all new line '\n' with a space.
You really don't need to remove ALL the signs: lf cr crlf.
# Pythonic:
r'\n', r'\r', r'\r\n'
Some texts must have breaks, but you probably need to join broken lines to keep particular sentences together.
Therefore it is natural that line breaking happens after priod, semicolon, colon, but not after comma.
My code considers above conditions. Works well with texts copied from pdfs.
Enjoy!:
def unbreak_pdf_text(raw_text):
""" the newline careful sign removal tool
Args:
raw_text (str): string containing unwanted newline signs: \\n or \\r or \\r\\n
e.g. imported from OCR or copied from a pdf document.
Returns:
_type_: _description_
"""
pat = re.compile((r"[, \w]\n|[, \w]\r|[, \w]\r\n"))
breaks = re.finditer(pat, raw_text)
processed_text = raw_text
raw_text = None
for i in breaks:
processed_text = processed_text.replace(i.group(), i.group()[0]+" ")
return processed_text

Remove all newlines from inside a string

I'm trying to remove all newline characters from a string. I've read up on how to do it, but it seems that I for some reason am unable to do so. Here is step by step what I am doing:
string1 = "Hello \n World"
string2 = string1.strip('\n')
print string2
And I'm still seeing the newline character in the output. I've tried with rstrip as well, but I'm still seeing the newline. Could anyone shed some light on why I'm doing this wrong? Thanks.
strip only removes characters from the beginning and end of a string. You want to use replace:
str2 = str.replace("\n", "")
re.sub('\s{2,}', ' ', str) # To remove more than one space
As mentioned by #john, the most robust answer is:
string = "a\nb\rv"
new_string = " ".join(string.splitlines())
Answering late since I recently had the same question when reading text from file; tried several options such as:
with open('verdict.txt') as f:
First option below produces a list called alist, with '\n' stripped, then joins back into full text (optional if you wish to have only one text):
alist = f.read().splitlines()
jalist = " ".join(alist)
Second option below is much easier and simple produces string of text called atext replacing '\n' with space;
atext = f.read().replace('\n',' ')
It works; I have done it. This is clean, easier, and efficient.
strip() returns the string after removing leading and trailing whitespace. see doc
In your case, you may want to try replace():
string2 = string1.replace('\n', '')
or you can try this:
string1 = 'Hello \n World'
tmp = string1.split()
string2 = ' '.join(tmp)
This should work in many cases -
text = ' '.join([line.strip() for line in text.strip().splitlines() if line.strip()])
text = re.sub('[\r\n]+', ' ', text)
strip() returns the string with leading and trailing whitespaces(by default) removed.
So it would turn " Hello World " to "Hello World", but it won't remove the \n character as it is present in between the string.
Try replace().
str = "Hello \n World"
str2 = str.replace('\n', '')
print str2
If the file includes a line break in the middle of the text neither strip() nor rstrip() will not solve the problem,
strip family are used to trim from the began and the end of the string
replace() is the way to solve your problem
>>> my_name = "Landon\nWO"
>>> print my_name
Landon
WO
>>> my_name = my_name.replace('\n','')
>>> print my_name
LandonWO

How to strip all whitespace from string

How do I strip all the spaces in a python string? For example, I want a string like strip my spaces to be turned into stripmyspaces, but I cannot seem to accomplish that with strip():
>>> 'strip my spaces'.strip()
'strip my spaces'
Taking advantage of str.split's behavior with no sep parameter:
>>> s = " \t foo \n bar "
>>> "".join(s.split())
'foobar'
If you just want to remove spaces instead of all whitespace:
>>> s.replace(" ", "")
'\tfoo\nbar'
Premature optimization
Even though efficiency isn't the primary goal—writing clear code is—here are some initial timings:
$ python -m timeit '"".join(" \t foo \n bar ".split())'
1000000 loops, best of 3: 1.38 usec per loop
$ python -m timeit -s 'import re' 're.sub(r"\s+", "", " \t foo \n bar ")'
100000 loops, best of 3: 15.6 usec per loop
Note the regex is cached, so it's not as slow as you'd imagine. Compiling it beforehand helps some, but would only matter in practice if you call this many times:
$ python -m timeit -s 'import re; e = re.compile(r"\s+")' 'e.sub("", " \t foo \n bar ")'
100000 loops, best of 3: 7.76 usec per loop
Even though re.sub is 11.3x slower, remember your bottlenecks are assuredly elsewhere. Most programs would not notice the difference between any of these 3 choices.
For Python 3:
>>> import re
>>> re.sub(r'\s+', '', 'strip my \n\t\r ASCII and \u00A0 \u2003 Unicode spaces')
'stripmyASCIIandUnicodespaces'
>>> # Or, depending on the situation:
>>> re.sub(r'(\s|\u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF)+', '', \
... '\uFEFF\t\t\t strip all \u000A kinds of \u200B whitespace \n')
'stripallkindsofwhitespace'
...handles any whitespace characters that you're not thinking of - and believe us, there are plenty.
\s on its own always covers the ASCII whitespace:
(regular) space
tab
new line (\n)
carriage return (\r)
form feed
vertical tab
Additionally:
for Python 2 with re.UNICODE enabled,
for Python 3 without any extra actions,
...\s also covers the Unicode whitespace characters, for example:
non-breaking space,
em space,
ideographic space,
...etc. See the full list here, under "Unicode characters with White_Space property".
However \s DOES NOT cover characters not classified as whitespace, which are de facto whitespace, such as among others:
zero-width joiner,
Mongolian vowel separator,
zero-width non-breaking space (a.k.a. byte order mark),
...etc. See the full list here, under "Related Unicode characters without White_Space property".
So these 6 characters are covered by the list in the second regex, \u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF.
Sources:
https://docs.python.org/2/library/re.html
https://docs.python.org/3/library/re.html
https://en.wikipedia.org/wiki/Unicode_character_property
Alternatively,
"strip my spaces".translate( None, string.whitespace )
And here is Python3 version:
"strip my spaces".translate(str.maketrans('', '', string.whitespace))
Remove the Starting Spaces in Python
string1 = " This is Test String to strip leading space"
print(string1)
print(string1.lstrip())
Remove the Trailing or End Spaces in Python
string2 = "This is Test String to strip trailing space "
print(string2)
print(string2.rstrip())
Remove the whiteSpaces from Beginning and end of the string in Python
string3 = " This is Test String to strip leading and trailing space "
print(string3)
print(string3.strip())
Remove all the spaces in python
string4 = " This is Test String to test all the spaces "
print(string4)
print(string4.replace(" ", ""))
The simplest is to use replace:
"foo bar\t".replace(" ", "").replace("\t", "")
Alternatively, use a regular expression:
import re
re.sub(r"\s", "", "foo bar\t")
As mentioned by Roger Pate following code worked for me:
s = " \t foo \n bar "
"".join(s.split())
'foobar'
I am using Jupyter Notebook to run following code:
i=0
ProductList=[]
while i < len(new_list):
temp='' # new_list[i]=temp=' Plain Utthapam '
#temp=new_list[i].strip() #if we want o/p as: 'Plain Utthapam'
temp="".join(new_list[i].split()) #o/p: 'PlainUtthapam'
temp=temp.upper() #o/p:'PLAINUTTHAPAM'
ProductList.append(temp)
i=i+2
Try a regex with re.sub. You can search for all whitespace and replace with an empty string.
\s in your pattern will match whitespace characters - and not just a space (tabs, newlines, etc). You can read more about it in the manual.
import re
re.sub(' ','','strip my spaces')
The standard techniques to filter a list apply, although they are not as efficient as the split/join or translate methods.
We need a set of whitespaces:
>>> import string
>>> ws = set(string.whitespace)
The filter builtin:
>>> "".join(filter(lambda c: c not in ws, "strip my spaces"))
'stripmyspaces'
A list comprehension (yes, use the brackets: see benchmark below):
>>> import string
>>> "".join([c for c in "strip my spaces" if c not in ws])
'stripmyspaces'
A fold:
>>> import functools
>>> "".join(functools.reduce(lambda acc, c: acc if c in ws else acc+c, "strip my spaces"))
'stripmyspaces'
Benchmark:
>>> from timeit import timeit
>>> timeit('"".join("strip my spaces".split())')
0.17734256500003198
>>> timeit('"strip my spaces".translate(ws_dict)', 'import string; ws_dict = {ord(ws):None for ws in string.whitespace}')
0.457635745999994
>>> timeit('re.sub(r"\s+", "", "strip my spaces")', 'import re')
1.017787621000025
>>> SETUP = 'import string, operator, functools, itertools; ws = set(string.whitespace)'
>>> timeit('"".join([c for c in "strip my spaces" if c not in ws])', SETUP)
0.6484303600000203
>>> timeit('"".join(c for c in "strip my spaces" if c not in ws)', SETUP)
0.950212219999969
>>> timeit('"".join(filter(lambda c: c not in ws, "strip my spaces"))', SETUP)
1.3164566040000523
>>> timeit('"".join(functools.reduce(lambda acc, c: acc if c in ws else acc+c, "strip my spaces"))', SETUP)
1.6947649049999995
Parce your string to separate words
Strip white spaces on both sides
Join them with single space in the end
Final line of code:
' '.join(word.strip() for word in message_text.split()
If optimal performance is not a requirement and you just want something dead simple, you can define a basic function to test each character using the string class's built in "isspace" method:
def remove_space(input_string):
no_white_space = ''
for c in input_string:
if not c.isspace():
no_white_space += c
return no_white_space
Building the no_white_space string this way will not have ideal performance, but the solution is easy to understand.
>>> remove_space('strip my spaces')
'stripmyspaces'
If you don't want to define a function, you can convert this into something vaguely similar with list comprehension. Borrowing from the top answer's join solution:
>>> "".join([c for c in "strip my spaces" if not c.isspace()])
'stripmyspaces'
TL/DR
This solution was tested using Python 3.6
To strip all spaces from a string in Python3 you can use the following function:
def remove_spaces(in_string: str):
return in_string.translate(str.maketrans({' ': ''})
To remove any whitespace characters (' \t\n\r\x0b\x0c') you can use the following function:
import string
def remove_whitespace(in_string: str):
return in_string.translate(str.maketrans(dict.fromkeys(string.whitespace)))
Explanation
Python's str.translate method is a built-in class method of str, it takes a table and returns a copy of the string with each character mapped through the passed translation table. Full documentation for str.translate
To create the translation table str.maketrans is used. This method is another built-in class method of str. Here we use it with only one parameter, in this case a dictionary, where the keys are the characters to be replaced mapped to values with the characters replacement value. It returns a translation table for use with str.translate. Full documentation for str.maketrans
The string module in python contains some common string operations and constants. string.whitespace is a constant which returns a string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab. Full documentation for string.whitespace
In the second function dict.fromkeys is used to create a dictionary where the keys are the characters in the string returned by string.whitespace each with value None. Full documentation for dict.fromkeys
Here's another way using plain old list comprehension:
''.join([c for c in aString if c not in [' ','\t','\n']])
Example:
>>> aStr = 'aaa\nbbb\t\t\tccc '
>>> print(aString)
aaa
bbb ccc
>>> ''.join([c for c in aString if c not in [' ','\t','\n']])
'aaabbbccc'
This got asked in an interview. So if you have to give a solution just by using strip method. Here's an approach -
s='string with spaces'
res=''.join((i.strip(' ') for i in s))
print(res)

How can I remove a trailing newline?

How can I remove the last character of a string if it is a newline?
"abc\n" --> "abc"
Try the method rstrip() (see doc Python 2 and Python 3)
>>> 'test string\n'.rstrip()
'test string'
Python's rstrip() method strips all kinds of trailing whitespace by default, not just one newline as Perl does with chomp.
>>> 'test string \n \r\n\n\r \n\n'.rstrip()
'test string'
To strip only newlines:
>>> 'test string \n \r\n\n\r \n\n'.rstrip('\n')
'test string \n \r\n\n\r '
In addition to rstrip(), there are also the methods strip() and lstrip(). Here is an example with the three of them:
>>> s = " \n\r\n \n abc def \n\r\n \n "
>>> s.strip()
'abc def'
>>> s.lstrip()
'abc def \n\r\n \n '
>>> s.rstrip()
' \n\r\n \n abc def'
And I would say the "pythonic" way to get lines without trailing newline characters is splitlines().
>>> text = "line 1\nline 2\r\nline 3\nline 4"
>>> text.splitlines()
['line 1', 'line 2', 'line 3', 'line 4']
The canonical way to strip end-of-line (EOL) characters is to use the string rstrip() method removing any trailing \r or \n. Here are examples for Mac, Windows, and Unix EOL characters.
>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'
>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'
>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'
Using '\r\n' as the parameter to rstrip means that it will strip out any trailing combination of '\r' or '\n'. That's why it works in all three cases above.
This nuance matters in rare cases. For example, I once had to process a text file which contained an HL7 message. The HL7 standard requires a trailing '\r' as its EOL character. The Windows machine on which I was using this message had appended its own '\r\n' EOL character. Therefore, the end of each line looked like '\r\r\n'. Using rstrip('\r\n') would have taken off the entire '\r\r\n' which is not what I wanted. In that case, I simply sliced off the last two characters instead.
Note that unlike Perl's chomp function, this will strip all specified characters at the end of the string, not just one:
>>> "Hello\n\n\n".rstrip("\n")
"Hello"
Note that rstrip doesn't act exactly like Perl's chomp() because it doesn't modify the string. That is, in Perl:
$x="a\n";
chomp $x
results in $x being "a".
but in Python:
x="a\n"
x.rstrip()
will mean that the value of x is still "a\n". Even x=x.rstrip() doesn't always give the same result, as it strips all whitespace from the end of the string, not just one newline at most.
I might use something like this:
import os
s = s.rstrip(os.linesep)
I think the problem with rstrip("\n") is that you'll probably want to make sure the line separator is portable. (some antiquated systems are rumored to use "\r\n"). The other gotcha is that rstrip will strip out repeated whitespace. Hopefully os.linesep will contain the right characters. the above works for me.
You may use line = line.rstrip('\n'). This will strip all newlines from the end of the string, not just one.
s = s.rstrip()
will remove all newlines at the end of the string s. The assignment is needed because rstrip returns a new string instead of modifying the original string.
"line 1\nline 2\r\n...".replace('\n', '').replace('\r', '')
>>> 'line 1line 2...'
or you could always get geekier with regexps
This would replicate exactly perl's chomp (minus behavior on arrays) for "\n" line terminator:
def chomp(x):
if x.endswith("\r\n"): return x[:-2]
if x.endswith("\n") or x.endswith("\r"): return x[:-1]
return x
(Note: it does not modify string 'in place'; it does not strip extra trailing whitespace; takes \r\n in account)
you can use strip:
line = line.strip()
demo:
>>> "\n\n hello world \n\n".strip()
'hello world'
rstrip doesn't do the same thing as chomp, on so many levels. Read http://perldoc.perl.org/functions/chomp.html and see that chomp is very complex indeed.
However, my main point is that chomp removes at most 1 line ending, whereas rstrip will remove as many as it can.
Here you can see rstrip removing all the newlines:
>>> 'foo\n\n'.rstrip(os.linesep)
'foo'
A much closer approximation of typical Perl chomp usage can be accomplished with re.sub, like this:
>>> re.sub(os.linesep + r'\Z','','foo\n\n')
'foo\n'
Careful with "foo".rstrip(os.linesep): That will only chomp the newline characters for the platform where your Python is being executed. Imagine you're chimping the lines of a Windows file under Linux, for instance:
$ python
Python 2.7.1 (r271:86832, Mar 18 2011, 09:09:48)
[GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.platform
'linux2'
>>> "foo\r\n".rstrip(os.linesep)
'foo\r'
>>>
Use "foo".rstrip("\r\n") instead, as Mike says above.
An example in Python's documentation simply uses line.strip().
Perl's chomp function removes one linebreak sequence from the end of a string only if it's actually there.
Here is how I plan to do that in Python, if process is conceptually the function that I need in order to do something useful to each line from this file:
import os
sep_pos = -len(os.linesep)
with open("file.txt") as f:
for line in f:
if line[sep_pos:] == os.linesep:
line = line[:sep_pos]
process(line)
I don't program in Python, but I came across an FAQ at python.org advocating S.rstrip("\r\n") for python 2.2 or later.
import re
r_unwanted = re.compile("[\n\t\r]")
r_unwanted.sub("", your_text)
If your question is to clean up all the line breaks in a multiple line str object (oldstr), you can split it into a list according to the delimiter '\n' and then join this list into a new str(newstr).
newstr = "".join(oldstr.split('\n'))
I find it convenient to have be able to get the chomped lines via in iterator, parallel to the way you can get the un-chomped lines from a file object. You can do so with the following code:
def chomped_lines(it):
return map(operator.methodcaller('rstrip', '\r\n'), it)
Sample usage:
with open("file.txt") as infile:
for line in chomped_lines(infile):
process(line)
I'm bubbling up my regular expression based answer from one I posted earlier in the comments of another answer. I think using re is a clearer more explicit solution to this problem than str.rstrip.
>>> import re
If you want to remove one or more trailing newline chars:
>>> re.sub(r'[\n\r]+$', '', '\nx\r\n')
'\nx'
If you want to remove newline chars everywhere (not just trailing):
>>> re.sub(r'[\n\r]+', '', '\nx\r\n')
'x'
If you want to remove only 1-2 trailing newline chars (i.e., \r, \n, \r\n, \n\r, \r\r, \n\n)
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r\n')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n')
'\nx'
I have a feeling what most people really want here, is to remove just one occurrence of a trailing newline character, either \r\n or \n and nothing more.
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n\n', count=1)
'\nx\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n\r\n', count=1)
'\nx\r\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n', count=1)
'\nx'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n', count=1)
'\nx'
(The ?: is to create a non-capturing group.)
(By the way this is not what '...'.rstrip('\n', '').rstrip('\r', '') does which may not be clear to others stumbling upon this thread. str.rstrip strips as many of the trailing characters as possible, so a string like foo\n\n\n would result in a false positive of foo whereas you may have wanted to preserve the other newlines after stripping a single trailing one.)
workaround solution for special case:
if the newline character is the last character (as is the case with most file inputs), then for any element in the collection you can index as follows:
foobar= foobar[:-1]
to slice out your newline character.
It looks like there is not a perfect analog for perl's chomp. In particular, rstrip cannot handle multi-character newline delimiters like \r\n. However, splitlines does as pointed out here.
Following my answer on a different question, you can combine join and splitlines to remove/replace all newlines from a string s:
''.join(s.splitlines())
The following removes exactly one trailing newline (as chomp would, I believe). Passing True as the keepends argument to splitlines retain the delimiters. Then, splitlines is called again to remove the delimiters on just the last "line":
def chomp(s):
if len(s):
lines = s.splitlines(True)
last = lines.pop()
return ''.join(lines + last.splitlines())
else:
return ''
s = '''Hello World \t\n\r\tHi There'''
# import the module string
import string
# use the method translate to convert
s.translate({ord(c): None for c in string.whitespace}
>>'HelloWorldHiThere'
With regex
s = ''' Hello World
\t\n\r\tHi '''
print(re.sub(r"\s+", "", s), sep='') # \s matches all white spaces
>HelloWorldHi
Replace \n,\t,\r
s.replace('\n', '').replace('\t','').replace('\r','')
>' Hello World Hi '
With regex
s = '''Hello World \t\n\r\tHi There'''
regex = re.compile(r'[\n\r\t]')
regex.sub("", s)
>'Hello World Hi There'
with Join
s = '''Hello World \t\n\r\tHi There'''
' '.join(s.split())
>'Hello World Hi There'
>>> ' spacious '.rstrip()
' spacious'
>>> "AABAA".rstrip("A")
'AAB'
>>> "ABBA".rstrip("AB") # both AB and BA are stripped
''
>>> "ABCABBA".rstrip("AB")
'ABC'
Just use :
line = line.rstrip("\n")
or
line = line.strip("\n")
You don't need any of this complicated stuff
There are three types of line endings that we normally encounter: \n, \r and \r\n. A rather simple regular expression in re.sub, namely r"\r?\n?$", is able to catch them all.
(And we gotta catch 'em all, am I right?)
import re
re.sub(r"\r?\n?$", "", the_text, 1)
With the last argument, we limit the number of occurences replaced to one, mimicking chomp to some extent. Example:
import re
text_1 = "hellothere\n\n\n"
text_2 = "hellothere\n\n\r"
text_3 = "hellothere\n\n\r\n"
a = re.sub(r"\r?\n?$", "", text_1, 1)
b = re.sub(r"\r?\n?$", "", text_2, 1)
c = re.sub(r"\r?\n?$", "", text_3, 1)
... where a == b == c is True.
If you are concerned about speed (say you have a looong list of strings) and you know the nature of the newline char, string slicing is actually faster than rstrip. A little test to illustrate this:
import time
loops = 50000000
def method1(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string[:-1]
t1 = time.time()
print('Method 1: ' + str(t1 - t0))
def method2(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string.rstrip()
t1 = time.time()
print('Method 2: ' + str(t1 - t0))
method1()
method2()
Output:
Method 1: 3.92700004578
Method 2: 6.73000001907
This will work both for windows and linux (bit expensive with re sub if you are looking for only re solution)
import re
if re.search("(\\r|)\\n$", line):
line = re.sub("(\\r|)\\n$", "", line)
A catch all:
line = line.rstrip('\r|\n')

Categories