How to use text strip() function? - python

I can strip numerics but not alpha characters:
>>> text
'132abcd13232111'
>>> text.strip('123')
'abcd'
Why the following is not working?
>>> text.strip('abcd')
'132abcd13232111'

The reason is simple and stated in the documentation of strip:
str.strip([chars])
Return a copy of the string with the leading and trailing characters removed.
The chars argument is a string specifying the set of characters to be removed.
'abcd' is neither leading nor trailing in the string '132abcd13232111' so it isn't stripped.

Just to add a few examples to Jim's answer, according to .strip() docs:
Return a copy of the string with the leading and trailing characters removed.
The chars argument is a string specifying the set of characters to be removed.
If omitted or None, the chars argument defaults to removing whitespace.
The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.
So it doesn't matter if it's a digit or not, the main reason your second code didn't worked as you expected, is because the term "abcd" was located in the middle of the string.
Example1:
s = '132abcd13232111'
print(s.strip('123'))
print(s.strip('abcd'))
Output:
abcd
132abcd13232111
Example2:
t = 'abcd12312313abcd'
print(t.strip('123'))
print(t.strip('abcd'))
Output:
abcd12312313abcd
12312313

Related

How is lstrip() method removing chars from left? [duplicate]

This question already has answers here:
Understanding python's lstrip method on strings [duplicate]
(3 answers)
Closed 1 year ago.
My understanding is that the lstrip(arg) removes characters from the left based on the value of arg.
I am executing the following code:
'htp://www.abc.com'.lstrip('/')
Output:
'htp://www.abc.com'
My understanding is that all the characters should be stripped from left until / is reached.
In other words, the output should be:
'www.abc.com'
I am also not sure why running the following code is generating below output:
'htp://www.abc.com'.lstrip('/:pth')
Output:
'www.abc.com'
Calling the help function shows the following:
Help on built-in function lstrip:
lstrip(chars=None, /) method of builtins.str instance
Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
Which, clearly means that any white-space in the starting (i.e. left) will be chopped-off or if the chars argument is specified it will remove those characters if and only if the string begins with any of the specified characters i.e. if you pass 'abc' as an argument then the string should start with any of 'a','b' or 'c' else the function won't change anything.
The string need not begin with the 'abc' as a whole.
print(' the left strip'.lstrip()) # strips off the whitespace
the left strip
>>> print('ththe left strip'.lstrip('th')) # strips off the given characters as the string starts with those
e left strip
>>> print('ththe left strip'.lstrip('left')) # removes 't' as 'left' contatins 't' in it
hthe left strip
>>> print('ththe left strip'.lstrip('zeb')) # doesn't change anything as the argument passed doesn't match the beggining of the string
ththe left strip
>>> print('ththe left strip'.lstrip('h')) # doesn't change anything as the argument passed doesn't match the beggining of the string
ththe left strip
If you want all chars right of a given string try split
url = 'htp://www.abc.com'
print(url.split('//')[1])
output
www.abc.com
lstrip only returns a copy of the string with leading characters stripped, not in between
I think you want this :
a = 'htp://www.abc.com'
a = a[a.find('/')+1:]
From Python Docs :
str.lstrip([chars])
Return a copy of the string with leading characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. **The chars argument is not a prefix; rather, all combinations of its values are stripped:**
Read the last line your doubt will be resolved.
In the Python documentation, str.lstrip can only remove the leading characters specified in its args, or whitespaces if no characters is provided.
You can try using str.rfind like this:
>>> url = "https://www.google.com"
>>> url[url.rfind('/')+1:]
'www.google.com'

Python: strip() method not removing whitespace from text

I have a problem where what looks like whitespace preceding a string isn't removed using the strip method. This is the script:
text = '"X-DSPAM-Confidence: 0.8475";'
startpos = text.find(":")
endpos = text.find('\";', startpos)
extracted_text = text[startpos+1:endpos]
extracted_text.strip()
print("Substring:",extracted_text)
This returns:
Substring: 0.8475
Assuming that strip() was used correctly, any advice on debugging to identify what is actually printed to screen that appears to be whitespace but isn't?
You have to re-assign the variable:
extracted_text=extracted_text.strip()
Alternatively:
print("Substring:",extracted_text.strip())
str.strip does not happen in-place it returns the stripped string.
In order to isolate the last number without the trailing characters you can use a combination of str.strip and str.split then get the second value and remove the trailing characters using str.replace:
>>> text.strip().split()[1].replace('";', '')
'0.8475'

strip(char) on a string

I am trying to strip the characters '_ ' (underscore and space) away from my string. The first code fails to strip anything.
The code for word_1 works just as I intend. Could anyone enlighten me how to modify the first code to get output 'ale'?
word = 'a_ _ le'
word.strip('_ ')
word_1 = '_ _ le'
word_1.strip('_ ')
'''
You need to replace() in this use case, not strip()
word.replace('_ ', '')
strip():
string.strip(s[, chars])
Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.
replace():
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
Strings in Python
.strip removes the target string from the start and end of the source string.
You want .replace.
>>> word = 'a_ _ le'
>>> word = word.replace("_ ", "")
>>> word
'ale'
.strip() is used when the passed string has to be removed from the start and end of string. It does not work in the middle. For this, .replace() is used as word.replace('_ ', ''). This outputs ale

Split string at capital letter but only if no whitespace

Set-up
I've got a string of names which need to be separated into a list.
Following this answer, I have,
string = 'KreuzbergLichtenbergNeuköllnPrenzlauer Berg'
re.findall('[A-Z][a-z]*', string)
where the last line gives me,
['Kreuzberg', 'Lichtenberg', 'Neuk', 'Prenzlauer', 'Berg']
Problems
1) Whitespace is ignored
'Prenzlauer Berg' is actually 1 name but the code splits according to the 'split-at-capital-letter' rule.
What is the command ensuring it to not split at a capital letter if preceding character is a whitespace?
2) Special characters not handled well
The code used cannot handle 'ö'. How do I include such 'German' characters?
I.e. I want to obtain,
['Kreuzberg', 'Lichtenberg', 'Neukölln', 'Prenzlauer Berg']
You can use positive and negative lookbehind and just list the Umlauts explicitly:
>>> string = 'KreuzbergLichtenbergNeuköllnPrenzlauer Berg'
>>> re.findall('(?<!\s)[A-ZÄÖÜ](?:[a-zäöüß\s]|(?<=\s)[A-ZÄÖÜ])*', string)
['Kreuzberg', 'Lichtenberg', 'Neukölln', 'Prenzlauer Berg']
(?<!\s)...: matches ... that is not preceded by \s
(?<=\s)...: matches ... that is preceded by \s
(?:...): non-capturing group so as to not mess with the findall results
This works
string="KreuzbergLichtenbergNeuköllnPrenzlauer Berg"
pattern="[A-Z][a-ü]+\s[A-Z][a-ü]+|[A-Z][a-ü]+"
re.findall(pattern, string)
#>>>['Kreuzberg', 'Lichtenberg', 'Neukölln', 'Prenzlauer Berg']

How to check if \n is in a string

I want to remove \n from a string if it is in a string.
I have tried:
slashn = str(chr(92))+"n"
if slashn in newString:
newerString = newString.replace(slashn,'')
print(newerString)
else:
print(newString)
Assume that newString is a word that has \n at the end of it. E.g. text\n.
I have also tried the same code except slash equals to "\\"+"n".
Use str.replace() but with raw string literals:
newString = r"new\nline"
newerString = newString.replace(r"\n", "")
If you put a r right before the quotes enclosing a string literal, it becomes a raw string literal that does not treat any backslash characters as special escape sequences.
Example to clarify raw string literals (output is behind the #> comments):
# Normal string literal: single backslash escapes the 'n' and makes it a new-line character.
print("new\nline")
#> new
#> line
# Normal string literal: first backslash escapes the second backslash and makes it a
# literal backslash. The 'n' won't be escaped and stays a literal 'n'.
print("new\\nline")
#> new\nline
# Raw string literal: All characters are taken literally, the backslash does not have any
# special meaning and therefore does not escape anything.
print(r"new\nline")
#> new\nline
# Raw string literal: All characters are taken literally, no backslash has any
# special meaning and therefore they do not escape anything.
print(r"new\\nline")
#> new\\nline
You can use strip() of a string. Or strip('\n'). strip is a builtin function of a string.
Example:
>>>
>>>
>>> """vivek
...
... """
'vivek\n\n'
>>>
>>> """vivek
...
... """.strip()
'vivek'
>>>
>>> """vivek
...
... \n"""
'vivek\n\n\n'
>>>
>>>
>>> """vivek
...
... \n""".strip()
'vivek'
>>>
Look for the help command for a string builtin function strip like this:
>>>
>>> help(''.strip)
Help on built-in function strip:
strip(...)
S.strip([chars]) -> string or unicode
Return a copy of the string S with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
>>>
Use
string_here.rstrip('\n')
To remove the newline.
Try with strip()
your_string.strip("\n") # removes \n before and after the string
If you want to remove the newline from the ends of a string, I'd use .strip(). If no arguments are given then it will remove whitespace characters, this includes newlines (\n).
Using .strip():
if newString[-1:-2:-1] == '\n': #Test if last two characters are "\n"
newerString = newString.strip()
print(newerString)
else:
print(newString)
Another .strip() example (Using Python 2.7.9)
Also, the newline character can simply be represented as "\n".
Text="test.\nNext line."
print(Text)
Output:::: test.\nNextline"
This is because the element is stored in double inverted commas.In such cases next line will behave as text enclose in string.

Categories