lstrip is removing a character I wouldn't expect it to - python

The following code:
s = "www.wired.com"
print s
s = s.lstrip('www.')
print s
outputs:
www.wired.com
ired.com
Note the missing w on the second line. I'm not sure I understand the behavior. I would expect:
www.wired.com
wired.com
EDIT:
Following the first two answers, I now understand the behavior. My question is now: how do I strip the leading www. without touching the rest?

The argument to string.lstrip is a list of characters:
>>> help(string.lstrip)
Help on function lstrip in module string:
lstrip(s, chars=None)
lstrip(s [,chars]) -> string
Return a copy of the string s with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
>>>
It removes ALL occurrences of those leading characters.
print s.lstrip('w.') # does the same!
[EDIT]:
If you wanted to strop the initial www., but only if it started with that, you could use a regular expression or something like:
s = s[4:] if s.startswith('www.') else s

According to the documentation:
The chars argument is a string specifying the set of characters to be removed...The chars argument is not a prefix; rather, all combinations of its values are stripped
You would achieve the same result by just saying:
'www.wired.com'.lstrip('w.')
If you wanted something more general, I would do something like this:
i = find(s, 'www.')
if i >= 0:
s = s[0:i] + s[i+4:]

To remove the leading www.
>>> import re
>>> s = "www.wired.com"
>>> re.sub(r'^www\.', '', s)
'wired.com'

Related

Python string.rstrip() doesn't strip specified characters

string = "hi())("
string = string.rstrip("abcdefghijklmnoprstuwxyz")
print(string)
I want to remove every letter from given string using rstrip method, however it does not change the string in the slightest.
Output:
'hi())('
What i Want:
'())('
I know that I can use regex, but I really don't understand why it doesn't work.
Note : It is a part of the Valid Parentheses challenge on code-wars
You have to use lstrip instead of rstrip:
>>> string = "hi())("
>>> string = string.lstrip("abcdefghijklmnoprstuwxyz")
>>> string
'())('

Python string regular expression

I need to do a string compare to see if 2 strings are equal, like:
>>> x = 'a1h3c'
>>> x == 'a__c'
>>> True
independent of the 3 characters in middle of the string.
You need to use anchors.
>>> import re
>>> x = 'a1h3c'
>>> pattern = re.compile(r'^a.*c$')
>>> pattern.match(x) != None
True
This would check for the first and last char to be a and c . And it won't care about the chars present at the middle.
If you want to check for exactly three chars to be present at the middle then you could use this,
>>> pattern = re.compile(r'^a...c$')
>>> pattern.match(x) != None
True
Note that end of the line anchor $ is important , without $, a...c would match afoocbarbuz.
Your problem could be solved with string indexing, but if you want an intro to regex, here ya go.
import re
your_match_object = re.match(pattern,string)
the pattern in your case would be
pattern = re.compile("a...c") # the dot denotes any char but a newline
from here, you can see if your string fits this pattern with
print pattern.match("a1h3c") != None
https://docs.python.org/2/howto/regex.html
https://docs.python.org/2/library/re.html#search-vs-match
if str1[0] == str2[0]:
# do something.
You can repeat this statement as many times as you like.
This is slicing. We're getting the first value. To get the last value, use [-1].
I'll also mention, that with slicing, the string can be of any size, as long as you know the relative position from the beginning or the end of the string.

Strip all matching characters from string

Given any of the following strings:
'test'
'test='
'test=='
'test==='
I'd like to run a function on it that will remove any/all '=' characters from the end. Now, I could write something like this in two seconds, in fact, here goes one, and I can imaging a dozen alternative approaches:
def cleanup():
p = passwd()
while True:
new_p = p.rstrip('=')
if len(new_p) == len(p):
return new_p
p = new_p
But I was wondering if anything like that already exists as part of the Python Standard Library?
str.rstrip() already removes all matching characters:
>>> 'test===='.rstrip('=')
'test'
There is no need to loop.
All you need is str.rstrip:
>>> 'test'.rstrip('=')
'test'
>>> 'test='.rstrip('=')
'test'
>>> 'test=='.rstrip('=')
'test'
>>> 'test==='.rstrip('=')
'test'
>>>
From the docs:
str.rstrip([chars])
Return a copy of the string with trailing characters removed.
It should be noted however that str.rstrip only removes characters from the right end of the string. You need to use str.lstrip to remove characters from the left end and str.strip to remove characters from both ends.

Surprising output of text manipulation when contains '#'

I have a string in python 2.7
s1='path#poss|<-poss<-home->prep->in->pobj->|pobj'
which I want to strip 'path#' from the beginning of it.
When I use lstrip it ends up in weird output with an extra 'p' stripped. The output of
s2 = s1.lstrip('path#')
is
'oss|<-poss<-home->prep->in->pobj->|pobj'
instead of
'poss|<-poss<-home->prep->in->pobj->|pobj'
It works perfectly for other examples like:
'path#nsubj|<-nsubj<-leader->prep->of->pobj->|pobj'
which is stripped correctly to:
'nsubj|<-nsubj<-leader->prep->of->pobj->|pobj'
Why is python stripping the extra letter from the string?
This should do it:
prefix_to_strip = 'path#'
s1 = 'path#poss|<-poss<-home->prep->in->pobj->|pobj'
s1 = s1[len(prefix_to_strip):]
strip() doesn't work because it just removes any character found in the string (or, iterable, strictly speaking) you pass to it.
P.S. If you want to be able to safely apply this to any string (i.e. string that might not start with path#), do this:
if s1.startswith(prefix_to_strip):
s1 = s1[len(prefix_to_strip):]
or even:
def strip_prefix(prefix, string):
return string[len(prefix):] if string.startswith(prefix) else string
strip_prefix('foo#', 'foo#bar') # => 'bar'
strip_prefix('foo#', 'hello') # => 'hello'
Excerpt from the documentation of lstrip:
Return a copy of the string with leading characters removed. The chars
argument is a string specifying the set of characters to be removed.
The characters you pass as an argument is a set, and since 'p' is part of it, it will remove the 'p' you are missing.
This will explain everything I believe
s1 = 'pppppppppppppp1pppppppppppppp'
print s1.lstrip("path#")
print s1.rstrip("path#")
print s1.strip("path#")
Output
1pppppppppppppp
pppppppppppppp1
1
No, it's not weird. str.strip doesn't removes prefix or suffix, it removes all combinations of characters passed to it.
From docs on str.strip([chars]):
Return a copy of the string with the leading and trailing characters
removed. The chars argument is a string specifying the set of
characters to be removed. If omitted or None, the chars argument
defaults to removing whitespace. The chars argument is not a
prefix or suffix; rather, all combinations of its values are stripped:
And same thing is applicable to str.lstrip and str.rstrip.
Fix:
>>> s1 = 'path#poss|<-poss<-home->prep->in->pobj->|pobj'
if s1.startswith('path#'):
s2 = s1[len('path#'):]
>>> s2
'poss|<-poss<-home->prep->in->pobj->|pobj'

Strip in Python

I have a question regarding strip() in Python. I am trying to strip a semi-colon from a string, I know how to do this when the semi-colon is at the end of the string, but how would I do it if it is not the last element, but say the second to last element.
eg:
1;2;3;4;\n
I would like to strip that last semi-colon.
Strip the other characters as well.
>>> '1;2;3;4;\n'.strip('\n;')
'1;2;3;4'
>>> "".join("1;2;3;4;\n".rpartition(";")[::2])
'1;2;3;4\n'
how about replace?
string1='1;2;3;4;\n'
string2=string1.replace(";\n","\n")
>>> string = "1;2;3;4;\n"
>>> string.strip().strip(";")
"1;2;3;4"
This will first strip any leading or trailing white space, and then remove any leading or trailing semicolon.
Try this:
def remove_last(string):
index = string.rfind(';')
if index == -1:
# Semi-colon doesn't exist
return string
return string[:index] + string[index+1:]
This should be able to remove the last semicolon of the line, regardless of what characters come after it.
>>> remove_last('Test')
'Test'
>>> remove_last('Test;abc')
'Testabc'
>>> remove_last(';test;abc;foobar;\n')
';test;abc;foobar\n'
>>> remove_last(';asdf;asdf;asdf;asdf')
';asdf;asdf;asdfasdf'
The other answers provided are probably faster since they're tailored to your specific example, but this one is a bit more flexible.
You could split the string with semi colon and then join the non-empty parts back again using ; as separator
parts = '1;2;3;4;\n'.split(';')
non_empty_parts = []
for s in parts:
if s.strip() != "": non_empty_parts.append(s.strip())
print "".join(non_empty_parts, ';')
If you only want to use the strip function this is one method:
Using slice notation, you can limit the strip() function's scope to one part of the string and append the "\n" on at the end:
# create a var for later
str = "1;2;3;4;\n"
# format and assign to newstr
newstr = str[:8].strip(';') + str[8:]
Using the rfind() method(similar to Micheal0x2a's solution) you can make the statement applicable to many strings:
# create a var for later
str = "1;2;3;4;\n"
# format and assign to newstr
newstr = str[:str.rfind(';') + 1 ].strip(';') + str[str.rfind(';') + 1:]
re.sub(r';(\W*$)', r'\1', '1;2;3;4;\n') -> '1;2;3;4\n'

Categories