How is lstrip() method removing chars from left? [duplicate] - python

This question already has answers here:
Understanding python's lstrip method on strings [duplicate]
(3 answers)
Closed 1 year ago.
My understanding is that the lstrip(arg) removes characters from the left based on the value of arg.
I am executing the following code:
'htp://www.abc.com'.lstrip('/')
Output:
'htp://www.abc.com'
My understanding is that all the characters should be stripped from left until / is reached.
In other words, the output should be:
'www.abc.com'
I am also not sure why running the following code is generating below output:
'htp://www.abc.com'.lstrip('/:pth')
Output:
'www.abc.com'

Calling the help function shows the following:
Help on built-in function lstrip:
lstrip(chars=None, /) method of builtins.str instance
Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
Which, clearly means that any white-space in the starting (i.e. left) will be chopped-off or if the chars argument is specified it will remove those characters if and only if the string begins with any of the specified characters i.e. if you pass 'abc' as an argument then the string should start with any of 'a','b' or 'c' else the function won't change anything.
The string need not begin with the 'abc' as a whole.
print(' the left strip'.lstrip()) # strips off the whitespace
the left strip
>>> print('ththe left strip'.lstrip('th')) # strips off the given characters as the string starts with those
e left strip
>>> print('ththe left strip'.lstrip('left')) # removes 't' as 'left' contatins 't' in it
hthe left strip
>>> print('ththe left strip'.lstrip('zeb')) # doesn't change anything as the argument passed doesn't match the beggining of the string
ththe left strip
>>> print('ththe left strip'.lstrip('h')) # doesn't change anything as the argument passed doesn't match the beggining of the string
ththe left strip

If you want all chars right of a given string try split
url = 'htp://www.abc.com'
print(url.split('//')[1])
output
www.abc.com
lstrip only returns a copy of the string with leading characters stripped, not in between

I think you want this :
a = 'htp://www.abc.com'
a = a[a.find('/')+1:]
From Python Docs :
str.lstrip([chars])
Return a copy of the string with leading characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. **The chars argument is not a prefix; rather, all combinations of its values are stripped:**
Read the last line your doubt will be resolved.

In the Python documentation, str.lstrip can only remove the leading characters specified in its args, or whitespaces if no characters is provided.
You can try using str.rfind like this:
>>> url = "https://www.google.com"
>>> url[url.rfind('/')+1:]
'www.google.com'

Related

Why does sentence.strip() remove certain characters but not others from the end of this string?

Tyring to figure out how strip() works when reading characters in a string.
This:
sentence = "All the single ladies"
sentence = sentence.strip("All the si")
print(sentence)
returns this:
ngle lad
I get why 'All the si' is removed from the start of the string. But how does Python decide to remove the 'ies' from the end of the string? If the 'e' is being removed from the 'ies', why isn't it being removed from 'the' too? What are the rules for string stripping behavior?
.strip() accepts an iterable of characters you want to remove not a substring. So all of i, e, s characters are present in the substring you passed (All the si). And d (that is at the end of the resulting string) isn't, so it stops on it.
See more in the docs.
To remove the substring you would use:
sentence.replace("All the si", "")

Python: strip() method not removing whitespace from text

I have a problem where what looks like whitespace preceding a string isn't removed using the strip method. This is the script:
text = '"X-DSPAM-Confidence: 0.8475";'
startpos = text.find(":")
endpos = text.find('\";', startpos)
extracted_text = text[startpos+1:endpos]
extracted_text.strip()
print("Substring:",extracted_text)
This returns:
Substring: 0.8475
Assuming that strip() was used correctly, any advice on debugging to identify what is actually printed to screen that appears to be whitespace but isn't?
You have to re-assign the variable:
extracted_text=extracted_text.strip()
Alternatively:
print("Substring:",extracted_text.strip())
str.strip does not happen in-place it returns the stripped string.
In order to isolate the last number without the trailing characters you can use a combination of str.strip and str.split then get the second value and remove the trailing characters using str.replace:
>>> text.strip().split()[1].replace('";', '')
'0.8475'

Proper replacement of "beginning" non-alphanumeric characters, in python, using regular expressions

NOTE: This post is not the same as the post "Re.sub not working for me".
That post is about matching and replacing ANY non-alphanumeric substring in a string.
This question is specifically about matching and replacing non-alphanumeric substrings that explicitly show up at the beginning of a string.
The following method attempts to match any non-alphanumeric character string "AT THE BEGINNING" of a string and replace it with a new string "BEGINNING_"
def m_getWebSafeString(self, dirtyAttributeName):
cleanAttributeName = ''.join(dirtyAttributeName)
# Deal with beginning of string...
cleanAttributeName = re.sub('^[^a-zA-z]*',"BEGINNING_",cleanAttributeName)
# Deal with end of string...
if "BEGINNING_" in cleanAttributeName:
print ' ** ** ** D: "{}" ** ** ** C: "{}"'.format(dirtyAttributeName, cleanAttributeName)
PROBLEM DESCRIPTION: The method seems to not only replace non-alphnumeric characters but it also incorrectly inserts the "BEGINNING_" string at the beginning of all strings that are passed into it. In other words...
GOOD RESULT: If the method is passed the string *##$ThisIsMyString1, it correctly returns BEGINNING_ThisIsMyString1
BAD/UNWANTED RESULT: However, if the method is passed the string ThisIsMyString2 it incorrectly (and always) inserts the replacement string (BEGINNING_), even there are no non-alphanumeric characters, and yields the result BEGINNING_ThisIsMyString2
MY QUESTION: What is the correct way to write the re.sub() line so it only replaces those non-alphnumeric characters at the beginning of the string such that it does not always insert the replacement string at the beginning of the original input string?
You're matching 0 or more instances of non-alphabetic characters by using the * quantifier, which means it'll always be picked up by your pattern. You can replace what you have with
re.sub('^[^a-zA-Z]+', ...)
to ensure that only 1 or more instances are matched.
replace
re.sub('^[^a-zA-z]*',"BEGINNING_",cleanAttributeName)
with
re.sub('^[^a-zA-z]+',"BEGINNING_",cleanAttributeName)
There is a more elegant solution. You can use this
re.sub('^\W+', 'BEGINNING_', cleanAttributeName)
\W Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_].
>>> re.sub('^\W+', 'BEGINNING_', '##$ThisIsMyString1')
'BEGINNING_ThisIsMyString1'
>>> re.sub('^\W+', 'BEGINNING_', 'ThisIsMyString2')
'ThisIsMyString2'

How to use text strip() function?

I can strip numerics but not alpha characters:
>>> text
'132abcd13232111'
>>> text.strip('123')
'abcd'
Why the following is not working?
>>> text.strip('abcd')
'132abcd13232111'
The reason is simple and stated in the documentation of strip:
str.strip([chars])
Return a copy of the string with the leading and trailing characters removed.
The chars argument is a string specifying the set of characters to be removed.
'abcd' is neither leading nor trailing in the string '132abcd13232111' so it isn't stripped.
Just to add a few examples to Jim's answer, according to .strip() docs:
Return a copy of the string with the leading and trailing characters removed.
The chars argument is a string specifying the set of characters to be removed.
If omitted or None, the chars argument defaults to removing whitespace.
The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.
So it doesn't matter if it's a digit or not, the main reason your second code didn't worked as you expected, is because the term "abcd" was located in the middle of the string.
Example1:
s = '132abcd13232111'
print(s.strip('123'))
print(s.strip('abcd'))
Output:
abcd
132abcd13232111
Example2:
t = 'abcd12312313abcd'
print(t.strip('123'))
print(t.strip('abcd'))
Output:
abcd12312313abcd
12312313

Replace pairs of characters at start of string with a single character

I only want this done at the start of the sting. Some examples (I want to replace "--" with "-"):
"--foo" -> "-foo"
"-----foo" -> "---foo"
"foo--bar" -> "foo--bar"
I can't simply use s.replace("--", "-") because of the third case. I also tried a regex, but I can't get it to work specifically with replacing pairs. I get as far as trying to replace r"^(?:(-){2})+" with r"\1", but that tries to replace the full block of dashes at the start, and I can't figure how to get it to replace only pairs within that block.
Final regex was:
re.sub(r'^(-+)\1', r'\1', "------foo--bar")
^ - match start
(-+) - match at least one -, but...
\1 - an equal number must exist outside the capture group.
and finally, replace with that number of hyphens, effectively cutting the number of hyphens in half.
import re
print re.sub(r'\--', '',"--foo")
print re.sub(r'\--', '',"-----foo")
Output:
foo
-foo
EDIT this answer is for the OP before it was completely edited and changed.
Here's it all written out for anyone else who comes this way.
>>> foo = '---foo'
>>> bar = '-----foo'
>>> foobar = 'foo--bar'
>>> foobaz = '-----foo--bar'
>>> re.sub('^(-+)\\1', '-', foo)
'-foo'
>>> re.sub('^(-+)\\1', '-', bar)
'---foo'
>>> re.sub('^(-+)\\1', '-', foobar)
'foo--bar'
>>> re.sub('^(-+)\\1', '-', foobaz)
'--foo--bar'
The pattern for re.sub() is:
re.sub(pattern, replacement, string)
therefore in this case we want to replace -- with -. HOWEVER, the issue comes when we have -- that we don't want to replace, given by some circumstances.
In this case we only want to match -- at the beginning of a string. In regular expressions for python, the ^ character, when used in the pattern string, will only match the given pattern at the beginning of the string - just what we were looking for!
Note that the ^ character behaves differently when used within square brackets.
Square brackets can be used to indicate a set of chars, so [abc] matches 'a' or 'b' or 'c'... An up-hat (^) at the start of a square-bracket set inverts it, so [^ab] means any char except 'a' or 'b'.
Getting back to what we were talking about. The parenthesis in the pattern represent a "group," this group can then be referenced with the \\1, meaning the first group. If there was a second set of parenthesis, we could then reference that sub-pattern with \\2. The extra \ is to escape the next slash. This pattern can also be written with re.sub(r'^(-+)\1', '-', foo) forcing python to interpret the string as a raw string, as denoted with the r preceding the pattern, thereby eliminating the need to escape special characters.
Now that the pattern is all set up, you just make the replacement whatever you want to replace the pattern with, and put in the string that you are searching through.
A link that I keep handy when dealing with regular expressions, is Google's developer's notes on them.

Categories