Strip in Python - python

I have a question regarding strip() in Python. I am trying to strip a semi-colon from a string, I know how to do this when the semi-colon is at the end of the string, but how would I do it if it is not the last element, but say the second to last element.
eg:
1;2;3;4;\n
I would like to strip that last semi-colon.

Strip the other characters as well.
>>> '1;2;3;4;\n'.strip('\n;')
'1;2;3;4'

>>> "".join("1;2;3;4;\n".rpartition(";")[::2])
'1;2;3;4\n'

how about replace?
string1='1;2;3;4;\n'
string2=string1.replace(";\n","\n")

>>> string = "1;2;3;4;\n"
>>> string.strip().strip(";")
"1;2;3;4"
This will first strip any leading or trailing white space, and then remove any leading or trailing semicolon.

Try this:
def remove_last(string):
index = string.rfind(';')
if index == -1:
# Semi-colon doesn't exist
return string
return string[:index] + string[index+1:]
This should be able to remove the last semicolon of the line, regardless of what characters come after it.
>>> remove_last('Test')
'Test'
>>> remove_last('Test;abc')
'Testabc'
>>> remove_last(';test;abc;foobar;\n')
';test;abc;foobar\n'
>>> remove_last(';asdf;asdf;asdf;asdf')
';asdf;asdf;asdfasdf'
The other answers provided are probably faster since they're tailored to your specific example, but this one is a bit more flexible.

You could split the string with semi colon and then join the non-empty parts back again using ; as separator
parts = '1;2;3;4;\n'.split(';')
non_empty_parts = []
for s in parts:
if s.strip() != "": non_empty_parts.append(s.strip())
print "".join(non_empty_parts, ';')

If you only want to use the strip function this is one method:
Using slice notation, you can limit the strip() function's scope to one part of the string and append the "\n" on at the end:
# create a var for later
str = "1;2;3;4;\n"
# format and assign to newstr
newstr = str[:8].strip(';') + str[8:]
Using the rfind() method(similar to Micheal0x2a's solution) you can make the statement applicable to many strings:
# create a var for later
str = "1;2;3;4;\n"
# format and assign to newstr
newstr = str[:str.rfind(';') + 1 ].strip(';') + str[str.rfind(';') + 1:]

re.sub(r';(\W*$)', r'\1', '1;2;3;4;\n') -> '1;2;3;4\n'

Related

Remove variable parts of a string that start and end the same

I have a string as the following:
'1:CH,AG,ME,GS,AP,CH,HE,AC,AC,AG,CA,HE,AT,AT,AC,AT,OG,NE,AG,AC,CS,OD\n&:TA,EB,PA,AC,BR,TH,PO,AC,2I,AC,TH,PE,TH,AZ,AZ,ZE,CS,OD,CH,EO,ZE,OG\n&:TH,ZE,ZE,HE,HE,HP,HP,OG,HP,ZE\n2:ZE,FD,FD,AG,EO,OG,AG,NE,RU,GS,HP,ZE,ZE,HM,HM,PC,PC,AS,AS,TY,TY,AG\n&:AG,GS,NO,EU,ZF,HE,AT,AT,OD,OD,EB,OD,GS,TR,OD,AC,TR,GS,OD,TR,OD,AT,GS\n&:CA,GS,NE,GS,AG,PS,HL,AG,NE,ID,AJ,AX,DI,OD,ME,AT,GS,MU,HO,PB,LT,9Z,PT,9Y\n&:9W,9X,AR,9V,9U,9T,AX,9S,9R,AT,AJ,DI,ST,EA,AG,ME,NE,MU,9Q,9P,9O,9N,9M,9L\n&:9K,ID,MG,OD,FY,AU,AU,HR,HR,9J,TL,9I,9H,9G,9F,AC,BR,AC,9E,9D,9C,9B,99\n'
As you can see, I would like to get the '\n(number or & here):' replaced by ','
Since they all start with '\n' and end with ':' I believe that there should be a way to replace them all at once.
The output would be as the sort:
'CH,AG,ME,GS,AP,CH,HE,AC,AC,AG,CA,HE,AT,AT,AC,AT,OG,NE,AG,AC,CS,OD,TA,EB,PA,AC,BR,TH,PO,AC,2I,AC,TH,PE,TH,AZ,AZ,ZE,CS,OD,CH,EO,ZE,OG,TH,ZE,ZE,HE,HE,HP,HP,OG,HP,ZE,ZE,FD,FD,AG,EO,OG,AG,NE,RU,GS,HP,ZE,ZE,HM,HM,PC,PC,AS,AS,TY,TY,AG,AG,GS,NO,EU,ZF,HE,AT,AT,OD,OD,EB,OD,GS,TR,OD,AC,TR,GS,OD,TR,OD,AT,GS,CA,GS,NE,GS,AG,PS,HL,AG,NE,ID,AJ,AX,DI,OD,ME,AT,GS,MU,HO,PB,LT,9Z,PT,9Y,9W,9X,AR,9V,9U,9T,AX,9S,9R,AT,AJ,DI,ST,EA,AG,ME,NE,MU,9Q,9P,9O,9N,9M,9L,9K,ID,MG,OD,FY,AU,AU,HR,HR,9J,TL,9I,9H,9G,9F,AC,BR,AC,9E,9D,9C,9B,99'
What could work was making a for lop for numbers and &.
string.replace('\n&:',',')
for i in range(1,20):
string.replace('\ni:',',')
But I believe there must be a better way.
You can use regex to get the job done:
Input:
import re
text = '1:CH,AG,ME,GS,AP,CH,HE,AC,AC,AG,CA,HE,AT,AT,AC,AT,OG,NE,AG,AC,CS,OD\n&:TA,EB,PA,AC,BR,TH,PO,AC,2I,AC,TH,PE,TH,AZ,AZ,ZE,CS,OD,CH,EO,ZE,OG\n&:TH,ZE,ZE,HE,HE,HP,HP,OG,HP,ZE\n2:ZE,FD,FD,AG,EO,OG,AG,NE,RU,GS,HP,ZE,ZE,HM,HM,PC,PC,AS,AS,TY,TY,AG\n&:AG,GS,NO,EU,ZF,HE,AT,AT,OD,OD,EB,OD,GS,TR,OD,AC,TR,GS,OD,TR,OD,AT,GS\n&:CA,GS,NE,GS,AG,PS,HL,AG,NE,ID,AJ,AX,DI,OD,ME,AT,GS,MU,HO,PB,LT,9Z,PT,9Y\n&:9W,9X,AR,9V,9U,9T,AX,9S,9R,AT,AJ,DI,ST,EA,AG,ME,NE,MU,9Q,9P,9O,9N,9M,9L\n&:9K,ID,MG,OD,FY,AU,AU,HR,HR,9J,TL,9I,9H,9G,9F,AC,BR,AC,9E,9D,9C,9B,99\n'
text = re.sub(r'\n&*(\d*:)*',',', text[2:]).rstrip(',')
Output:
'CH,AG,ME,GS,AP,CH,HE,AC,AC,AG,CA,HE,AT,AT,AC,AT,OG,NE,AG,AC,CS,OD,TA,EB,PA,AC,BR,TH,PO,AC,2I,AC,TH,PE,TH,AZ,AZ,ZE,CS,OD,CH,EO,ZE,OG,TH,ZE,ZE,HE,HE,HP,HP,OG,HP,ZE,ZE,FD,FD,AG,EO,OG,AG,NE,RU,GS,HP,ZE,ZE,HM,HM,PC,PC,AS,AS,TY,TY,AG,AG,GS,NO,EU,ZF,HE,AT,AT,OD,OD,EB,OD,GS,TR,OD,AC,TR,GS,OD,TR,OD,AT,GS,CA,GS,NE,GS,AG,PS,HL,AG,NE,ID,AJ,AX,DI,OD,ME,AT,GS,MU,HO,PB,LT,9Z,PT,9Y,9W,9X,AR,9V,9U,9T,AX,9S,9R,AT,AJ,DI,ST,EA,AG,ME,NE,MU,9Q,9P,9O,9N,9M,9L,9K,ID,MG,OD,FY,AU,AU,HR,HR,9J,TL,9I,9H,9G,9F,AC,BR,AC,9E,9D,9C,9B,99'
You can use a regular expression replace:
s = '1:CH,AG,ME,GS,AP,CH,HE,AC,AC,AG,CA,HE,AT,AT,AC,AT,OG,NE,AG,AC,CS,OD\n&:TA,EB,PA,AC,BR,TH,PO,AC,2I,AC,TH,PE,TH,AZ,AZ,ZE,CS,OD,CH,EO,ZE,OG\n&:TH,ZE,ZE,HE,HE,HP,HP,OG,HP,ZE\n2:ZE,FD,FD,AG,EO,OG,AG,NE,RU,GS,HP,ZE,ZE,HM,HM,PC,PC,AS,AS,TY,TY,AG\n&:AG,GS,NO,EU,ZF,HE,AT,AT,OD,OD,EB,OD,GS,TR,OD,AC,TR,GS,OD,TR,OD,AT,GS\n&:CA,GS,NE,GS,AG,PS,HL,AG,NE,ID,AJ,AX,DI,OD,ME,AT,GS,MU,HO,PB,LT,9Z,PT,9Y\n&:9W,9X,AR,9V,9U,9T,AX,9S,9R,AT,AJ,DI,ST,EA,AG,ME,NE,MU,9Q,9P,9O,9N,9M,9L\n&:9K,ID,MG,OD,FY,AU,AU,HR,HR,9J,TL,9I,9H,9G,9F,AC,BR,AC,9E,9D,9C,9B,99\n'
s = re.sub(r"(\n\d*?:)|(\n&:)", ",", s).strip() # replaces the middle bits with commas and strips trailing \n
s = re.sub(r"^(\d*?:)|(&:)", "", s) # removes the initial 1: or similar

How to remove words after certain character in a line in python [duplicate]

I have a string. How do I remove all text after a certain character? (In this case ...)
The text after will ... change so I that's why I want to remove all characters after a certain one.
Split on your separator at most once, and take the first piece:
sep = '...'
stripped = text.split(sep, 1)[0]
You didn't say what should happen if the separator isn't present. Both this and Alex's solution will return the entire string in that case.
Assuming your separator is '...', but it can be any string.
text = 'some string... this part will be removed.'
head, sep, tail = text.partition('...')
>>> print head
some string
If the separator is not found, head will contain all of the original string.
The partition function was added in Python 2.5.
S.partition(sep) -> (head, sep, tail)
Searches for the separator sep in S, and returns the part before it,
the separator itself, and the part after it. If the separator is not
found, returns S and two empty strings.
If you want to remove everything after the last occurrence of separator in a string I find this works well:
<separator>.join(string_to_split.split(<separator>)[:-1])
For example, if string_to_split is a path like root/location/child/too_far.exe and you only want the folder path, you can split by "/".join(string_to_split.split("/")[:-1]) and you'll get
root/location/child
Without a regular expression (which I assume is what you want):
def remafterellipsis(text):
where_ellipsis = text.find('...')
if where_ellipsis == -1:
return text
return text[:where_ellipsis + 3]
or, with a regular expression:
import re
def remwithre(text, there=re.compile(re.escape('...')+'.*')):
return there.sub('', text)
import re
test = "This is a test...we should not be able to see this"
res = re.sub(r'\.\.\..*',"",test)
print(res)
Output: "This is a test"
The method find will return the character position in a string. Then, if you want remove every thing from the character, do this:
mystring = "123⋯567"
mystring[ 0 : mystring.index("⋯")]
>> '123'
If you want to keep the character, add 1 to the character position.
From a file:
import re
sep = '...'
with open("requirements.txt") as file_in:
lines = []
for line in file_in:
res = line.split(sep, 1)[0]
print(res)
This is in python 3.7 working to me
In my case I need to remove after dot in my string variable fees
fees = 45.05
split_string = fees.split(".", 1)
substring = split_string[0]
print(substring)
Yet another way to remove all characters after the last occurrence of a character in a string (assume that you want to remove all characters after the final '/').
path = 'I/only/want/the/containing/directory/not/the/file.txt'
while path[-1] != '/':
path = path[:-1]
another easy way using re will be
import re, clr
text = 'some string... this part will be removed.'
text= re.search(r'(\A.*)\.\.\..+',url,re.DOTALL|re.IGNORECASE).group(1)
// text = some string

lstrip is removing a character I wouldn't expect it to

The following code:
s = "www.wired.com"
print s
s = s.lstrip('www.')
print s
outputs:
www.wired.com
ired.com
Note the missing w on the second line. I'm not sure I understand the behavior. I would expect:
www.wired.com
wired.com
EDIT:
Following the first two answers, I now understand the behavior. My question is now: how do I strip the leading www. without touching the rest?
The argument to string.lstrip is a list of characters:
>>> help(string.lstrip)
Help on function lstrip in module string:
lstrip(s, chars=None)
lstrip(s [,chars]) -> string
Return a copy of the string s with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
>>>
It removes ALL occurrences of those leading characters.
print s.lstrip('w.') # does the same!
[EDIT]:
If you wanted to strop the initial www., but only if it started with that, you could use a regular expression or something like:
s = s[4:] if s.startswith('www.') else s
According to the documentation:
The chars argument is a string specifying the set of characters to be removed...The chars argument is not a prefix; rather, all combinations of its values are stripped
You would achieve the same result by just saying:
'www.wired.com'.lstrip('w.')
If you wanted something more general, I would do something like this:
i = find(s, 'www.')
if i >= 0:
s = s[0:i] + s[i+4:]
To remove the leading www.
>>> import re
>>> s = "www.wired.com"
>>> re.sub(r'^www\.', '', s)
'wired.com'

Removing the single quotes after using re.sub() in python

After replacing all word characters in a string with the character '^', using re.sub("\w", "^" , stringorphrase) I'm left with :
>>> '^^^ ^^ ^^^^'
Is there any way to remove the single quotes so it looks cleaner?
>>> ^^^ ^^ ^^^^
Are you sure it's just not how it's displayed in the interactive prompt or something (and there aren't actually apost's in your string)?
If the ' is actually part of the string, and is first/last then either:
string = string.strip("'")
or:
string = string[1:-1] # lop ending characters off
Use the print statement. The quotes aren't actually part of the string.
To remove all occurrences of single quotes:
mystr = some_string_with_single_quotes
answer = mystr.replace("'", '')
To remove single quotes ONLY at the ends of the string:
mystr = some_string_with_single_quotes
answer = mystr.strip("'")
Hope this helps

How to remove all characters after a specific character in python?

I have a string. How do I remove all text after a certain character? (In this case ...)
The text after will ... change so I that's why I want to remove all characters after a certain one.
Split on your separator at most once, and take the first piece:
sep = '...'
stripped = text.split(sep, 1)[0]
You didn't say what should happen if the separator isn't present. Both this and Alex's solution will return the entire string in that case.
Assuming your separator is '...', but it can be any string.
text = 'some string... this part will be removed.'
head, sep, tail = text.partition('...')
>>> print head
some string
If the separator is not found, head will contain all of the original string.
The partition function was added in Python 2.5.
S.partition(sep) -> (head, sep, tail)
Searches for the separator sep in S, and returns the part before it,
the separator itself, and the part after it. If the separator is not
found, returns S and two empty strings.
If you want to remove everything after the last occurrence of separator in a string I find this works well:
<separator>.join(string_to_split.split(<separator>)[:-1])
For example, if string_to_split is a path like root/location/child/too_far.exe and you only want the folder path, you can split by "/".join(string_to_split.split("/")[:-1]) and you'll get
root/location/child
Without a regular expression (which I assume is what you want):
def remafterellipsis(text):
where_ellipsis = text.find('...')
if where_ellipsis == -1:
return text
return text[:where_ellipsis + 3]
or, with a regular expression:
import re
def remwithre(text, there=re.compile(re.escape('...')+'.*')):
return there.sub('', text)
import re
test = "This is a test...we should not be able to see this"
res = re.sub(r'\.\.\..*',"",test)
print(res)
Output: "This is a test"
The method find will return the character position in a string. Then, if you want remove every thing from the character, do this:
mystring = "123⋯567"
mystring[ 0 : mystring.index("⋯")]
>> '123'
If you want to keep the character, add 1 to the character position.
From a file:
import re
sep = '...'
with open("requirements.txt") as file_in:
lines = []
for line in file_in:
res = line.split(sep, 1)[0]
print(res)
This is in python 3.7 working to me
In my case I need to remove after dot in my string variable fees
fees = 45.05
split_string = fees.split(".", 1)
substring = split_string[0]
print(substring)
Yet another way to remove all characters after the last occurrence of a character in a string (assume that you want to remove all characters after the final '/').
path = 'I/only/want/the/containing/directory/not/the/file.txt'
while path[-1] != '/':
path = path[:-1]
another easy way using re will be
import re, clr
text = 'some string... this part will be removed.'
text= re.search(r'(\A.*)\.\.\..+',url,re.DOTALL|re.IGNORECASE).group(1)
// text = some string

Categories